[Week 1] Research Project Journey - 6 Degree of Freedom Object Pose Estimation
After working on various projects during my Ph.D., I decided to write down my experience on a new project that I’ve just started for a few weeks and share it publically.

A brief introduction about myself: I’m a 5th-year Ph.D. student in Computer Science at Penn. Throughout my Ph.D., I’ve worked on different machine perception problems, from tomography estimation, depth estimation to object detection. I’ve recently published a paper on multi-agent localization using pair-wise measurement. Describing myself as an applied science researcher, I enjoyed working on different projects that aim to equip robots and intelligent agents with the capability to sense the world using cameras and LiDARs. This time, my goal is to estimate 6Dof of an object pose using monocular RGB images.
This week, I’ve spent time reading a couple of papers including:
- He, Yisheng, et al. “PVN3D: A deep point-wise 3D keypoints voting network for 6DoF pose estimation.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
- Li, Zhigang, et al. “Robust RGB-based 6-DoF Pose Estimation without Real Pose Annotations.” arXiv preprint arXiv:2008.08391 (2020).
- Wu, Di, et al. “6d-vnet: End-to-end 6-dof vehicle pose estimation from monocular rgb images.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2019.
I found that Wu et.all[3] look closest to what I want to solve. Therefore, I’ve read it in more details and started cloning their codebase to experiment with. I’ll update on the experiment later.