The aim of this project is to develop a new method of estimating the poses of imaged scene surfaces provided that they can be locally approximated by their tangent planes. Our approach performs an accurate direct estimation by exploiting the robustness of scale invariant feature transform (SIFT). The results are representative of the state of the art for this challenging task.
Retrieving the poses of keypoints in addition to matching them is an essential task in many computer-vision applications to transform uncostrained problems into costrained ones. This project proposes a new method of estimating the poses of regions around keypoints provided that they can be considered locally planar. While this has previously been attempted by adapting iterative algorithms developed for template matching, no explicit accurate direct estimation has been introduced before. Our approach simultaneously learn the “nuisance residual” structure present in the detection and description steps of the SIFT algorithm allowing local perspective properties of distinctive features to be recovered through a homography. The system is trained using synthetic images generated from a single reference view of the surface.
The method produces accurate detailed and fine grained set of local pose which can also be applied to non rigid surfaces. In particular the accuracy and robustness of the method are representative of the state of the art for this challenging task. At present, we investigate the application of the estimated homographies for building a pose-invariant descriptor for 3D face recognition.