-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assignment-3: Two-view reconstruction #10
Comments
Can you please explain the construction of the Normalization Transform Matrix T. Currently, with the definition of T, it seems like we are shrinking the image coordinates in the range [-root(2), root(2)], instead of [-1, 1]. Could you explain why we are doing this? Moreover, the scaling factor is the same along x and y directions. However, if we are normalizing the points, it should have different for x and y directions. Possibly the average width and height of the image. Could you explain why we are taking the Euclidean distance of the image points as well? TIA |
Good questions.
This normalization matrix first translates all the image coordinates so that they're centered around the origin, and then applies a scaling so that the average distance of a point from the origin is sqrt(2). This means that an average point is equal to (1,1,1). This is desirable because this means that each of the entries in the A matrix will also have similar magnitude. And since in DLT we are in a way minimizing A, we make sure that adjusting every entry will have similar effect on the image points, and is not skewed by some entries. This way the algorithm becomes more stable. This point is explained in much more detail in this paper by Hartley.
In the paper he also shows that applying a non-isotropic scaling (different factors for x and y directions) actually has little effect on the results.
I don't have an answer to why we take l-2 norm and why not any other norm. |
Task:
To reconstruct the sparse structure of a scene from two given images of it.
Steps:
Feature extraction and matching:
Detect interest points in both the images, extract descriptors (using SIFT/SURF) for each of them, and match them to form two corresponding sets. Existing library implementations such as this one can be used for this step.
Motion estimation:
From the two sets of corresponding points, estimate the fundamental matrix between the two views using the normalized eight-point algorithm. Implement the algorithm within a RANSAC scheme to take care of outliers from the previous step. Remember to normalize the image points and then to 'de-normalize' the estimated fundamental matrix. Use T = [1.44/d, 0, -1.44/d * mu(0); 0, 1.44/d, -1.44/d * mu(1); 0, 0, 1] where d is the mean distance of all the image points from the origin, and mu is the mean of all the points.
Then convert the fundamental matrix into an essential matrix by using the provided calibration matrix and decompose it into relative R, Sb using any existing implementation. (Hartley & Zisserman's method). A Matlab implementation of the decomposition is also provided.
Triangulation:
Once the relative motion (orientation) has been estimated, implement linear triangulation to estimate the 3d position of each corresponding point.
Files:
The accompanying files can be found here. Matlab (with the computer vision toolbox) is recommended for this assignment since it is easier to get started with.
Deliverables:
Email your results to karnikram@gmail.com and ansariahmedjunaid@gmail.com.
Deadline:
Wednesday, the 29th.
Use this thread for any doubts you might have.
The text was updated successfully, but these errors were encountered: