Labs2: Camera Calibration and Pose Reconstruction |
---|
For this session, you will have to calibrate a camera and use the calibration data to reconstruct the skeleton of a posing model. Calibration In the pinhole camera model, a camera can be defined by two matrices: the intrinsic matrix and the extrinsic matrix. The intrinsic matrix contains all the properties of the internal structure of the camera: focal length, image format and principal point. The intrinsic matrix is called K, and can be written as follows (f_u and f_v represent the focal length in term of pixels, u_0 and v_0 are the coordinates of the principal point): The extrinsic matrix is related to the position of the camera in the world. It defines how to go from the world coordinate system to the camera coordinate system. It can be represented as a rotation matrix and a translation matrix. The process of camera calibration consists in recovering those two matrices. To do that, we will use the Bouguet Camera Calibration Toolbox, which is available here You can follow the first tutorial, and calculate the intrinsic matrix of the camera, K. Use the files im*.jpg that you can find here. Reconstruction of a posing model This part tackles the methods that are commonly used in Motion Capture. The goal here is to reconstruct the 3D skeleton of our posing model, from a pair of photos taken from different angles. In real Motion Capture, the model wears markers that are easily trackable by a set of cameras (due to the material they are made of). A 3D reconstruction of all the markers is done for each frame over a period of time, which results in an animated skeleton which can be used on a 3D model to create realistic CG animation. To make this part more easy, we use the same camera for both shots, to avoid having to calibrate the intrinsics of two different cameras. Note that to be able to use this trick, the model has to be absolutely still between the two shots (in reality, two different cameras should take a photo at exactly the same time). We will pretend from now on that we used two cameras. The first part of the process is to calculate the extrinsics (ie. rotation and translation) of both cameras. This can be done easily with the toolbox, by using a single photo of the checkerboard (one for each camera, of course the checkerboard should not move). By doing that, the checkerboard will actually define the world coordinate system (ie. one corner of the checkerboard will be the origin, and its sides will be the unit vectors of the basis). For practical reasons, we make the checkerboard visible on the shots of the model (so we can use them to calibrate), but it is possible to use independent shots of the checkerboard beforehand to perform extrinsic calibration (as long as the position of the cameras stay unchanged after that). Once the cameras are calibrated, you will have to pick the 2D positions of distinctive parts of the model's body, on both photos. You should pick enough points to build a skeleton, for instance: tip of both foots, inter-leg, collar, elbow, tip of the hands, tip of the nose. Now let's see the maths behind the reconstruction process. Don't worry, the actual Matlab code that you will have to write to solve the equations fit in 10 lines. We use the following notations:
A 3D point is projected onto the camera by using the projection matrix P, defined as being the product of the intrinsic matrix and the extrinsic matrix: Note that Pu, Pv and Pw are 1x4 vectors. To project a point X onto the camera, we multiply its homogeneous coordinates by P. This gives us another homogeneous vector. We get the 2D position of the point by dividing by the last coordinates w. We can then write: Now the problem is the following: we want to find the position of the 3D point X, such that, in an ideal world, the measured 2D position and its calculated 2D projection are the same. Because we cannot have an exact solution, we actually want to minimise the error between the measured position and the calculated projection. We define the error, and develop it as follows: We want to minimise this error. This is equivalent to minimising the right hand side of this equation, which is equivalent to minimising both terms A and B, since they are both positive. This is finally equivalent to solving the following linear problem: Each camera brings two lines to this matrix, which will be 2n x 4. To have a fully constrained system, we therefore need 2 cameras. This is why we can only reconstruct the 3D position of a point with a minimum of two different views. To solve this system we can use the SVD decomposition of the matrix A: The position of the 3D point will correspond to the last column of V. Be careful, it is still in homogeneous coordinates. You will have to do this process for each point. Once you have the position of all the points, you can use the method joinPoints to draw segments between pairs of points in 3D, and build your final skeleton. Guidelines:
You can download the different files that you will need here. The squares of the checkerboard are 28mm x 28mm. Update:
|