Automatic Image Panorama

 2024/10/27 

UC-Berkeley 24FA CV Project 4a & 4b:

Creating panorama images by correspondence detecting, projective warping, resampling and compositing them together.
A simpler reproduction of Multi-Image Matching using Multi-Scale Oriented Patches

Project 4a: Manual Image Panorama

Creating panoramas by manually registering corresponding points

Shoot and Digitize pictures

First, we retrieve some images surrounding memorial glade @ Berkeley.

These images have same center of projection (COP) and different looking directions.
Then we define the following corresponding points manually:

Recover Homographies

The transform between images is basically a projection transform with 8 degree of freedoms. This can be written as: \[ H \mathbf{p} = \mathbf{p}' \] where: \[ H = \begin{pmatrix} a & b & c \\ d & e & f \\ g & h & 1 \end{pmatrix} \]
Expanding it out would get: \[ \begin{aligned} w'x' &= a x + b y + c \\ w'y' &= d x + e y + f \\ w' &= g x + h y + 1 \end{aligned} \]
Further expansion (into eight separate variables and a linear system) will result: \[ \begin{pmatrix} x_1 & y_1 & 1 & 0 & 0 & 0 & -x_1 x_1' & -y_1 x_1' \\ 0 & 0 & 0 & x_1 & y_1 & 1 & -x_1 y_1' & -y_1 y_1' \\ x_2 & y_2 & 1 & 0 & 0 & 0 & -x_2 x_2' & -y_2 x_2' \\ 0 & 0 & 0 & x_2 & y_2 & 1 & -x_2 y_2' & -y_2 y_2' \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ x_N & y_N & 1 & 0 & 0 & 0 & -x_N x_N' & -y_N x_N' \\ 0 & 0 & 0 & x_N & y_N & 1 & -x_N y_N' & -y_N y_N' \end{pmatrix} \cdot \begin{pmatrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \end{pmatrix} = \begin{pmatrix} x_1' \\ y_1' \\ x_2' \\ y_2' \\ \vdots \\ x_N' \\ y_N' \end{pmatrix} \]
Thus, we have least square method to solve this over-determined equation.

h = np.linalg.inv(A.T @ A) @ (A.T @ b) OR h, _, _, _ = np.linalg.lstsq(A, b, rcond=None)

Warp the Images

With the projection matrix, we may warp any image 1 to image 2. We implement this by:
1. First, project the four corners of image 1 to image 2, defining an area that img1 will be projected onto.
2. Second, rasterize the area, covering all pixels that img1 will be projected onto.
3. Third, inversely project all pixels in area back to image 1 and sample them bilinearly. This will give us the projected image.
Remark: using scipy.interpolate.griddata is too slow here since it has to query the whole point set. Hence we can implement the bilinear-interpolation and vectorize it using numpy operations ourselves.
Here is our example result:

Image Rectification

According to project requirements, we should perform "rectification" on image to check if the projection works correctly.
Here is an example:

Here we require the ipad in the scene to be rectangular (and hence defined corresponding points on four corners.)
Another example:

Image Blending

To avoid weird edge artifacts (usually caused by simply covering one image by another),

we develop a method to blend images together with an seamless transition.
First, we generate a mask as follows.
Second, we blend images together according to the mask's values.

For pixels that only the middle image cover, we use its original value.

For pixels that are covered by multiple images, set it to mask * middle_value + (1-mask) * avg(other_values)
Here is an example result:
Ablation study:

Up: w/o blending; Down: with blending

Project 4a: Result Gallery

Bancoft St.

Oxford St.

My Home

Project 4b: Automatic Image Panorama

Here we derive a method to detect corresponding points automatically.

Thus, the image panorama process will be completely automatic, without manually registering points.

Harris Corner Detector

We can implement Harris corner detector, a classic method to find corners in the image.
The basic idea of Harris corner detector is: slide a window on the image, and sense the change of pixel values.
First, we define the change in appearance of window W for the shift \([u, v]\) as follows: \[ E(u, v) = \sum_{(x, y) \in W}{I(x + u, y + v) - I(x, y)} \]
Then, we use a First-order Taylor approximation for small motions \([u, v]\): \[ \begin{aligned} I(x+u, y+v) &= I(x, y) + I_x u + I_y v + \text{higher order terms} \\ &\approx I(x, y) + I_x u + I_y v \\ &= I(x, y) + \begin{bmatrix} I_x & I_y \end{bmatrix} \begin{bmatrix} u \\ v \end{bmatrix} \end{aligned} \] Plugging this into \(E(u, v)\), we would get: \[ \begin{aligned} E(u, v) &= \sum_{(x, y) \in W} \left[I(x+u, y+v) - I(x, y)\right]^2 \\ &\approx \sum_{(x, y) \in W} \left[I(x, y) + \begin{bmatrix} I_x & I_y \end{bmatrix} \begin{bmatrix} u \\ v \end{bmatrix} - I(x, y)\right]^2 \\ &= \sum_{(x, y) \in W} \left(\begin{bmatrix} I_x & I_y \end{bmatrix} \begin{bmatrix} u \\ v \end{bmatrix}\right)^2 \\ &= \sum_{(x, y) \in W} \begin{bmatrix} u & v \end{bmatrix} \begin{bmatrix} I_x^2 & I_x I_y \\ I_x I_y & I_y^2 \end{bmatrix} \begin{bmatrix} u \\ v \end{bmatrix} \end{aligned} \]
This gives us the second moment matrix \(M\), which is a approximate of local change on images. \[ M = \begin{bmatrix} I_x^2 & I_x I_y \\ I_x I_y & I_y^2 \end{bmatrix} \]
Therefore, we may use this quadratic form's eigenvalues and eigenvectors to describe a rotation-invariant corner strength.

Here, we calculate this function value as "corner strength": \[ R = \det(M) - k * \text{tr}(M)^2 \] where \(\det(M) = \lambda_1 * \lambda_2\), \(\text{tr}(M) = \lambda_1 + \lambda_2\).

Note that \(k\) is an empirical value, usually between \([0.04, 0.06]\). If this value is bigger, less corners are detected, and vice versa.
Summary:
1. Compute Gaussian derivatives at each pixel
2. Compute second moment matrix M in a Gaussian window around each pixel
3. Compute corner response function R & Threshold R
Example Results:

Adaptive Non-Maximal Suppression

Since the computational cost of matching is superlinear in the number of interest points, it is desirable to restrict the number of interest points extracted from each image. Also, we wish the interest points are spatially well-distributed. Hence, we implement ANMS (Adaptive Non-Maximal Suppression) to prune some points.
Method: Interest points are suppressed based on the corner strength \(R\), and only those that are a maximum in a neighborhood of radius \(r\) pixels are retained.
In practice we robustify the non-maximal suppression by requiring that a neighbor has a sufficiently larger strength. Thus the minimum suppression radius \(r_i\) is given by: \[ r_i = \min_{j} |x_i - x_j|, \text{s.t.} f(x_i) < c_{\text{robust}}f(x_j), X_J \in I \] where where \(x_i\) is a 2D interest point image location, and \(I\) is the set of all interest point location.

Note that \(c_j\) is an empirical value, usually taking \(0.9\).
Example results:

Left: w/o pruning; Right: with ANMS pruning

Feature Descriptor Extraction

For each filtered interest point, we extract a feature descriptor for it. This is to facilitate our matching process, providing more information instead of a single pixel R value.
In this project, we simply extract a 8*8 axis-aligned box, down-sampled from a 40*40 box, centered by any interest point.
Here are some example results:

Feature Matching

Here, we implement an index to describe matching error, and then threshold on it to filter correctly matched points.
To be specific, the ratio of the closest to the second closest (1NN/2NN) is implemented.
This value is thresholded by an empirical value 0.67 according to the following graph.
Here is an example after pruning points.

RANSAC Homography Estimation

As can be read from the image, there are still some outliers that are incorrectly matched. Hence we use RANSAC (RaNdom SAmple Consensus) to calculate a robust transform matrix.
RANSAC basically iterates a lot of times, each time choosing some points from the interest point list and calculate a homography matrix. Then, the algorithm checks if the points projected by the homography matrix are correct (meaning they are close to some other point in the plane that the image is projected to.)
Here is an example after calculating a robust transform matrix by RANSAC:
With the homography matrix, we can repeat the steps in project 4a to calculate a completely automatic image panorama!

Result Comparison

Left: Automatic correspondence detection panorama; Right: Manual registering points panorama

Remark: Due to some errors in manually registering points, the last image in the right has been incorrect; but the automatic panorama's results are correct.

Project 4b: Result Gallery

Sather Gate

Somewhere near RSF

Lower plaza

Summaries and Takeaways

From what HAS been implemented:
1. Start from simple ideas. Corner detection is a hard job, but detecting the change in a small sliding window is not.
2. Do it inversely. Direct projection makes it hard to grid points, but inverse-sampling makes it much simpler and parallelizable.
From what has NOT been implemented:
1. Make it multi-resolution. If you can'd detect corners in multiple scales, do it multiple times with different window sizes.
2. Use transform-invariant descriptors. For example, eigenvalues. Or, make the corner descriptor box aligned to corner direction, instead of the axes.

Author：SimonXie2004

Link：https://simonxie2004.github.io/2024/10/27/Automatic-Image-Panorama/

Publish date：October 27th 2024, 8:22:47 pm

Update date：December 5th 2024, 9:29:45 am

License：CC-4.0

Next Post

Fun with Diffusion Models
Previous Post

CS161-Computer Security Midterm Cheatsheet