Image Matching & Colorizing

 2024/09/09 

UC-Berkeley 24FA CV Project 1: Colorizing the Images of the Russian Empire

RoadMap

This project involves two parts:
1. Colorize the image by aligning the mismatched RGB channels separately.
2. Fix the color issues with Auto-Contrasting & WhiteBalancing

Background

Sergei Mikhailovich Prokudin-Gorskii (1863-1944) was a man well ahead of his time. Convinced, as early as 1907, that color photography was the wave of the future, he won Tzar's special permission to travel across the vast Russian Empire and take color photographs of everything he saw including the only color portrait of Leo Tolstoy. And he really photographed everything: people, buildings, landscapes, railroads, bridges... thousands of color pictures!
His idea was simple: record three exposures of every scene onto a glass plate using a red, a green, and a blue filter. Never mind that there was no way to print color photographs until much later -- he envisioned special projectors to be installed in "multimedia" classrooms all across Russia where the children would be able to learn about their vast country. Alas, his plans never materialized: he left Russia in 1918, right after the revolution, never to return again.
Luckily, his RGB glass plate negatives, capturing the last years of the Russian Empire, survived and were purchased in 1948 by the Library of Congress. The LoC has recently digitized the negatives and made them available on-line.

Image Aligning

Preprocessing: Removing Borders

We start by calculating the average values both horizontally and vertically, resulting in a [H, 1] matrix and a [1, W] matrix.
Next, we remove the outermost white border and the second outermost black border using a threshold.

Single-Layer Image Aligning

Firstly, we align low-resolution images by exhaustively searching over a window of possible displacements, score each one using some image matching metric, and take the displacement with the best score. This is rather brute-force, but the low-resolution makes the calculation time acceptable.
A good metric to asses the similarity of two pictures is crucial. Traditional Euclidian Distances aren't good ideas, since different channels don't necessarily have same features like average brightness, variance, etc.
Here we choose SSIM as our assessment function. SSIM is defined as a product of 3 parts: Lightness part, Contrast part and Structural part. Here is a visualized result why SSIM is better than Euclidean Distance:

The SSIM Index for Image Quality Assessment

Figure 1: equal-MSE hypersphere from cns.nyu.edu/~lcv/ssim/

\[ \text{SSIM}(x, y) = l(x, y)^\alpha \cdot c(x, y)^\beta \cdot s(x, y)^\gamma \\ \]

\[ \begin{aligned} l(x, y) &= \frac{2\mu_x\mu_y + C_1}{\mu_x^2 + \mu_y^2 + C_1} \\ c(x, y) &= \frac{2\sigma_x\sigma_y + C_2}{\sigma_x^2 + \sigma_y^2 + C_2} \\ s(x, y) &= \frac{\sigma_{xy} + C_3}{\sigma_x\sigma_y + C_3} \end{aligned} \]

Remark: Introduce the constant \(C_1, C_2, C_3\) to avoid zero division. Usually a tiny number.

Multiscale Image Aligning

Although the basic method works for low-resolution images, it usually fails for larger images due to the workload increasing with \(O(n^2)\). To address this, we use an image pyramid approach, which combines both low-resolution (down-sampled) and high-resolution (original) versions of the image. This allows us to handle the problem iteratively: we first align the smallest image, then transfer the resulting offsets to progressively larger images, aligning each one in turn. Specifically, this approach uses a stack structure.

Initially, a while loop pushes images, each halved in size from the previous one, into the stack.
In each iteration, the loop pops the last image from the stack and aligns it.
The resulting offset is then doubled and applied to the next higher-resolution image in the subsequent loop iteration.

Pyramid (image processing) - Wikipedia

Figure 2: image pyramid from Pyramid (image processing) - Wikipedia

Interim Result Gallery

Gshift: (23, 49) Rshift: (41, 106)	Gshift: (2, 5) Rshift: (3, 12)	Gshift: (4, 25) Rshift: (-4, 58)
Gshift: (17, 59) Rshift: (14, 123)	Gshift: (17, 40) Rshift: (23, 90)	Gshift: (9, 55) Rshift: (12, 118)
Gshift: (10, 81) Rshift: (13, 177)	Gshift: (2, -3) Rshift: (2, 3)	Gshift: (27, 51) Rshift: (37, 108)
Gshift: (-11, 33) Rshift: (-27, 140)	Gshift: (29, 78) Rshift: (37, 175)	Gshift: (14, 52) Rshift: (11, 111)
Gshift: (3, 3) Rshift: (3, 7)	Gshift: (5, 41) Rshift: (31, 85)

Figure 3: My interim results with multi-layer matching.
The G & R Value below the image are shift values of green & red channel.

Post-Processing on Images

Auto White-Balancing

In this project, we implement the gray world algorithm, which is a classic auto white balance algorithm that estimates the illuminant of an image by assuming that the average color of the world is gray. The algorithm is based on the idea that the average reflectance of surfaces in the world is achromatic, or gray.

Remark: when calculating the mean values, the outer 10% of the image border is excluded to avoid potential biases (since the image itself is damaged).

Auto Contrasting

This is a straightforward algorithm. We select the minimum & maximum value \(\min, \max\) among the three channels and remap the entire image to \([0, 255]\) accordingly. It’s important to note that we do not apply auto-contrasting separately to each channel, as doing so would disrupt the white balance established in section 2.1.

Ablation Study

Figure 4: Auto-contrasting & white-balancing (left: off; right: on)

Figure 5: Auto-contrasting & white-balancing (top: off; bottom: on)

In comparison, the images on the left are overly blue, green, or red. The auto-contrasting and white-balancing algorithms have made slight adjustments to improve their appearance.