ColorAlignNet: a reference-based video colorization network with temporal aggregation
ZHU Wenzhi , WANG Tong
Journal of Donghua University(English Edition) ›› 2026, Vol. 43 ›› Issue (2) : 94 -102.
Video colorization is an important technique to breathe life back into old movies. While current colorization methods work well on still images and low-motion video data, they often struggle with complex dynamic scenes. To address this problem, this study proposes ColorAlignNet, a reference-based video colorization network with temporal aggregation. The network uses source-reference attention to propagate color information from reference frames to grayscale frames, guaranteeing color accuracy, and uses deformable convolution to align features of adjacent frames to enhance temporal consistency. Finally, we use the cyclic transformer module to reconstruct the final prediction results. Extensive experimental results demonstrate that ColorAlignNet achieves excellent performance on the DAVIS and Videvo datasets, outperforming other state-of-the-art methods on both the learned perceptual image patch similarity (LPIPS) and color distribution consistency (CDC) metrics.
deformable convolution / video colorization / Swin-transformer
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
/
| 〈 |
|
〉 |