Space-time video super-resolution using long-term temporal feature aggregation

Kuanhao Chen; Zijie Yue; Miaojing Shi

doi:10.1007/s43684-023-00051-9

Autonomous Intelligent Systems ›› 2023, Vol. 3 ›› Issue (1) : 5. DOI: 10.1007/s43684-023-00051-9

Original Article

Space-time video super-resolution using long-term temporal feature aggregation

Kuanhao Chen² ,
Zijie Yue¹^,² ,
Miaojing Shi¹^,²^,^c

Author information +

History +

Abstract

Space-time video super-resolution (STVSR) serves the purpose to reconstruct high-resolution high-frame-rate videos from their low-resolution low-frame-rate counterparts. Recent approaches utilize end-to-end deep learning models to achieve STVSR. They first interpolate intermediate frame features between given frames, then perform local and global refinement among the feature sequence, and finally increase the spatial resolutions of these features. However, in the most important feature interpolation phase, they only capture spatial-temporal information from the most adjacent frame features, ignoring modelling long-term spatial-temporal correlations between multiple neighbouring frames to restore variable-speed object movements and maintain long-term motion continuity. In this paper, we propose a novel long-term temporal feature aggregation network (LTFA-Net) for STVSR. Specifically, we design a long-term mixture of experts (LTMoE) module for feature interpolation. LTMoE contains multiple experts to extract mutual and complementary spatial-temporal information from multiple consecutive adjacent frame features, which are then combined with different weights to obtain interpolation results using several gating nets. Next, we perform local and global feature refinement using the Locally-temporal Feature Comparison (LFC) module and bidirectional deformable ConvLSTM layer, respectively. Experimental results on two standard benchmarks, Adobe240 and GoPro, indicate the effectiveness and superiority of our approach over state of the art.

Keywords

Space-time video super-resolution / Mixture of experts / Deformable convolutional layer / Long-term temporal feature aggregation

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Kuanhao Chen, Zijie Yue, Miaojing Shi. Space-time video super-resolution using long-term temporal feature aggregation. Autonomous Intelligent Systems, 2023, 3(1): 5 https://doi.org/10.1007/s43684-023-00051-9

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Z. Yue, M. Shi, S. Ding, S. Yang, Enhancing space-time video super-resolution via spatial-temporal feature interaction (2022). http://arxiv.org/abs/2207.08960

[2]	KimS.Y., OhJ., KimM.. FISR: deep joint frame interpolation and super-resolution with a multi-scale temporal loss. AAAI, 2022 http://arxiv.org/abs/1912.07213

[3]	ShechtmanE., CaspiY., IraniM.. Increasing space-time resolution in video. ECCV, 2002

[4]	ShechtmanE., CaspiY., IraniM.. Space-time super-resolution. IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27: 531-545 CrossRef Google scholar

[5]	ShaharO., FaktorA., IraniM.. Space-time super-resolution from a single video. CVPR, 2011 3353-3360 CrossRef Google scholar

[6]	NiklausS., LiuF.. Softmax splatting for video frame interpolation. CVPR, 2020 http://arxiv.org/abs/2003.05534

[7]	XueT., ChenB., WuJ., WeiD., FreemanW.T.. Video enhancement with task-oriented flow. IJCV, 2019 1106-1125 CrossRef Google scholar

[8]	XuX., SiyaoL., SunW., YinQ., YangM.-H.. Quadratic video interpolation. IJCV, 2019 http://arxiv.org/abs/1911.00627

[9]	LaiW.-S., HuangJ.-B., AhujaN., YangM.-H.. Fast and accurate image super-resolution with deep Laplacian pyramid networks. IEEE Trans. Pattern Anal. Mach. Intell., 2018, 41(11):2599-2613 http://arxiv.org/abs/1710.01992 CrossRef Google scholar

[10]	TianY., ZhangY., FuY., XuC.. TDAN: temporally deformable alignment network for video super-resolution. CVPR, 2018 http://arxiv.org/abs/1812.02898

[11]	WangX., ChanK.C.K., YuK., DongC., LoyC.C.. EDVR video restoration with enhanced deformable convolutional networks. CVPR Workshop, 2019 http://arxiv.org/abs/1905.02716

[12]	HarisM., ShakhnarovichG., UkitaN.. Space-time-aware multi-resolution video enhancement. CVPR, 2020 http://arxiv.org/abs/2003.13170

[13]	XiangX., TianY., ZhangY., FuY., AllebachJ.P., XuC.. Zooming slow-mo: fast and accurate one-stage space-time video super-resolution. CVPR, 2020 http://arxiv.org/abs/2002.11616

[14]	XuG., XuJ., LiZ., WangL., SunX., ChengM.-M.. Temporal modulation network for controllable space-time video super-resolution. CVPR, 2021 http://arxiv.org/abs/2104.10642

[15]	NiklausS., MaiL., LiuF.. Video frame interpolation via adaptive separable convolution. ICCV, 2017 http://arxiv.org/abs/1708.01692

[16]	NiklausS., MaiL., LiuF.. Video frame interpolation via adaptive convolution. CVPR, 2017 http://arxiv.org/abs/1703.07514

[17]	LeeH., KimT., ChungT., PakD., BanY., LeeS.. AdaCoF: adaptive collaboration of flows for video frame interpolation. CVPR, 2020 http://arxiv.org/abs/1907.10244

[18]	BaoW., LaiW.-S., MaC., ZhangX., GaoZ., YangM.-H.. Depth-aware video frame interpolation. CVPR, 2019 http://arxiv.org/abs/1904.00830

[19]	ChanK.C.K., WangX., YuK., DongC., LoyC.C.. BasicVSR: the search for essential components in video super-resolution and beyond. CVPR, 2021 http://arxiv.org/abs/2012.02181

[20]	HarisM., ShakhnarovichG., UkitaN.. Deep back-projection networks for super-resolution. CVPR, 2018 http://arxiv.org/abs/1803.02735

[21]	ShiW., CaballeroJ., HuszárF., TotzJ., AitkenA.P., BishopR., RueckertD., WangZ.. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. CVPR, 2016 http://arxiv.org/abs/1609.05158

[22]	YouC., HanL., FengA., ZhaoR., TangH., FanW.. Megan: memory enhanced graph attention network for space-time video super-resolution. WACV, 2022

[23]	CaoJ., LiangJ., ZhangK., WangW., WangQ., ZhangY., TangH., GoolL.V.. Towards interpretable video super-resolution via alternating optimization. ECCV, 2022

[24]	HuM., JiangK., LiaoL., XiaoJ., JiangJ., WangZ.. Spatial-temporal space hand-in-hand: spatial-temporal video super-resolution via cycle-projected mutual learning. CVPR, 2022

[25]	GengZ., LiangL., DingT., ZharkovI.. RSTT: real-time spatial temporal transformer for space-time video super-resolution. CVPR, 2022 http://arxiv.org/abs/2203.14186

[26]	H. Wang, X. Xiang, Y. Tian, W. Yang, Q. Liao, STDAN: deformable attention network for space-time video super-resolution (2022). http://arxiv.org/abs/2203.06841

[27]	LiuZ., LinY., CaoY., HuH., WeiY., ZhangZ., LinS., GuoB.. Swin transformer: hierarchical vision transformer using shifted windows. CVPR, 2021 http://arxiv.org/abs/2103.14030

[28]	MasoudniaS., EbrahimpourR.. Mixture of experts: a literature survey. Artif. Intell. Rev., 2014, 42: 275-293 CrossRef Google scholar

[29]	JacobsR.A., JordanM.I., NowlanS.J., HintonG.E.. Adaptive mixtures of local experts. Neural Comput., 1991, 3: 79-87 CrossRef Google scholar

[30]	PavlitskayaS., HubschneiderC., WeberM., MoritzR., HugerF., SchlichtP., ZollnerJ.M.. Using mixture of expert models to gain insights into semantic segmentation. CVPR Workshop, 2020 1399-1406 CrossRef Google scholar

[31]	Z. Du, M. Shi, J. Deng, S. Zafeiriou, Redesigning multi-scale neural network for crowd counting (2022). http://arxiv.org/abs/2208.02894

[32]	WangY., WangL., WangH., LiP., LuH.. Blind single image super-resolution with a mixture of deep networks. Pattern Recognit., 2020, 102 CrossRef Google scholar

[33]	LiuD., WangZ., NasrabadiN., HuangT.. Learning a mixture of deep networks for single image super-resolution. ACCV, 2017 http://arxiv.org/abs/1701.00823

[34]	EmadM., PeemenM., CorporaalH.. MoESR: blind super-resolution using kernel-aware mixture of experts. WACV, 2022 4009-4018 CrossRef Google scholar

[35]	RastiR., RabbaniH., MehridehnaviA., HajizadehF.. Macular OCT classification using a multi-scale convolutional neural network ensemble. IEEE Trans. Med. Imaging, 2018, 37: 1024-1034 CrossRef Google scholar

[36]	LiuZ., MaoH., WuC.-Y., FeichtenhoferC., DarrellT., XieS.. A ConvNet for the 2020s. CVPR, 2022 http://arxiv.org/abs/2201.03545

[37]	DaiJ., QiH., XiongY., LiY., ZhangG., HuH., WeiY.. Deformable convolutional networks. CVPR, 2017 http://arxiv.org/abs/1703.06211

[38]	ShiX., ChenZ., WangH., YeungD.-Y., WongW., WooW.. Convolutional LSTM network: a machine learning approach for precipitation nowcasting. NIPS, 2015 http://arxiv.org/abs/1506.04214

[39]	ChenZ., ChenY., LiuJ., XuX., GoelV., WangZ., ShiH., WangX.. VideoINR: learning video implicit neural representation for continuous space-time super-resolution. CVPR, 2022 http://arxiv.org/abs/2206.04647

[40]	SunD., YangX., LiuM.-Y., KautzJ.. PWC-net: CNNs for optical flow using pyramid, warping, and cost volume. CVPR, 2018 http://arxiv.org/abs/1709.02371

[41]	NahS., KimT.H., LeeK.M.. Deep multi-scale convolutional neural network for dynamic scene deblurring. CVPR, 2018 http://arxiv.org/abs/1612.02177

[42]	LoshchilovI., HutterF.. SGDR: stochastic gradient descent with warm restarts. ICLR, 2017 http://arxiv.org/abs/1608.03983

[43]	WangZ., BovikA.C., SheikhH.R., SimoncelliE.P.. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process., 2004, 13: 600-612 CrossRef Google scholar

[44]	JiangH., SunD., JampaniV., YangM.-H., Learned-MillerE., KautzJ.. Super SloMo: high quality estimation of multiple intermediate frames for video interpolation. CVPR, 2018 http://arxiv.org/abs/1712.00080