Global Spatial-Temporal Information-based Residual ConvLSTM for Video Space-Time Super-Resolution

Date

2025-02-18

Advisors

Journal Title

Journal ISSN

ISSN

Volume Title

Publisher

IEEE

Type

Article

Peer reviewed

Yes

Abstract

By converting low-frame-rate, low-resolution videos into high-frame-rate, high-resolution ones, space-time video super-resolution techniques can enhance visual experiences and facilitate more efficient information dissemination. We propose a convolutional neural network (CNN) for space-time video super-resolution, namely GIRNet. Our method combines long-term global information and short-term local information from the video to better extract complete and accurate spatial-temporal information To generate highly accurate features and thus improve performance, the proposed network integrates a feature-level temporal interpolation module with deformable convolutions and a global spatial-temporal information-based residual convolutional long short-term memory (convLSTM) module. In the feature-level temporal interpolation module, we leverage deformable convolution, which adapts to deformations and scale variations of objects across different scene locations. This provides a more efficient solution than conventional convolution for extracting features from moving objects. Our network effectively uses forward and backward feature information to determine inter-frame offsets, leading to the direct generation of interpolated frame features. In the global spatial-temporal information-based residual convLSTM module, the first convLSTM is used to derive global spatial-temporal information from the input features, and the second convLSTM uses the previously computed global spatial-temporal information feature as its initial cell state. This second convLSTM adopts residual connections to preserve spatial information, thereby enhancing the output features. Experiments on the Vimeo90K dataset show that the proposed method outperforms open source state-of-the-art techniques in peak signal-to-noise-ratio (by 1.45 dB, 1.14 dB, and 0.2 dB over STARnet, TMNet, and 3DAttGAN, respectively), structural similarity index(by 0.027, 0.023, and 0.006 over STARnet, TMNet, and 3DAttGAN, respectively), and visual quality.

Description

The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.

Keywords

Video space-time super-resolution, Deformable convolution, ConvLSTM

Citation

Fu, C., Yuan, H., Jiang, S., Zhang, G., Shen, L. and Hamzaoui, R. (2024) Global Spatial-Temporal Information-based Residual ConvLSTM for Video Space-Time Super-Resolution. IEEE Transactions on Multimedia,

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International
http://creativecommons.org/licenses/by-nc-nd/4.0/

Research Institute

Institute of Sustainable Futures