Solving Video Inverse Problems Using Image Diffusion Models

KAIST
arXiv Code

Solving video inverse problems using only image diffusion models,

with batch-consistent sampling stretagy.

Reconstruction Results for temporal degradations

 (256x256 ADM)

Abstract

Diffusion model-based inverse problem solvers (DIS) have emerged as state-of-the-art approaches for addressing inverse problems. However, their application to video inverse problems arising from spatio-temporal degradation remains largely unexplored. In response, we introduce an innovative video inverse solver using only image diffusion models. Specifically, our method treats the time dimension of a video as the batch dimension of image diffusion models, thereby solving spatio-temporal optimization problems within denoised spatio-temporal batches derived from each image diffusion model. We address the batch inconsistency issue in diffusion models by controlling batch-stochasticity, thereby enabling batch-consistent sampling.
Ours 😁
Video reconstruction Better Reconstruction Quality
& Batch-Consistent Samples
Memory effieciency Requires 13GB VRAM for 16-frame videos, maximum 32-frame in 24 GB VRAM.
Accessibility Using open-sourced image diffusion model (ADM)
Experimental results demonstrate that our method effectively addresses various temporal and spatial degradations in video inverse problems, achieving state-of-the-art reconstructions.

Batch-consistent Sampling Stretagy

1

Geometric illustration of the sampling path evolution. (a) Common batch sampling in image diffusion models is batch-independent. (b) In standard batch-consistent sampling, identical images are sampled by synchronizing the stochastic noise components. (c) Frame-dependent perturbation through multi-step CG ensures that the batch-consistent sampling meets the spatio-temporal data consistency.

Method

2

Illustration of the intermediate sampling process in our video inverse problem solver that leverages only image diffusion models. By taking multi-step CG in the video space of the Tweedie denoised manifold, we iteratively solve the video inverse problem. We synchronize the stochastic noise components in the reverse diffusion process to ensure batch-consistent sampling with state-of-the-art reconstruction performance.

Experimental results

As demonstrated by the teaser videos, we successfully solved temporal degradations with various PSFs. Even with the additional spatial degradations, our method demonstrates robustness against various combinations of spatio-temporal degradations.

Reconstruction Results for spatio-temporal degradations

 (256x256 ADM)

Visualization with Comparative Methods

2

We provide visualization of reconstruction results with comparative methods in video format. We also provide ablation study results.