solving-video-inverse

Abstract

Diffusion model-based inverse problem solvers (DIS) have emerged as state-of-the-art approaches for addressing inverse problems. However, their application to video inverse problems arising from spatio-temporal degradation remains largely unexplored. In response, we introduce an innovative video inverse solver using only image diffusion models. Specifically, our method treats the time dimension of a video as the batch dimension of image diffusion models, thereby solving spatio-temporal optimization problems within denoised spatio-temporal batches derived from each image diffusion model. We address the batch inconsistency issue in diffusion models by controlling batch-stochasticity, thereby enabling batch-consistent sampling.

Ours 😁

Video reconstruction Better Reconstruction Quality
& Batch-Consistent Samples

Memory effieciency Requires 13GB VRAM for 16-frame videos, maximum 32-frame in 24 GB VRAM.

Accessibility Using open-sourced image diffusion model (ADM)

Experimental results demonstrate that our method effectively addresses various temporal and spatial degradations in video inverse problems, achieving state-of-the-art reconstructions.

Batch-consistent Sampling Stretagy

Geometric illustration of the sampling path evolution. (a) Common batch sampling in image diffusion models is batch-independent. (b) In standard batch-consistent sampling, identical images are sampled by synchronizing the stochastic noise components. (c) Frame-dependent perturbation through multi-step CG ensures that the batch-consistent sampling meets the spatio-temporal data consistency.

Method

Illustration of the intermediate sampling process in our video inverse problem solver that leverages only image diffusion models. By taking multi-step CG in the video space of the Tweedie denoised manifold, we iteratively solve the video inverse problem. We synchronize the stochastic noise components in the reverse diffusion process to ensure batch-consistent sampling with state-of-the-art reconstruction performance.

Experimental results

As demonstrated by the teaser videos, we successfully solved temporal degradations with various PSFs. Even with the additional spatial degradations, our method demonstrates robustness against various combinations of spatio-temporal degradations.

Visualization with Comparative Methods

We provide visualization of reconstruction results with comparative methods in video format. We also provide ablation study results.

Solving Video Inverse Problems Using Image Diffusion Models

Solving video inverse problems using only image diffusion models,

with batch-consistent sampling stretagy.

Reconstruction Results for temporal degradations

(256x256 ADM)

“16-frames 256x256 13-frame averaged”

“16-frames 256x256 13-frame averaged”

“16-frames 256x256 13-frame averaged”

“16-frames 256x256 13-frame averaged”

“16-frames 256x256 7-frame averaged”

“16-frames 256x256 7-frame averaged”

“16-frames 256x256 7-frame averaged”

“16-frames 256x256 7-frame averaged”

“16-frames 256x256 convolved with Gaussian PSF”

“16-frames 256x256 convolved with Gaussian PSF”

“16-frames 256x256 convolved with Gaussian PSF”

Abstract

Batch-consistent Sampling Stretagy

Method

Experimental results

Reconstruction Results for spatio-temporal degradations

(256x256 ADM)

“SRx4 & 7-frame averaged”

“SRx4 & 7-frame averaged”

“SRx4 & 7-frame averaged”

“Deblur (σ = 2.0) & 7-frame averaged”

“Deblur (σ = 2.0) & 7-frame averaged”

“Deblur (σ = 2.0) & 7-frame averaged”

“Deblur (σ = 2.0) & 7-frame averaged”

“Inpaint (50%) & 7-frame averaged”

“Inpaint (50%) & 7-frame averaged”

“Inpaint (50%) & 7-frame averaged”

“Inpaint (50%) & 7-frame averaged”

Visualization with Comparative Methods

	Ours 😁
Video reconstruction	Better Reconstruction Quality & Batch-Consistent Samples
Memory effieciency	Requires 13GB VRAM for 16-frame videos, maximum 32-frame in 24 GB VRAM.
Accessibility	Using open-sourced image diffusion model (ADM)