SeedVR2 Upscaler — Image & Video

Support for single image, batch images, and MP4 video upscaling.

This application utilizes the ComfyUI-SeedVR2_VideoUpscaler backend logic for inference.

Neither CUDA (NVIDIA) nor MPS (macOS) was detected. Processing will be extremely slow.

Recommendation: Clone this repository to a local machine with a GPU for full functionality.
If running online (CPU): Please process Images Only.
Model Selection: Use GGUF 3B models or 7B (Q4_K_M) quantization. Heavier models will likely fail.

Input Images (Batch Support)

Preset mode

Automatically adjusts compilation, tiling, and offload settings based on your hardware capabilities.

Target Resolution (Short Side)

Target short-side resolution in pixels. The aspect ratio is preserved.

256 4096

Max resolution (0=unlimited)

Maximum resolution for any edge. Scales down if exceeded. 0 = no limit.

Output Format (Default: webp)

Format for saved images. For video input, the CLI produces MP4 (or PNG sequence), and this app converts the final result if needed.

Seed

Random seed for reproducibility.

Loop Count (Images only, Repeat Upscale)

Run SeedVR2 N times per image. If downscale is checked, it applies before EACH run to progressively upscale/refine.

1 5

Use GGUF-quantized models (gguf)

When checked, the DiT model dropdown will show GGUF models from cmeka/SeedVR2-GGUF. Efficient for lower VRAM.

DiT model (Format: RepoID/Filename)

DiT transformer model. 7B models have higher quality but require more memory than 3B models.

Color correction

Method to match colors. 'lab' (perceptual, recommended), 'wavelet' (frequency-based), 'adain' (statistical), etc.

Input Noise Scale

Input noise injection scale (0.0-1.0). Adds variation to input images.

0 1

Latent Noise Scale

Latent noise injection scale (0.0-1.0). Adds variation to latent space.

0 1

Pre-downscale image (Images only, removes noise/artifacts)

Reduces image size before upscaling. Helps remove JPEG artifacts or noise as noted in community tips.

Downscale factor

0.5 means the input is resized to 50% size before being upscaled to target resolution.

0.1 0.9

Ready 0.0%

Download Result

CLI logs (streaming)