๐๐๐ฒ๐ฟ๐๐ผ๐ป๐ฒ ๐๐ฎ๐ป๐๐ ๐ฐ๐ ๐๐. But true native-4K data is still surprisingly scarce.
๐ Excited to share our new work: ๐ฐ๐๐๐ฆ๐๐: ๐ ๐๐ฎ๐ฟ๐ด๐ฒ-๐ฆ๐ฐ๐ฎ๐น๐ฒ ๐๐ฎ๐๐ฎ๐๐ฒ๐ ๐ณ๐ผ๐ฟ ๐ฐ๐ ๐๐บ๐ฎ๐ด๐ฒ ๐ฅ๐ฒ๐๐๐ผ๐ฟ๐ฎ๐๐ถ๐ผ๐ป ๐ฎ๐ป๐ฑ ๐๐ฒ๐ป๐ฒ๐ฟ๐ฎ๐๐ถ๐ผ๐ป, accepted to CVPR 2026 DataCV.
Most public datasets are built around sub-1K, HD, or 2K images. But at 4K resolution, small artifacts become big problems: blurry textures, distorted boundaries, repeated patterns, and missing fine details.
To address this gap, we introduce ๐ฐ๐๐๐ฆ๐๐, a large-scale native-4K dataset and benchmark for high-resolution restoration and generation.
๐ ๐ฐ๐๐๐ฆ๐๐ ๐ถ๐ป๐ฐ๐น๐๐ฑ๐ฒ๐:
โ
129K native-4K training images
โ
2K validation images and 1,984 test images
โ
Diverse categories: nature, urban scenes, people, food, artwork, CGI, and more
โ
Aligned 4K imageโtext pairs for generative modeling
โ
Paired LR/HR evaluation sets for super-resolution
We also build a multi-stage curation pipeline combining resolution filtering, LMM-based quality scoring, texture-richness filtering, and human verification.
Across classical SR, real-world blind SR, and 4K text-to-image generation, fine-tuning on 4KLSDB consistently improves fidelity, local detail, perceptual quality, and human preference.
๐ก Main takeaway: ๐ป๐ฎ๐๐ถ๐๐ฒ-๐ฐ๐ ๐๐๐ฝ๐ฒ๐ฟ๐๐ถ๐๐ถ๐ผ๐ป ๐บ๐ฎ๐๐๐ฒ๐ฟ๐.
As visual AI moves toward ultra-high-resolution restoration and generation, we need datasets and benchmarks that expose the fine-scale failures hidden by low-resolution evaluation.
๐ ๐ฃ๐ฟ๐ผ๐ท๐ฒ๐ฐ๐ ๐ฝ๐ฎ๐ด๐ฒ:
4klsdb.github.io
๐ป ๐๐ถ๐๐๐๐ฏ:
github.com/taco-group/4KLSDB
๐ฝ ๐๐ฎ๐๐ฎ๐๐ฒ๐:
huggingface.co/datasets/Singโฆ
#ComputerVision #GenerativeAI #ImageRestoration #SuperResolution #TextToImage #DiffusionModels #Dataset #Benchmarking #TAMU