๐ข ๐๐ผ๐๐ป๐ ๐๐ป๐๐๐ต๐ถ๐ป๐ด: ๐ก๐ฒ๐ ๐๐ ๐ ๐ผ๐ฑ๐ฒ๐น ๐๐ผ๐๐ป๐๐ ๐ข๐ฏ๐ท๐ฒ๐ฐ๐๐ ๐ณ๐ฟ๐ผ๐บ ๐ง๐ฒ๐
๐ ๐๐ฐ๐ฟ๐ผ๐๐ ๐ฒ ๐๐ผ๐บ๐ฎ๐ถ๐ป๐
Researchers from Tsinghua University released Count Anything, a vision model that counts objects in images based on a text query.
It uses a dual approach: a region-level counter for large sparse objects and a pixel-level counter for small crowded ones. The two outputs are fused into a single point set showing where each counted instance is.
The model covers six domains: general scenes, remote sensing, histopathology, cellular microscopy, agriculture, and microbiology.
They also built CLOC, a 220K-image dataset across 619 categories with 15M object instances to train and benchmark it on.
Count Anything substantially beats existing open-world counting methods across all six domains.
Project page: GitHub
#ComputerVision #ObjectCounting #AIResearch #Tsinghua
โโโ
๐ค ๐๐ผ๐ฟ ๐บ๐ผ๐ฟ๐ฒ ๐๐ ๐ป๐ฒ๐๐ ๐ฎ๐ป๐ฑ ๐๐๐ผ๐ฟ๐ ๐๐ผ๐๐ฟ๐ฐ๐ฒ๐, ๐๐ฒ๐ฎ๐ฟ๐ฐ๐ต "๐๐ฒ๐ป๐๐๐ฆ๐ฝ๐ผ๐" ๐ผ๐ป ๐ง๐ฒ๐น๐ฒ๐ด๐ฟ๐ฎ๐บ
ALT News article image