We put in place a rigorous and meticulous filtering, deduplicating, and re-captioning pipeline to create MONET:
⛽ Sourced from 2.9B images from open datasets (LAION, COYO, etc.)
✅ Filtered for high-res, aesthetics & strict safety/NSFW standards
👬 Deduplicated & stripped of stock/watermarked images
💬 Re-captioned using 4 top VLMs for rich, diverse text descriptions
🕹️ Augmented with safe, permissive synthetic data