Abaka AI

Abaka AI

132 Photos and videos

Tweets

Pinned Tweet

Abaka AI

@AbakaAI_Tech

May 28

🚨CVPR 2026 Accepted ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding in collaboration with @MIT & @IBM 1.5M-sample open-source dataset for robust chart understanding Each sample aligns a chart image, plotting code, CSV/table data, natural-language summary, and QA with reasoning Full breakdown👇: #CVPR26 #CVPR2026

1,058

Abaka AI

Abaka AI

@AbakaAI_Tech

Jun 6

Buzzing at #CVPR2026 Booth #701 Come by for free swags and AI chats

153

Abaka AI

Abaka AI

@AbakaAI_Tech

Jun 5

Kicked off #CVPR2026 with the DataMFM workshop 📍 On June 3rd, the Emerging Directions in Data for Multimodal Foundation Models Workshop brought together speakers from top labs and universities Ranjay Krishna @RanjayKrishna, Ziwei Liu @liuziwei7, Aishwarya Agrawal @aagrawalAA, Yilun Du @du_yilun, and challenge winners It is now clear that the next big leaps in multimodal AI will come from better foundations! Huge thanks to all speakers, organizers, challenge participants, and attendees for making the workshop a success. Abaka AI was proud to support the workshop as an organizer and sponsor🚀 If you missed it, catch highlights and workshop details here: datamfm.github.io/

299

Abaka AI

Abaka AI

@AbakaAI_Tech

Jun 5

Learn more insights on multimodal data: abaka.ai/?utm_source=x&utm_m…

Abaka AI | Abaka AI - AI Data Annotation & Solution - Your Data Partner In The AI Industry

Abaka AI offers data collection, data cleaning, data annotation, and high-quality datasets for world-class Automobile AI, Generative AI, and Embodied AI industry leaders.

abaka.ai

Abaka AI

Abaka AI

@AbakaAI_Tech

Jun 3

✈️We're heading to #CVPR2026 🏄Booth #701 with free swags 📡DataMFM workshop with insights and prizes 🪩A Happy Hour with food, drinks, and relaxed chats DataMFM workshop June 3rd, Room 111, 1:00–6:00 PM Happy Hour (spots are limited) June 6th Don’t forget to pass by and bring your perspective DataMFM: datamfm.github.io/ Happy hour, register now: luma.com/rcz2dzht

154

Abaka AI

Abaka AI

@AbakaAI_Tech

Jun 1

ChartNet: a 1.5M-sample open-source dataset for chart understanding With an aligned chart image, plotting code, CSV/table data, summary, and QA with reasoning, Improves chart reconstruction, data extraction, summarization, and chart QA across model sizes. Huge thanks to respected collaborators @kondic_jovana, @RogerioFeris, @AudeOliva, @ZihanWang123, and everyone involved! The dataset is out on Hugging Face. Test it, break it, and we'll see you at #CVPR2026 to discuss!

Abaka AI

@AbakaAI_Tech

May 28

191

Abaka AI

Abaka AI

@AbakaAI_Tech

May 28

1,058

more replies

Abaka AI

Abaka AI

@AbakaAI_Tech

May 28

The gains to public benchmarks Granite-Vision-2B improved from 1.6 to 12.4 BLEU on ChartCap from 30.8 to 58.4 on ChartMimic-v2

140

Abaka AI

Abaka AI

@AbakaAI_Tech

May 28

From describing what they see to recovering the structure underneath For document AI, analytics, scientific plots, business dashboards, and visual reasoning, that difference is huge Paper: arxiv.org/abs/2603.27064 Dataset: huggingface.co/datasets/ibm-…

ChartNet: A Million-Scale, High-Quality Multimodal Dataset for...

Understanding charts requires models to jointly reason over geometric visual patterns, structured numerical data, and natural language -- a capability where current vision-language models (VLMs)...

arxiv.org

133

Abaka AI

Abaka AI

@AbakaAI_Tech

May 16

🚀Abaka AI is heading to @CVPR 2026 Meet our data experts at Booth #701. And join us June 6 for a curated AI happy hour near the venue. Local drinks. Right people. Late conversations. Limited spots, register now: luma.com/rcz2dzht #CVPR2026 #AI #GenAI #AbakaAI

395

Abaka AI

Abaka AI

@AbakaAI_Tech

May 11

Why are Coding Agents hitting a "glass ceiling"? It’s not model scale—it’s post-training data quality. We built AB-Terminal Bench to prove it. Using just 1.8k curated samples, we boosted @Alibaba_Qwen 3-32B’s Pass@1 from 3.37% → 17.24%. 🚀 The secret? Data must be: ✅ Containerized ✅ Pytest-verified ✅ Reproducible Generating tasks is easy; providing a true discriminative signal is the hard part. 👉🏻Full blog: lnkd.in/gEY_J5GS #AI #AbakaAI #CodingAgents #LLM

164

Abaka AI

Abaka AI

@AbakaAI_Tech

May 1

ICLR 🇧🇷 was loud. The signals weren’t. Where models break. How reasoning holds. Why data sits at the center. Small. Curated. Right people. What comes next is taking shape. @iclr_conf #AI #AbakaAI #ICLR #MachingLearning #LLM #Data

206

Abaka AI

Abaka AI

@AbakaAI_Tech

Apr 10

The Full-Duplex Paradox is 2026 AI architectures with 2004 data Even state-of-the-art models like Moshi rely on the Fisher corpus 8kHz telephone speech from 20 years ago. The hardware has evolved, but inside is the same old (in every sense) data @nvidia's PersonaPlex paper correctly diagnoses the core bottleneck: finding speech with natural interruptions where speakers are separated at the source. Synthetic TTS is a band-aid. True conversational AI needs the "messiness" of real human overlap. Enter the Abaka Bidirectional Speech Corpus: ✅ 20,000 Hours (10x larger than legacy sets) ✅ 7 Global Languages ✅ Hardware-synced dual-channel isolation ✅ Deeply annotated for conversational nuances Building Spoken Dialogue Models? DM for a technical sample pack📦

221

Abaka AI

Abaka AI

@AbakaAI_Tech

Apr 10

Full article: abaka.ai/blog/bidirectional-…

Abaka AI | Abaka AI - AI Data Annotation & Solution - Your Data Partner In The AI Industry

Abaka AI offers data collection, data cleaning, data annotation, and high-quality datasets for world-class Automobile AI, Generative AI, and Embodied AI industry leaders.

abaka.ai

111