2077AI

2077AI

44 Photos and videos

Tweets

Pinned Tweet

2077AI

@2077AI

Jun 3

The Rising Star Award has been announced! Congratulations to Yining Hong @yining_hong , Rising Star Awardee for Spatial Intelligence, and to finalists Zhiyang Dou, Jiafei Duan @DJiafei , Hezhen Hu, Tiange Xiang @xxtiange , and Junyi Zhang @junyi42 . The awards are supported by 2077AI, with up to USD 30,000 in research gift funding to the awardee's institution and USD 2,000 in API credits for each finalist, helping early-career researchers further develop promising ideas in spatial intelligence. Full Rising Star list: e2e3d.github.io/rising_star.… Join the E2E3D Workshop today: 13:00–18:00 · Room 501 E2E3D Workshop: e2e3d.github.io/index.html

14,761

2077AI

2077AI

@2077AI

Jun 6

Come meet 2077AI at #CVPR2026 Booth #716. Stop by to talk about benchmarks, multimodal data, evaluation, and open research collaboration. Researchers, engineers, students, and friends are all welcome. @ZihanWang123 @Rubyzx67 @H7803954325458 @Neutrino_l

327

2077AI

2077AI

@2077AI

Jun 5

Wrapped up the DataMFM Workshop at #CVPR2026! The workshop brought together discussions on the data foundations of multimodal AI, aiming to help define the foundation of next-generation multimodal data ecosystems. Congratulations to the DataMFM Challenge winners! Many thanks to our speakers @RanjayKrishna @liuziwei7 @du_yilun @aagrawalAA, workshop organizers Pengyuan Li from @MITIBMLab @ZexueHe @ZihanWang123 @Rubyzx67 @WenhuChen @ManlingLi_ @RogerioFeris, the challenge organizers @aa_blueshark @HanSineng @H7803954325458, and everyone who joined us! DataMFM Workshop: datamfm.github.io/ #CVPR2026 #DataMFM #2077AI

377

2077AI

2077AI

@2077AI

Jun 3

Great to see this level of engagement at the E2E3D Workshop @ CVPR in Denver! The room is at full capacity, with many attendees standing and many more gathered outside the room. Huge thanks to our speakers @lucacarlone1 @jiajunwu_cs @pesarlin @geopavlakos @drmapavone and all the organizers @zhiwen_fan_ @QianqianWang5 @CongWenyan0320 @ManlingLi_ @_vztu @Neutrino_l @ZihanWang123. This workshop is sponsored by 2077AI, and we remain committed to advancing frontier research in spatial intelligence and embodied AI. Workshop: e2e3d.github.io/index.html #CVPR2026 #2077AI

1,025

2077AI

2077AI

@2077AI

Jun 1

Current VLM pipelines have a major blind spot in long-horizon household tasks: noisy auto-labels and weak spatial grounding. These errors compound fast, leading to object hallucinations, skipped steps, weak physical reasoning, and loss of task intent over time. Our #CVPR2026 paper, EgoTL, addresses this data bottleneck: Paper: arxiv.org/abs/2604.09535

414

more replies

2077AI

2077AI

@2077AI

Jun 1

EgoTL-Bench evaluates 6 egocentric capabilities across 3 reasoning layers: memory-conditioned planning scene-aware action reasoning next-action prediction action recognition direction recognition distance estimation It also evaluates world models on 60-second egocentric rollouts.

104

2077AI

2077AI

@2077AI

Jun 1

Results: EgoTL improves both VLM reasoning and world-model rollouts. With LoRA fine-tuning on ~1.2k curated Q&A pairs, Qwen2.5-VL-7B surpasses the strongest tested baselines across the reported EgoTL-Bench metrics. Distance estimation is the clearest gain: 39.45% MRA, nearly 2x the best pre-finetuning setup. Fine-tuned COSMOS-Predict2 also improves long-horizon rollouts, better following think-aloud instructions while preserving object identity and scene layout. Project: ego-tl.github.io/ Dataset:huggingface.co/datasets/luuu…

EgoTL: Egocentric Think-Aloud Chains for Long-Horizon Tasks

EgoTL is an egocentric benchmark for long-horizon household tasks with think-aloud reasoning, spatial annotations, and multimodal foundation model evaluation.

ego-tl.github.io

2077AI

2077AI

@2077AI

May 31

ChartNet, our paper developed with @IBM and @MIT has been accepted at #CVPR2026. It introduces 1.5M multimodal chart samples across 24 chart types and 6 plotting libraries. Core question: how do we train models to connect a chart image with the numbers, code, text, and reasoning behind it?🧵

163

more replies

2077AI

2077AI

@2077AI

May 31

On public benchmarks, Granite-Vision-2B ChartNet improves: ChartCap: 1.6 → 12.4 BLEU ChartMimic-v2: 30.84 → 58.42 Covering chart summarization and chart-to-code translation.

2077AI

2077AI

@2077AI

May 31

Full paper and dataset: arxiv.org/abs/2603.27064 huggingface.co/datasets/ibm-…

ChartNet: A Million-Scale, High-Quality Multimodal Dataset for...

Understanding charts requires models to jointly reason over geometric visual patterns, structured numerical data, and natural language -- a capability where current vision-language models (VLMs)...

arxiv.org