Research Scientist @Salesforce AI Research, Ph.D. from @SCSatCMU

Joined January 2021
Photos and videos
Jielin Qiu retweeted
Today I finally get to share something our team has been quietly grinding on for months โ€“ we've created an ๐—ผ๐—ฝ๐—ฒ๐—ป ๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ๐—ฑ ๐˜ƒ๐—ฒ๐—ฟ๐˜€๐—ถ๐—ผ๐—ป ๐—ผ๐—ณ Cursor ๐—•๐—ฒ๐—ป๐—ฐ๐—ต @cursor_ai . If youโ€™ve been following Cursorโ€™s Composer launch and their internal "Cursor Bench" for testing vibe coding models, you can think of our ๐—Ÿ๐—–๐—•๐—” ๐—ฏ๐—ฒ๐—ป๐—ฐ๐—ต as the open-source, model-agnostic counterpart. Here is what we provide by @SFResearch . With ๐—Ÿ๐—–๐—•๐—” ๐—ฏ๐—ฒ๐—ป๐—ฐ๐—ต we: โ€ข Ship a ๐—–๐˜‚๐—ฟ๐˜€๐—ผ๐—ฟ-๐˜€๐˜๐˜†๐—น๐—ฒ ๐—ฎ๐—ด๐—ฒ๐—ป๐˜ ๐˜€๐˜๐—ฎ๐—ฐ๐—ธ: ReAct loop, semantic @ codebase search, grep, file read/write, refactor tools, and a three-tier memory system inspired by production coding assistants like Cursor. โ€ข ๐—ง๐—ฎ๐—ธ๐—ฒ ๐Ÿด,๐Ÿฌ๐Ÿฌ๐Ÿฌ ๐—ฟ๐—ฒ๐—ฎ๐—น-๐˜„๐—ผ๐—ฟ๐—น๐—ฑ ๐˜ƒ๐—ถ๐—ฏ๐—ฒ ๐—ฐ๐—ผ๐—ฑ๐—ถ๐—ป๐—ด ๐˜€๐—ฐ๐—ฒ๐—ป๐—ฎ๐—ฟ๐—ถ๐—ผ๐˜€ and turn them into interactive agent gyms across 10 languages and 10Kโ€“1M token codebases. โ€ข Let you plug in any model (GPT-5, Claude Sonnet 4.5, Gemini 2.5 Pro, etc.) and see how it actually behaves on long, messy, multi-turn coding tasks. A few fun findings: Cursor-style agents with context management are surprisingly robust at 1M-token contexts, but thereโ€™s a hard trade-off between deep exploration vs. efficiency โ€” no one frontier model sits in the โ€œperfectโ€ top-right corner yet. Anthropic Claude 4.5 and Google Gemini 2.5 pro are at the Pareto Frontier. Everything is open source (agent, code, scenarios, traces, metrics) on @huggingface: ๐Ÿ“„ Tech Report: arxiv.org/pdf/2509.09614 ๐Ÿค– GitHub:github.com/SalesforceAIReseaโ€ฆ ๐Ÿค— Dataset: huggingface.co/datasets/jasoโ€ฆ If youโ€™re building coding agents, benchmarking your model against GPT/Claude/Gemini, or want to train your coding agents with RL in real coding environments, weโ€™d love for you to try LCBA bench, and tell us your findings!
2
6
7
544
Jielin Qiu retweeted
๐Ÿšจ Introducing LoCoBench-Agent: a comprehensive benchmark for evaluating LLM agents in long-context software engineering ๐Ÿ“„ Paper: bit.ly/49mPrBv ๐Ÿ”— GitHub: bit.ly/3KbpkTN โœจ Key Features: ๐Ÿค– 8,000 interactive agent scenarios with multi-turn conversations (up to 50 turns) ๐Ÿ” Context lengths: 10K-1M tokens across 10 programming languages โšก 9 bias-free evaluation metrics (5 comprehension 4 efficiency) ๐Ÿ› ๏ธ 8 specialized development tools: file operations, semantic search, grep, code analysis ๐ŸŽฏ 8 task categories: architectural understanding, cross-file refactoring, multi-session development, bug investigation, feature implementation, code comprehension, integration testing, and security analysis ๐Ÿ”ฌ Key Findings: - Fundamental comprehension-efficiency trade-off - Tool usage patterns matter more than raw capabilities - Strategic exploration > exhaustive exploration LoCoBench-Agent assesses agent behavior across extended development sessions, measuring context retention, adaptive strategy refinement, and tool usage efficiency. Authors: Jielin Qiu @Jason_Q, Zuxin Liu @LiuZuxin, Zhiwei Liu @JYJimLiu, Rithesh Murthy @rithesh__rn, Jianguo Zhang @JianguoZhang3, Haolin Chen @HaolinChen11, Shiyu Wang @shiyu04490786, Ming Zhu@ming_zhu0527, Liangwei Yang @Liangwei_Yang, Juntao Tan @chrisjtan, Roshan Ram @shoonyaka1, Akshara Prabhakar @aksh_555, Tulika Awalgaonkar @tulika614, Zixiang Chen @_zxchen_, Zhepeng Cen @ZhepengCen, Cheng Qian @qiancheng1231, Shelby Heinecke @shelbyh_ai, Weiran Yao @iscreamnearby, Silvio Savarese @silviocinguetta, Caiming Xiong @CaimingXiong, Huan Wang @huan__wang #LLM #AIAgents #SoftwareEngineering #MachineLearning #Benchmark #FutureOfAI #EnterpriseAI
4
3
13
2,410
Jielin Qiu retweeted
๐Ÿšจ Introducing LoCoBench: a comprehensive benchmark for evaluating long-context LLMs in complex software development ๐Ÿ“„ Paper: bit.ly/4ponX3P ๐Ÿ”— GitHub: bit.ly/4pvIfbZ โœจ Key Features: ๐Ÿ“Š 8,000 evaluation scenarios across 10 programming languages ๐Ÿ” Context lengths: 10K-1M tokens (100ร— variation!) โšก 17 evaluation metrics across 4 dimensions (6 newly proposed) ๐ŸŽฏ 8 essential task categories: architectural understanding, cross-file refactoring, multi-session development, bug investigation, feature implementation, code comprehension, integration testing, and security analysis Current SOTA models show dramatic performance drops as context increases - highlighting critical gaps in long-context understanding for real-world software engineering. Authors: Jielin Qiu @_Jason_Q, Zuxin Liu @LiuZuxin, Zhiwei Liu @JYJimLiu, Rithesh Murthy @rithesh__rn, Jianguo Zhang @JianguoZhang3, Haolin Chen @HaolinChen11, Shiyu Wang @shiyu04490786, Ming Zhu@ming_zhu0527, Liangwei Yang @Liangwei_Yang, Juntao Tan @chrisjtan, Zhepeng Cen @ZhepengCen, Cheng Qian @qiancheng1231, Shelby Heinecke @shelbyh_ai, Weiran Yao @iscreamnearby, Silvio Savarese @silviocinguetta, Caiming Xiong @CaimingXiong, Huan Wang @huan__wang #LLM #SoftwareEngineering #MachineLearning #Benchmark #FutureOfAI #EnterpriseAI
13
19
2,290
Jielin Qiu retweeted
19 Jan 2024
Excited to see the first paper getting accepted at @DMLRJournal. In the last few months, we are fascinated by the quality of reviews and the engaging interactions between authors and reviewers! Thanks everyone! Please continue to send your best work about Data x ML๐Ÿ˜€
'Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift' by Jielin Qiu, Yi Zhu, Xingjian Shi, Florian Wenzel, Zhiqiang Tang, Ding Zhao, Bo Li, Mu Li Action Editor: Hongyang Zhang openreview.net/forum?id=Vc1fโ€ฆ #Multimodal #Robustness #DistributionShift
3
15
2,327
19 Jan 2024
๐ŸŽŠExtremely honored to share that our paper on multimodal model robustness has been accepted as the 1st paper for the Journal of Data-centric Machine Learning Research @DMLRJournal With @yizhu59 @sxjscience @flwenz @mli65 #Multimodal #Robustness #DistributionShift
'Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift' by Jielin Qiu, Yi Zhu, Xingjian Shi, Florian Wenzel, Zhiqiang Tang, Ding Zhao, Bo Li, Mu Li Action Editor: Hongyang Zhang openreview.net/forum?id=Vc1fโ€ฆ #Multimodal #Robustness #DistributionShift
2
3
17
2,195
Jielin Qiu retweeted
๐Ÿ“š๐ŸŒŸ Evaluate any story to your heart's content with our new personalized story evaluation model, PerSE! No more worries about diverse preferences - get your own story evaluation report now! ๐Ÿ“๐ŸŽฏ arxiv.org/abs/2310.03304 1/5
1
9
30
19,071
Jielin Qiu retweeted
24 May 2023
What is missing in the text generation evaluation for BERTScore, BLERUT, COMET, SEScore & SEScore2? Explanation! Can we build a metric that not only produces a well-correlated quality score but also tell you the rationales, error type, and error location? Checkout InstructScore!
7
13
85
15,023
Jielin Qiu retweeted
๐Ÿš€ Excited to share our latest work in EMNLP main conference: "Learning from Mistakes via Interactive Study Assistant for Large Language Models". We introduce a study assistant (SALAM) to conduct thoughtful analysis on LLMs' mistakes and provide guidelines to avoid past mistakes
1
5
17
3,036
Jielin Qiu retweeted
๐Ÿ˜ญTired of in-context demos & docs for LLM tool use? ๐Ÿ’ฐToo GPU-poor to tune LLMs for unseen tools? ๐ŸคฌFrustrated with frequent syntax errors in tool calls? Check out our new preprint ๐“๐จ๐จ๐ฅ๐ƒ๐ž๐œ that addresses all these issues from the decoding side! arxiv.org/abs/2310.07075 1/5
4
32
99
36,175
Jielin Qiu retweeted
Excited to share our recent work, AnyMAL -- a unified Multimodal LLM built on LLaMA-2 that can reason over various inputs, e.g. images, audio, motion sensors. Check out our paper for more information on the model training, evaluation, safety and more! โžก๏ธ arxiv.org/abs/2309.16058
4
24
121
22,539
Jielin Qiu retweeted
17 Dec 2022
Check out our new evaluation benchmarks and metrics for robustness of image-text multimodal models! @AmazonScience #multimodal #stablediffusion
16 Dec 2022
Are Multimodal Models Robust to Image and Text Perturbations? deepai.org/publication/are-mโ€ฆ by Jielin Qiu et al. including @yizhu59 #OpenSource #ComputerVision
2
5
24
7,409
Jielin Qiu retweeted
24 Mar 2022
A topic that comes up in every interview: Bias, variance, and their relationship with machine learning algorithms. Here is a simple summary that you will easily remember. โ†“
23
208
960
Jielin Qiu retweeted
25 Mar 2022
Our #ACL2022 paper "Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions" is out (arxiv.org/abs/2203.12667)!!! It serves as a thorough reference for the VLN research community (for both starters and experts). github.com/eric-ai-lab/awesoโ€ฆ

2
27
129
Jielin Qiu retweeted
How to present a line plot? Line plots are effective for describing the relationship between two variables of interests. Unfortunately, most junior students would simply copy&paste the figure from the paper in their talk and cause much confusion. ๐Ÿ˜• Let's break it down ... ๐Ÿงต
6
106
546
Jielin Qiu retweeted
23 Feb 2022
Our team at Google Brain is looking for outstanding PhD students (expected graduation after 2023) who are interested in student researcher internships this year 2022. careers.google.com/jobs/resuโ€ฆ

1
28
89
Jielin Qiu retweeted
11 Feb 2022
The Embodied AI Lecture Series at AI2 is back! Subscribe to the mailing list for info about how to join these free lectures live, or stay tuned and we'll post the recorded sessions after the fact. Subscribe: allenai.us1.list-manage.com/โ€ฆ More info: prior.allenai.org/lectures
4
13
Jielin Qiu retweeted
I've been writing research articles for over 10 years now and one of the hardest parts is writing consistently and efficiently without procrastinating. I'm going to share some of my tips here ๐Ÿงต 1/10
77
1,350
11,422
Jielin Qiu retweeted
8 Feb 2022
AI2's computer vision team PRIOR announced an exciting new release of their #EmbodiedAI platform AI2-THOR โ€“ in partnership with @unity, you can now train headlessly on multiple GPUs. ๐Ÿ“ˆ Learn more: medium.com/ai2-blog/ai2-thorโ€ฆ
13
43
Jielin Qiu retweeted
I'm hiring an intern at Google AI team 2022! Email me (arjunakula@google.com) if you are 1) Graduating in 2023 or 2024; 2) Interested in multi-modal representation learning, language grounding; and 3) have strong publication record. #NLProc #intern #computervision #google #hiring
31
108
559
Jielin Qiu retweeted
Where do the rewards for robotic reinforcement learning come from? In this blog post we explore how using crowdsourced language annotations and videos of humans, we can learn reward functions and enable them to generalize more broadly. ai.stanford.edu/blog/reward-โ€ฆ
1
16
75