ServiceNow AI Research

ServiceNow AI Research

Users
Tweets

ServiceNow AI Research

@ServiceNowRSRCH

Jun 10

🚀 Grok Voice Think Fast 1.0 (@xAI) lands on the Pareto frontier on EVA-Bench — no system in the eval beats it on accuracy without sacrificing experience, or vice versa. 📊 Leaderboard: servicenow.github.io/eva/#re… @elonmusk #VoiceAgents #ServiceNowResearch #EVABench #GrokVoice #xAI

114,260

ServiceNow AI Research

ServiceNow AI Research

@ServiceNowRSRCH

Jun 4

⭐ 𝗘𝗩𝗔-𝗕𝗲𝗻𝗰𝗵 𝗗𝗮𝘁𝗮 𝟮.𝟬: 𝟯 𝗗𝗼𝗺𝗮𝗶𝗻𝘀, 𝟭𝟮𝟭 𝗧𝗼𝗼𝗹𝘀, 𝟮𝟭𝟯 𝗦𝗰𝗲𝗻𝗮𝗿𝗶𝗼𝘀 We just published an article detailing the major expansion we have done to the data behind EVA-Bench. 🗂️ Data: huggingface.co/datasets/Serv… 📄 Article: huggingface.co/blog/ServiceN… #VoiceAgents #OpenSource #Data #AIResearch #ServiceNowResearch

ServiceNow-AI/eva-bench · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

654

AISecHub

AISecHub

@AISecHub

13 Oct 2025

Are Firewalls All You Need, or Stronger Benchmarks? Benchmarking is critical for understanding and comparing the security of tool-calling agents. As attacks evolve and defenses adapt, researchers need consistent, realistic, and reproducible evaluation frameworks to identify true progress and avoid misleading conclusions. Several recent benchmarks, such as AgentDojo, Agent Security Bench, and InjecAgent, aim to simulate real-world attack scenarios. However, our analysis further reveals that many of these benchmarks do not model real-world situations appropriately and sometimes employ skewed metrics to gauge performance. In such cases, even weak defenses may seem deceptively effective. We highlight these limitations and fix them through our proposed standardized benchmarking best-practices. Source: arxiv.org/pdf/2510.05244 @rishika2110, @KevinKasa98, @AbhayPuri98, @GabrielHuang9, @irinarish, Graham W. Taylor, @DjDvij, @alex_lacoste_ - @ServiceNowRSRCH, @Mila_Quebec, @UMontreal, @VectorInst, @uofg #PromptInjection #LLMSecurity #AIAgents #AgentSecurity #AIEvaluation #Guardrails #CyberSecurity #AISafety #MilaQuebec #ServiceNowResearch #VectorInstitute #UniversityOfGuelph

1,520

ServiceNow AI Research

ServiceNow AI Research

@ServiceNowRSRCH

8 Oct 2025

📢 Don’t miss it tomorrow at #COLM2025! Our researcher Arjun Ashok will present 🧩 “Context is Key: A Benchmark for Forecasting with Essential Textual Information”⏳ 🕓 4:45 PM at the XTempLLMs 2025 workshop 🔗 xtempllms.github.io/2025/pro… #AIresearch #ServiceNowResearch #XTempLLMs #LLMs #Forecasting #TimeSeries #MachineLearning #GenerativeAI

675

ServiceNow AI Research

ServiceNow AI Research

@ServiceNowRSRCH

6 Oct 2025

🎉 It’s CoLM week! The Conference on Language Modeling (CoLM 2025) kicks off tomorrow in Montréal 🇨🇦🍁 Proud that ServiceNow AI Research is a main sponsor — and that our team will present 5 papers on: 📊 Multimodal reasoning 🔄 Unified AR & diffusion models 🔍 Dense retrieval 🛡️ AI safety 📈 Efficient adaptation See you there! 🚀 #COLM2025 #AIresearch #ServiceNowResearch #LLMs #GenerativeAI #MontrealAI

758

Patrice Bechard

Patrice Bechard

@patricebechard

11 Sep 2025

Exciting update from our 💫StarFlow project! 🌐 servicenow.github.io/StarFlo… When we first introduced StarFlow, we showed how Vision–Language Models (VLMs) can transform sketches and diagrams into structured workflows for automation. Today, we’re taking it a step further: we’re open-sourcing the models, dataset, and code to the community! 🎉 🔹 Fine-Tuned Models • Llama-3.2-11B-Vision-Instruct-StarFlow (huggingface.co/ServiceNow/Ll…) • Pixtral-12B-2409-StarFlow (huggingface.co/ServiceNow/Pi…) • Qwen2.5-VL-7B-Instruct-StarFlow (huggingface.co/ServiceNow/Qw…) 🗂️ Large & Diverse Dataset • BigDocs-Sketch2Flow (huggingface.co/datasets/Serv…) 💻 Training & Evaluation Code • Includes our custom metrics & benchmarks • Available on github.com/ServiceNow/StarFl… We hope these resources empower researchers and practitioners to push the boundaries of vision-language reasoning and enterprise automation. #AI #OpenSource #VisionLanguageModels #SketchToFlow #WorkflowAutomation #GenerativeAI @ServiceNowResearch

StarFlow: Generating Structured Workflow Outputs From Sketch Images

VLMs that turn sketches into structured workflows. Dataset fine-tuned models released.

servicenow.github.io

Patrice Bechard

@patricebechard

28 May 2025

🚀 New paper from our team at @ServiceNowRSRCH!⁣ ⁣ 💫𝐒𝐭𝐚𝐫𝐅𝐥𝐨𝐰: 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐧𝐠 𝐒𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞𝐝 𝐖𝐨𝐫𝐤𝐟𝐥𝐨𝐰 𝐎𝐮𝐭𝐩𝐮𝐭𝐬 𝐅𝐫𝐨𝐦 𝐒𝐤𝐞𝐭𝐜𝐡 𝐈𝐦𝐚𝐠𝐞𝐬⁣ We use VLMs to turn 𝘩𝘢𝘯𝘥-𝘥𝘳𝘢𝘸𝘯 𝘴𝘬𝘦𝘵𝘤𝘩𝘦𝘴 and diagrams into executable workflows. 🖍️→⚙️⁣ ⁣ 🔗arxiv.org/abs/2503.21889⁣ 📝tinyurl.com/3utdbn97⁣ #Sketch2Flow #AI #VLM

0:52

1,281

Florian Brand

Florian Brand

@xeophon

11 Jul 2025

Replying to @DBahdanau @ServiceNowRSRCH

wow, you were the face of servicenowresearch for me can't wait to see where you end up, congrats:)

611

Perouz Taslakian

Perouz Taslakian @PerouzT

15 Apr 2025

📣📣📣 We just dropped Test Split 3️⃣ of RepLiQA — our Q&A dataset built to really test LLMs on unseen, made-up content. 🚀Great for RAG, context reasoning & in-context learning 🚀 huggingface.co/datasets/Serv… #ServiceNowResearch

663

Alexandre Lacoste

Alexandre Lacoste @alex_lacoste_

10 Dec 2024

Very excited to see this work coming out from #ServiceNowResearch. Can't wait to try the trained VLM in #AgentLab.

Joan Rodriguez

@joanrod_ai

10 Dec 2024

🎉 Excited to introduce BigDocs! An open, transparent multimodal dataset designed for: 📄 Documents 🌐 Web content 🖥️ GUI understanding 👨‍💻 Code generation from images We’re also launching BigDocs-Bench, featuring 10 tasks to test models on: ➡️ Document, Web, GUI Visual reasoning ➡️ Converting images into JSON, Markdown, LaTeX, SVG, and more! 📜 Paper: arxiv.org/pdf/2412.04626 huggingface.co/papers/2412.0… 🌍 Website bigdocs.github.io/

539

Massimo Caccia

Massimo Caccia

@MassCaccia

12 Nov 2024

🚨 Preprint Alert! 🚨 It's 12 hours before your conference deadline. Tic, toc. ⏰ You're obviously last minute and need to write code for some fancy plots. 📊 You counted on your coding assistant to do the heavy lifting, but it's not version-aware. 🤖❌ You keep hitting relentless matplotlib plot errors. 🐛 Tic, toc. Panic sets in. 😱 🚀 Introducing GitChameleon 🦎 Our new benchmark tests large language models (LLMs) on their ability to generate version-specific code. We curated 116 Python code completion problems, each tied to specific library versions, complete with executable unit tests. Why Does Version Awareness Matter? LLMs are great at generating code, but they often fail when library versions change. This can lead to non-functional code, wasting precious time—especially when deadlines loom! 🕒 The Challenge: Software libraries evolve rapidly. Matplotlib, NumPy, PyTorch—you name it. If your code assistant isn't aware of version-specific changes, you could be in for a world of debugging pain. 😩 What GitChameleon Brings to the Table: * Version-Specific Problems: Focuses on real-world issues like deprecated functions and API updates. * Execution-Based Evaluation: Goes beyond static code analysis to test actual functionality. * Popular Libraries Covered: Matplotlib, NumPy, PyTorch, Pandas, and more. Key Findings: We tested state-of-the-art LLMs, including GPT-4o, Gemini, DeepSeekCoder v2, and others. * Performance Was Underwhelming: GPT-4o achieved a pass@10 of only 39.9%. * Error Feedback Helps Slightly: With error feedback, GPT-4o improved to 43.7%. * Low Correlation with Other Benchmarks: The correlation of GitChameleon with representative code benchmarks was low. The Spearman correlation coefficients with HumanEval, EvalPlus, and BigCodeBench-Hard split were 0.37, 0.50, and 0.36, respectively. This highlights the unique challenges in version-specific code generation. Types of Version Changes Tested: * Function Name Changes * Argument/Attribute Changes * Semantic/Behavioral Changes (avg pass@10: ~9.3% 😱). * New Features/Dependencies Paper: huggingface.co/papers/2411.0… Code: github.com/NizarIslah/GitCha… Thanks to first authors @nizar_islah and Justine G, and to @irinarish, @NeuralEnsemble, @terryyuezhuo @ServiceNowResearch @MILA (yes, I did pay 3.75$ to write a long post 😛)

12,914

Gaurav Sahu

Gaurav Sahu

@dem_fier

10 Dec 2023

Excited to present our #EMNLP2023 paper, PromptMix: Class Boundary Augmentation Method for Large Language Model Distillation! I’m presenting it in the East Foyer. Come say hi! paper: arxiv.org/pdf/2310.14192.pdf code: github.com/ServiceNow/Prompt… #UWCheritonCS #ServiceNowResearch

3,477