Ilia

Ilia

Users
Tweets

Björn Schotte  retweeted

Ilia

@iliaa

Jun 9

Ran /codesage-review with Fable 5 (High effort) over three of my PHP extensions to see how benchmarks compare to reality. Biggest takeaway: AI code review isn't one-and-done, and not much of a change from Opus 4.8. Every repo needed multiple passes, and each pass found real bugs the previous one walked straight past, including bugs the model introduced in its own fixes. Per-repo breakdown below.

191

Polsia

Polsia

@polsia

Jun 7

Built CodeSage today. It monitors your repos 24/7, flags bugs before they ship, and sends you fixes. Your codebase's senior dev who never sleeps.

nav_tw

nav_tw @LonareNavneet

Apr 7

Used to spend hours understanding new codebases. Built CodeSage 🧠 Ask your repo questions in plain English → get clear answers. FAISS RAG LLM Node React Open source 👇 github.com/snipeet03/CodeSag… #buildinpublic #OpenSource #DevTools #AI

GitHub - snipeet03/CodeSage

Contribute to snipeet03/CodeSage development by creating an account on GitHub.

github.com

Redha | رضا

Redha | رضا @Redha_twe

Mar 9

أداة ثورية لتحليل الأكواد بلحظات! مشروع CodeSage AI بات قادرًا على فحص أي كود، اكتشاف الأخطاء واقتراح التحسينات تلقائيًا. 🎯 لماذا يهمك؟ ستوفر ساعات من العمل للمطورين، وتقلل الأخطاء البرمجية قبل الإطلاق. #AI #GitHub #OpenSource #TechNews

codeSage

codeSage

@wisdom_ikoi

Jan 29

How to laugh a product in the shortest time possible! #startup #founder #ai #codesage #productlaunch

3:04

Teachable Machine

Teachable Machine @TeachableAI

23 Jun 2025

Researchers are using powerful language models to make searching and recommending code easier! They tested different models on various datasets, looking at how well they find the right code snippets. A model called Codesage-small-v2 did really well on one dataset, while BGE-base and GIST-base performed similarly on another. Starcoder2-7B worked across multiple programming languages for matching code and identifying its parts. arxiv.org/abs/2506.15655 #ArtificialIntelligence

cAST: Enhancing Code Retrieval-Augmented Generation with...

Retrieval-Augmented Generation (RAG) has become essential for large-scale code generation, grounding predictions in external code corpora to improve actuality. However, a critical yet...

arxiv.org

hashim alsharif

hashim alsharif

@hashim1

18 Jun 2025

Built CodeSage, a 19M param LLM for code understanding. Been hacking on tokenizers and transformers… maybe one day I’ll build a real model haha. Gonna deploy it, do some reinforced learning, feed it more data. Just need cloud credits lol

1,054

meng shao

meng shao

@shao__meng

5 Dec 2024

Voyage-code-3：更精准、更高效的新一代代码检索引擎「Voyage AI推出新一代代码检索模型，通过创新的维度压缩和量化技术，在显著提升检索准确率(超越OpenAI 13.8%)的同时，大幅降低了存储和计算成本，为代码搜索领域带来突破性进展」 1. Voyage AI发布了新一代代码检索模型 voyage-code-3，性能显著提升： - 比OpenAI的模型平均高出13.80% - 比CodeSage的模型平均高出16.81% - 支持更长的上下文长度(32K tokens) 2. 创新特性： - 支持多种维度的嵌入(2048/1024/512/256维) - 提供多种量化格式，可以大幅降低存储成本 - 采用"套娃式学习"(Matryoshka learning)技术，一个向量可以灵活用于不同长度 3. 实际优势： - 存储成本大幅降低：使用8位或1位存储可以分别节省4倍或32倍空间 - 性能损失小：即使使用压缩后的格式，检索质量仍然保持在较高水平 - 适配多种主流向量数据库，如Milvus、Qdrant等 4. 训练与评估： - 使用了更大更多样的代码训练数据 - 覆盖300多种编程语言 - 在238个数据集上进行了全面测试 - 支持多种代码检索场景：文本到代码、代码到代码、文档到代码等这个发布对开发者和企业的意义： - 可以用更低的成本获得更好的代码检索效果 - 在保持高性能的同时大幅降低存储和计算成本 - 提供了更灵活的部署选项，可以根据需求选择不同的维度和存储格式这是代码检索领域的一个重要进展，特别是在效率和成本方面取得了显著突破。他们提供前2亿个token免费使用，开发者可以通过其文档开始尝试。

Voyage AI by MongoDB

@VoyageAI

5 Dec 2024

📢 Announcing voyage-code-3 embedding model! 1. more accurate: 14% gain over OpenAI-v3-large 2. flexible dimension (Matryoshka): 256-2048 3. quantized embeddings: float, int8, binary 4. new Pareto frontier: (binary,256 dim.) is 6% better than OpenAI (float,3072 dim.) 🧵🧵

1,042

Voyage AI by MongoDB

Voyage AI by MongoDB

@VoyageAI

4 Dec 2024

We evaluated various embedding models, @OpenAI , @awscloud CodeSage, CodeRankEmbed, @JinaAI_ v2 code, along with the @Voyage AI’s newly released voyage-code-3 (blog.voyageai.com/2024/12/04…) on these datasets:

772

ChainIDE

ChainIDE

@ChainIDE

8 Jul 2024

ChainIDE had the pleasure of being part of the @HackQuest_ x @arbitrum IRL Bootcamp in Kolkata! The energy and passion of the developers were incredible. We're thrilled to have been a guest for this Partner-sharing session! 🎉 We showcased how to use the AI capabilities of ChainIDE-CodeSage for full-stack and AI-driven Dapp development, from front-end to back-end to smart contracts.

2,136

C0ss4ck

C0ss4ck @CossackWang

6 Jun 2024

等等党等到了CodeSAGE，下周测测large在SCA的效果！提前挖个坑 mark: arxiv.org/abs/2402.01935

Code Representation Learning At Scale

Recent studies have shown that code language models at scale demonstrate significant performance gains on downstream tasks, i.e., code generation. However, most of the existing works on code...

arxiv.org

1,076

Marktechpost AI

Marktechpost AI

@Marktechpost

21 Feb 2024

AWS AI Labs Introduce CodeSage: A Bidirectional Encoder Representation Model for Source Code Quick read: marktechpost.com/2024/02/21/… The researchers from AWS AI Labs’ introduction of CODE SAGE marks a pivotal shift towards an innovative bidirectional encoder representation model designed specifically for source code. This model pioneers a two-stage training scheme, utilizing a vast dataset far exceeding the scale traditionally employed in this field. The approach is novel, intertwining identifier deobfuscation and a refined version of masked language modeling objectives that move beyond conventional masking techniques. This methodology is crafted to more effectively capture the intricate semantic and structural nuances of programming languages. Paper: arxiv.org/abs/2402.01935 #ArtificialIntelligence @awscloud

206

Wasi Ahmad

Wasi Ahmad @ahmadwasi

18 Feb 2024

🚀 Our latest research paper on code representation learning, CodeSage, outperforms OpenAI text-embedding-3-large on Code2Code search, and is on par with NL2Code search tasks! Dive into the techniques and insights - check them out on the blog: code-representation-learning…

Wasi Ahmad @ahmadwasi

9 Feb 2024

Introducing #CodeSage, a family of embedding models for generating code representations. To appear at #ICLR2024, co-led w/ @DejiaoZhang. 1/5 Paper: arxiv.org/abs/2402.01935 Evaluation code: github.com/amazon-science/Co… Model checkpoints: huggingface.co/codesage

660

Philipp Schmid

Philipp Schmid

@_philschmid

13 Feb 2024

New Embedding Models for Code released by @awscloud! Embedding Models are at the heart of every RAG application. Without good embeddings, retrieving relevant context to answer your user prompts is impossible. 🔍 Super exciting to see Amazon release CodeSage, a family of open code embedding models with an encoder architecture that supports a wide range of source code understanding tasks. 🤗 TL;DR; 📏 Comes in 3 sizes: 130M, 356M, 1.3B 📚 Pre-trained on @BigCodeProject the Stack (237 million code files) 🇪🇺 Fine-tuned on 75 million bimodal (code and natural language) pairs 🔍 Using hard negatives & hard positive improve MAP > 10% 🔠 Using @BigCodeProject StarCoder Tokenizer ⚖️ Licensed under Apache 2.0 🥇 Outperforms @OpenAI and others on 0-shot Code Search 🚀 Sota Performance on NL2Code (Natural Language to Code) 🤗 Available on @huggingface and supported in Sentence Transformers

217

28,213

Wasi Ahmad

Wasi Ahmad @ahmadwasi

9 Feb 2024

Code Representation Learning At Scale

Recent studies have shown that code language models at scale demonstrate significant performance gains on downstream tasks, i.e., code generation. However, most of the existing works on code...

arxiv.org

1,203

GLBITM,GL Bajaj Institute of Tech. & Management

GLBITM,GL Bajaj Institute of Tech. & Management

@glbajaj

14 Aug 2023

#GLBajaj (GLBITM) is filled with pride to share that Team CodeSage from GL Bajaj have stood as winners in KAVACH 2023 in the "New Age Women Safety App" category. #Kavach2023 #winners #glbajastudents #hackathon #CyberSecurityHackathon #MinistryOfEducation #AICTE #MIC #BPR&D #I4C

400

Pillai University

Pillai University @pillaicollege

10 Aug 2023

CodeSage, Cookie Bytes, Photon in a Double Slit and Little Champs were the four teams who led themselves to glory in the epic Grand Finale of KAVACH 2023! #kavach2023 #cybersecurityhackathon #ministryofeducation #ministryofhomeaffairs #aicte #mic #bpr&d #i4c #kavachhackathon2023

458

Sunwoo👺

Sunwoo👺

@sunwooz

9 Aug 2023

Replying to @violetto96 @shnai0

Ah didnt know about that. Codesage will be fundamentally different because the user chooses a github repository and version/tag to chat with. Ogpt seems to be a skin on top of an older API version of gpt4.

TechnoMag

TechnoMag @TechnoMagZw

19 Feb 2016

Introduction To Programming goo.gl/4rtAmZ @codesage @SaharaHacker @SecureITZim @Neolabtech #Twimbos