Sooraj

Sooraj

Users
Tweets

Sooraj

@suryanox7

Jun 10

But how come python run netflix, google, netflix? Google: Large-scale search indexing, backend core infrastructure, and Android’s runtime environment. Uber: The distributed storage and database layer (Docstore), along with thousands of core backend microservices. Netflix: The entire Control Plane (the user-facing application architecture, streaming API gateways, personalization engines, and Spring Boot-based microservices). All is Java.

629

Karry × Orb 🔮

Karry × Orb 🔮

@karry_viber

Apr 29

今天 Orb🔮 帮我做了啥 · 04/28 🐛 今天主轴是把"自反思"系统的几个静默故障挖了出来。Orb 每天会自反思生成 lesson 沉淀踩坑，但最近发现 escalation 反复推荐同一个 root-cause——查下去才知道 prompt 没要求 LLM 实地 grep/ls 验证，就照抄历史描述对付过去。更隐蔽的是 crudVerify 用 80 字 dedup key 撞老内容直接返回 NOOP，整条"知识提取"链路就这么静默通过 audit。修法是加硬约束让每条候选先 ls/grep/cat 验证锚点，同时把 cron 串到 skill 提议双轨。顺手发现 lesson cache 里有 5 个已删条目的 ghost vector 在污染相似度计算，清掉后高相似 pair 从 7 个掉到 1 个，整套收敛舒服多了。 🪞 复盘下来挺尴尬：一个号称"自反思"的系统，最终是被我手动 grep 才发现它在静默欺骗自己。这种"audit 全绿但底下烂掉"的模式，以后得在 prompt 里把 grounding 要求写死，光让 LLM 自己声称"我看了"是不够的。 🛠️ 同期偷了两份开源项目的设计闭环。一份是把 PreToolUse hook 的黑名单思路落地成 12 类危险命令拦截（grep -E 用 `:::` 替 `|` 踩了个坑），顺手在 TDD skill 里加了 Anti-Pattern: Horizontal Slices；另一份是把三态边语义引进来——边带上 extracted / inferred / ambiguous 标签，注入上下文时打 ⚙️ 或 ❓ 前缀让自己分辨可信度，加 DocStore PreToolUse hint hook，4/4 验收都过了。偷的是设计不是代码，融进现有机制。 📈 投资侧给一家做 SaaS 平台的票起了 opportunity 档案，AI 商业化路径是核心 thesis，比 ATH 折让 39%；X 运营脚本踩了个硬编码 deadline 的 bug，改成动态算剩余天数。 ⏳ 待跟进有三个：三态边分布要观察一周稳定性，daily-notes monitor cron 状态待查，还有一份全局 CLAUDE.md 老段落 worker 改不动得我手删。 from Orb🔮 with Karry

275

U N C L E BIGBAY ✨

U N C L E BIGBAY ✨

@unclebigbay143

Feb 4

Just finished reading How Uber Serves 40M Reads Per Second using an Integrated Cache. Seeing how Uber combines their in-house database Docstore with CacheFront is a real eye-opener into what happens behind the scenes to deliver instant data at scale. If you’re curious about database caching layers and micro-services at scale, that is a solid read.

U N C L E BIGBAY ✨

@unclebigbay143

Feb 3

My interest in caching mechanisms has really grown lately. It’s fascinating how almost every piece of digital tech we use relies on some form of cache.

303

Rohit Lakhotia 🐝

Rohit Lakhotia 🐝

@rohitlakh

7 Oct 2025

1/3 Every time you check your Uber ride or browse restaurant menus, you’re part of 40M requests/sec. How does Uber keep it lightning-fast? Meet Docstore, Uber’s custom distributed DB: Stateless query engine (brains) Stateful storage (MySQL Raft) Control plane for orchestration Problem: Microservices trigger dozens of queries → MySQL bottleneck & expensive scaling.

297

Dev

Dev

@dev_11217

20 Sep 2025

Ever wondered how Uber powers its app with lightning-fast data access during peak hours? Their CacheFront system, integrated with Redis caching and MySQL via Docstore, handles over 150 million reads per second. It prioritises cache hits, using Lua scripts for deduplicated writes, and Flux for subsecond invalidations to ensure strong consistency. This read-heavy architecture (up to 100:1 ratio) boosts scalability with optimizations like adaptive timeouts and 99.9% hit rates, keeping everything snappy even under massive load. Here is a link to their blog - uber.com/en-IN/blog/how-uber… What's your favorite caching hack for high-scale systems?

How Uber Serves over 150 Million Reads per Second from Integrated Cache with Stronger Consistency...

This is the second blog about the integrated cache infrastructure for Docstore, our online storage at Uber. In the previous blog, we presented CacheFront and outlined its core functionality, design...

uber.com

402

samir

samir @samireey

1 Sep 2025

ML grind day 163/365🎯 > explored langchain agents [zero-shot, conversational, react-docstore, self-ask-with- search] > planning to build one from scratch tomorrow

samir @samireey

31 Aug 2025

ML grind day 162/365🎯 > dived into langchain tools [builtin custom] > tested out tool calling & binding > built a currency converter tool on top of it

1,044

vineet

vineet @x0_vineet

13 Jul 2025

retriever only returns vectors and not the text data, for us to understand we need text i.e original nodes, and this is where docstore comes in it stores a original node data like text and a unique id.

vineet

vineet @x0_vineet

13 Jul 2025

wtf is docstore? so in RAG we convert our documents in nodes and then we use embedding model to convert these nodes into embeddings or vectors. we store these embeddings in vector database, and when we do retrieving, it finds semantically similar vectors to our query vector.

199

اقرأ-تِك - Eqraatech

اقرأ-تِك - Eqraatech

@Eqraatechcom

4 Jun 2025

ورقة وقلم وهنتكلم عن Uber Docstore Architecture 🚀 شركة Uber كانت بتستعمل الـ Schemaless Types من قواعد البيانات , واللي بناء عليها اتجهت لاستعمال Cassandra كـ Database أساسية وتبقى كـ General Purpose Database لمعظم الـ Business Verticals. ولكن مع حجم الـ Scale بتاع Uber كان الـ Operations على Cassandra كبيرة جدًا ومش فعالة بالنسبة ليهم من ناحية الكفاءة بتاعتها ، بالاضافة كمان لان Cassandra بتدعم الـ Eventual Consistency وده Consistency Level بالنسبة لـ Uber مكنش أفضل حاجة من ناحية المتطلبات بتاعتهم خصوصا ان هم بيطمحوا لتحقيق الـ Strict Serializability Consistency Level. ومن ثم ظهر الحاجة لتصميم Docstore واللي مبني على MySQL Database Engine. 💎 هنلاقي أن Docstore متقسمة بشكل رئيسي لثلاث أجزاء أو طبقات Layers 💎 🔸 الـ Stateless Query Engine مسئول بشكل أساسي عن الـ Query Planning والـ Routing والـ Sharding والـ Schema Management وكمان الـ Node Health Monitoring والـ Request Parsing والـ Validation والـ AuthN/AuthZ. والـ AuthN اللي هي اختصار لـ Authentication والـ AuthZ اختصار للـ Authorization. 🔸 الـ Stateful Storage Engine مسئول بشكل أساسي عن تحقيق الـ Consensus من خلال Raft وده طبعًا بيتم استعماله بشكل أساسي في النظم الموزعة لضمان تحقيق الـ Replication بكفاءة واتساق البيانات أو ما يعرف بالـ Consistency. والـ Storage Engine كذلك مسئول عن الـ Replication والـ Transactions والـ Concurrency Control والـ Load Management. 🔸 الـ Control Plane هو مسئول بشكل أساسي على انه يـ Assign الـ Shards للـ Docstore Partitions ويعدل ويغير من الـ Shard بناء على الـ Failure اللي ممكن تحصل في أي وقت. فهو زي المخ لعملية تحديد الـ Shards على الـ Partitions. وزي ماحنا شايفين في الصورة احنا عندنا أكتر من Partition كل جزء بيكون عبارة عن بعض الـ MySQL Nodes مدعومة بـ NVMe SSDs واللي قادرة على انها تتحمل الأحمال الثقيلة في القراءة والكتابة Heavy Read and Write Workloads. البيانات متقسمة على أكتر من جزء وكل جزء بيكون فيه Leader واحد , و 2 Follower وطبعا من خلال استعمال Raft لتحقيق الـ Consensus. طب ايه التضحية اللي Uber بتاهدها في سبيل اختيارلهم للـ Consistency Level ده والـ Transactions هيكون ايه نظامها خصوصا في الـ Concurrent Writes وايه هو الـ Architecture بتاع Docstore ده اللي هنعرفه من خلال الرابط ده 👇 eqraatech.com/ubers-docstore… --- وتقدروا دلوقتي تشتركوا في اقرأ-تِك بخصم 50% وتستمتعوا بكافة المقالات المميزة ومحتوى ورقة وقلم ومدونات فطين بشكل حصري وتستمتعوا بمواضيع متنوعة في كل ما يخص هندسة البرمجيات بالعربي وبتصاميم بجودة عالية 🎁

1,469

اقرأ-تِك - Eqraatech

اقرأ-تِك - Eqraatech

@Eqraatechcom

29 Apr 2025

مٌدونات فطين وتجربة عملية من شركة Uber 🚀 شركة Uber بتستعمل Docstore وهو عبارة عن قاعدة البيانات الموزعة بتاعتهم واللي مبنية على MySQL و Docstore بتخزن عشرات الـ PetaBytes من البيانات وبتخدم عشرات الملايين من ال Requests في الثانية. ودي واحدة من أكبر محركات قواعد البيانات عند Uber واللي بتستخدمها كتير من الـ Microservices في كل القطاعات التجارية أو اللي بنسميها Business Verticals عندهم. والكلام ده من ساعة ما بدأت في 2020، عدد المستخدمين وحالات الاستخدام بتاعت Docstore في ازدياد، وكمان حجم الطلبات والبيانات في زيادة. المطالب المتزايدة من القطاعات التجارية واحتياجاتهم بتضطر تقدم Microservices معقدة جدًا. وبالتالي، التطبيقات بتطلب زمن استجابة قليل Latency، وأداء عالي High Performance، وقابلية توسع من قاعدة البيانات Scalability، وفي نفس الوقت الكلام ده بيولد Workloads وأحمال عالية. طب ازاي شركة Uber قدرت إنها تحقق ده ؟ تقدروا تعرفوا التجربة كاملة وتجارب شركات تانية كتير متنوعة ومميزة من خلال كتاب مدونات فطين في تصميم النظم 👇 library.eqraatech.com/b/V1e8… والكتاب بيغطي 15 تجربة عملية من داخل الشركات العالمية في تصميم النظم الضخمة بتصاميم بجودة عالية 😍

239

EXM Cloud

EXM Cloud @EXMCloud

16 Dec 2024

The future of document management is here. Discover the trends and how DocStore keeps you at the forefront! Future-Proof your document Management with DocStore Today. #Integration #DocumentManagement #DocStore #trends #Salesforce

EXM Cloud

EXM Cloud @EXMCloud

28 Nov 2024

Don’t just store your docs—give them a home with DocStore! fact resources: invenioit.com/continuity/dat… cognidox.com/blog/document-m… novabackup.com/blog/seven-co… #DocStore #Salesforce #DataSecurity #exmcloud #DocumentManagement

Hrishi

Hrishi

@hrishioa

18 Oct 2024

Replying to @yersultansapar

Thank you! I'd say embeddings should be the last step instead of the first. Typespecs would be useful at the document level, at the docstore level, even summarizing documents and letting the LLM choose which ones to read in detail would significantly cut down the search space

189

anarki

anarki

@basedanarki

3 Oct 2024

definitely docstore this one btw. reference it, goes beyond fastapi project architectures imo. a few days of fucking react components lol you start to learn these lessons and they all (should) look just the same diff from many templates out there. dgaf github.com/zhanymkanov/fasta…

EXM Cloud

EXM Cloud @EXMCloud

5 Sep 2024

Ready to level up your document management? With DocStore, connect Salesforce with Google Drive, Azure, and SharePoint for seamless storage. Manage, automate, and secure all your files in one place! 🔗💼 #SalesforceIntegration #DocumentManagement #salesforce #sharepoint

akuwano

akuwano

@kuwa_tw

23 Aug 2024

UberのMySQLをDocstoreからMyRocksに置き換えた際に増分バックアップなどをどう実現したか、などのBlog！ Differential Backups in MyRocks Based Distributed Databases at Uber uber.com/en-JP/blog/differen…

Differential Backups in MyRocks Based Distributed Databases at Uber

Learn about how the Storage team at Uber significantly reduced costs and improved speed for backups of its Petabyte-scale, MyRocks-based distributed databases by devising a Differential Backups...

uber.com

845

اقرأ-تِك - Eqraatech

اقرأ-تِك - Eqraatech

@Eqraatechcom

3 Aug 2024

العدد الـ 21 أصبح متاح دلوقتي واتبعت لاكتر من 1,000 مشترك 🚀 عنوان العدد كان : How LinkedIn Improves Microservices Performance With Protobuf 🎉 ودي المواضيع اللي اتكلمنا فيها ❗️ 1- How LinkedIn Improves Microservices Performance With Protobuf 2- Uber’s Docstore High Level Architecture 3- Linux Administration Series 4- OpenID Connect 5- Stackoverflow Developer Survey رابط العدد الـ 21 👇 newsletter.eqraatech.com/p/v…

3,303

اقرأ-تِك - Eqraatech

اقرأ-تِك - Eqraatech

@Eqraatechcom

1 Aug 2024

ورقة وقلم وهنتكلم عن Uber Docstore Architecture 🚀 شركة Uber كانت بتستعمل الـ Schemaless Types من قواعد البيانات , واللي بناء عليها اتجهت لاستعمال Cassandra كـ Database أساسية وتبقى كـ General Purpose Database لمعظم الـ Business Verticals. ولكن مع حجم الـ Scale بتاع Uber كان الـ Operations على Cassandra كبيرة جدًا ومش فعالة بالنسبة ليهم من ناحية الكفاءة بتاعتها ، بالاضافة كمان لان Cassandra بتدعم الـ Eventual Consistency وده Consistency Level بالنسبة لـ Uber مكنش أفضل حاجة من ناحية المتطلبات بتاعتهم خصوصا ان هم بيطمحوا لتحقيق الـ Strict Serializability Consistency Level. ومن ثم ظهر الحاجة لتصميم Docstore واللي مبني على MySQL Database Engine. هنلاقي أن Docstore متقسمة بشكل رئيسي لثلاث أجزاء أو طبقات Layers: 1- الـ Stateless Query Engine مسئول بشكل أساسي عن الـ Query Planning والـ Routing والـ Sharding والـ Schema Management وكمان الـ Node Health Monitoring والـ Request Parsing والـ Validation والـ AuthN/AuthZ. والـ AuthN اللي هي اختصار لـ Authentication والـ AuthZ اختصار للـ Authorization. 2- الـ Stateful Storage Engine مسئول بشكل أساسي عن تحقيق الـ Consensus من خلال Raft وده طبعًا بيتم استعماله بشكل أساسي في النظم الموزعة لضمان تحقيق الـ Replication بكفاءة واتساق البيانات أو ما يعرف بالـ Consistency. والـ Storage Engine كذلك مسئول عن الـ Replication والـ Transactions والـ Concurrency Control والـ Load Management. 3- والـ Control Plane هم مسئول بشكل أساسي على انه يـ Assign الـ Shards للـ Docstore Partitions ويعدل ويغير من الـ Shard بناء على الـ Failure اللي ممكن تحصل في أي وقت. فهو زي المخ لعملية تحديد الـ Shards على الـ Partitions. وزي ماحنا شايفين في الصورة احنا عندنا أكتر من Partition كل جزء بيكون عبارة عن بعض الـ MySQL Nodes مدعومة بـ NVMe SSDs واللي قادرة على انها تتحمل الأحمال الثقيلة في القراءة والكتابة Heavy Read and Write Workloads. البيانات متقسمة على أكتر من جزء وكل جزء بيكون فيه Leader واحد , و 2 Follower وطبعا من خلال استعمال Raft لتحقيق الـ Consensus. طب ايه التضحية اللي Uber بتاهدها في سبيل اختيارلهم للـ Consistency Level ده والـ Transactions هيكون ايه نظامها خصوصا في الـ Concurrent Writes وايه هو الـ Architecture بتاع Docstore ده اللي هنعرفه من خلال الرابط ده 👇 eqraatech.com/ubers-docstore… --- وبنفكركوا انكوا تقدروا دلوقتي تشتركوا في اقرأ-تِك من خلال انستا باي وفودافون كاش من خلال رسايل الصفحة او الايميل لو فيه مشاكل في الدفع الاونلاين 😍

353

Mahmoud Youssef

Mahmoud Youssef

@mahyoussef_

28 Jul 2024

شركة Uber بتستعمل Docstore وهو عبارة عن قاعدة البيانات الموزعة بتاعتهم واللي مبنية على MySQL و Docstore بتخزن عشرات الـ PetaBytes من البيانات وبتخدم عشرات الملايين من ال Requests في الثانية. ودي واحدة من أكبر محركات قواعد البيانات عند Uber واللي بتستخدمها كتير من الـ Microservices في كل الـ Business Verticals عندهم. وبما ان المطالب بدأت تزيد واحتياجاتهم كذلك فده أدى إلى أن بناء الـ Microservices تكون معقدة وبالتالي التطبيقات بدأت تتطلب ان يكون فيه Latency قليل , وكمان High Performance وطبعا قابلية التوسع وتحقيق الـ Scalability. وفي نفس الوقت الكلام ده كله بيولد Workloads ضخمة جدًا ان DocStore يستوعبها خصوصا انه بيخدم على كل الـ Business Verticals. وعل الرغم من أن Docstore كان ممكن يكون يتم استعماله ويلبي المتطلبات دي كلها ، ولكن كان هيكون فيه مشكلة الا وهي التكلفة بسبب الـ Scalability اللي محتاجينها بالاضافة للـ Operational Management بالمناسبة احنا اتكلمنا عن الـ High-Level Architecture بتاع Docstore في العدد الـ 20 من النشرة الأسبوعية بالاضافة انكوا تقدروا تشوفوا المقال كامل في اقرأ-تِك من هنا ولكن جزء صغير هيكون مجاني , والباقي محتاج اشتراك 👇 eqraatech.com/how-uber-serve… خلونا قبل ما نشوف Uber عملت ايه ، نبص مع بعض على التحديات اللي غالبًا بتقابل أي System كبير بيكون محتاج لـ Low Latency و High Throughput : 1- سرعة استرجاع البيانات من الـ Disk بيكون ليها Limit معين متقدرش تتخطاه مقرون بالـ Data Model وكذلك الـ Query Optimization 2- الـ Vertical Scaling بكل تأكيد هيكون ليه Limit خاص بالـ Resources اللي انت بتزودها هتوصل برضو لحد معين وما هتعرف تزود عنه 3- الـ الـ Horizontal Scaling رغم فوايده وانه ممكن يكون حل عملي الا ان هيقدم مشاكل على النقيض من ناحية الـ Durability والـ Resiliency واللي برضو ممكن يعرضنا لمشاكل زي الـ Hot Shards/Partitions/Keys 4- فيه عدم توازن بيحصل نتيجة الـ Read / Write Requests واللي ممكن في كتير من الحالات عمليات القراءة بتكون أكتر بكتير من عمليات الكتابة مرة واحدة. 5- التكلفة اللي هنحطها في التوسع الرأسي أو الأفقي عشان نحسن من الـ Latency هيكون مكلف على المدى الطويل والتكلفة بتتضاعف في وجود Regions عشان تحقيق الـ High-Availability. فعشان نعالج المشكلة دي كتير من الـ Microservices Architecture كـ Best Practices بتبدأ تلجأ للـ Distributed Caching فتبدأ تكتب عمليات الكتابة في الـ Database والـ Cache وتخلي القراءة تتم من خلال الـ Cache عشان تحسين الـ Latency ولكن زي ماحنا عارفين برضو هيكون فيه شوية تحديات تانية زي: 1- كل فريق مسئول عن انه يـ Maintain الـ Cache Services الخاص بيه واللي بيخدم على الـ Services بتاعته وده بما اننا عندنا Microservices فكل فريق بيكون ليه Ownership عليهم. 2- الـ Cache Invalidation Logic هيكون معموله Implementation بشكل غير مركزي وده لان كل فريق هيكون مسئول عنه. 3- لو حصلت مشكلة في Region من الـ Regions وبدأ يحصل Failover ، لازم نكون محتفظين بنسخ الـ Cache التانية انها تكون مستعدة ومعمولها Warmup لان في أي وقت هيحصل Failover مفاجئ ممكن يحصل عدد مهول من الـ Cache Misses وبالتالي ده يوقع قاعدة البيانات بسبب عدد الـ Requests الضخم اللي هيضربها مرة واحدة. وعشان كده Uber بدأت تفكر في عمل CacheFront وهو Integrated Caching Solution يكون ناحية Docstore في الـ Query Engine لان ده اللي بيستقبل الـ Queries من الـ Client بأهداف محددة. الـ CacheFront بيستعمل الـ Cache Aside Strategy عشان الـ Cached Reads ، وكمان بيستعملوا الـ CDC عشان يتجنبوا مشاكل الـ Conditional Updates باستخدام Flux والـ MySQL Binlog وبدأوا يعملوا بعض التحسينات زي : 1- الـ Compare Cache مع الـ Shadow Requests عشان ضمان الـ Consistency والـ Monitoring للـ Consistency 2- الـ Cache Warming عشان مشكلة الـ Cross Regions 3- الـ Negative Caching عشان الـ Un-Existing Entries 4- الـ Sharding وخلوا الـ Partition Key بتاع الـ Cache مختلف عن قاعدة البيانات عشان يعالجوا مشكلة الـ Hot Partition لو الجزء ده حصله مشكلة في الـ Cache الـ Requests متروحش كلها مرة واحدة للـ Database في Shard معين. 5- الـ Circuit Breakers 6- الـ Adaptive Timeouts طبعا الكلام ده كله احنا ذكرناه بالتفصيل في المقالة ، فتقدروا تقرأوها كاملة من خلال اقرأ-تِك وتسمتعوا بجرعة من الـ System Design الجميلة وتشوفوا ازاي فريق Uber قدر يفكر في حل المشكلة وازاي قدر يوجد حلول للمشاكل اللي كانت بتطلع وايه التحديات والتضحيات اللي خدوها ❤️

3,447

Mahmoud Youssef

Mahmoud Youssef

@mahyoussef_

25 Jul 2024

يتم الآن التجهيز للعدد العشرون من النشرة الأسبوعية لاقرأ-تِك واللي هيكون فيه موضوع دسم عن ازاي Uber بتـ Serve حوالي 40 Million Reads Per Second وهنتكلم عن الـ Docstore الـ Distributed Database بتاعتهم والـ Integrated Cache وفكرته. هيكون فيه جزء مجاني متاح للجميع قراءته وباقي المقال باذن الله وشرح الـ Use Case للأعضاء المشتركين في اقرأ-تِك 😍

1,816