From the creators of Postgres & Apache Spark -- Build reliable backends effortlessly

Joined January 2024
125 Photos and videos
Pinned Tweet
May 22
Over the past month, the DBOS product team has focused heavily on two areas: workflow operations tooling and performance optimization. What's new: - Major performance improvements - Database-backed dynamic queue configuration - Timeline visualization for workflows and steps - Spring Boot integration … and more! Would love to hear your thoughts, especially from folks running workflows in production.
1
2
12
1,097
DBOS retweeted
We’re having a user group at 11 AM PT tomorrow (Thursday)! These meetups are awesome–one of the best parts of working on open-source is seeing the community come together. The big focus this month is observability. First, my teammate Alex will be introducing some big improvements to the console, including a workflow timeline view, aggregated visualizations , and optimizations to handle millions of workflows with thousands of steps. Then, I’ll demo the new OpenMetrics integration, showing you how to connect your workflows to the rest of your observability stack and export data to or configure alerts in your Grafana or Datadog. Check it out! 👇
1
3
9
429
DBOS retweeted
Cool new feature: Metrics! You can now scrape metrics about your DBOS applications’ workflows and steps from a Prometheus/OpenMetrics endpoint. This makes it easier to integrate your workflows into your observability stack, letting you monitor them directly from your APM.
1
4
16
1,847
DBOS retweeted
Save state. Resume from the last success. That's durable execution in one sentence. 🔄 @jedberg at @DBOS_Inc joins us on "Hello, Agent!" to explain why this primitive matters so much for agentic systems — non-deterministic LLM calls mean failures are inevitable, and being able to replay from a checkpoint rather than restarting from scratch saves time, money, and compute. Click the link below for this episode of “Hello, Agent!” ⤵️ redpanda.com/hello-agent/dur…
5
7
396
DBOS retweeted
Can Postgres Queues Scale? The hardest part of building durable workflows on Postgres is scaling Postgres-backed task queues. At scale, thousands of workers are polling the same database table at the same time, creating a brutal workload that exposes subtle contention and performance issues. But the amazing thing about Postgres is that, with the right optimizations, it can handle that workload and scale queues to tens of thousands of tasks per second, or billions per month. In this blog post, we talk about how to scale Postgres for a workload it wasn’t designed for, but excels at anyways: 👇
1
7
35
2,771
DBOS retweeted
Join us at this month's @DBOS_Inc User Group! We're focusing on workflow observability: - ​Workflow Observability Redesigned: @poliakov_dbos will showcase major upgrades to DBOS Conductor UI. You can now visualize workflow execution timelines, navigate workflows with thousands of steps, and use multi-faceted filtering capabilities to quickly find workflows of interest. - OpenMetrics Integration: @petereliaskraft will introduce our new OpenMetrics support and demonstrate how to export workflow metrics to Datadog and other observability platforms for monitoring, dashboards, and alerting. If you're building long-running or complex workflows, these updates make it much easier to understand performance, debug issues, and keep your systems running reliably. Registration link below.
1
3
12
589
DBOS retweeted
Replying to @DBOS_Inc
@DBOS_Inc loving some of the new UI in the console!
5
7
361
DBOS retweeted
Congrats to the Soria team! Our teams have worked closely together since the early days, and it's been amazing to watch their journey and growth.
Soria is an AI Bloomberg Terminal built for sector-specialist investors, starting with healthcare. Agents aggregate hundreds of sources, monitor inflections, and alert investors in real time— and they're already live with major banks and hedge funds. ycombinator.com/launches/QXr…
2
8
1,044
DBOS retweeted
Cool new feature: you can now perform bulk workflow actions from the console. You can filter and select many workflows, then cancel, resume, or delete them. This is really useful when responding to unexpected events (for example, cancelling and resuming workflows to apply a bug fix), letting you manage all your workflows without leaving the console.
4
21
3,098
Jun 1
We just released DBOS Transact for Go v0.16.0 This release adds support for SQLite as a durability backend, making it even easier to build durable workflows and background jobs without provisioning a database. SQLite is a great fit for local development, edge deployments, and lightweight applications that still need reliable recovery from failures. Release notes: github.com/dbos-inc/dbos-tra…
1
2
11
402
DBOS retweeted
A surprisingly hard challenge building AI agents is putting a human in the loop. If you want your agent to be able to perform critical tasks in production, it probably needs to wait for human approval. However, because there are real people involved, approval doesn't always happen instantly, and agents need to be able to wait hours or days for human intervention, then quickly resume when it arrives. This creates a new reliability problem for agents, as they’re now running for hours or days instead of seconds or minutes. As they’re running for longer, it’s much more likely they’re interrupted (server maintenance, code upgrade, process crash) while waiting. For agents to really be usable in production, they need to be able to automatically recover from these interruptions and resume from where they left off. Durable workflows can help make long-running, human-in-the-loop agents resilient to failure. The idea is to checkpoint an agent’s progress in a database so that if the agent is interrupted, it can recover and resume from its last checkpoint. To handle human-in-the-loop specifically, we can use a database-backed messaging system where an agent awaits a notification delivered through the database. When the agent first starts waiting, it checkpoints a timeout. If the agent is interrupted, it recovers from its checkpoints and continues waiting towards the timeout. When a human approves the agent, the approval message is written to a database table so that when the agent is ready and recovered, it can read the message and continue execution. That way, an agent can run for days waiting for human approval and be ready to go as soon as it arrives.
4
4
14
1,186
DBOS retweeted
Postgres is all you need... for durable workflows and queues.
21
22
732
98,121
DBOS retweeted
💬 Hello, Agent: How do you make AI agents fail-safe? Learn the answer in Episode 3 of our "Hello, Agent!" podcast with the super savvy @jedberg, C-Suite Advisor at @DBOS_Inc. In this ep, Jeremy explains how durable execution solves reliability issues, its role in building production-ready AI agents, and shares insights from his time at Netflix and Reddit on enterprise-scale reliability engineering. Tune in 🎧 redpanda.com/hello-agent/dur…
3
5
723
DBOS retweeted
One unique benefit of DBOS is that it allows you to update your database record and start a background task in the same transaction. This guarantees both atomicity and durability in one place, without needing extra coordination between your application and external queueing/messaging system.
One important pattern for building reliable systems is a transactional outbox. It solves an important problem: how to reliably update a database record and send a message to another system. This is trickier than it sounds because the operations usually need to be atomic: they either both happen or neither do, even if there are failures (such as process crashes or network glitches) while performing them. Otherwise, the database might go out of sync with other systems, which could cause serious data integrity issues. Typically, to implement an outbox, we add a new “outbox” table to our database. When we need to perform an atomic update, we run a single database transaction that both: - Updates the database record - Writes the message we want to send to the "outbox" table. A separate background process then polls the outbox table and sends the messages there to the other system. Performing the database record update and writing the message to the "outbox" table in one transaction guarantees atomicity: either both records are updated and neither are, and once the message is written to the outbox, it will asynchronously be consumed and sent by the background process even if failures occur later. Durable workflows make a transactional outbox pattern easier in one of two ways. First, instead of using an outbox at all, you can both perform the transaction and send the message in a workflow. The workflow guarantees atomicity: if a failure occurs after writing to the database but before sending the message to the external system, the workflow will recover from its last completed step (writing to the database) and retry the next step (sending the message) until the message is successfully sent. This is the same guarantee a conventional transactional outbox provides: assuming the message is eventually delivered after enough retries, either both operations occur or neither do. Alternatively, instead of sending a message, you can enqueue a workflow in the same transaction as your database operation (because workflows are backed by database tables). Then, the workflow can perform whatever operation you want to happen atomically with your database update. All these patterns provide the same guarantees, but using workflows can be simpler in practice.
1
3
12
1,820
DBOS retweeted
One important pattern for building reliable systems is a transactional outbox. It solves an important problem: how to reliably update a database record and send a message to another system. This is trickier than it sounds because the operations usually need to be atomic: they either both happen or neither do, even if there are failures (such as process crashes or network glitches) while performing them. Otherwise, the database might go out of sync with other systems, which could cause serious data integrity issues. Typically, to implement an outbox, we add a new “outbox” table to our database. When we need to perform an atomic update, we run a single database transaction that both: - Updates the database record - Writes the message we want to send to the "outbox" table. A separate background process then polls the outbox table and sends the messages there to the other system. Performing the database record update and writing the message to the "outbox" table in one transaction guarantees atomicity: either both records are updated and neither are, and once the message is written to the outbox, it will asynchronously be consumed and sent by the background process even if failures occur later. Durable workflows make a transactional outbox pattern easier in one of two ways. First, instead of using an outbox at all, you can both perform the transaction and send the message in a workflow. The workflow guarantees atomicity: if a failure occurs after writing to the database but before sending the message to the external system, the workflow will recover from its last completed step (writing to the database) and retry the next step (sending the message) until the message is successfully sent. This is the same guarantee a conventional transactional outbox provides: assuming the message is eventually delivered after enough retries, either both operations occur or neither do. Alternatively, instead of sending a message, you can enqueue a workflow in the same transaction as your database operation (because workflows are backed by database tables). Then, the workflow can perform whatever operation you want to happen atomically with your database update. All these patterns provide the same guarantees, but using workflows can be simpler in practice.
2
22
2,639
DBOS retweeted
Postgres is all you need for durable execution. I wrote a blog post about the big idea we’ve been working on for the past couple years. Durable execution is a powerful tool for building reliable software, and the core idea is simple: checkpoint the progress of your programs in a database so that if anything fails, you can recover them from their last completed steps. So what do you need to make durable execution work? It turns out, not much: a database and a library that connects to it. And if you build your durable execution on Postgres, you get access to its rich ecosystem: built-in tooling for scalability, observability, availability, and any other operational need. I wrote about how this idea works and how we’re implementing it at DBOS: 👇
2
5
46
2,373
May 22
Over the past month, the DBOS product team has focused heavily on two areas: workflow operations tooling and performance optimization. What's new: - Major performance improvements - Database-backed dynamic queue configuration - Timeline visualization for workflows and steps - Spring Boot integration … and more! Would love to hear your thoughts, especially from folks running workflows in production.
1
2
12
1,097
DBOS retweeted
Replying to @brian_armstrong
To ensure your workflows are durable, look no further than @DBOS_Inc
3
6
333
DBOS retweeted
appreciate it — built on solid ground; DBOS is doing serious heavy lifting under the hood!
3
4
178
May 19
Congrats!! p.s., Memory Store is built on DBOS 😎
Memory Store (@memorydotstore) gives your team and AI agents a shared company brain. Your team's knowledge & decisions are scattered across slack, emails, and people's heads. Memory Store turns them into a living wiki for your agents and teammates. Congrats on the launch, @ishitajindal17 & @diwanksingh! ycombinator.com/launches/QPs…
2
3
14
843