Professor of data management for ML at @bifoldberlin. Ex-@UvA_Amsterdam, @NYUDataScience, @Twitter intern; member of @TheASF & @EFF. Views are my own.

Joined June 2010
4 Photos and videos
Job Opening: Our lab is hiring a student employee (40h/month) for the development of a new benchmark for ML engineering agents with realistic ML pipelines. deem.berlin/#jobs-204356
1
2
5
382
Sebastian retweeted
That's nothing, I know software engineers in big tech who were capable of this feat even before the advent of GenAI
59
91
3,198
123,838
Sebastian retweeted
10 Oct 2025
bet you’ve heard of train on test, but have you heard of test on train?
2
3
26
2,079
Activity of the day: tricking "AI Agents" into doom-loops ;)
159
Sebastian retweeted
Blog Post: Looking back on the first decade as faculty (2014-2024). I list my favorite papers from the decade, why I enjoyed working on them, and provide backstory and reflection. data-people-group.github.io/…
1
12
47
9,996
Sebastian retweeted
I miss the days of being a PhD student, or postdoc. I would give almost anything to have multiple full days at a time, just to concentrate deeply and single-mindedly on open-ended research.
63
115
3,080
297,190
Sebastian retweeted
I should write or record a longer piece on this at some point. But hopefully the slides will useful to someone. Link: github.com/okhat/blog/blob/m…
2
3
17
920
12 Sep 2025
Next time my students ask me how real-world data looks like, I will point them to this article :) jimmyhmiller.com/ugliest-bea…
1
2
290
Sebastian retweeted
New research agenda we're kickstarting at Berkeley: redesigning data systems to serve the dominant workload of the future: agents! Agentic speculation is massive, heterogeneous, steerable, and redundant: properties data systems can better support and take advantage of. Take a look: arxiv.org/abs/2509.00997
6
49
265
33,856
Sebastian retweeted
2 Sep 2025
Vol:18 No:12 → mlidea: Interactively Improving ML Data Preparation Code via "Shadow Pipelines" vldb.org/pvldb/vol18/p5359-g…
2
9
888
17 Jul 2025
If you are at #icml25, don't miss @o_ovcharenko's spotlight poster today at 11 a.m. PDT — 1:30 p.m. PDT at West Exhibition Hall B2-B3 #W-311. ICML link: icml.cc/virtual/2025/poster/…

Replying to @o_ovcharenko
Thanks to all co-authors Florian Barkmann, Philip Toma, @ImantDaunhawer, @vogt_je, @sscdotopen and @val_boeva 📄 Full paper: openreview.net/pdf?id=jnPHZq… 💻 Code: github.com/BoevaLab/scSSL-Be…
5
394
Sebastian retweeted
Join our lab's presentations at ICML'2025 @icmlconf in beautiful Vancouver! 1. Thursday, Olga Ovcharenko (@o_ovcharenko) will present our work with @sscdotopen and @vogt_je on "scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data", selected for a spotlight poster. icml.cc/virtual/2025/poster/…. Paper: arxiv.org/abs/2506.10031 2. Saturday, Marc Glettig (@GlettigMarc) will present our work on "H&Enium, Applying Foundation Models to Computational Pathology and Spatial Transcriptomics to Learn an Aligned Latent Space", selected for a poster presentation at the Workshop on Multi-modal Foundation Models and Large Language Models for Life Sciences. Paper: openreview.net/forum?id=W64N… ICML link: icml.cc/virtual/2025/worksho… 3. Saturday, I will give an invited talk about our CancerFoundation model by @Theus__A and Florian Barkmann at the Workshop on Multi-modal Foundation Models and Large Language Models for Life Sciences. Preprint to be updated soon with new results: biorxiv.org/content/10.1101/…
2
33
1,604
14 Jul 2025
The DEEM Lab is at ICML this week for the first time, with two contributions! (1/3)
1
2
8
508
14 Jul 2025
On Thursday, Olga will present her research on "scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data". This paper is joint work with ETH Zuerich and was selected as a spotlight poster: icml.cc/virtual/2025/poster/… (2/3)
1
1
4
450
14 Jul 2025
On Saturday, @o_ovcharenko will present a poster on "Towards Cross-Modal Error Detection with Tables and Images" at the the Data World workshop, which details our initial ideas on finding errors in tables by inspecting corresponding image data: olgaovcharenko.github.io/_pa… (3/3)
2
170
Sebastian retweeted
Our paper "Towards Cross-Modal Error Detection with Tables and Images" was accepted for the DataWorld workshop at ICML'25! 🥳 Thanks to @sscdotopen!
1
1
11
390
Sebastian retweeted
New PhD position at @AmlabUva on learning concepts with theoretical guarantees using #causality and #RL with me, Frans Oliehoek (TU Delft) and @herkevanhoof 💥 Deadline: 15 June werkenbij.uva.nl/en/vacancie…

11
43
4,794
26 May 2025
We have a PhD opening in Berlin on "Responsible Data Engineering", with a focus on data preparation pipelines designed along responsibility objectives. This is a fully-funded position at @bifoldberlin, co-supervised by @stoyanoj from NYU. Details: deem.berlin/#jobs-17725
5
7
637
12 May 2025
We have a PhD opening in Berlin on "Responsible Data Engineering", with a focus on data preparation for ML/AI systems. This is a fully-funded position with salary level E13 at the DEEM Lab, as part of @bifoldberlin . Details available at deem.berlin/#jobs-2225
1
5
12
1,386
Sebastian retweeted
Today we had a great @bifoldberlin Day 2025 (incl reception) with awesome keynotes by @CzyIna (Berlin Senate) @tkluewer (BMBF) @MatthiasBethge (Tuebingen AI Center) as well as a variety of talks, posters, and networking. Thanks to all participants.
2
12
498