Job Opening:
Our lab is hiring a student employee (40h/month) for the development of a new benchmark for ML engineering agents with realistic ML pipelines.
deem.berlin/#jobs-204356
Blog Post: Looking back on the first decade as faculty (2014-2024).
I list my favorite papers from the decade, why I enjoyed working on them, and provide backstory and reflection.
data-people-group.github.io/…
I miss the days of being a PhD student, or postdoc. I would give almost anything to have multiple full days at a time, just to concentrate deeply and single-mindedly on open-ended research.
New research agenda we're kickstarting at Berkeley: redesigning data systems to serve the dominant workload of the future: agents!
Agentic speculation is massive, heterogeneous, steerable, and redundant: properties data systems can better support and take advantage of.
Take a look: arxiv.org/abs/2509.00997
Join our lab's presentations at ICML'2025 @icmlconf in beautiful Vancouver!
1. Thursday, Olga Ovcharenko (@o_ovcharenko) will present our work with @sscdotopen and @vogt_je on "scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data", selected for a spotlight poster. icml.cc/virtual/2025/poster/…. Paper: arxiv.org/abs/2506.10031
2. Saturday, Marc Glettig (@GlettigMarc) will present our work on "H&Enium, Applying Foundation Models to Computational Pathology and Spatial Transcriptomics to Learn an Aligned Latent Space", selected for a poster presentation at the Workshop on Multi-modal Foundation Models and Large Language Models for Life Sciences. Paper: openreview.net/forum?id=W64N… ICML link: icml.cc/virtual/2025/worksho…
3. Saturday, I will give an invited talk about our CancerFoundation model by @Theus__A and Florian Barkmann at the Workshop on Multi-modal Foundation Models and Large Language Models for Life Sciences. Preprint to be updated soon with new results: biorxiv.org/content/10.1101/…
On Thursday, Olga will present her research on "scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data". This paper is joint work with ETH Zuerich and was selected as a spotlight poster:
icml.cc/virtual/2025/poster/…
(2/3)
On Saturday, @o_ovcharenko will present a poster on "Towards Cross-Modal Error Detection with Tables and Images" at the the Data World workshop, which details our initial ideas on finding errors in tables by inspecting corresponding image data:
olgaovcharenko.github.io/_pa…
(3/3)
We have a PhD opening in Berlin on "Responsible Data Engineering", with a focus on data preparation pipelines designed along responsibility objectives.
This is a fully-funded position at @bifoldberlin, co-supervised by @stoyanoj from NYU.
Details: deem.berlin/#jobs-17725
We have a PhD opening in Berlin on "Responsible Data Engineering", with a focus on data preparation for ML/AI systems.
This is a fully-funded position with salary level E13 at the DEEM Lab, as part of @bifoldberlin .
Details available at deem.berlin/#jobs-2225
Today we had a great @bifoldberlin Day 2025 (incl reception) with awesome keynotes by @CzyIna (Berlin Senate) @tkluewer (BMBF) @MatthiasBethge (Tuebingen AI Center) as well as a variety of talks, posters, and networking. Thanks to all participants.