Excited to share that I'll be joining @thomsonreuters Labs in Zug, Switzerland, as an Applied AI Scientist Intern from April to September 2026! 🏔️
A nice bridge between finishing my MSc at JKU Linz and starting my PhD later this year - let me know if you're around!
wtpsplit now supports length-constrained segmentation ✂️
min/max chunk length (chars) while preserving semantic chunks - should be great for RAG!
Example (≤30 chars):
[Landing 5pm → Beimen.]
[Let's meet at: Ximen Exit 6.]
[Then: Ningxia Night Market...]
[Late-night snack!']
Excited to share two new papers on AI-generated music detection from my research internship at @Deezer, published in @ismir_conf #ISMIR2025 and @aclmeeting#ACL2025 Findings! 🎶🤖
The problem: most AI music detectors are impractical or unreliable in real-world settings.
Excited to share two new papers on AI-generated music detection from my research internship at @Deezer, published in @ismir_conf #ISMIR2025 and @aclmeeting#ACL2025 Findings! 🎶🤖
The problem: most AI music detectors are impractical or unreliable in real-world settings.
I view this work as an important extension of current single-modality detectors while maintaining flexibility and modularity. It's not production-ready, but it highlights key paradigms for detection:
Using all available information from just the audio and a focus on robustness.
Wtpsplit, our text segmentation tool, just reached ⭐️1000 stars⭐️ on GitHub! Excited to see it is proving useful!
Check it out here: github.com/segment-any-text/… 🎉
We created Approximate Likelihood Matching, a principled (and very effective) method for *cross-tokenizer distillation*!
With ALM, you can create ensembles of models from different families, convert existing subword-level models to byte-level and a bunch more🧵
ALT Image illustrating that ALM can enable Ensembling, Transfer to Bytes, and general Cross-Tokenizer Distillation.
Curious about our SoTA text segmentation tool? 🪓 It's gonna help you across all kinds of NLP tasks!
Learn more at our poster session: Tuesday, 4pm, Jasmine room at #EMNLP2024! 🗓️
See you there!
I'll be attending the whole conference - happy to connect with everyone! 👋
Excited to share that I joined @researchdeezer as a research intern to work with @evpure and @Gabolsgabs on detecting AI-generated lyrics !🎶
The first few weeks have been amazing, and I am excited about what is to come—life in Paris certainly has unparalleled charm!
This was an awesome summer! I can only recommend ETH's summer research fellowship program 🏔️
Also happy about the project's progress - integrating videos into existing architectures is quite exciting, stay tuned! Super grateful to Ryan Cotterell and @glnmario for supervising me.
Excited to share that I joined @ETH Zürich as a summer research fellow, supervised by Prof. @ryandcotterell, working on ✨Multimodal LLMs! ✨
The first few weeks have been a blast, and I'm looking forward to the weeks ahead! 📽️
Congratulations to C4AI Research Grant recipient
@FrohmannM and all authors of "Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation" for their EMNLP acceptance!🥳
Just in time for acceptance to #EMNLP2024, we added ONNX support to SaT🚀
The 12 layer model can now segment 45k chars in ~137ms. This is faster than the previous 3 layer model implementation.
We just released v2.1.0 of our library! ⚡️
It now supports GPU inference via ONNX, leading to a further ~50% speedup for all models.
Check out our state-of-the-art sentence segmentation tool here: github.com/segment-any-text/…