I've always been passionate about data quality -- that's where the best war stories are, too. My friend @jeremystan is writing book on it and you can read the first chapter now:
📚 HOT OFF THE PRESS 📚
We’re excited to share an exclusive first look at @OReillyMedia’s new book Automating Data Quality Monitoring at Scale written by @jeremystan & Paige
Chapter 1: The Data Quality Imperative is out now
Download👉 anomalo.com/oreilly/automati…#dataquality 📈
Hi, this is your annual reminder that @squarecog and I wrote a book a year ago. Reviews are in and they're pretty solid, if I don't say so myself (I do).
Might make a good gift. 'Tis the season and all. Just sayin'. Ok, retweet 🔃 this and carry on. 😁
amazon.com/Missing-README-Gu…
Hello tech friends. If you are still hiring, please reply to this with links to jobs and share. If you can help in any capacity also please reply. As long as we’re in this dumpster fire let’s get toasty together.
We just released our first post in a series on monitoring metrics in @anomalo_hq.
Visualizations are 🔑 to doing this well: explore the history of the metric, understand the drivers of change, and audit the forecast.
anomalo.com/post/monitoring-…
Vince @Conitzer returned to CMU this fall from @DukeU with appointments in @CSDatCMU and @mldcmu. We say returned because Vince did his Ph.D. here and we couldn't be more jazzed to have him back and heading up the Foundations of Cooperative AI Lab!
youtu.be/tt_e2ZOfWZg
Here’s an example of a “state of the art prompt”, in my opinion, it involves a lot of overloading, and sometimes adding in things you don’t exactly want like “deep sea creature” (or whatever else) to find more interesting forms and textures
I’d love to see other approaches
I'm very excited to announce that I will be teaching a
course on Data Engineering for Machine Learning in September! (1/5)
getsphere.com/ml-engineering…
Bindu, can you please explain why you took my work, replaced my watermark with your logo, and tweeted it as your own? Many people I respect follow you so I’m trying to think of a good explanation but coming up short.
.@joe_hellerstein, @vsreekanti, @cgwu0530 and I have been working on a new project (and company) to simplify infrastructure for data scientists. We're looking for feedback from data engineers who support data scientists. If you or a friend are willing to chat, we'd appreciate it!
We don't store data in warehouses. We operate data factories without quality controls.
Then we fight fires when dashboards and data products break.
My latest on what we need to do differently: thenewstack.io/build-data-fa…
Take a look if you're working with Machine Learning for NLP. Fun challenge and a change of pace if you're advising, consulting or taking a break.
Slides and lectures already available as a starting point!
We are on the search for a lecturer to teach NLP 243, Machine Learning for Fall 2022!
If interested, send an email to nlp@ucsc.edu indicating your interest and qualifications, with subject line "Teaching NLP243".
Syllabus:
docs.google.com/document/d/1…
👋 The project I've been working on is now open source!
Open Robo-Advisor is a Python library that acts as an advisor 🤖 for passive indexing (think Wealthfront). It's very basic, but I wanted to get it out early.
Check it out and send feedback! 👀
github.com/highwire-ai/open-…
A question I ask to prioritize data projects:
'Imagine you're done. It took 4 months. It works OK. Now, what metric has improved? By how much? Was it worth the effort & opportunity cost?'
More on assessing impact from @jikechong & @yuec's book on leading in data science:
You might think only applies to big companies with complex planning and metrics, but I'm surprised how often people don't pause to wonder 'OK, what if this works?' before getting started.
Same. @TheBeastAcademy is one of those rare products that immediately turn you into an evangelist. After trying so many math apps that were based on rote memorization or games w/ the occasional addition question, BA was the only one I felt could inspire a love of math.
Our boys are doing AoPS (or Beast Academy) for their math curriculum. 1 year in, they are both loving it, are developing strong mathematical thinking and interest in exploring more. I attribute all this to AoPS
My favorite data science algorithm (division) saves the day — this time as the final step of automated bad data root-cause analysis by @anomalo_hq. Intuitive ➡️ easy to trust, which is key for this application.
Row level insights are key to explaining data issues:
- expected or not?
- cause for concern?
- where did it happen?
- why?
Our root cause analysis does this automatically, every time. New post by machine learning engineer @VickyAndonova explains how
anomalo.com/post/root-causin…
For most ML systems, data quality is *the* impact bottleneck. It sneaks up on you while the data team is busy modeling; when discovered it's dismissed as a 'bug, fixed' vs. a systemic issue.
Impressed with what @jeremystan and @eshmu built so far & their vision; congrats!