Our 2023 ICASSP paper is now up on arXiv:
Dataset balancing can hurt model performance
arxiv.org/abs/2307.00079
Dataset balancing works differently than you might assume:
- can cause overfitting;
- doesn’t improve performance on rare classes;
- speeds up training convergence.
This intersection has been a disaster for years. Drivers on Delaney block the box at commute hours and beyond.
@ChrisMarteNYC this is your district.
@LincolnRestler @crystalrhudson your constituents who bike into Manhattan have to navigate this every day.
Was out testing new film equipment. Drivers behaving brutally all over, but none so badly as the chaos at Chrystie & Delancy Streets.
Cranks love to complain about #bikenyc and say drivers don't break laws.
We know that's just not true.
@StreetsblogNYC@bikenewyork@OpenPlans
To show the need for our checklist for ML research, we looked for a study to refute. It took just 1 or 2 hours to find that a model claimed to be 97% accurate is no better than random. My working assumption is that most ML-based science these days is junk. aisnakeoil.com/p/introducing…
ALT Every music producer is looking for the next hit song. So when a paper claimed that machine learning can predict hit songs with 97% accuracy, it would have been music to their ears. News outlets, including Scientific American and Axios, published pieces about how this "frightening accuracy" could revolutionize the music industry. Earlier studies have found that it is hard to predict if a song will be successful in advance, so this paper seemed to be a dramatic achievement.
Unfortunately for music producers, we found that the study's results are bogus.
The model presented in the paper exhibits one of the most common pitfalls in machine learning: data leakage. This roughly means that the model is evaluated on the same, or similar, data as it is trained on, which makes estimates of accuracy exaggerated. In the real world, the model would perform far worse. This is like teaching to the test—or worse, giving away the answers before an exam.
It only took a couple of centuries, but @NYTimes confirms this morning that the exhortation of Cato the Elder has come to pass:
Carthago deleta est.
ALT A map of Africa published online by the New York Times on 2023-07-29. The country of Tunisia has been replaced on the map by a new arm of the Mediterranean Sea.
Seriously, the map in this article replaces Tunisia with a new arm of the Mediterranean. What map tool even allows you to do that?
nytimes.com/2023/07/29/world…
Our 2023 ICASSP paper is now up on arXiv:
Dataset balancing can hurt model performance
arxiv.org/abs/2307.00079
Dataset balancing works differently than you might assume:
- can cause overfitting;
- doesn’t improve performance on rare classes;
- speeds up training convergence.
We used a partial version of the balancing scheme from the PANN and PSLA/AST papers.
Full balancing (PANN, PSLA/AST) hurts mAP on a held-out set, but partial balancing gives a small boost over baseline:
We love the bells at the Co-Cathedral of St. Joseph's & respect the daily ringing in honor of COVID-19 victims & heroes. But we live right nearby, and 30 min daily is stressing our 7 y.o. & waking the toddler. Would shorter be possible? @BpDiMarzio @cmlauriecumbo@BrooklynDiocese
Brooklyn’s Co-Cathedral of St. Joseph has been tolling its bells for the past ten minutes. Ominous-sounding but I don’t see anything in the news; can you comment on what they’re for, @BrooklynDiocese?
@Optimum@Optimumhelp We're on day five without usable internet and it looks like we're not the only ones. The only reason we haven't cancelled yet is that @verizonfios can't do an install until next week.
@Optimum@OptimumHelp your phone support person told us that it was an issue with our specific modem; that you have working modems in stock; but that you can't send us one. @verizonfios any chance you can do an install this week?