@kchonyc's blog on anxiety and frustration among late PhD students has caught attention over the last week, so I wanted to reflect on it as a former PhD student from Kyunghyun's class of 2020
indeed when i started doing ML research circa 2014/2015 there were relatively few experts in the field, AI largely wasn't on people's mind at all beyond sci fi. but at that time deep learning started shattering benchmarks and demonstrating new capabilities signaling that it was a matter of time before it would go mainstream.
before 2020s the AI conferences including were filled with very crazy and interesting ideas (example neural style, backpropagation with learned gradients, learning to ponder, and memory nets, among many others i am forgetting). researchers and practitioners of AI both from academia and industry were developing creative concepts we hadn't previously considered. but what was more impressive that these useful innovations that had a massive impact on the field (for example, Adam and ResNet were mostly developed on relatively small compute clusters
as the 2010s progressed, it became increasingly evident that those with greater compute consistently achieved more compelling results and wrote more impactful publications. Inventions like BERT in 2019, which required substantial compute (64 TPUs to train BERT-Large), demonstrated necessity of such resources for breakthroughs. as a result the field began shifting from clever ideas on a small compute cluster to scaling and optimization. note, i am not disregarding the impact of the idea, as BERT demonstrated a new MLM objective for transformers.
fast forward to today almost everyone is working on LLMs and foundation models in one way or another. there's nothing wrong with this; people naturally gravitate toward winning ideas, pushing and improving them. but as a result of focus on LLMs and foundation models, research job market has changed:
- people focus on scale because it potentially offers predictable returns and wins
- people focus on products because AI is delivering amazing commercial results (people pay for AI-native tools; ARR growth is record high for such products)
- core ML expertise is becoming less necessary, as prompting and ML tools these allow almost anyone to train AI models with little to no code and virtually no ML expertise
- the field is becoming more focused on optimizing existing ideas rather than generating new ones; there's less need for creative exploration and more for creative & useful optimizations
now as a phd student or ml practitioner you might be thinking that the research game is over as it is just a matter of further work optimizing foundation models before we get to AGI. that might be the case (who knows?), but overall, it doesn't negate the question of whether creativity and unique ideas are necessary in ML these days. what if it is a matter of creative optimizations.
so my advise for todays phd students and researchers:
- embrace LLMs and foundation models. i doubt they'll disappear anytime soon, and staying up-to-date on them is important. you don't need to know every detail (the field is vast), but having a good overview and expertise in one or two subtopics will put you in a great position. that said, nothing lasts forever, and i wouldn't be surprised if another meta takes over the AI space sooner or later.
- not all problems are simply "solved" with scale. Looking at the latest O1 report, I see that models that think more still hallucinate as badly as purely autoregressive LLMs. further out-of-the-box breakthroughs are necessary in this area. however, don't choose a topic that clearly improves with scale.
- learn skills beyond core ML, including product building, sales, UX understanding, and taste. it's clear that products, not just research, is playing a more central role in AI. with these skills, you can launch successful personal AI side projects like i did with
@sourcelyai and
@yomu_ai which generate profit. in the best-case scenario, they become very big. there's a lot of opportunity as we enter an age where AI is useful and is making everyone more productive.
based on my experience hiring at Amazon, the 2024 job market is much better than in 2022. when I attended NeurIPS in 2022, I had to decline internship offers and not interview great full-time candidates because of the hiring freezes due to the looming recession. now there is a path forward with more available positions; candidates just have to recognize and adapt to the fact that the field is no longer the same.
champions adjust.
good luck and have fun!