this is a weird long post without much substance
I strongly recommend against reading it
...
so, do you feel like whatever you're working on right now is pointless, or will have zero value soon, due to the crazy times we're living? then, perhaps you should stop, and start working on the only unsolved problem that actually matters TODAY:
✨ replicating GPT-3 in a laptop ✨
"why is that so important?"
because it would make AI incredibly cheap, which would mean everyone would have Fable-class models in their laptops, without depending on Anthropic, OpenAI, or any other hyper-scaler giant. and that's amazing, don't you think?
"isn't that literally impossible?"
that's the cool part: as far as computer science is concerned, no. not really. not at all. is entirely plausible and, as far as we know, most likely not even hard.
it takes one good idea. one breakthrough. one great "aha moment", to go from zero to "hey, this software I wrote is producing credible English sentences"
and whenever that happens:
- the entire AI industry collapses
- clusters are liquidated
- we all get Fable at home
- you become famous and rich, if that's your thing
sounds fun, doesn't it?
"wtf you talking, OF COURSE that is hard"
so prove it.
show me a paper, a lean file, anything that proves that training a Fable-class model fundamentally requires billions of dollars. you can't, because, guess what - it is not true! the only "evidence" we have is purely psychological. "many attempted over decades, and the best thing we have is GPTs, so, it is a hard problem" - but that's not a scientific argument. that's a human, psychological, sociological argument. and if that's it, consider the following counter-argument:
✨ humans are stupid as hell ✨
I mean, 10 years ago we didn't have transformers, so, that very argument could be used against GPTs existing. yet, they exist. we have them now, because someone found it. and, guess what, it isn't even complex. I mean, karpathy implemented the whole thing in a napkin. and it probably compiles.
we were just too dumb to figure GPTs out... for decades.
just like GPTs, there ARE other approaches, other algorithms, other architectures, equally simpler or even simpler, that do work. this is a mathematical certainty. and one of them might be astronomically faster than what we're doing right now.
and you might be the one to find it!
"me? why me???"
because you're intelligent, creative and handsome.
I see a lot of potential in you.
in fact, I always believed in you.
and I think you're wasting your time, doing that silly agent orchestrator. nobody wants that. quit it. take your most interesting ideas, intuition, creativity, and work in a problem that matters. do your best shot at reproducing GPT-3 in your own laptop.
do NOT fork llama.cpp.
do NOT train another LLM.
do something... ✨different✨
it must be unique, novel, full of YOUR soul. something nobody thought of, or bothered doing.
go ahead and implement that thing in C/CUDA (or Bend!).
no Python!
zero excuses for Python.
any model is fluent in GPGPU now. build a real kernel.
and then, train your thing. download wikipedia, give it time and compute to absorb the patterns of English speech. you can rent GPUs anywhere nowadays. let it train. then, ask it some questions. chances are it will just respond back. just like GPT-2 answered OpenAI. computers are incredible. don't underestimate them!
"many tried. nobody succeeded. why would I?*
see - that's your mistake again. turns out not many actually tried, at all. I promise you. who do you think is seriously working on that?
people on Mozilla?
they're busy building a browser
Linus Torvalds?
he is busy building an OS
employees at OpenAI, Anthropic, xAI?
they're paid to work on what is proven to work: GPTs.
what about all the AI enthusiasts all around the world?
yeah, you know they're mostly fine tuning Qwen
and how about your friends?
if only they weren't busy building a SaaS in the eve of AGI...
how about people from the past?
bro - people from the past seriously expected Lisp would be AGI. just dismiss them. they didn't have the compute, the resources, the knowledge, the MODELS that we have today. that YOU have access to.
so, what's left? not much.
the world looks big. it is not.
truth is: ✨almost nobody is working on this ✨
"I still think it is impossible. I don't trust you"
well, take my word no more.
Ilya himself, in his 2019 talk on GPT-2, said:
> "the story of deep learning is this: empirically old simple methods which were usually invented in the 80s and the 90s when scaled up on very large clusters work really well."
and then:
> "(we took) normal simple reinforcement learning method, scaled it up, and discovered that it suddenly becomes very capable of solving extremely hard problems."
and again:
> "you take a simple tool which is unimposing and barely works, and then you run it on a big cluster and suddenly it works, it becomes a capable tool for solving problems"
do you see the point here?
Ilya isn't arguing that transformers are magic.
Ilya is arguing that SCALING is magic
step #1: take a simple, elegant algorithm.
step #2: shove compute at its face.
step #3: ...?
step #4: your computer is talking to you
THAT is the key insight that led to GPT-3
THAT is what Ilya saw
THAT is what caused the OpenAI x Anthropic war
THAT is the founding principle of the ongoing era
not "scaling transformers work"
but "scaling beautiful algorithms works"
that's the incredible lesson.
yet, we all took it and... threw it way.
- zurk bought 100k GPUs. to train GPTs
- musk bought 100k GPUs. to train GPTs
- bezos bought 100k GPUs. to train GPTs
...
that's what everyone is doing.
so, no. not many are trying to replicate GPT-3 through other means.
we're just ants, after all...
whenever we find a pile of sugar, we leave a track of pheromones, which guide the rest of the colony towards the new food source. the colony then swarms around the pile, extract all of it, until no grain is left.
but piles of sugar aren't spontaneously generated in the middle of nowhere. they imply something more profound: "humans are around". and, if humans are in sight, even better things must be. like a big sweet cake.
a colony that only follows the pheromone trail would miss the cake for the grains. that's why every ant species has scouts and exploratory foragers. and, just like a pile of sugar implies something more profound, LLMs also imply something quite profound:
*computers are capable of thinking*
a pile of sugar is never alone.
GPTs are most likely not the only system capable of thinking.
so, if you find yourself a bit lost, without purpose, like your work is pointless and Fable 3 will soon one shot it anyway... consider becoming a scout. find a new approach to AI. bring something new to humanity. breaking out of the massive cost associated with training GPTs is the next big step in AI, and it will only happen if people like you work to make it happen.