PyTorch's design origins, its connection to Lua, its intertwined deep connection to JAX, its symbiotic connection to Chainer
The groundwork for PyTorch originally started in early 2016, online, among a band of Torch7's contributors.
Torch7 (~2010-2017)
These days, we also commonly refer to Torch7 as LuaTorch, as it was used via Lua. Torch7 was written by Ronan Collobert,
@clmt and
@koraykv in ~2010. I was deeply involved in Torch7 since 2012, with official "maintainer" status, joining these three original authors in April 2014.
Refactoring LuaTorch to be language agonstic (late 2015 to mid 2016)
LuaTorch's C backend with all the CPU and CUDA code for Linear Algebra and Neural Networks was deeply intertwined with Lua. So, a bunch of us lead by
@lantiga @neurosp1ke @szagoruyko5 me
@apaszke @fvsmassa refactored these backends to be agnostic of Lua, and usable independently. We did this after discussing online that we should move LuaTorch to a new, modern design, but hadn't quite framed what that design should be.
Writing a new Python based Torch (mid 2016)
@apaszke reached out to me early 2016 looking for internships. At that time, the entire LuaTorch team at
@AIatMeta was ~3 people (
@GregoryChanan @TrevorKilleen and me). I asked Adam to come do an internship to build the next version of LuaTorch, with modern design.
@colesbury was in-between projects, so he joined in full-time as well.
We started from a fork of the LuaTorch, LuaTorch-nn codebases specifically for two things:
1. the TH/THC and THNN/THCUNN C backends
2. Building a compatibility with LuaTorch's checkpoints, so that LuaTorch users could smoothly continue into PyTorch. We did this by transpiling LuaTorch's `nn` code to Python. We called this package in PyTorch `torch.legacy.nn`.
Then, coming to the design itself, we debated a lot of designs. The strong inspirations were:
1. torch-autograd (written by
@awiltschko and
@clmt ) 2. Chainer (written by the team at
@PreferredNet ).
@ebetica who loved Chainer would obsessively tell us its the best thing ever, so he came on board to build this together with us. Quite a few others such as Natalia Gimelshein and
@adamlerer part-time got involved in various ways.
We wrote the code for the new design of PyTorch from scratch.
The connection to JAX: inspiration of HIPS/autograd
@awiltschko's torch-autograd (which was a big inspiration for PyTorch's design) was directly inspired by
@SingularMattrix @DougalMaclaurin @DavidDuvenaud and
@ryan_p_adams 's HIPS/autograd library, so in that indirect sense, we had strong inspiration from Ryan's library. In fact, we were so oblivious to certain origins that we named our Autodiff engine `torch.autograd` because we thought it was the norm within the autodiff community to call things "autograd". We later had to apologize to
@SingularMattrix and team about the name of our subpackage conflicting with their `autograd` package.
Later,
@SingularMattrix @DougalMaclaurin and others went on to create JAX, continuing down their design exploration of HIPS/autograd.
The inspiration from Chainer -> PyTorch and the inspiration for PyTorch -> Chainer v2
Chainer was a strong inspiration, we really liked the concept of Chains and stuff. The Chainer devs were friends of us, and we interacted with them a lot as well. I visited them in Japan in 2017.
Chainer's design is in my opinion a revolutionary design -- very original for that time and pretty awesome. We are proud to have been inspired from it.
However, unlike people commonly misunderstand and misattribute, we didn't simply replicate Chainer's design as-is. People have posted online on how PyTorch's design looks exactly like Chainer's and hence its origins are just copy-paste -- and that's because they don't understand the co-evolution. After PyTorch's release, Chainer evolved to include some of PyTorch's good ideas, and eventually they converged to look the same. For example, Chainer's nn Chains required you to pass in all the modules to the constructor (or use an add_link). The concept of self-assignment (i.e.) `self.conv = nn.Conv2d(...)`, the concept of `Parameter` was something we introduced as an evolved upgrade from Chainer v1. We also innovatively changed the way the autodiff engine was implemented -- things like "variable versioning" to detect correctness issues with inplace operations, and a few other new ideas, ideas that eventually went back into Chainer in their v2.
When Chainer's community wanted to stop development,
@PreferredNet amicably and proactively joined the PyTorch community (link in references).
Post-launch evolution (2017 to present)
This post doesn't have the space to cover PyTorch's:
* evolution to add in ideas from Caffe2 (
@jiayq @dzhulgakov et. al)
* its 5 compiler designs before we landed on what seems great (Zach DeVito,
@ezyang @apaszke @jamesr66a Jason Ansel Christian Sarofeen et. al.)
* our inspirations from JAX and designing functorch (Richard Zou,
@cHHillee @vfdev_5 Animesh Jain)
* our entire distributed design and evolution
* the origins of the sparse package (
@braizh ) and its evolution (
@cpuhrsch et. al.)
* PyTorch's domain libraries
* data loading (
@colesbury @TongzhouWang )
* community design, community growth, innovation in design of incentives (
@ptrblck_de Alban Desmaison, me)
* Several innovations in GPU code (several key folks from NVIDIA and Meta)
Many other parts of PyTorch that I didn't include -- its become somewhat of a monolith at this point.
Attributing ideas is healthy, awesome and should be done more often
Since PyTorch has launched, several new libraries have used the designs and ideas from PyTorch -- the particular new ideas that we introduced eventually propagated to many other libraries -- and this is awesome.
We are proud to have been inspired by work before us, and we are proud to have inspired work after us.
We also take pride in always attributing our inspirations clearly -- torch-autograd, chainer and many other projects that have inspired us in lesser ways.
I think people don't do this enough, attribute their origins clearly -- either ego or corporate controls come into play to erase history -- and people should do more here. In that sense, I'm really proud of my JAX friends who see framework design as a scientific endeavor, openly discussing ideas and evolutions, and proudly attributing their origins and inspiration.
References:
1. My reply in March'17 on the origins of PyTorch:
discuss.pytorch.org/t/pytorc…
2. Chainer's v1 design:
github.com/chainer/chainer/b…
3.
pytorch.org/blog/pytorch-add…
4. PyTorch's autodiff innovations in a short paper:
openreview.net/pdf?id=BJJsrm…
5. The PyTorch paper:
proceedings.neurips.cc/paper…