Types of Neural Networks - Evolution Of Deep Learning Architectures.
Oppenheimer, the movie, has all of us thinking about the 40s and WW2. Believe it or not, the first neural networks (NN) were invented around the same time, in 1943!
Warren McCulloch and Walter Pitts the founding fathers of NNs, were intrigued by how biological neurons worked and proposed a mathematical model for a NN
It was not until 1958, that Frank Rosenblatt invented the "Perceptron" which was basically a computer program designed to learn from its mistakes. It ran on a very big machine and essentially did binary classification. While there was a lot of excitement around these baby NNs they required a lot of compute and data, which meant that they needed some serious funding.
In 1969, a paper titled "Perceptrons" by Minsky and Papert, killed almost all innovation in NNs. The paper proved that the single perceptron, couldn't solve simple problems including the XOR problem, and was severely limiting and all funding stopped. At the same time, algorithms like Support Vector Machines (SVMs) start taking off and NNs took a back seat.
Multi-layer perceptrons (MLPs) were viewed as a way to address the issues that single-layer perceptrons had, but training these MLPs proved to be very difficult. Not until 1986, did we see the resurgence NNs. Rumelhart, Hinton, and Williams introduced the backpropagation algorithm, and suddenly training multi-layer NNs became tractable. Computers were becoming more powerful and more data become available. NNs were back in business.
In the late 80s, Yann LeCunn introduced CNNs, The convolutional layers of a CNN can model the spatial hierarchy of images and NNs started to become useful in image-processing applications. Still, SVMs were the cool kids and NNs were being used for niche tasks like handwriting recognition.
Only in the 2000s, did we see a true renaissance of NNs. Geoff Hinton introduced Deep Belief Networks and the term deep learning (DL) begin to take off.
In 2012, Deep Learning had a seminal breakthrough with a CNN called AlexNet that outperformed all other algorithms in image classification. Since then we have seen an explosion in NN architectures.
Recurrent Neural Networks (RNNs) and Long Short-Term Memory Networks were useful in understanding patterns in sequential data. In 2015, ResNets helped solve the vanishing gradient problem (another pesky issue with DL training), and DL research was exploding.
In 2014, generative NNs had a big moment - Generative Adversarial Networks (GANs) were invented by Ian Goodfellow et. al. GANs were really good at generating realistic images. The first deep fake was born :)
Finally, in 2017, Vaswani et al introduced Transformers. Transformers, with their self-attention mechanism allowed the model to weigh the importance of each word in relation to others and better understand language.
BERT in 2018, was a specific implementation on Transformers and can look and understand text in both directions. BERT is pre-trained on massive amounts of data (e.g. Wikipedia) and can be adapted to specific tasks with fine-tuning. BERT can be adapted to multiple tasks like Q/A and text classification
Just a few months earlier, also in 2018, OpenAI introduced the GPT models. These were unidirectional but also were trained on massive amounts of data. Unlike BERT, GPTs are fine-tuned for generation or next-word prediction. Since 2018, we have seen better and more sophisticated versions of the GPT series...with GPT-4 released in 2023, being capable of human-level cognition, generation, and basic reasoning!!
So what started almost 80 years ago is now finally beginning to take over and transform the world completely!! 🤯