Introduction to Neural Networks/Ch. 1 — Foundations of Neural Networks/1 of 7

A Story 90 Years in the Making

If you think artificial intelligence started in 2022 when ChatGPT dropped and the internet collectively lost its mind — you are in excellent company, and you are also very wrong.

The story of neural networks begins in 1943, and it begins like a movie. In fact, it is a movie, or at least a movie mirrors it almost perfectly. If you have seen Good Will Hunting, you already know the rough shape of the story: a mathematically brilliant young man from a difficult background, doing odd jobs around a university, meets an established professor — and together they change the world.

That young man was Walter Pitts. A teenager from a rough neighborhood, Pitts was the kind of person who reads Bertrand Russell and Alfred Whitehead's Principia Mathematica for fun — a dense, thousand-page tome of mathematical logic — and then writes the authors a letter pointing out errors in the proofs. He was right. Pitts eventually ran away from home at 15, bounced between odd jobs, and ended up doing janitorial work around a university. That university happened to be where Warren McCulloch was a professor of psychiatry and neurophysiology — someone fascinated by the mathematical structure of the brain.

McCulloch took Pitts in, gave him a place to live, and over the dinner table, the two started connecting their worlds: Pitts' abstract mathematics and McCulloch's biological understanding of neurons. The result was the 1943 paper that proposed the first mathematical model of a neuron. Artificial intelligence, born from homelessness, hospitality, and dinner conversation.

ℹ

Why This Matters in Practice

Neural networks have been through cycles of enormous hype and devastating winters — periods where funding dried up and whole research communities had to find other work. We are in a period of extraordinary momentum right now, but practitioners who understand the pattern of AI winters are better equipped to make level-headed decisions about what this technology can and cannot do, and where it is likely headed.

The Long Arc

Here is the timeline that should permanently recalibrate your intuition about how new any of this actually is:

1943
The First Mathematical Neuron
McCulloch and Pitts propose the first mathematical model of a neuron — born from dinner-table conversation between a runaway teenager and a psychiatry professor.
1958
The Perceptron
Frank Rosenblatt invents the Perceptron — the first machine that can be trained. He called it "the first machine having human qualities."
1970s
The First AI Winter
Minsky and Papert document the Perceptron's fundamental limits. Funding collapses. Research stalls. An entire generation of researchers pivots to other fields.
1982
Backpropagation
Hopfield networks revive interest. Three independent research groups discover backpropagation, making neural networks actually trainable at scale for the first time.
1990s–2000s
The Second AI Winter
Neural networks fall out of fashion. Support vector machines dominate. Deep learning research continues quietly in a handful of labs.
2009
ImageNet
Fei-Fei Li at Stanford creates ImageNet, a massive labeled image dataset, and turns it into a competition — because nobody cared about the data on its own. (Engineers love a competition.)
2012
AlexNet
AlexNet, a deep neural network, wins the ImageNet competition by a massive margin — shocking the machine learning community. The modern deep learning era begins.
2017
"Attention Is All You Need"
The Transformer architecture is introduced, enabling the large language models that would define the next decade.
2022
ChatGPT
ChatGPT launches. The internet collectively loses its mind. You know the rest.

Each leap forward in neural networks was enabled by a combination of three things: better algorithms, more data, and more computational power. When any one of those three is missing, progress stalls. When all three converge — as they did dramatically around 2012 — things move fast.

Checkpoint•Multiple Choice

According to the three-factor framework discussed in this section, which combination most directly triggered the breakthrough of 2012?

Better algorithms onlyBetter algorithms + more data + more computeMore data onlyMore compute only

The Artificial Neuron

→