Neural Network

In 1935, a boy in Detroit ducked into a public library to hide from bullies. He stayed for three days.

Walter Pitts was twelve years old, the son of a boiler-maker who used his fists freely. The library became his refuge, and over the next few years he taught himself Greek, Latin, logic, and mathematics. He read Bertrand Russell and Alfred North Whitehead’s Principia Mathematica, one of the most formidable works in the history of mathematics, and mailed Russell a letter pointing out errors in the first volume. Russell was so impressed he invited the boy to Cambridge.

Pitts never made it to England. Instead, at fifteen, he ran away from home to the University of Chicago, where he eventually moved in with a neuroscientist named Warren McCulloch. In 1943, the two published a paper with a dense title (“A Logical Calculus of the Ideas Immanent in Nervous Activity”) and a radical idea: a mathematical model of the brain, built from simple connected units that could, in theory, compute anything.

They had just invented the neural network. And every AI tool you use today, from ChatGPT to Midjourney, is a direct descendant of that idea.

What a Neural Network Actually Is

Your brain contains roughly 86 billion neurons. Each one receives electrical signals from other neurons, and if those signals are strong enough, it fires, sending its own signal down the line. Learning happens when the connections between neurons strengthen or weaken based on experience. See a dog enough times, and certain neural pathways light up the moment you spot one.

An artificial neural network borrows this concept (loosely, not literally) and rebuilds it in math. Instead of biological cells, you have tiny mathematical functions. Instead of electrical signals, you have numbers. Instead of synapses strengthening through experience, you have “weights,” numerical values that get adjusted during training until the network produces the right outputs.

Think of it like revising a manuscript. Your first draft is rough. You compare it to what you intended, identify where it went wrong, and adjust. Second draft, third draft, hundredth draft. Each pass gets you closer, not because you memorized a perfect manuscript, but because you internalized the patterns of good writing. A neural network does the same thing, except it runs through millions of drafts per hour and the “adjustments” are tiny numerical tweaks to thousands or millions of connection weights.

The network is organized in layers. An input layer receives the raw data (your text, the pixels of an image, a sound clip). One or more “hidden” layers process and transform that data, identifying patterns and relationships. An output layer produces the result: a word prediction, an image classification, a translated sentence. When a network has many hidden layers, it’s called a “deep” neural network, which is where the term “deep learning” comes from.

The Idea That Wouldn’t Die

The history of neural networks reads like a novel with several acts and an unlikely cast of characters.

After McCulloch and Pitts laid the theoretical groundwork in 1943, a Cornell psychologist named Frank Rosenblatt built the first machine that could actually learn. In 1958, he demonstrated the Perceptron using an IBM 704, a five-ton computer that taught itself to distinguish between cards marked on the left and cards marked on the right after just fifty training examples. The New York Times ran the headline “NEW NAVY DEVICE LEARNS BY DOING.” The New Yorker called it “the first serious rival to the human brain ever devised.” Rosenblatt’s Mark I Perceptron now sits in the Smithsonian.

Then the backlash arrived. In 1969, Marvin Minsky and Seymour Papert (Minsky had been Rosenblatt’s classmate at the Bronx High School of Science) published Perceptrons, a mathematical critique proving that single-layer networks couldn’t solve certain basic problems. Funding dried up. Researchers abandoned the field. The period that followed is known as the “AI winter,” and it lasted more than a decade.

But a small group of researchers refused to give up. In 1986, Geoffrey Hinton and colleagues published a paper in Nature popularizing backpropagation, a training method that allowed multi-layer networks to learn effectively by comparing their output to the correct answer and adjusting weights backward through every layer. Neural networks came back to life.

The real validation came in 2012, when a graduate student named Alex Krizhevsky, working with Hinton at the University of Toronto, trained a deep neural network called AlexNet on two consumer-grade graphics cards. It demolished the competition at a major image recognition challenge, cutting the error rate nearly in half. That result convinced the research community that deep neural networks weren’t just viable; they were the future.

Five years later, a team at Google published “Attention Is All You Need,” introducing the transformer architecture. The transformer is a type of neural network designed to process language by understanding relationships between all the words in a passage simultaneously rather than plodding through them one at a time. It became the foundation for every major language model that followed. (The “T” in ChatGPT stands for Transformer.)

Hinton, along with fellow neural network pioneers Yann LeCun and Yoshua Bengio, received the 2018 Turing Award (computing’s equivalent of the Nobel Prize) for their decades of persistence. In 2024, Hinton won an actual Nobel Prize, in Physics, for his foundational work on machine learning with neural networks.

Why This Matters for Your Writing Life

Every AI tool in your workflow is a neural network. ChatGPT, Claude, Sudowrite, NovelCrafter, Midjourney, ElevenLabs, DeepL, Grammarly. Different types, different architectures, different purposes, but all built on the same core idea that McCulloch and Pitts sketched out in 1943: connected layers of simple mathematical functions that learn patterns from data.

Understanding that one concept clears up a lot about how your tools behave.

Why AI is creative but an unreliable fact-checker. Neural networks learn statistical patterns, not facts. When ChatGPT writes a convincing paragraph about the history of the printing press, it’s generating text based on patterns it absorbed during training, not looking anything up. This is why AI can produce beautiful prose that contains completely fabricated details (a phenomenon called hallucination). The network learned how language works, not necessarily what’s true.

Why “temperature” and “creativity” sliders do what they do. When you adjust a creativity setting in Sudowrite or NovelCrafter, you’re changing how much randomness the neural network uses when choosing the next word. Low settings tell the network to stick with the most statistically likely choices. High settings tell it to take risks. Same network, different behavior, all controlled by one variable.

Why these tools keep getting better. The progression from GPT-3 to GPT-4 to whatever comes next is fundamentally about building bigger, better-trained neural networks. More parameters, more training data, smarter architectures. The underlying concept hasn’t changed since 1943. The scale has changed enormously.

Walter Pitts died in 1969, without a degree, having burned his own papers in a fit of despair. Rosenblatt drowned on his forty-third birthday in 1971. Neither lived to see their ideas vindicated. But the concept they helped bring into the world, that a network of simple connected units could learn to do extraordinary things, turned out to be one of the most consequential ideas in the history of computing. Every time you open an AI writing tool and watch it produce something that surprises you, you’re seeing the latest chapter of a story that started with a boy hiding in a library.