There is an old and popular idea that we learn only what we are rewarded for. Some psychologists have claimed that human learning is based entirely on reinforcement by reward: that even when we train ourselves with no external inducements, we are still learning from reward — only now in the form of signals from inside ourselves. But we cannot trust an argument that assumes what it purports to prove, and in any case, when we try to use this idea to explain how people learn to solve hard problems, we encounter a deadly circularity. You first must be able to do something before you can be rewarded for doing it!
This circularity was no great problem when Ivan Pavlov studied conditioned reflexes nearly a century ago, because in his experiments the animals never needed to produce new kinds of behavior; they only had to link new stimuli to old behaviors. Decades later, Pavlov's research was extended by the Harvard psychologist B. F. Skinner,
who recognized that higher animals did indeed sometimes exhibit new forms of behavior, which he called operants. Skinner's experiments confirmed that when a certain operant is followed by a reward, it is likely to reappear more frequently on later occasions. He also discovered that this kind of learning has much larger effects if the animal cannot predict when it will be rewarded. Under names like operant conditioning and behavior modification, Skinner's discoveries had a wide influence in psychology and education, but never led to explaining how brains produce new operants. Further- more, few of these animal experiments shed much light on how humans learn to form and carry out their complex plans; the trouble is that other animals can scarcely learn such things at all. Those twin ideas — reward/success and punish/failure — do not explain enough about how people learn to produce the new ideas that enable them to solve difficult problems that could not otherwise be solved without many lifetimes of ineffectual trial and error.
The answer must lie in learning better ways to learn. In order to discuss these things, we'll have to start by using many ordinary words like goal reward, learning, thinking, recognizing, liking, wanting,
imagining, and remembering — all based on old and vague ideas. We'll find that most such words must be replaced by new distinctions and ideas. Still, there's something common to them all: in order to solve any hard problem, we must use various kinds of memories. At each moment, we must keep track of what we've just done — or else we might repeat the same steps over and over again. Also, we must somehow maintain our goals — or we'll end up doing pointless things. Finally, once our problem is solved, we need access to records of how it was done, for use when similar problems arise in the future.
Much of this book will be concerned with memory — that is, with records of the mental past. Why, how, and when should such records be made? When the human brain solves a hard problem, many millions of agents and processes are involved. Which agents could be wise enough to guess what changes should then be made? The high-level agents can't know such things; they scarcely know which lower-level processes exist. Nor can lower-level agents know which of their actions helped us to reach our high-level goals; they scarcely know that higher-level goals exist. The agencies that move our legs aren't concerned with whether we are walking toward home or toward work — nor do the agents involved with such destinations know anything of controlling individual muscle units. Where in the mind are judgments made about which agents merit praise or blame?