Book Review — The Information: A History, A Theory, A Flood by James Gleick
A Brief Summary of Information Theory

I initially saw this book, The Information: A History, A Theory, A Flood by James Gleick, recommended from Mark Manson’s Nine Books that Explains How the World Works. Because information theory always intrigues me, I picked up the book and am glad I did. This article is about a few sparks I have learned, given the author’s excellent writing on information theory and vivid recounts of its far-reaching implications for various aspects of the world, culture, and our daily lives.
Overview
The book has roughly three parts, as implied and listed in the title. Briefly speaking, the book starts with the history of information about the time long before the term itself came into existence. More accurately, it is the history of how human ancestors communicated.
Gradually, the author delved into the era before and when Claude Shannon developed the information theory. This part also intertwines with the invention of computing machines, particularly the Turing machine and Baggage’s analytical engine one century before Turing. It is the most fascinating part of the book. I am in awe of those intellectual giants and their genius work impacting the humanity of later generations even today.
As a result, we are now in the peak of the information era with the growth of computer farms following Moore’s law and the explosion of data everywhere — this is what the book’s Flood part is about.
I will expand on each part a bit more.
The History
Gleick started with the story of how native Africans used drums to communicate. Imagine that on the vast and vacant continent, people working in a field raise their heads when hearing a short drum rhapsody from a long distance. Drums delivered messages much faster than in person. A drum, however, only has pitches and rhythms. How can it convey what human language can do? It is the same problem that Samuel Morse encountered centuries later before he invented the Morse code.
Gleick lays out the challenges in two aspects:
- Code efficiency: To develop a code system to approximate the complexity of speaking language while still being simple enough for drummers to learn and apply quickly using the same code.
- Message transmission and relays: Communication at that time needed to cover hundreds of miles — a drummer must relay the message to the next drummer in the event of urgency to reach the final destination. The transmission protocol is vital for survival. The further a message goes, the more errors or confusion could be introduced.
These two challenges remained throughout human history and were only fully resolved when information theory was born. Gleick gave the hint early in the book.
The author also delved into the history of human language through the lens of English dictionaries. It is fascinating to know how the first English dictionary came into existence with only 2,500 words, what it looked like, and how new dictionaries started progressing and growing afterward. This history strikes me with how language enables humans to think. Without language, humans would have continued to live as chimpanzees. Humans are proud to have high cognitive functions, such as reasoning, math, categorization, and metaphor, but these capabilities only became possible after the language was invented. Language is the foundation and base for higher cognitive functions in the brain. Below is what Gleick says in the book:
Literate people take for granted their own awareness of words, along with the array of word-related machinery: classification, reference, definition. Before literacy, there is nothing obvious about such techniques. “Try to explain to me what a tree is,” Lauria says, and a peasant replies, “Why should I? Everyone knows what a tree is; they don’t need me telling them.”
Because of the words, humans can comprehend and think in abstract, as Gleick put them: “It is a twisting journey from things to words, from words to categories, from categories to metaphor and logic.” and “Mathematics, too, followed from the invention of writing.” The maturity of language and writing marked the start of Western civilization when those great philosophers and thinkers could devote their lifetime to reasoning, discovering, and developing the knowledge systems that we still marvel at today.
In history, we always find a few who lived or thought ahead of their time. Charles Babbage was one of them. As a mathematician, he had the idea of a machine to do maths for humans to overcome errors and intensive human labor, even when there was no electricity. In the 18th century, he designed a machine powered by steam, named the Analytical Engine, with an unprecedented ambition for automation. Baggage made a remarkable futurist statement when he presented his first design to the Royal Society in 1822: “A kinetic machine…. for the demand of computation, would grow as the uses of commerce, industry, and science came together.”
Under Baggage’s guidance, Augusta Ada King, Countess of Lovelace, the daughter of famous poet Lord Byron, wrote the first computer program in history, concluding that it could essentially do any mathematical manipulations when applied to Babbage’s analytical engine. As Gleick put it, “She devised a process, a set of rules, a sequence of operations,” including variables and loops, and “in another century, this would be called an algorithm, later a computer program….”. Unfortunately, Babbage failed to secure enough funding from the Royal Society for his work, and his analytical engine was never completed. His and Ada’s work was like a lonely star until many generations later for aspiration and admiration.
The Theory
One century later, in 1936, Alan Turing designed a similar computing machine. This time, he ignored the engineering part and only constructed it in a mental experiment.
With this model (typewriter) in mind, Turing imagined another kind of machine, of the utmost purity and simplicity. Being imaginary, it was unencumbered by the real-world details one would need for a blueprint, an engineering specification, or a patent application… Turing did not plan ever to build this machine… He listed the very few items his machine must possess: tape, symbols, and states.
Twelve years later (1948), Claude Shannon published his famous information theory, A Mathematical Theory of Communication, to quantify the complex process of transmitting information from one point to another. Even Gleick admitted that “the mathematics was difficult for many engineers, and mathematicians meanwhile lacked the engineering context.”
In the book, Gleick distilled the theory into two core aspects. The first is the general communication framework of five components (see the picture below):
- Message Source: the person or machine generating the message.
- Transmitter: that encodes the message to produce a signal.
- Channel: the medium to transmit the signal.
- Receiver: that inverts the operation of the transmitter and decodes the message.
- Destination: the person or machine that uses the message.

It is the framework for any communication, regardless of the content of a message. Gleick uses human communication as an example: “In the case of ordinary speech, these elements are the speaker’s brain, the speaker’s vocal cords, the air, the listener’s ear, and the listener’s brain.”
The second core aspect is the elegant equation listed below. It measures information as a function of probabilities by summing them with a logarithmic weighting (the logarithm has a base of 2).

A simple example is to think about the flipping of a coin. There are two possible outcomes: head or tail, each with a probability of 1/2. Plugging them in the above equation leads to H = -1, where the unit is bit.
Using the same coin-flipping example, suppose a coin is made with tricks such that it lands on the head with 100% chance; H becomes 0 because log(1) = 0. If the probability is 25% for the head and 75% for the tail, H becomes 0.6 bit, much smaller than 1 bit.
These examples manifest that information is a measure of unexpectedness and surprise. When each outcome has a 50% chance, it is the most random case, and the result is not predictable. Complete randomness leads to the most information. In contrast, the 100% certainty to land on the head means zero information. When there is a 75% chance for the tail, the odd favors one side, while the amount of information decreases accordingly.
In contrast to the simplicity of the equation, its implication is profound. Below are a few quotes from the author:
- “…but generally more choices meant more uncertainty — more information.”
- “The essence of communication is that the message is not created; it is selected. It is a choice.”
- “The more inherent order exists in a sample of English text, the more predictability there is, and in Shannon’s terms, the less information is conveyed by each subsequent letter. When the subject guesses the next letter with confidence, it is redundant, and the arrival of the letter contributes no new information. Information is surprise.”
- “Shannon said, “Information can be considered as order wrenched from disorder.””
Another crucial part of the equation is H, which is called entropy. Gleick explores the boundary between Shannon’s information theory and other disciplines of physics, including thermodynamics and quantum physics.
Gleick simplifies the first two laws of thermodynamics as follows:
First law: The energy of the universe is constant.
Second law: The entropy of the universe always increases.
The second law suggests the process from orderly to disorderly is irreversible — “Entropy thus became a physical equivalent of probability: the entropy of a given macrostate is the logarithm of the number of its possible microstates.” In other words, entropy is a measure of uncertainty about the state of a physical system, while in information theory, entropy is a measure of uncertainty about a message.
Going further, information is entropy and aligns with the second law of thermodynamics regarding uncertainty and disorderliness. However, if we dive deeper, something is unsettling. The second law says the entropy of the universe always increases, meaning it is a process from orderly to unorderly, and a process of irreversible decay. An information process is also one-directional, but it should reduce the uncertainty at the end — either human brains or machines will make sense and select the information and eventually make a choice. Human brains and AI are built for pattern recognition, looking for certainties, and decision-making. Does it contradict what the theories have expressed?
The answer is no. Information theory measures the amount of information while not caring about the content. It is about the total amount of information and possible outcomes, but not for each outcome. Similarly, the second law of thermodynamics states entropy at “the universe” level, a macro trend being the sum of all the possible microstates or processes. So entropy only makes sense at an aggregate level, such as the human society, within which each living individual is, in fact, actively creating the orderly out of chaos and “feeds upon negative entropy,” commented by the quantum physicist Schrödinger.
Lastly, Gleicks touches upon quantum computing, which I found fascinating. The emerging quantum computing and its application of information theory continue to be an exciting area to explore in the coming decades.
As a counterpart of the bit in information theory, the qubit is the smallest nontrivial unit of a quantum system. Like a classical bit, a qubit has two possible values, zero or one, representing two distinguished states. In addition, it possesses more based on Schrödinger’s uncertainty principle.
“The qubit is not just either-or. Its 0 and 1 values are represented by quantum states that can be reliably distinguished — for example, horizontal or vertical polarizations — but coexisting with these are the whole continuum of intermediate states, such as diagonal polarizations, that lean toward 0 or 1 with different probabilities. So a physicist says that a qubit is a superposition of states; a combination of probability amplitudes. It is a determinate thing with a cloud of indeterminacy living inside. But the qubit is not a muddle; a superposition is not a hodgepodge but a combining of probabilistic elements according to clear and elegant mathematical rules.”
Gleick then added:
“Qubits can encode these Boolean values along with all their possible superpositions. This gives a quantum computer a potential for parallel processing that has no classical equivalent. So quantum computers — in theory — can solve certain classes of problems that had otherwise been considered computationally infeasible.”
The Flood
Toward the end of the book, Gleick describes eloquently the information explosion in the modern era: “The three are fundamentally equivalent: information, randomness, and complexity — three powerful abstractions, bound all along like secret lovers.” It explains the general direction of information for humankind, as predicted by the second law of thermodynamics for entropy, getting into more randomness and chaos. The phenomenon is certainly what we all experience and are familiar with in our daily lives.
The ultimate purpose of communication is for the user to choose, reason, and decide. Considering the enormous growth of information storage and internet bandwidth in the past decades, the capacity of the communication channel has become almost unlimited. Along with the immense amount of information, noises have increased proportionally. Shannon did not foresee this magnitude at his time. If he still lives today, he would have been working on solving the challenges we are facing.
In conclusion, James Gleick offers tremendous knowledge and insights in this book with his skilled writing and fascinating storytelling. It gives not only a vivid recount of the astounding history before and after the birth of information theory, but also information theory’s thought-evoking and profound implications for physics, engineering, language, biology, and the evolution of human culture. If you haven’t read the book, I hope this review helps and you might want to pick up the book someday.






