## Devlin's Angle |

Shannon was born in Michigan in 1916. After obtaining degrees in both mathematics and engineering at the University of Michigan, he went to MIT to pursue graduate studies in mathematics. There he came into contact with some of the men who were laying much of the groundwork for the information revolution that would take off after the Second World War, notably the mathematician Norbert Wiener (who later coined the term cybernetics for some of the work in information theory that he, Shannon, and others did at MIT and elsewhere) and Vannevar Bush, the dean of engineering at MIT (whose conceptual "Memex" machine foretold the modern World Wide Web and whose subsequent achievements included the establishment of the National Science Foundation).

In the early 1930s, Bush had built a mechanical, analog computer at MIT called the Differential Analyzer, designed to solve equations that were too complex for the (mechanical) calculating machines of the time. This massive assemblage of cog wheels, shafts, gears, and axles took up several hundred feet of floor space, and was powered by electric motors. Preparing the device to work on a particular problem required physically configuring the machine, and could take two or three days. After the machine had completed the cycle that constituted "solving" the equation, the answer was read off by measuring the changes in position of various components.

Always a "tinkerer," Shannon took to working with the
Analyzer with great enthusiasm. At Bush's suggestion, for
his master's thesis, he carried out a mathematical analysis
of the operation of the machine's relay circuits. In 1938,
he published the results of this study in the *Transactions
of the American Institute of Electrical Engineers* under
the title "A Symbolic Analysis of Relay and Switching
Circuits."

Bush's seemingly mundane motivation for having Shannon
do the work was the telephone industry's need for a
mathematical framework in which to describe the behavior
of the increasingly complex automatic switching circuits that
were starting to replace human telephone operators. What
Shannon produced far transcended that aim. The ten page
article that he published in the *Transactions of the AIEE*
has been described as one of the most important engineering
papers ever written. And with good reason: quite simply, it set
the stage for digital electronics.

Shannon began by noting that, although the Analyzer computed in an analog fashion, its behavior at any time was governed by the positions of the relay switches, and they were always in one of just two states: open or closed (or on or off). This led him to recall the work of the nineteenth century logician George Boole, whose mathematical analysis of the "laws of thought" was carried out using an algebra in which the variables have just the two "truth values" T and F (or 1 and 0). From there it was a single -- but major -- step to thinking of using relay circuits to build a digital "logic machine" that could carry out not just numerical computations but also other kinds of "information processing."

In 1940, Shannon obtained his doctorate in mathematics, and went to the Institute for Advanced Study at Princeton as a National Research Fellow, where he worked with Hermann Weyl. The following year, he took a position at the Bell Telephone Laboratories in New Jersey, joining a research group who were trying to develop more efficient ways of transmitting information and improving the reliability of long-distance telephone and telegraph lines.

In the 1950s, Shannon became interested in the idea of machine intelligence, and was one of the conveners -- together with his soon to be famous mentees John McCarthy and Marvin Minsky -- of the now legendary 1956 conference at Dartmouth College in New Hampshire that is generally acknowledged as the birth of artificial intelligence (or AI), as it later became known. But while others (McCarthy and Minsky among them) would become identified with AI, Shannon's name will be forever associated with the theory of information and communication that the world learned of from the Shannon-Weaver pamphlet.

Prior to Shannon's work, mathematicians and engineers working on communications technology saw their job as finding ways to maintain the integrity of an analog signal traveling along a wire as a fluctuating electric current or through the air as a modulated radio wave. Shannon took a very different approach. He viewed "information" as being completely encoded in digital form, as a sequence of 0s and 1s -- which he referred to as "bits" (for "binary digits"), following a suggestion of his Princeton colleague John Tukey. In addition to providing the communications engineers with a very different way of designing transmission circuits, this shift in focus also led to a concept of "information" as an objective commodity, disembodied from a human "sender" or "receiver." After Shannon, the name of the game became: how can you best send a sequence of discrete electrical or electromagnetic pulses from one point to another?

A particular consequence of this new approach, which Shannon himself was not slow to observe, was that whereas even a small variation in an analog signal distorts -- and can conceivably corrupt -- the information being carried by that signal, the discrete yes-or-no/on-or-off nature of a digital signal means that information conveyed digitally is far less prone to corruption; indeed, by adding extra bits to the signal, automatic error detection and correction can be built into the system. (A feature of digital coding that, decades later, would enable Napster users to download music files over the phone lines and play the latest pop music on their desktop PC with a fidelity limited only by the quality of the computer's sound system, and which is further exemplified by the oft-repeated claim of CD manufacturers that you can drill a centimeter hole in your favorite music CD and it will still play perfectly.)

From a mathematical point of view, arguably the most significant aspect of Shannon's new, digital conception of information is that it provides a way to measure information -- to say exactly how much information a particular signal carries. The measure is simple: you simply count the minimum number of bits it takes to encode the information. To do this, you have to show how a given item of information can be arrived at by giving the answers to a sequence of yes/no questions.

For example, suppose that eight work colleagues apply for a promotion: Alberto, Bob, Carlo, David, Enid, Fannie, Georgina, and Hilary. After the boss has chosen which person will get the position, what is the minimum number of yes/no question you have to ask to discover his or her identity? A few moments thought will indicate that the answer is 3. Thus, the information content of the message announcing who got the job is 3 bits. Here is one way to arrive at this figure:

*First question:* Is the person male?

That cuts down the number of possibilities from 8 to 4.

*Second question:* Does the person's name end in a vowel?

That reduces the field to a single pair.

*Third question:* Is the person the taller of the two?

Now you have your answer. Of course, this particular sequence of questions assumes that no final pair of applicants are the same height. Moreover, I rigged it to have four males and four females, with carefully chosen names. But the principle will work for any example. All you need is a framework within which a series of yes/no questions (or other binary decisions) will repeatedly halve the number of possibilities until just one remains. (If the number of possibilities at the outset is not a power of 2, there will be a little redundancy in the decision sequence, but you'll still get a measure of the information content. For example, if there were just 7 candidates, the information content of the final decision will still be 3 bits.)

Building on this simple idea, Shannon was able to develop an entire quantitative theory of information content that has proved to be of enormous importance to the engineers who have to decide how much "channel capacity" a particular communications network requires at each point. So complete was his initial analysis that, although you can find the theory described in many contemporary textbooks, you might just as well go back to his original 1949 pamphlet with Weaver. Except for one thing: the name "information theory" is misleading.

As has been pointed out by a number of workers (including
myself in my 1991 book
Logic and Information),
Shannon's theory does not deal with "information" as that
word is generally understood, rather with data -- the raw
material out of which information is obtained. (See my book
InfoSense for a discussion of the distinction.) In
Shannon's theory, what is measured is the size of the (binary)
*signal.* It does not matter what that signal denotes.
According to Shannon's measure, any two books of 100,000
words have exactly the same information content. That's a
useful (if misleading) thing to say if your goal is simply to
transmit both books digitally over the Internet. But if one is an
instruction manual for building a nuclear-powered submarine
and the other a trashy novel, no one would claim that the two
contain the same amount of "information."

By the same token, anyone who thinks that the information content of Shannon's 1948 paper can be captured by the statement that it is "100 pages worth" must surely have been in a trance for the past fifty years in which Shannon's ideas have transformed the world.

Devlin's Angle is updated at the beginning of each month.