Measuring Information Mathematically And Its Impact On Communication
Table of contents
Introduction
Information is facts about things or people. It has always been there in one form or the other but there came a time when it became extremely important to develop a theory about information.
The purpose of this paper is to research and understand how Shannon developed information theory and how his theory contributed in developing better communication systems to saving peoples’ lives during world war II. We will also look at how this whole idea converged to develop a theory about thinking machines and how that theory is linked to machine learning and artificial intelligence in today’s time.
We would also look at how information cannot be separated from probability theory and how Shannon used probabilities to give a mathematical communication model and tried to measure the information in a quantized form.
Measuring information in a quantized form was important because that way, all sorts of information ( for example - text, music or photos) can be brought down to a common ground and can be stored, retrieved, compared and understood better. Shannon proposed to measure the amount of information in bits. So “bits” became a common metric as a way to measure Entropy. Later in the paper, we will try to answer why Shannon used bits as a measure of Entropy.
Information and Probability Theory
One of the Shannon’s quote says “ Information is the resolution of uncertainty. As long as something can be relayed that resolves uncertainty, that is the fundamental nature of information. ” [2] We know that probability theory allows us to make uncertain statements and reason in the presence of uncertainty, information allows us to quantify the amount of uncertainty in a probability distribution.
Considering a discrete communication system, we would say that the transmitter transmits a sequence of symbols and each symbol has a certain bits of information. Now, looking at it from the receivers end, we need to identify what is the probability of a particular symbol occurring and thus whole concept of a communication system works on a mathematical framework called probability theory.
In information theory, it is believed that if there is no surprise, there is no information. Every language has a statistical structure and is redundant to a certain extent. The more the redundancy, the less the information because the patterns become easily recognizable and there is no surprise left. For example the probability of letter ‘h’ occurring after letter ‘t’ is close to 90%. Similarly, ‘e’ is the most frequent letter in English. Shannon applied probability theory and calculated that English has a redundancy of 50 %. [3]
The point of explaining the concept of redundancy here is to know that if you know the statistical nature of the source and use proper encoding the same amount of information can be sent over a communication system with lesser number of symbols and thus making our systems faster, reducing the required capacity of our channels without losing the original information. Thus we know that information cannot be separated from probabilities. As the theory of information improved with time, so did our communications systems and their efficiency.
Discrete Symbols and the Idea of Mapping to Make and Break codes
During the world war II, the Germans believed that their codes for their radio messages were un-breakable. But mathematicians like Shannon who worked for Bell Labs during the war did great work in the field of cryptography which led to the defeat of the Germans in the war. The systems used were secret systems in which the right symbols were mapped to wrong symbols and were communicated as secret messages. The sender and the receiver both had a a key which was a secret too. The key helped the sender to know what symbols to be mapped to what and the receiver to know what symbols to be flipped. In a sense, this system was like a noisy communication channel where unnecessary symbols were added and wrong symbols were mapped to right ones to hide the actual message. If both the encoding and decoding sides of the system being used manages to maintain the key as a secret, the messages cannot be intercepted.
The longer the key, the more difficult it is to crack because the code breaker needs to try every possible key. To put this in context, each binary unit of information, or bit, has a value of 0 or 1. An 8-bit key would then have 256 possible keys and after a certain length it become practically impossible to try every possible combination.
The code breaker like Shannon would see the long array of unnecessary symbols that were supposed to look random to the enemy. The goal was to find patterns among the junk data and understand it in a way that he could find a way in. As long as there was some kind of pattern, or a sequence or statistical regularity like a normal language like English does, the code could be broken if you understand the patterns of a language well enough.
The Idea Of Thinking Machines
Claude Shannon quoted “I visualize a time when we will be to robots what dogs are to humans. And I am rooting for the machines”[2]
Great mathematicians like Claude Shannon and Alan Turing have always pondered on the possibility of machines learning to think. Surprisingly, this idea existed even before the transistor was invented. Thinking, if you see it at a very basic level is not something illogical. Thinking is a step -by -step logical process along with classifying important details and removing unnecessary details during the process to come to a decision.
Machine were considered to be a very mechanical things that were designed to perform a repetitive task without being creative but if you see them with a perspective of a programmer like Turing saw himself where the machines are considered to be at a particular state and can go into two different directions depending on what state it is in. That way it can imitate the process of thinking and come to a decision. But in the 1940’s this was just a theory. As transistor came into existence and computers were built, they could be fed with particular instructions and were able to imitate the thinking process upto a certain extent. This way the idea of thinking machines that was proposed long ago started becoming a reality.
Now, almost a century later we are able to make machines learn and improve on their own by continuously consuming more and more data. Artificial intelligence is everywhere, and the way machines are interacting with humans now days is something that was unimaginable 50 years ago. ( Examples – Google Home, Smart Lights, Alexa etc. ) With the pace of current development, I would not be surprised if machines will overtake humans in terms of intelligence 100 years from now. The applications of artificial intelligence would fundamentally transform our society in the same way that digital computers did.
But we should never forget where it came from, Shannon’s theory of Information and Communications and the idea of storing information as a bit laid the foundations of digital circuits. The Theory of Information combined with Alan Turing’s idea of Thinking machines is what the modern day machine learning and artificial intelligence is based on. In the next section we will answer why Shannon used ‘bits’ as a measure of entropy ( Amount of Information in an event ).
Bits’ as Measure of Entropy
As mentioned earlier in the paper, information cannot be separated from probabilities. A message can be represented as the outcome of a process that generated events with discrete probabilities. [3] A ‘bit’ represents the amount of uncertainty that exists in the flipping of a coin.
According to Shannon, the measure of information ( Entropy ) is the measure of uncertainty. This means that if you are able to calculate the uncertainty ( through probability), you would know exactly how many bits of information is there in a particular message. So the ultimate goal was to measure the information as a function of probabilities.
Shannon put this idea into a mathematical framework in order to achieve the goal of measuring information as a function of probabilities and gave a mathematical quantity H ( Entropy ) which was the average logarithm of the uncertainty of a message. Because base 2 was used for the logarithm, the unit was called bits ( binary digits ).
The unit of information was the smallest possible chunk that cannot be divided any further, was called 'bits'. The most widely used digital code in modern electronics is based around bits that can each have only one of two values: 0 or 1. The idea is to convert your message, letter by letter, into a code made from 0s and 1s, then send this long string of digits down a wire – every 0 represented by a brief low-voltage signal and every 1 represented by a brief burst of high voltage. This led to better encoding of the messages leading to faster communication, more storage, all without the loss of information.
Conclusion
To conclude, we saw how Shannon used mathematics, precisely probability to measure information, something that was treated as immeasurable for long time throughout the history of humankind. It was always there but never measured. We saw how important it was to view, understand and be well versed with the structure and underlying patterns of a language to be able to turn it into secret codes and on the other hand to break those codes. This simple knowledge led to saving lives of millions of people during the World War II when Germans secret communication was brought down by the Polish.
Towards the end, we came across how the philosophy of thinking machines and measuring information in bits laid the foundations of modern day digital circuits and artificial intelligence. I would say when anyone talks about the information revolution of the last few decades, it is Shannon's idea of information that they are talking about. Without his contributions to turn ‘information’ from a vague word to a mathematical quantity, today’s excellent communications systems would never be built, and artificial intelligence would still remain a distant dream.
Cite this Essay
To export a reference to this article please select a referencing style below