Computer Engineering Concepts

Home
Subscription
2 - Number Systems
2.8 Information Representation

2.8 Information Representation

Computers of today do much more than perform calculations on numbers. The computer’s ability to work with other types of information is dependent on representing the information in terms of numbers. Text, sound, graphics, and motion video are all translated into numbers before the computer can process the information. These types of information also require a large number of bits when they are converted to binary; therefore, to make it easier to express the number of bits a factor prefix is used. For example, kilobits or megabits. The prefix kilo represents 10³ (1000) or 2¹⁰ (1024) the two numbers are close to each other and are convenient to use in different contexts. The first factor is based on base 10 and the second factor is based on base 2. One factor is easier to use in terms of base 10 and the other is easier to use in terms of base 2. Therefore, both definitions of the prefix factor are commonly used, and both approaches approximately describe the same amount of information (2³ ≈10³). The following table shows the prefix factors and their amounts.

Table 2.4. Commonly used order of magnitude prefixes.

Another unit of bit measure is the byte. A byte is equivalent to 8 bits. This measure is commonly used because characters were initially represented using sets of 8 bits, and this led to character information being measured in terms of bytes instead of bits. Now it is common to measure information in terms of both bits and bytes. To distinguish between bits and bytes a capital B is used to represent bytes and a lowercase b is used to represent bits. For example, a 1.44MB floppy disk can hold 1.44 megabytes of information or 1.44x8 megabits of information.

Characters

The task of representing characters is done by assigning a number to each character. This concept is similar to the Morse code which was used in the past to send messages using a system of dots and dashes, or short beeps and long beeps. In computing there are several different methods that are used to represent characters in binary, but the most popular standard is defined by the American Standard Code for Information Interchange (ASCII). The ASCII code uses 8 bits or 1 byte to represent a character. Since 8 bits are used a total of 2⁸ = 256 characters can be represented using this system.

For example, the letter A is assigned the number 65 or 0100001₂ and the letter B the number 66 or 0100010₂. The computer handles the text using the assigned numbers in binary. This idea is also used in the storage of information within the computer. The word “computer” will take up approximately 8x8bit of space within a storage system. The ASCII code is found in the appendix, and it is worth noting that the letter lowercase “a” and upper case “A” are considered as two separate characters. Essentially all characters and symbols (letters, numbers, punctuation marks, etc.) used within the computer need to be distinguished separately, or the computer will not be able to process the information. Some the ASCII codes are used as control codes. These codes are used to format, and organize the text for output. For example, decimal 9 on the ASCII code is used for the tab function.

In the ASCII code table the binary assignment for the number 7 is 00110111. In this case the binary number converted to decimal will not give 7. Here the bit pattern is simply used to represent the symbol 7. From this example it is evident that a decimal number that is coded in binary and a decimal number that is converted to binary will not necessarily produce the same result. A code that represents a decimal number in terms of its binary equivalent is called a Binary Coded Decimal (BCD). It is interesting to note that the last four bits of the ASCII representation does follow BCD method.

Like the ASCII code other codes can also be used to represent information in a computer. The Extended Binary Coded Decimal Interchange Code (EBCDIC) was developed by IBM and used on IBM equipment. The concept is the same, but done differently with the EBCDIC code. This system also follows the BCD idea.

The number of bits used to represent the characters is of interest because it will affect the processing time and storage requirements for the information. Therefore, the number of bits used is kept to a bare minimum. For example, if each character is represented with 20 bits using code A and with 10 bits using code B, then the storage requirement for code A will turn out to be twice that of code B, but more characters can be represented using code A than code B.

Graphics

The representation of graphics in the computer is done using the same kind of idea as character representation. A graphic is created, stored, and processed by considering it in terms of small dots called pixels.

Fig. 2.2. The pixels of the image are enlarged and shown

Each graphic has several pixels that are used to define the image. The information for each pixel like the colour and location are maintained in terms of binary. The method used to maintain the information can vary, as seen in the different file types (gif, jpg, bmp, etc.) that are used to store graphical information. The graphic in figure 2.2 shows a magnified view of graphic created by a computer on any output device. Notice that it has been created using small boxes (pixels). If the number of pixels used to represent a graphic is increased, then the quality of the graphic becomes better as the pixel size drops. This idea is referred to as resolution. The higher the resolution, the smaller the pixel size, and better the quality. Better quality would mean an increase in the number of pixels per unit area, which would mean that the computer would have to keep track of more information. This idea is easily seen by comparing two graphic files stored on a disk. The file with the better quality will take up more space (bits) than the file with the lower quality. Another measure of resolution is in terms of the number of dots or pixels per unit area. In this measure the pixel density is used to describe the quality. This is measured in dots per square inch or dpi. In this system a higher dpi value means a better graphic quality. This measure is commonly used to describe the graphic quality of monitors and printers.

In a colour graphic the colour information of each pixel needs to be tracked as well. This is referred to as number of bits of colour. For example a graphic with 8 bit colour would have one of 256 possible colours to choose from for each pixel, and a graphic with 4 bit colour would have one of 16 colours to choose from for each pixel. When pixel colours are chosen from a larger selection, then the colour quality of the image becomes better; however, the larger selection would mean a larger amount of information for the computer to process. The number of colours available for a graphic is called the colour palette, and as the number of bits for the colour increases so does the number of colours within the colour palette.

Other Information

Representing information using binary is the key to processing the information using the computer. Both in character processing and graphic processing, the information is represented in binary using a scheme before it is handled by the computer. Therefore, the first step in processing any kind of general information is representing it in binary using a scheme. The methods used can vary, but the idea or concept is the same. For example, to represent sound or audio information, the volume and the pitch of the sound at every instant should be kept track of. If the number of bits used to keep track of the information is increased then the resolution or the quality of the information goes up. Consider temperature measurement by a computer as another example. If the range of measurements is from 1^o to 64^o and the number of bits used is 2, then the range can be divided into 4 segments with each segment represented using a two bit binary number as shown in the following chart.

In this case the resolution is low because the level of detail is low. A temperature measurement of 7 and 12 will both be recorded as the same.

Now consider a 3 bit system of representation with 8 divisions of the range of measurement. In the 3 bit system of measurement the 7^o and 12^o will be recorded by the computer as being different. Now imagine an 8 bit system; in this case the range will be divided into 256 divisions leading to greater resolution or level of detail. In the 8 bit scheme a 0.25^o change will be recorded compared to an 8^o change for the 3 bit system.

Regardless of what kind of information is being processed by the computer, the concept of binary representation using a fixed number of bits is the same. The criterion for deciding on the number of bits to use for a type of information depends on the resolution needed for the application and processing capability of the hardware. Improved hardware performance leads to improved quality of the information. This relationship could be represented using the following mathematical equation.

Example: 6 bits are available to measure temperatures from 20^o C to 60^o C. Determine the resolution of the temperature measurement.

Solution:

This means that a change of 0.625^o C can be measured with such a system. Any change less than 0.625^o C will not be detected.

Encryption

Encryption is the process by which information is scrambled in a particular manner to prevent unauthorized access to it. Those with access to the scrambling method will have access to the information. For example, the letters in a word could be switched around by reflecting it to make the access to the information difficult when the scrambling method is not known. The word COMPUTER would be represented as RETUPMOC. If every word in a given text is scrambled using this method, then it makes it difficult for someone get the information out without the scrambling method. In this example, the encryption method is relatively simple, and therefore can be easily broken. A simple encryption method is relatively easy to break, therefore we can say that a complex encryption method is relatively hard to break. The complexity or the difficulty of breaking a scrambling method is called the strength of the encryption. Strong encryption is difficult to unscramble without the key, and weak encryption is easier to unscramble without the key. The following is an example of a more complex encryption method. Try unscrambling it.

Fodszqujpo!jt!jefbm!gps!qspwjejoh!jogpsnbujpo!tfdvsjuz

This is a bit harder to unscramble or decrypt than the previous example because of the complexity of the encryption. Compare your attempts at accessing the information to the actual information found at the end of this section.

Encryption ideas can also be applied directly to a binary sequence. In this case a simple bit by bit replacement method would not work, because it would simply switch the 1's and 0's around. Therefore other ideas need to be used. For example, the bits could be grouped into sets, and these sets of bits could be replaced. Consider the bit string: 100100110111011. If the bits are grouped into sets of 3 bits then the string becomes 100 100 110 111 011. These sets could now be replaced using a different methods For example, the set of three could be replaced with the next binary number i.e. 100 become 101 and 001 becomes 010. In this approach 111 could be assigned to become 000. Using this scheme the original bit string could be expressed as 101 101 111 000 100 to produce 101101111000100. Similarly, other replacement methods could be employed using different operations.

If access to information can be controlled effectively by using a strong encryption method, then it becomes a useful tool for providing information security. Information security is needed to provide private communications over public communication systems like computer networks and the internet. The ability to maintain privacy is essential in many areas, for example financial transactions, which forms the basis for conducting business over the internet. For example, to have credit card information transmitted over the internet, the information needs to be in a secure format before transmission can take place. Encryption can also create problems because it can be used as a tool for illegal activity, therefore there are laws controlling the use of encryption in many countries.

Now going back to the encryption challenge presented earlier, it should be evident that decryption without knowing the encryption method is difficult. The solution to the encrypted text is shown below.

Encryption is ideal for providing information security

Now that you know the answer, compare the letters and determine the method of encryption that was used in this case. In this case the method is a replacement method, where each letter is replaced with another letter. Simple replacement methods can be easily cracked using statistical analysis. The frequency of each letter is determined within the text, and these letters are most likely to be the vowels. This statistical information can then be used to determine the encryption method. It is interesting to note that computers are usually used to crack encryption systems by analysing mathematical patterns. Therefore, most strong encryption software will use encryption systems with complex mathematical patterns that are difficult to break. One such method is called the RSA encryption which is based on prime numbers.

Data Compression

Data compression is used to represent data using fewer number of bits than what is needed. There are two types of data compression; one is called loss less data compression and the other is called lossy data compression.

In loss less data compression the idea is to represent the data with fewer number of bits without losing the accuracy of the original data. For example, the binary data 100000000000000000000000000110111 is 32 bits long. Therefore, if it is represented as it is, it would take 32 bits of space in a storage system to store the information. This 32 bit number is interestingly mostly 0's with a few 1's, therefore, if it can be represented using fewer number of bits, then the storage space required will be reduced. In this case if the numbers are reassigned based on following scheme then the space requirement will be reduced.

00 as 0

01 as 10

10 as 110

11 as 1110

Using this scheme, the bits in the original data is grouped into sets of 2 bits and then represented using the scheme as shown below.

10 00 00 00 00 00 00 00 00 00 00 00 00 01 10 11 - 32 bits (original)

110 0 0 0 0 0 0 0 0 0 0 0 0 10 1110 - 21bits (new)

In this case the number of bits needed to represent the data has been reduced by a significant amount. This system works well, but it appears to cause a problem when there are an odd number of data bits to work with. This problem can be dealt with by using a single bit at the beginning to specify odd and even number of bits in the original data. For example, if the first bit is 0 then the number of bits in the original data is even, and if the first bit is 1 then the number of bits in the original data is odd. In the event that the number of bits in the data is odd it can be made even by adding a 0 at the end as shown next.

10 00 00 00 00 01 0 - Original data 13 bit

10 00 00 00 00 01 00 - 0 is added to make the number of bit even

1 110 0 0 0 0 10 0 - Leading 1 shows odd number of data bits.

1110000100 - 10 bits.

When using data compression methods, it must always be possible to recreate the original data from the compressed bit pattern, otherwise the compression method is useless. Let us now see how the original data can be recovered from the compressed bit pattern.

1110000100                 - Compressed data

1      110000100                    - First bit shows odd number of bits in the original data.

1       110 0 0 0 10 0               - The 0's are used to identify the bit groupings.

10 00 00 00 00 01 00    - Original bits are recovered with the 0 at the end.

10 00 00 00 00 01 0 - Since the 1^st bit is 1, the last 0 is removed

10 00 00 00 00 01 - Original bit pattern has been recovered.

From the analysis, it can be seen that data representations can be made more efficient or compact by using compression schemes. This particular scheme has a limitation. If the incidence of 1's and 0's in the data is equal then there will be little or no compression. In this case this scheme works best when there are lots of 0's in the data. If the data is mainly 1's, then this method will actually increase the data size. Therefore, it must be noted that there are limitation to data compression. Loss less data compression schemes are used in commonly in data compression software like Winzip.

In the lossy data compression method the original data cannot be recovered from the compressed data, but a close representation is recreated. This approach works best in some situations where the decompressed data does not have to be exactly the same as the compressed data. In such cases much higher levels of compressions can be achieved. For example, still graphics, motion video, and sound information can be compressed to a much higher level by giving up accuracy on decompression. JPG and MPEG files are an example of lossy data compression. Under these lossy compression schemes for graphics quality is sacrificed in favour of reduced file sizes. The reduced file sizes become a significant advantage when working with the internet because smaller file sizes mean faster transfer of the files.

2.8 Practice Questions

1. Explain how characters are represented using binary

2. Represent the following words in binary using the ASCII code.

a. BAG b. Bag c. now d. CPU

3. Explain why the ASCII code for A and a are different.

4. What is a pixel and how is it used to generate graphics on the computer?

5. How does the resolution represent the quality of a given type of information?

6. A computerized weighing scale has a range of 0 to 1kg. How many bits are needed if the scale has to measure 0.1grams?

7. Using a simple replacement method that takes a letter and number and replaces it with the previous letter or number (d becomes c and 3 becomes 2) to encrypt the following words.

a. School b. engineering c. Computers d. encryption

8. What are some methods of improving the encryption idea in question 7.

9. Create an encryption scheme that is stronger than the method outlined in question 6, and compare it with methods that were developed by your classmates.

10. Explain what is meant by the term encryption strength.

11. Using the data compression method outlined earlier compress the following binary strings.

a. 1010110010000000000000000000101

b. 01000000000000000100010100000000

c. 1101000010001010000000110100001000101000000011010010000

d. 1010000000000001000101000110100001000110100001000100000

12. Calculate the % compression achieved in each part of question 11.

13. If the compression method described earlier is used on bit strings with more 1's than 0's, then what will happen to the % compression?

GlobalEduTech Solutions

Powered by Wild Apricot Membership Software