What is the difference between Shannon and Hartley formulas. Shannon formula, information entropy

The American engineer R. Hartley in 1928 considered the process of obtaining information as the choice of one message from a finite pre-specified set of N equiprobable messages, and amount of information The I contained in the selected message was defined as the binary logarithm of N.

Hartley formula: I = log 2 N or N = 2 i

Suppose you need to guess one number from a set of numbers from one to one hundred. Using the Hartley formula, you can calculate how much information is required for this: I \u003d log 2 100\u003e 6.644. Thus, a message about a correctly guessed number contains an amount of information approximately equal to 6.644 units of information.

Here are some other examples equiprobable messages :

1. when tossing a coin: “tails fell out”, “tails fell out”;

2. on the page of the book: “the number of letters is even”, “the number of letters is odd”.

Let us now determine whether equiprobable messages « the woman will be the first to leave the door of the building" And “The man will be the first to leave the door of the building". It is impossible to answer this question unambiguously. It all depends on what kind of building we are talking about. If this is, for example, a metro station, then the probability of going out the door first is the same for a man and a woman, and if it is a military barracks, then for a man this probability is much higher than for a woman.

For problems of this kind, the American scientist Claude Shannon proposed in 1948 another formula determining the amount of information, taking into account the possible unequal probability of messages in the set .

Shannon formula: I = - (p 1 log 2 p 1 + p 2 log 2 p 2 + . . . + p N log 2 p N),

where p i is the probability that exactly i-th message selected in a set of N messages.

It is easy to see that if the probabilities p 1 , ..., p N are equal, then each of them is equal to 1 / N, and Shannon's formula turns into Hartley's formula.

In addition to the two considered approaches to determining the amount of information, there are others. It is important to remember that any theoretical results are applicable only to a certain range of cases, outlined by the initial assumptions.

As units of information Claude Shannon offered to take one bit(English bit - binary digit - binary digit).

Bit in information theory - the amount of information required to distinguish between two equally probable messages (such as "heads" - "tails", "even" - "odd", etc.).

IN computer science a bit is the smallest "portion" of computer memory required to store one of the two characters "0" and "1" used for intramachine representation of data and commands.

A bit is too small a unit of measure. In practice, a larger unit is more often used - byte equal to eight bits. It is eight bits that are required to encode any of the 256 characters of the computer keyboard alphabet (256=28).



Even larger derived units of information are also widely used:

1 Kilobyte (KB) = 1024 bytes = 210 bytes,

1 Megabyte (MB) = 1024 KB = 220 bytes,

1 Gigabyte (GB) = 1024 MB = 230 bytes.

Recently, due to the increase in the volume of processed information, such derived units as:

1 Terabyte (TB) = 1024 GB = 240 bytes,

1 Petabyte (PB) = 1024 TB = 250 bytes.

For a unit of information, one could choose the amount of information needed to distinguish, for example, ten equally probable messages. It will not be binary (bit), but decimal ( dit) unit of information.

The amount of information contained in the message is determined by the amount of knowledge that this message carries to the person receiving it. A message contains information for a person if the information contained in it is new and understandable for this person, and, therefore, replenishes his knowledge.

The information that a person receives can be considered a measure of reducing the uncertainty of knowledge. If a certain message leads to a decrease in the uncertainty of our knowledge, then we can say that such a message contains information.

The unit of the amount of information is taken as the amount of information that we get when the uncertainty is reduced by 2 times. This unit is called bit.

In a computer, information is presented in binary code or in machine language, the alphabet of which consists of two digits (0 and 1). These figures can be considered as two equiprobable states. When writing one binary digit, the choice of one of two possible states (one of two digits) is implemented and, therefore, one binary digit carries the amount of information in 1 bit. Two binary bits carry information of 2 bits, three bits - 3 bits, etc.



Let us now set the inverse problem and determine: “How many different binary numbers N can be written using I binary digits?” With one binary digit, you can write 2 different numbers (N=2=2 1), with two binary digits, you can write four binary numbers (N=4=2 2), with three binary digits, you can write eight binary numbers (N =8=2 3) etc.

In the general case, the number of different binary numbers can be determined by the formula

N is the number of possible events (equiprobable)!!!;

In mathematics, there is a function by which an exponential equation is solved, this function is called a logarithm. The solution to such an equation is:

If events equiprobable , then the amount of information is determined by this formula.

The amount of information for events with different probabilities determined by Shannon's formula :

,

where I is the amount of information;

N is the number of possible events;

P i is the probability of individual events.

Example 3.4

There are 32 balls in the lottery drum. How much information does the message contain about the first number drawn (for example, the number 15 fell out)?

Solution:

Since drawing any of the 32 balls is equally likely, the amount of information about one dropped number is found from the equation: 2 I =32.

But 32=2 5 . Therefore, I=5 bits. Obviously, the answer does not depend on which number is drawn.

Example 3.5

How many questions are enough to ask your interlocutor to determine for sure the month in which he was born?

Solution:

We will consider 12 months as 12 possible events. If you are asking about a specific month of birth, then you may have to ask 11 questions (if the first 11 questions were answered in the negative, then the 12th question is not necessary, since it will be correct).

It is more correct to ask "binary" questions, that is, questions that can only be answered with "yes" or "no". For example, "Were you born in the second half of the year?". Each such question splits the set of options into two subsets: one corresponds to the answer "yes" and the other to the answer "no".

The correct strategy is to ask questions in such a way that the number of possible options is halved each time. Then the number of possible events in each of the obtained subsets will be the same and their guessing is equally probable. In this case, at each step, the answer ("yes" or "no") will carry maximum amount information (1 bit).

According to formula 2 and using a calculator, we get:

bit.

The number of received bits of information corresponds to the number of questions asked, but the number of questions cannot be a non-integer number. We round up to a larger integer and get the answer: with the right strategy, you need to set no more than 4 questions.

Example 3.6

After the computer science exam that your friends took, the grades are announced ("2", "3", "4" or "5"). How much information will be carried by the message about the assessment of student A, who learned only half of the tickets, and the message about the assessment of student B, who learned all the tickets.

Solution:

Experience shows that for student A, all four grades (events) are equally likely, and then the amount of information that the grade message carries can be calculated using formula (1):

Based on experience, we can also assume that for student B, the most likely grade is "5" (p 1 = 1/2), the probability of a grade "4" is half as much (p 2 = 1/4), and the probabilities of grades "2 "and" 3 "is still two times less (p 3 \u003d p 4 \u003d 1/8). Since the events are not equally probable, we will use formula 2 to calculate the amount of information in the message:

Calculations have shown that with equiprobable events we get more information than with non-equiprobable events.

Example 3.7

An opaque bag contains 10 white, 20 red, 30 blue and 40 green marbles. How much information will contain a visual message about the color of the drawn ball.

Solution:

Since the number of balls of different colors is not the same, the probabilities of visual messages about the color of the ball taken out of the bag also differ and are equal to the number of balls of a given color divided by the total number of balls:

P b =0.1; P to =0.2; P c =0.3; P s \u003d 0.4.

Events are not equally probable, therefore, to determine the amount of information contained in the message about the color of the balloon, we use formula 2:

You can use a calculator to calculate this expression containing logarithms. I" 1.85 bits.

Example 3.8

Using Shannon's formula, it is quite simple to determine how many bits of information or binary digits are needed to encode 256 various characters. 256 different symbols can be considered as 256 different equally probable states (events). In accordance with the probabilistic approach to measuring the amount of information, the required amount of information for binary encoding of 256 characters is:

I=log 2 256=8 bits=1 byte

Therefore, for binary encoding of 1 character, 1 byte of information or 8 bits is required.

How much information is contained, for example, in the text of the novel War and Peace, in the frescoes of Raphael, or in the human genetic code? Science does not give answers to these questions and, in all likelihood, will not give any soon. Is it possible to objectively measure the amount of information? The most important result of information theory is the following conclusion: “Under certain, very broad conditions, one can neglect the qualitative features of information, express its quantity as a number, and also compare the amount of information contained in different groups of data.”

At present, approaches to the definition of the concept of "amount of information" based on the fact that that the information contained in the message can be loosely interpreted in the sense of its novelty or, in other words, reducing the uncertainty of our knowledge about the object. These approaches use the mathematical concepts of probability and logarithm.

We have already mentioned that Hartley's formula is a special case of Shannon's formula for equiprobable alternatives.

Substituting into formula (1) instead of p i its (in the equiprobable case, independent of i) value, we get:

Thus, Hartley's formula looks very simple:

(2)

It clearly follows from it that the greater the number of alternatives ( N), the greater the uncertainty ( H). These quantities are related in formula (2) not linearly, but through a binary logarithm. Logarithm to base 2 and brings the number of options to units of information - bits.

Note that entropy will be an integer only if N is a power of 2, i.e. If N belongs to the series: {1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048…}

Rice. 10. Dependence of entropy on the number of equiprobable choices (equivalent alternatives).

Recall what a logarithm is.

Rice. 11. Finding the logarithm b by reason a is finding degrees, to which you need to raise a, To obtain b.

The base 2 logarithm is called binary:

log 2 (8)=3 => 2 3 =8

log 2 (10)=3.32 => 2 3.32 =10

The logarithm to base 10 is called decimal:

log 10 (100)=2 => 10 2 =100

The main properties of the logarithm:

    log(1)=0 because any number to the zero power gives 1;

    log(a b)=b*log(a);

    log(a*b)=log(a)+log(b);

    log(a/b)=log(a)-log(b);

    log(1/b)=0-log(b)=-log(b).

To solve inverse problems when the uncertainty is known ( H) or the amount of information obtained as a result of its removal ( I) and you need to determine how many equiprobable alternatives correspond to the occurrence of this uncertainty, use the inverse Hartley formula, which looks even simpler:

(3)

For example, if it is known that as a result of determining that Kolya Ivanov, who is of interest to us, lives on the second floor, 3 bits of information were received, then the number of floors in the house can be determined by formula (3), as N=2 3 =8 floors.

If the question is as follows: “there are 8 floors in the house, how much information did we receive when we learned that Kolya Ivanov, who is of interest to us, lives on the second floor?”, you need to use formula (2): I= log 2 (8)=3 bits.

    1. The amount of information received in the message process

So far, we have given formulas for calculating entropy (uncertainty) H, indicating that H they can be replaced by I, because the amount of information received with complete removaluncertainty some situation is quantitatively equal to the initial entropy of this situation.

But uncertainty can only be partially removed, so the amount of informationI, obtained from some message, is calculated as the decrease in entropy that occurred as a result of obtaining given messages.

(4)

For an equiprobable case, using the Hartley formula to calculate the entropy, we get:

(5)

The second equality is derived based on the properties of the logarithm. Thus, in the equiprobable case I depends on how many times the number of choices considered has changed (diversity considered).

Based on (5), we can deduce the following:

If
, That
- complete removal of uncertainty, the amount of information received in the message is equal to the uncertainty that existed before the message was received.

If
, That
- the uncertainty has not changed, therefore no information has been obtained.

If
, That
=>
, If
,
=>
. Those. the amount of information received will be positive if, as a result of receiving the message, the number of alternatives considered has decreased, and negative if it has increased.

If the number of considered alternatives is halved as a result of receiving the message, i.e.
, That I=log 2 (2)=1 bit. In other words, receiving 1 bit of information excludes half of the equivalent options from consideration.

Consider, as an example, an experiment with a deck of 36 cards.

Rice. 12. Illustration for an experiment with a deck of 36 cards.

Let someone take one card from the deck. We are interested in which of the 36 cards he took out. The initial uncertainty calculated by formula (2) is H= log 2 (36) 5.17 bit. The one who draws the card tells us some of the information. Using formula (5), we determine how much information we receive from these messages:

OptionA. "ThiskartAred suits”.

I=log 2 (36/18)=log 2 (2)=1 bit (there are half red cards in the deck, the uncertainty has decreased by 2 times).

OptionB. "ThiskartApeak suits”.

I=log 2 (36/9)=log 2 (4)=2 bits (the cards of spades make up a quarter of the deck, the uncertainty has decreased by 4 times).

Option C. "This is one of the highest cards: jack, queen, king or ace."

I=log 2 (36)–log 2 (16)=5.17-4=1.17 bits (the uncertainty has decreased by more than two times, so the amount of information received is more than one bit).

OptionD. "That's one card from the deck."

I=log 2 (36/36)=log 2 (1)=0 bits (uncertainty not reduced - message not informative).

OptionD. “This is a ladypeak".

I=log 2 (36/1)=log 2 (36)=5.17 bits (the uncertainty is completely removed).

    It is known a priori that the ball is in one of three urns: A, B, or C. Determine how many bits of information the message that it is in urn B contains. Options: 1 bit, 1.58 bit, 2 bit, 2.25 bit.

    The probability of the first event is 0.5, and the second and third 0.25. What is the information entropy for such a distribution. Options: 0.5 bit, 1 bit, 1.5 bit, 2 bit, 2.5 bit, 3 bit.

    Here is a list of employees of some organization:

Determine the amount of information missing in order to fulfill the following requests:

    Please call Ivanova on the phone.

    I am interested in one of your employees, she was born in 1970.

    Which message contains more information:

    As a result of tossing a coin (heads, tails), tails fell out.

    The traffic lights (red, yellow, green) are now green.

As a result of throwing a die (1, 2, 3, 4, 5, 6), 3 points fell out.

Information will be defined through its main properties (because, along with matter and energy, it is the primary concept of our world and therefore cannot be defined in the strict sense):

  • information brings information about the world around which was not at the point under consideration before it was received;
  • information is not material and cannot exist in isolation from the form of information presentation (sequences of signals or signs - messages);
  • messages contain information only for those who are able to recognize it.

Messages contain information not because they copy objects of reality, but by social agreement on the connection between carriers and objects designated by this carrier (for example, a word denotes some object of objective reality). In addition, carriers can be formed by naturally occurring physical processes.

In order for the message to be transmitted to the recipient, it is necessary to use some physical process that can propagate from the source to the recipient of the message with one speed or another. The time-varying physical process that reflects the transmitted message is called a signal.

To apply mathematical means to study information, it is necessary to abstract from the meaning, the content of information. This approach was common to the researchers we mentioned, since pure mathematics operates with quantitative ratios without going into the physical nature of those objects behind which the ratios stand. Therefore, if the meaning is emasculated from messages, then the starting point for information assessment events, only a set of events that are different from each other and, accordingly, messages about them remain.

Let us be interested in the following information about the state of some objects: in which of the four possible states (solid, liquid, gaseous, plasma) is some substance? in which of the four courses of the technical school does the student study? In all these cases, there is an uncertainty of the event of interest to us, characterized by the presence of a choice of one of four possibilities. If we ignore their meaning in the answers to the above questions, then both answers will carry the same amount of information, since each of them singles out one of the four possible states of the object and, therefore, removes the same uncertainty of the message.

Uncertainty is inherent in the concept of probability. Reducing uncertainty is always associated with the choice (selection) of one or more elements (alternatives) from some of their totality. This mutual reversibility of the concepts of probability and uncertainty served as the basis for using the concept of probability in measuring the degree of uncertainty in information theory. If we assume that any of the four answers to the questions is equally likely, then its probability in all questions is equal to 1/4 .

The same probability of answers in this example also determines the equal uncertainty removed by the answer in each of the two questions, which means that each answer carries the same information.

Now let's try to compare the following two questions: in which of the four courses of the technical school does the student study? How will a coin fall when tossed: up “coat of arms” or “number”? In the first case, four equally probable answers are possible, in the second - two. Therefore, the probability of some answer in the second case is greater than in the first ( 1/2 > 1/4 ), while the uncertainty removed by the answers is greater in the first case. Any possible answer to the first question removes more uncertainty than any answer to the second question. Therefore, the answer to the first question carries more information! Consequently, the lower the probability of an event, the more uncertainty the message about its occurrence removes and, consequently, the more information it carries.

Let's assume that some event has m equally likely outcomes. Such an event can be, for example, the appearance of any character from an alphabet containing m such characters. How to measure the amount of information that can be transmitted using such an alphabet? This can be done by defining a number N possible messages that can be transmitted using this alphabet. If the message is formed from one character, then N=m, if from two, then N \u003d m m \u003d m 2. If the message contains n characters ( n is the length of the message), then N=mn. It would seem that the required measure of the amount of information has been found. It can be understood as a measure of the uncertainty of the outcome of an experiment, if by experience we mean a random selection of a message from a certain number of possible ones. However, this measure is not entirely convenient.

In the presence of an alphabet consisting of one character, i.e. When m = 1, only this character may appear. Therefore, there is no uncertainty in this case, and the appearance of this symbol does not carry any information. Meanwhile, the value N at m = 1 does not go to zero. For two independent message sources (or alphabet) with N 1 And N 2 number of possible messages total number of possible messages N = N 1 N 2, while it would be more logical to assume that the amount of information received from two independent sources should not be a product, but the sum of the constituent quantities.

A way out has been found R. Hartley who offered information I per message is determined by the logarithm of the total number of possible messages N:

I(N) = log N

If the entire set of possible messages consists of one ( N=m=1), That

I(N) = log 1 = 0,

which corresponds to the lack of information in this case. In the presence of independent sources of information with N 1 And N 2 number of possible messages

I (N) \u003d log N \u003d log N 1 N 2 \u003d log N 1 + log N 2

those. the amount of information per message is equal to the sum of the amounts of information that would be received from two independent sources, taken separately.

Formula proposed Hartley, satisfies the requirements. Therefore, it can be used to measure the amount of information. If the possibility of the occurrence of any character of the alphabet is equiprobable (and we have so far assumed that it is), then this probability p= 1/m. Assuming that N=m, we get

I = log N = log m = log (1/p) = – log p,

The resulting formula allows for some cases to determine the amount of information. However, for practical purposes, it is necessary to specify the unit of its measurement. To do this, assume that information is the removed uncertainty. Then, in the simplest case of uncertainty, the choice will be made between two mutually exclusive equally probable messages, for example, between two qualitative signs: positive and negative impulses, impulse and pause, etc.

The amount of information transmitted in this simplest case is most conveniently taken as a unit of the amount of information. The resulting unit of the amount of information, which is a choice of two equally probable events, is called a binary unit, or bit. (Name bit formed from two initial and last letters of an English expression binary unit, which means a binary unit.)

A bit is not only a unit of the amount of information, but also a unit of measurement of the degree of uncertainty. This refers to the uncertainty that is contained in one experiment that has two equally probable outcomes. The amount of information received from a message is affected by the surprise factor for the recipient, which depends on the probability of receiving a particular message. The lower this probability, the more unexpected and therefore more informative the message. Message, probability

of which the degree of surprise is high and, accordingly, low, carries little information.

R. Hartley understood that messages have different probabilities and, therefore, the unexpectedness of their appearance for the recipient is not the same. But by quantifying the amount of information, he tried to completely eliminate the "surprise" factor. Therefore the formula Hartley allows you to determine the amount of information in the message only for the case when the occurrence of symbols is equally probable and they are statistically independent. In practice, these conditions

rarely performed. When determining the amount of information, it is necessary to take into account not only the number of various messages that can be received from the source, but also the probability of receiving them.

The most widely used approach in determining the average amount of information contained in messages from sources of a very different nature is the approach. TO Shannon.

Consider the following situation. Source transmits elementary signals k various types. Let's follow a fairly long segment of the message. Let it have N 1 signals of the first type, N 2 signals of the second type, ..., Nk signals k-th type, and N 1 + N 2 + ... + N k = N is the total number of signals in the observed segment, f 1 , f 2 , ..., f k are the frequencies of the corresponding signals. As the length of the message segment increases, each of the frequencies tends to a fixed limit, i.e.

lim f i = p i , (i = 1, 2, ..., k),

Where p i can be considered the probability of the signal. Suppose a signal is received i-th type with probability p i containing - log p i units of information. In the section under consideration i-th signal will meet approximately Np i times (we will assume that N large enough), and general information delivered by signals of this type will be equal to the product Np i log p i. The same applies to signals of any other type, so the total amount of information delivered by a segment from N signals will be approximately equal. To determine the average amount of information per signal, i.e. specific information content of the source, you need to divide this number by N. With unlimited growth, the approximate equality will turn into exact.

As a result, an asymptotic relation will be obtained - the formula Shannon. It turned out that the formula proposed Hartley, is a special case of more general formula Shannon.

In addition to this formula, Shannon proposed an abstract communication scheme consisting of five elements (information source, transmitter, communication line, receiver and addressee), and formulated theorems on bandwidth, noise immunity, coding, etc.

60. Measurement of information - probabilistic and alphabetical approaches. Formulas of Hartley, Shannon. Example inMSExWithel.

From the point of view of information, as of the removed uncertainty, the amount of information in the message about some event depends on the probability of this event.

A scientific approach to evaluating messages was proposed back in 1928 by R. Hartley. Estimated Hartley's formula for equiprobable events looks like:

I = log 2 Nor 2I = N,

where N is the number equiprobable events (the number of possible choices), I - the amount of information.

If N = 2 (choice of two possibilities), then I = 1 bit.

Example 1 Using the Hartley formula to calculate the amount of information. How many bits of information does the message carry?

does the train arrive on one of the 8 tracks?

Hartley formula: I = log 2 N,

where N is the number of equiprobable outcomes of the event referred to in the message,

I is the amount of information in the message.

I = log 2 8 = 3(bits) Answer: 3 bits.

Modified Hartley's formula for non-uniform events. Since the occurrence of each of the N possible events has the same probability

p = 1 / N, That N = 1 / p and the formula looks like

I = log 2 N= log 2 (1/p) = - log 2 p

The quantitative relationship between the probability of an event (p) and the amount of information in the message about it (I) is expressed by the formula:

I = log 2 (1/ p)

The probability of an event is calculated by the formula p= K/ N, K is a value showing how many times the event of interest to us has occurred; N is the total number of possible outcomes, events. If the probability decreases, then the amount of information increases.

Example 2 There are 30 people in the class. Behind test in mathematics, 6 fives, 15 fours, 8 triples and 1 two were received. How many bits of information does the message that Ivanov received a four carry?

Answer: 1 bit.

Using the Shannon formula. The general case of calculating the amount of information in a message about one of N, but not equally probable events. This approach was proposed by K. Shannon in 1948.

Basic information units:

IWed= -

Meaning IWed pi= 1 / N.

Example 3 How many bits of information does a randomly generated message “headlight” carry if, on average, for every thousand letters in Russian texts, the letter “a” occurs 200 times, the letter “f” - 2 times, the letter “r” - 40 times.

We will assume that the probability of a character appearing in a message coincides with the frequency of its occurrence in texts. Therefore, the letter "a" occurs with an average frequency of 200/1000=0.2; The probability of the appearance of the letter “a” in the text (p a) can be considered approximately equal to 0.2;

the letter "f" occurs with a frequency of 2/1000=0.002; the letter "p" - with a frequency of 40/1000=0.04;

Similarly, p p = 0.04, p f = 0.002. Then we proceed according to K. Shannon. We take the binary logarithm of the value 0.2 and call what we got the amount of information that a single letter “a” carries in the text under consideration. We will do the same operation for each letter. Then the amount of proper information carried by one letter is equal to log 2 1/ pi = - log 2 pi, It is more convenient to use the average value of the amount of information per one character of the alphabet as a measure of the amount of information.

IWed= -

Meaning IWed reaches a maximum for equally probable events, that is, when all p i

pi= 1 / N.

In this case, Shannon's formula turns into Hartley's formula.

I = M*I cf =4*(-(0.002*log 2 0.002+0.2* log 2 0.2+0.04* log 2 0.04+0.2* log 2 0.2))= 4*(-(0.002*(-8.967)+0.2*(-2.322)+0.04*(-4.644)+0.2*(-2.322)))=4*(-(-0.018-0 .46-0.19-0.46))=4*1.1325=4.53

Answer: 4.53 bits

Alphabetical approach to measuring information

The alphabetic approach is used in technology, in this case the amount of information does not depend on the content, but depends on the power of the alphabet and the number of characters in the text.

For ASCII encoding - alphabet power=256

I=log 2 256=8(bit); When encoding character information in codes, each character, including spaces and punctuation marks, is encoded by 1 byte (8 bits).

Units of measurement of information in computing

1 bit (technical approach)

minimum unit of information

the amount of information is measured only by an integer number of bits

1 KB (kilobyte)

2 10 bytes = 1024 bytes

~ 1 thousand bytes

1 MB (megabyte)

2 10 KB = 2 20 bytes

~ 1 million bytes

1 GB (gigabyte)

2 10 MB = 2 30 bytes

~ 1 billion bytes

  • 3. Technologies of data transmission. Ethernet, Token Ring, ISDN, X.25, Frame Relay.
  • 4. Gateway devices: repeaters, bridges, routers, gateways. Switching and routing methods. Ways to Improve Network Performance
  • 5. Peer-to-peer and server networks: comparative characteristics. The main types of specialized servers.
  • 6. Technological basis of the Internet. Addressing system (IP addresses, domain names, DNS system). Basic communication protocols in the network.
  • 7. Basic user technologies for working on the Internet. WWW, FTP, TELNET, E-MAIL. Search for information on the Internet.
  • 9. Databases: data, data model, database, database management system, information system. data models. Relational data model.
  • 12. Design of information systems. Structure and life cycle models.
  • 13. Modeling and representation of the structure of the enterprise. IDEF0 Diagrams.
  • 14. Modeling and presentation of data flows. DFD diagrams.
  • 16. Expert systems (ES): concept, purpose, architecture, distinctive features. ES classification. Stages of development of ES.
  • 17. Knowledge bases of expert systems. Knowledge representation methods: logical models, production rules, frames, semantic networks.
  • 18 Knowledge. Types of knowledge. Knowledge extraction methods: communicative, textological.
  • 19 Programming languages, their characteristics (Prolog, Delphi, C++).
  • 20. Programming languages, their characteristics (PHP, Perl, JavaScript).
  • 21. Goals, objectives, principles and main directions for ensuring information security of the Russian Federation. Legal, organizational, engineering and technical protection of information.
  • 22. Electronic publications: concept, composition. EI classification. Registration of EI.
  • 23. Information resources: concept, composition. State information resources.
  • 24. The operating system of a personal computer as a means of resource management (on the example of the studied OS). OS structure and components.
  • 25. Malicious software: classifications, detection and removal methods.
  • 26 The structure of web applications. HTTP protocol. Cookie. Web application functions. CGI protocol.
  • 27 Ensuring the reliability of the IS. transactions. OLTP systems.
  • 28. Ergonomic goals and indicators of the quality of the software product.
  • 31.Information management: concept and main functions.
  • 33 Software standardization. Software Documentation Standards.
  • 34. Evaluation of the qualitative and quantitative characteristics of information systems. Models for assessing the reliability characteristics of software and information support. Basic concepts, indicators and methods for ensuring the reliability of information systems.
  • 36. Features of the implementation of innovative programs in the field of informatization (characteristics of information policy in the field of informatization, principles of project formation and implementation of IP, management of informatization projects).

This formula, like the Hartley formula, is used in computer science to calculate the total amount of information for various probabilities.

An example of various unequal probabilities is the exit of people from the barracks in a military unit. A soldier, an officer, and even a general can leave the barracks. But the distribution of soldiers, officers and generals in the barracks is different, which is obvious, because there will be the most soldiers, then officers come in number and the rarest type will be generals. Since the probabilities are not equal for all three types of military, in order to calculate how much information such an event will take and use Shannon's formula.

For other equally probable events, such as a coin toss (the probability that heads or tails will be the same - 50%), Hartley's formula is used.

Now, let's look at the application of this formula on a specific example:

Which message contains the least information (Count in bits):

  1. Vasily ate 6 sweets, 2 of them were barberries.
  2. There are 10 folders in the computer, the desired file was found in the 9th folder.
  3. Baba Luda made 4 meat pies and 4 cabbage pies. Gregory ate 2 pies.
  4. Africa has 200 days of dry weather and 165 days of monsoons. an African hunted 40 days a year.

In this problem, we pay attention that options 1, 2 and 3, these options are easy to consider, since the events are equally likely. And for this we will use the Hartley formula I = log 2 N(Fig. 1) But with the 4th point, where it is clear that the distribution of days is not even (predominance towards dry weather), then what should we do in this case? For such events, the Shannon formula or information entropy is used: I = - (p 1 log 2 p 1 + p 2 log 2 p 2 + . . . + p N log 2 p N),(fig.3)

FORMULA FOR THE QUANTITY OF INFORMATION (FORMULA HARTLEY, FIG. 1)

Wherein:

  • I - amount of information
  • p is the probability that these events will happen

The events of interest to us in our problem are

  1. There were two barberries out of six (2/6)
  2. There was one folder in which the required file was found in relation to the total number (1/10)
  3. There were eight pies in total, of which Gregory ate two (2/8)
  4. and the last forty days of hunting in relation to two hundred dry days, and forty days of hunting to one hundred and sixty-five rainy days. (40/200) + (40/165)

thus we get that:

PROBABILITY FORMULA FOR AN EVENT.

Where K is the event of interest to us, and N is the total number of these events, also to check yourself, the probability of an event cannot be more than one. (because there are always less likely events)

SHANNON FORMULA FOR COUNTING INFORMATION (FIG. 3)

Let's return to our task and calculate how much information is contained.

By the way, when calculating the logarithm, it is convenient to use the site - https://planetcalc.ru/419/#

  • For the first case - 2/6 = 0.33 = and further Log 2 0.33 = 1.599 bits
  • For the second case - 1/10 = 0.10 Log 2 0.10 = 3.322 bits
  • For the third - 2/8 = 0.25 = Log 2 0.25 = 2 bits
  • For the fourth - 40/200 + 40/165 = 0.2 and 0.24, respectively, then we calculate according to the formula - (0.2 * log 2 0.2) + - (o.24 * log 2 0.24) = 0.95856 bits

Thus, the answer for our problem turned out 4.



Loading...
Top