Domestic data encryption standard. Domestic data encryption standard Cryptographic conversion algorithm GOST 28147 89

Brief description of the cipher

GOST 28147-89 - Soviet and Russian standard for symmetric encryption, introduced in 1990, is also a CIS standard. Full name - “GOST 28147-89 Information processing systems. Cryptographic protection. Cryptographic Transformation Algorithm”. Block cipher algorithm. When using the encryption method with gamma, it can perform the functions of a stream cipher algorithm.

GOST 28147-89 is a block cipher with a 256-bit key and 32 conversion cycles, operating with 64-bit blocks. The basis of the cipher algorithm is the Feistel Network. The basic encryption mode according to GOST 28147-89 is the simple replacement mode (more complex modes are also defined - gamma, gamma with feedback and simulated insert mode).

How the algorithm works

The algorithm is not fundamentally different from DES. It also contains encryption cycles (there are 32 of them) according to the Feistel scheme (Fig. 2.9.).

Rice. 2.9. Encryption rounds of the GOST 28147-89 algorithm.

To generate subkeys, the original 256-bit key is divided into eight 32-bit blocks: k 1 …k 8 . Keys k 9 ... k 24 are a cyclic repetition of keys k 1 ... k 8 (numbered from the least significant bits to the most significant ones). The keys k 25 ...k 32 are the keys k 1 ...k 8 going in reverse order.

After all 32 rounds of the algorithm are completed, blocks A 33 and B 33 are glued together (note that A 33 becomes the most significant bit, and B 33 becomes the least significant bit) - the result is the result of the algorithm.

Function f(A i ,K i) is calculated as follows: A i and K i are added modulo 2 32 , then the result is divided into eight 4-bit subsequences, each of which is fed to the input of its own substitution table node(in ascending order of bit precedence), called below S-block. The total number of GOST S-blocks is eight, i.e. the same as the number of subsequences. Every S-block represents a permutation of numbers from 0 to 15. The first 4-bit subsequence falls on the input of the first S-box, the second - on the input of the second, etc. The outputs of all eight S-boxes are combined into a 32-bit word, then the whole word is cyclically shifted to the left (to the most significant bits) by 11 bits. All eight S-boxes can be different. In fact, they can be additional key material, but more often they are a schema parameter that is common to a certain group of users. The text of the standard indicates that the delivery of the filling of replacement units (S-blocks) is carried out in the prescribed manner, i.e. algorithm developer. The community of Russian CIPF developers agreed on the replacement nodes used on the Internet.

Decryption is performed in the same way as encryption, but the order of subkeys k i is reversed.

Operation modes of the GOST 28147-89 algorithm

The GOST 28147-89 algorithm has four modes of operation.

1. Modesimple replacement accepts data as input, the size of which is a multiple of 64 bits. The result of encryption is the input text, converted in blocks of 64 bits in the case of encryption with the "32-3" cycle, and in the case of decryption - with the "32-P" cycle.

2. Modescaling accepts data of any size as input, as well as an additional 64-bit parameter - sync message. During operation, the sync message is converted in the "32-Z" cycle, the result is divided into two parts. The first part is added modulo 2 32 with a constant value of 1010101 16 . If the second part is equal to 2 32 -1, then its value does not change, otherwise it is added modulo 2 32 -1 with a constant value of 1010104 16 . The value obtained by combining both converted parts, called the cipher gamma, enters the "32-3" loop, its result is bitwise added modulo 2 with a 64-bit block of input data. If the latter is less than 64 bits, then the extra bits of the received value are discarded. The resulting value is sent to the output. If there is still incoming data, then the action is repeated: the block composed of 32-bit parts is converted in parts, and so on.

3. Modescaling with feedback also accepts data of any size and a sync message as input. The block of input data is bitwise added modulo 2 with the result of the transformation in the cycle "32-3" of the sync message. The resulting value is sent to the output. The value of the sync message is replaced in the case of encryption by the output block, and in the case of decryption - by the input, i.e. encrypted. If the last block of incoming data is less than 64 bits, then the extra bits of the gamma (output of the "32-3" loop) are discarded. If there is still incoming data, then the action is repeated: the gamma cipher is formed from the result of encryption of the replaced value, and so on.

4. Production modeimitation inserts takes input data that is at least two full 64-bit blocks in size, and returns a 64-bit block of data, called insert impersonation. The temporary 64-bit value is set to 0, then, while there is input data, it is bitwise added modulo 2 with the result of the execution of the "16-3" cycle, the input of which is the block of input data. After the end of the input, the temporary value is returned as the result.

Cryptanalysis of the cipher

The GOST 28147-89 cipher uses a 256-bit key and the size of the key space is 2256 . No general purpose computer currently in existence can guess a key in less than many hundreds of years. The Russian standard GOST 28147-89 was designed with a large margin and exceeds the American DES standard by many orders of magnitude with its real key size of 56 bits and the volume of the key space of only 2 56 .

There are also attacks on the full-round GOST 28147-89 without any modifications. One of the first open works, in which the analysis of the algorithm was carried out, uses the weaknesses of the key expansion procedure of a number of well-known encryption algorithms. In particular, the full-round algorithm GOST 28147-89 can be broken using differential cryptanalysis on linked keys, but only if weak substitution tables are used. The 24-round version of the algorithm (which lacks the first 8 rounds) is opened in the same way for any substitution tables, however, strong substitution tables make such an attack completely impractical.

Domestic scientists A.G. Rostovtsev and E.B. Makhovenko in 2001 proposed a fundamentally new method of cryptanalysis by forming an objective function from a known plaintext, the ciphertext corresponding to it and the desired key value and finding its extremum corresponding to the true value of the key. They also found a large class of weak keys of the GOST 28147-89 algorithm, which allow you to open the algorithm using only 4 selected plaintexts and their corresponding ciphertexts with a fairly low complexity.

In 2004, a group of experts from Korea proposed an attack with which, using differential cryptanalysis on associated keys, it is possible to obtain a 12-bit secret key with a probability of 91.7%. The attack requires 235 chosen plaintexts and 236 encryption operations. As you can see, this attack is practically useless for the real opening of the algorithm.

The substitution table is a long-term key element, that is, it is valid for much more long term than a single key. It is assumed that it is common to all encryption nodes within one cryptographic protection system. The quality of this table determines the quality of the cipher. With a "strong" substitution table, the strength of the cipher does not fall below a certain acceptable limit, even if it is disclosed. Conversely, the use of a "weak" table may reduce the strength of the cipher to an unacceptably low limit. No information on the quality of the table of substitutions in open seal Russia has not been published, but the existence of "weak" tables is beyond doubt - an example is the "trivial" substitution table, according to which each value is replaced by itself. In a number of works, it is erroneously concluded that the secret substitution tables of the GOST 28147-89 algorithm can be part of the key and increase its effective length (which is not significant, since the algorithm has a very large 256-bit key).

Encryption algorithm GOST 28147-89. Simple substitution method. — Archive WASM.RU

“While you are alive, do not die, look at this world.
Many here have a dead soul - they are dead inside.
But they walk and laugh, not knowing that they are not there,
Don't rush your death," she told me.
Aria, "Up There"
2.1 Feistel networks.
2.2 Block cipher GOST 28147-89

3.1 Key information
3.2 The main step of the crypto transformation
3.3 Basic cycles:“ 32-Z” , “ 32-R” .

4.1 Implementation of the main step of crypto transformation
4.2 Increasing the speed of the algorithm
5. Key information requirements
6. List of used literature
7. Thanks.

Introduction.

This document is my attempt to describe the method of simply replacing the GOST 28147-89 encryption algorithm in the simplest, but, nevertheless, technically literate language. How well I succeeded, the reader will give his opinion after reading the first six points.

In order for my work to give more benefit I recommend arming yourself with the works of the authors indicated in the list of used literature. A calculator is also recommended, so that it has a function for calculating the operation XOR, because reading the article assumes that the reader has set out to study this encryption algorithm. Although it is also suitable as a reference tool, I wrote this article precisely as a tutorial.

Introduction to block ciphers.

Before we begin to consider the algorithm, we need to familiarize ourselves with the history of the creation of this kind of cipher. The algorithm belongs to the category of block ciphers, in the architecture of which the information is divided into a finite number of blocks, the final one naturally may not be complete. The encryption process takes place precisely over the full blocks, which form the ciphertext. The final block, if it is incomplete, is supplemented with something (I will talk about the nuances of its addition below) and is encrypted in the same way as full blocks. By ciphertext, I mean - the result of the action of the encryption function on a certain amount of data that the user submitted for encryption. In other words, the ciphertext is the end result of encryption.

The history of the development of block ciphers is associated with the beginning of the 70s, when IBM, realizing the need to protect information when transmitting data over computer communication channels, began to implement own program scientific research dedicated to the protection of information in electronic networks, including cryptography.

A group of researchers - developers of the IBM company, which began to study encryption systems with a symmetric key scheme, was headed by Dr. Horst Feistel.

2.1 Feistel networks

The architecture of the new encryption method proposed by Feistel in the classical literature was called the "Feistel Architecture", but on this moment in Russian and foreign literature, a more established term is used - "Feistel's network" or Feistel`s NetWork. Subsequently, according to this architecture, the "Lucifer" cipher was built - which was later published and caused a new wave of interest in cryptography in general.

The idea of the Feistel network architecture is as follows: the input information stream is divided into blocks of n bits in size, where n is an even number. Each block is divided into two parts - L and R, then these parts are fed into an iterative block cipher, in which the result of the j-th stage is determined by the result of the previous stage j-1! This can be illustrated with an example:

Rice. 1

Where, function A is the main action of the block cipher. It can be a simple action, such as an XOR operation, or it can have a more complex look, be a sequence of a number of simple actions - modulo addition, left shift, element replacement, etc., in total these simple steps form the so-called - the main step of crypto-transformation.

It should be noted that the key elements of the function are the supply of key elements and the XOR operation, and how well thought out the work of these operations is, speaks of the cryptographic strength of the cipher as a whole.

In order for the idea of Feistel networks to be completely clear, consider the simplest case depicted in Fig. rice. 1, where in the function A - the operations “mod 2” (“xor”) will perform, but this protozoan case, in a more serious situation, for example, hiding information of national importance, function A can be more complex (as far as I have seen, function A can indeed be very complicated):

Initial data:

L=1110b, R=0101, K=1111b

Get a ciphertext

1. (R + K) mod 2 4 = Smod, Smod = 0100b

2. (Smod + L) mod 2 = Sxor, Sxor = 1010b

3. L=R, R=Sxor

L=0101b, R=1010b

Let's explain our steps:

1. This operation is addition mod 2 4 . In practice, such an operation boils down to a simple addition, where we must add two numbers and ignore the carry to the 5th digit. Since, if we put exponents over the digits of the binary representation of a number, there will be an indicator of four over the fifth digit, take a look at the figure below, which shows the actions of our operation:

Rice. 2

Here I pointed to the exponents with an arrow, as you can see, the result should have been 10100, but since the carry is ignored during the mod 2 4 operation, we get 0100.

2. This operation in the literature is called mod 2, in assembly language it is implemented by the command XOR. But her more correct name mod 2 1 . Without this unique operation, it is hardly possible to build a fast, easy-to-implement encryption algorithm and, at the same time, be quite crypto-resistant. The uniqueness of this operation lies in the fact that it is the reverse of itself! For example, if the number A is XORed with the number B, the result will be C, in the future it is enough to re-XOR the numbers B and C together to get the previous value of A!

In this operation, we got 1010 having the numbers 1110 and 0100, to get back 1110, it is enough to XOR the numbers 0100 and 1010 together! More details about this operation can be found in the article, which is embedded on the site. www.wasm.ru, « Elementary guide toCRC_error detection algorithms» the author who Ross N. Williams. There is a paragraph in this work - " 5. Binary arithmetic excluding transfers". It is in this article that the operation is described xor! I exclaim because in this article this operation is described in such a way that the reader not only understands how this operation works, he even begins it see, hear and feel!

3. This action is necessary so that when decrypting from the ciphertext, you can get the original values.

2.2 Block cipher GOST 28147-89

The encryption algorithm GOST 28147 - 89 belongs to the category of block ciphers operating on the Feistel balanced network architecture, where two parts of the selected information block are of equal size. The algorithm was developed in the depths of the eighth department of the KGB, now transformed into FAPSI, and was fixed as an encryption standard. Russian Federation back in 1989 under the USSR.

For this method of the algorithm to work, it is necessary to break the information into blocks of 64 bits in size. Generate or enter into the encryption system the following key information: key and substitution table. The choice of the key and substitution table for encryption should be taken very seriously, because. this is the foundation of the security of your information. About what requirements are imposed on the key, and the table of substitutions, see the paragraph "Requirements for key information".

When considering the method, we will not focus on this, because. this article, as I said above, was written with the aim of teaching the reader how to encrypt data using the simple replacement method this algorithm encryption, but we will definitely touch on this issue at the end of the article.

theoretical minimum.

3.1 Key information

As I said above, the following are actively involved in data encryption:

3.1.1. The key is a sequence of eight elements of 32 bits each. Further, we will denote the symbol K, and the elements of which it consists - k1,k2,k3,k4,k5,k6,k7,k8.

3.1.2 The substitution table is a matrix of eight rows and sixteen columns, hereinafter referred to as Hij. Each element at the intersection of row i and column j takes 4 bits.

3.2 The main step of crypto transformation

The main action in the encryption process is the main step of the crypto transformation. This is nothing more than an action to encrypt data using a certain algorithm, only the name the developers introduced was too cumbersome.

Before starting to encrypt, the block is divided into two parts L and R, 32 bits each. The key element is selected and only then these two parts of the block are fed, the key element is the substitution table into the main step function, the result of the main step is one iteration of the base cycle, which will be discussed in the next paragraph. The main step consists of the following steps:

The addition part of the block R is added to the key element K mod 2 32 . I described a similar operation above, here the same thing is only the exponent is not “4”, but “32” - the result of this operation will be denoted by Smod in the future.

We divide the previously obtained result Smod into four bit elements s7,s6,s5,s4,s3,s2,s1,s0 and feed it into the replacement function. The replacement is as follows: the element Smod - s i is selected, from the beginning we start from the lowest element, and replace it with the value from the substitution table for the i - row and column indicated by the value of the element s i . We pass to s i +1 element and do the same and continue until we replace the value of the last element Smod - the result of this operation will be denoted as Ssimple.

In this operation, the value of Ssimple is shifted cyclically to the left by 11 bits and we get Srol.

We select the second part of the block L and add mod 2 with Srol, as a result we have Sxor.

At this stage, the L part of the block becomes equal to the value of the R part, and the R part in turn is initialized with the result of Sxor, and this completes the main step function!

3.3 Basic cycles: “32-Z”, “32-R”.

In order to encrypt information, it is necessary to break it into blocks of 64 bits in size, naturally the last block can be less than 64 bits. This fact is the Achilles' heel of this "simple substitution" method. Since its addition to 64 bits is a very important task to increase the cryptographic strength of the ciphertext, this sensitive place, if it is present in the array of information, but it may not be (for example, a file of 512 bytes in size!), should be treated with great responsibility!

After you have broken the information into blocks, you should break the key into elements:

K = k1,k2,k3,k4,k5,k6,k7,k8

Encryption itself consists in using the so-called basic cycles. Which, in turn, include the n -th number of basic crypto-transformation steps.

Basic cycles are, how to say, marked: n - m. Where n is the number of basic crypto-transformation steps in the base loop, and m is the "type" of the base loop, i.e. what is it about, "Z" encryption or "R" encryption of data.

The basic encryption cycle 32–3 consists of 32 basic cryptographic steps. The block N and the element of the key K are fed into the function that implements the actions of the step, and the first step occurs with k1, the second over the result with the element k2, etc. according to the following scheme:

k1,k2,k3,k4,k5,k6,k7,k8,k1,k2,k3,k4,k5,k6,k7,k8,k1,k2,k3,k4,k5,k6,k7,k8k8,k7, k6,k5,k4,k3,k2,k1

The 32-P decryption process proceeds in a similar way, but the key elements are fed in the reverse order:

k1,k2,k3,k4,k5,k6,k7,k8,k8,k7,k6,k5,k4,k3,k2,k1,k8,k7,k6,k5,k4,k3,k2,k1,k8, k7,k6,k5,k4,k3,k2,k1

Practice.

4.1 Implementation of the main step of crypto transformation

After we got acquainted with the theory of how to encrypt information, it is time to see how encryption works in practice.

Initial data:

Let's take a block of information N = 0102030405060708h, here the parts L and R are equal:

L = 01020304h, R = 05060708h, take the key:

K=' as28 zw37 q839 7342ui23 8e2t wqm2 ewp1' (these are ASCII codes, in order to view the hexadecimal representation, you can open this file in view mode in Total Commander pressing the key " F3” and then the key “ 3 "). In this key, the element values will be:

k1 = 'as28', k2 = 'zw37', k3 = 'q839', k4 = '7342'

k5 = 'ui23', k6 = '8e2t', k7 = 'wqm2', k8 = 'ewp1'

Also take the following substitution table:

Rice. 3

Here the rows are numbered from 0 to 7, the columns from 0 to F.

Warning: All information, including the key with the substitution table, is taken as an example for considering the algorithm!

Using the "Initial data", you need to get the result of the action of the main step of the crypto transformation.

1. Select part R = 05060708h and key element k1 = ‘as28’, in hexadecimal form the key element will look like this: 61733238h. Now we do the operation of summation mod 2 32:

Rice. 4

As you can see in the figure, we did not transfer to 33 bits marked in red and with the exponent " 32 ". And if we had other values of R and the key element, this could well happen, and then we would ignore it, and in the future use only the bits marked in yellow.

I perform such an operation with the assembler command add:

; eax=R, ebx='as28'

The result of this operation is Smod = 66793940h

2. Now the most intricate operation, but if you look closely, it is no longer as scary as it seems at first. Let's imagine Smod in the following form:

Rice. 5

I tried to visualize the Smod elements in the figure, but I’ll explain anyway:

s0 = 0, s1 = 4, s2 = 9, etc.

Now, starting from the lowest element s0, we make a replacement. Remembering the paragraph 3.2 The main step of crypto transformation» i is a row, s i is a column, we are looking for a value in the zero row and zero column:

Fig.6

So the current value of Smod is not 6679394 0 h, and 6679394 5 h.

We proceed to replace s1, i.e. four. Using the first row and fourth column (s1= 4!). Let's look at the picture:

Rice. 7

Now the value is Smod, not 667939 4 5h, 667939 2 5h. I assume that now the replacement algorithm is clear to the reader, and I can say that after the end result of Ssimple will have the following value - 11e10325h.

I will talk about how this is easiest to implement in the form of assembler commands later in the next paragraph, after I talk about the extended table.

We must shift the resulting Ssimple value 11 bits to the left.

Rice. 8

As you can see, this action is quite simple, and is implemented by one assembly language command - roll and the result of this Srol operation is 0819288Fh.

4. Now it remains part L of our block of information to XOR with the value Srol. I take calculator from w2k sp4 and get Sxor = 091b2b8bh.

5. This action is final and we simply assign, clean R the value of the L part, and initialize the L part with the value of Sxor.

Final result:

L=091b2b8bh, R=01020304h

4.2 Increasing the speed of the algorithm

Now let's talk about optimizing the algorithm for speed. When implementing a project, one has to take into account that a program that works with registers more often than with memory works the fastest, and here this judgment is also very important, because. over one block of information as many as 32 encryption steps!

When I implemented the encryption algorithm in my program, I did the following:

1. Selected part of the block L in the eax register, and R in edx.

2. Initialized into the esi register with the address of the extended key, more on that below.

3. I assigned the value of the address of the extended substitution table to the ebx register, more on this below

4. Passed the information of points 1,2, 3 to the function of the basic cycle 32 - Z or 32 - R, depending on the situation.

If you look at the scheme for supplying key elements in paragraph " Basic cycles: “32-Z”, “32-R””, then our key for the basic cycle 32 - Z can be represented as follows:

K 32-Z =

'as28','zw37','q839','7342','ui23','8e2t','wqm2','ewp1',

'as28','zw37','q839','7342','ui23','8e2t','wqm2','ewp1',

'ewp1','wqm2','8e2t','ui23','7342','q839','zw37','as28'

Those. from the beginning go k1,k2,k3,k4,k5,k6,k7,k8 - ‘ as28', 'zw37', 'q839', '7342', 'ui23', '8e2t', 'wqm2', 'ewp1' This sequence is repeated three times. Then the elements go in reverse order, i.e.: k8,k7,k6,k5,k4,k3,k2,k1 - 'ewp1', 'wqm2', '8e2t', 'ui23', '7342', 'q839', 'zw37', 'as28'.

I pre-arranged the elements in the array in the order they should be served in 32 - Z. Thus, I increased the memory required per key, but saved myself from some thinking processes that I did not need, and increased the speed of the algorithm, for by reducing the memory access time! Here I described only the key for 32 - Z, for the cycle 32 - R I did the same, but using a different scheme for supplying elements, which I also described in paragraph " Basic cycles: “32-Z”, “32-R».

It's time to describe the implementation of the replacement function, as I promised above. I could not describe earlier, because. this requires the introduction of a new concept - an extended substitution table. I can't explain to you what it is. Instead, I will show you it, and you can formulate for yourself what it is - an extended substitution table?

So, in order to understand what an extended substitution table is, we need a substitution table, for example, I will take the one shown in Fig. 3.

For example, we needed to replace the number 66793940h. I will present it in the following form:

Rice. 9

Now if we take the elements s1,s0, i.e. low byte, then the result of the replacement function will be 25h! After reading the article by Andrey Vinokurov, which I cited in the paragraph “ List of used literature”, you will actually find that if you take two lines, you can get an array that allows you to quickly find replacement elements using an assembler command xlat. They say it is possible in another way, faster, but Andrey Vinokurov spent about four years researching fast algorithms for implementing GOST! I don't think you should reinvent the wheel when it already exists.

So, about the array:

Let's take the first two lines, zero and first, and create an array of 256 bytes. Now we observe one feature that if you need to convert 00h, then the result will be 75h (based on Fig. 3) - put this value in the array at offset 00h. We take the value 01h, the result of the replacement function is 79h, put it in the array at offset 01 and so on until 0FFh, which will give us 0FCh, which we will put in the array at offset 0FFh. So we got an extended table of replacements for the first group of strings: the first and zero. But there are still three groups: the second page 2, page 3, the third page 4, page 5, the fourth page 6, page 7. With these three groups we proceed in the same way as with the first. The result is an extended substitution table!

Now you can implement an algorithm that will perform the replacement. To do this, we take the source codes that Andrey Vinokurov posted on his page, see " Bibliography».

lea ebx,extented_table_simple

mov eax,[put number to replace]

add ebx,100h ;go to the next two nodes

sub ebx,300h ; so that in the future ebx points to the table

Now one more feature, we not only replaced the previous actions, but also shifted the number by 8 bits to the left! We just have to shift the number another 3 bits to the left:

and we get the result of the operation rol eax,11!

There is nothing more I can add on optimization, the only thing I can emphasize what I said above is to use registers more often than memory accesses. I think these words are only for beginners, experienced people understand this very well without my words.

Key Information Requirements.

As stated in the article by Andrey Vinokurov, the key is selected according to two criteria:

The criterion for the equiprobable distribution of bits between the values 1 and 0. Usually, the criterion for the equiprobable distribution of bits is Pearson's criterion ("chi-square").

This means the key, in principle any number can. That is, when forming the next bit of the key, the probability of its initialization by one or zero is 50/50!

Please note that the key has eight elements, each 32 bits, so there are 32 * 8 = 256 bits in total in the key and the number possible keys 2256! Doesn't it strike you?

series criteria.

If we look at our key, which I gave in paragraph " 4.1 Implementation of the main step of crypto transformation”, then you will notice that the following entry is true:

Rice. 10

In one phrase, the value of k 1 should not be repeated not in k 2 , not in any other element of the key.

That is, the key that we have chosen as the consideration of the encryption algorithm, well meets the two criteria above.

Now about the choice of substitution table:

Now let's talk about how to choose the right substitution table. The main requirement for the choice of substitution tables is the phenomenon of "non-repeatability" of elements, each of which is 4 bits in size. As you saw above, each row of the substitution table consists of the values 0h, 1h, 2h, 3h, ..., 0fh. So the main requirement is that each line contains the values 0h, 1h, 2h, ... , 0fh and each such value in one copy. For example, the sequence:

1 2 3 4 5 6 7 8 9 A B C D E F

Fully meets this requirement, but still! It is not recommended to select such a sequence as a string. Because if you supply a value to the input of a function that relies on such a string, then you will get the same value at the output! Don't believe? Then take the number 332DA43Fh and eight such lines as a substitution table. Perform the replacement operation, and I assure you, the output will be the number 332DA43Fh! That is, the same that you submitted to the input of the operation! And this is not a sign of good form in encryption, and was it?

That was one requirement, the next criterion says that - each bit of the output block must be statistically independent of each bit of the input block!

How does it look easier? And here is how, for example, we chose the element s0 = 0Fh, 01111b from the above number. The probability that we now replace the first bit with a one or a zero is 0.5! The probability of replacing the second, third and fourth bits, each bit, considered separately, by ones or zeros is also 0.5. When choosing s1 = 0Eh, we will replace the probability that we are a zero bit, and this is “0”, by zero or one too equals - 0.5! Thus, according to this criterion, there is no regularity between the replacement of zero bits of elements s0, s1! Yes, you could substitute ones, but you could also put zeros.

To evaluate the table according to this criterion, you can build a table of correlation coefficients calculated by the formula:

If p = 1, then the value of bit j at the output is equal to the value of bit i at the input for any combination of bits at the input;

If p = -1, then the value of bit j at the output is always the inverse of input bit i;

If p = 0, then the output bit j takes on the values 0 and 1 with equal probability for any fixed value of the input bit i.

Let's take a single line example:

Let's break it down into components:

We calculate one coefficient using the formula above. To make it easier to understand how this is done, I will explain in more detail:

We take the 0th bit of the 0th number (0) at the input and the 0th bit of the 0th number at the output (1) and perform the operation 0 XOR 1 = 1.

We take the 0th bit of the 1st number (1) at the input and the 0th bit of the 1st number at the output (1) and perform the operation 1 XOR 1 = 0.

We take the 0th bit of the 2nd number (0) at the input and the 0th bit of the 2nd number at the output (0) and perform the operation 0 XOR 0 = 0.

We take the 0th bit of the 3rd number (1) at the input and the 0th bit of the 3rd number at the output (1) and perform the operation 1 XOR 1 = 0.

After sequentially performing XOR operations in such a sequence, we count the number of all non-zero values, we get the value 6. Hence P 00 = 1-(6/2 4-1) = 0.25. So, it turned out that the value of bit 0 at the output is equal to the value of bit 0 at the input in 4 cases out of 16;

Final odds table:

As can be seen from the table of correlation coefficients, bit 3 at the input is inverted relative to bit 0 at the output in 14 cases out of 16, which is 87.5% This is no longer acceptable for normal encryption systems. For a change, let's take another example:

The table of coefficients will be as follows (who is not lazy to recalculate)

Well, things are even worse in this table - bits 1 and 2 of the group remain unchanged! The cryptanalyst has a place to turn around Taking into account all these requirements, a simple enumeration (“head-on”) found permutation tables corresponding to the specified theory (today - 1276 combinations) Here are some of them:

09 0D 03 0E-06 02 05 08-0A 07 00 04-0C 01 0F 0B

00 05 0A 07-03 08 0F 0C-0E 0B 04 09-0D 06 01 02

06 0B 0F 00-0C 01 02 0D-08 07 09 04-05 0A 03 0E

04 0E 00 09-0B 01 0F 06-03 0D 07 0A-0C 02 08 05

04 02 08 0E-05 0F 03 09-0B 01 0D 07-0A 0C 06 00

07 03 09 0C-08 00 06 0F-0E 04 01 0A-0D 0B 02 05

06 0F 03 08-0D 04 0A 01-09 02 05 0C-00 0B 0E 07

0C 06 08 01-03 09 07 0E-0B 05 0F 02-04 0A 00 0D

04 0B 09 06-0E 01 00 0F-0A 05 03 0C-0D 02 07 08

00 0E 0F 01-07 08 09 06-04 0B 0A 05-03 0D 0C 02

0F 09 01 07-04 0A 08 06-0E 00 02 0C-05 03 0B 0D

0A 03 04 01-05 0C 0B 0E-08 06 0F 0D-07 09 00 02

0B 06 0F 01-04 0A 08 05-00 0D 0C 02-07 09 03 0E

0C 03 02 08-0D 06 0B 05-07 09 04 0F-0A 00 01 0E

02 0B 0F 04-09 00 06 0D-05 0E 01 08-0C 07 0A 03

List of used literature.

Article by Andrey Vinokurov:

Encryption algorithm GOST 28147-89, its use and implementation

for computers Intel platforms x86.

Here are the source codes for the implementation of the encryption algorithm.

Article by Horst Feistel:

Cryptography and Computer Security.

(can be found at the same address as the previous article)

Ross N. Williams:

An Elementary Guide to CRC Error Detection Algorithms
Posted on the site www.wasm.en.

Thanks.

I would like to express my gratitude to all visitors of the www.wasm.ru forum. But I would especially like to thank ChS, who is currently known as SteelRat, he helped me understand things that I probably would never understand, as well as helping me write the paragraph: “ Key information requirements”, the main part of this paragraph was written by him. I am also deeply grateful to the employee of KSTU. A.N. Tupolev Anikin Igor Vyacheslavovich and it would be a sin not to mention Chris Kaspersky, for the fact that he is, and Volodya / wasm.ru for his instructions. Oh, and I get from him. I also want to note Sega-Zero / Callipso for bringing some mathematical jungle to my mind.

This is perhaps all I would like to tell you.

I would be grateful for criticism or questions related to this article or just advice. My contact details: [email protected], ICQ - 337310594.

Sincerely, Evil`s Interrupt.

P.S.: I did not try to outdo anyone with this article. It was written with the intent to make it easier to study GOST, and if you have difficulties, this does not mean that I am to blame for this. Be reasonable and be patient, all the best to you!

“While you are alive, do not die, look at this world.
Many here have a dead soul - they are dead inside.
But they walk and laugh, not knowing that they are not there,
Don't rush your death," she told me.

Aria, "Up There"

Introduction

Introduction to block ciphers

2.1 Feistel networks.
2.2 Block cipher GOST 28147-89

Theoretical minimum

3.1 Key information
3.2 The main step of the crypto transformation

3.3 Basic cycles:“ 32-Z” , “ 32-R” .

Practice

4.1 Implementation of the main step of crypto transformation
4.2 Increasing the speed of the algorithm
5.
6. List of used literature
7. Thanks.

Introduction.

This document is my attempt to describe the method of simply replacing the GOST 28147-89 encryption algorithm in the simplest, but, nevertheless, technically literate language. How well I succeeded, the reader will give his opinion after reading the first six points.

In order for my work to be more useful, I recommend arming yourself with the works of the authors indicated in the list of used literature. A calculator is also recommended, so that it has a function for calculating the operation XOR, because reading the article assumes that the reader has set out to study this encryption algorithm. Although it is also suitable as a reference tool, I wrote this article precisely as a tutorial.

Introduction to block ciphers.

Before we begin to consider the algorithm, we need to familiarize ourselves with the history of the creation of this kind of cipher. The algorithm belongs to the category of block ciphers, in the architecture of which the information is divided into a finite number of blocks, the final one naturally may not be complete. The encryption process takes place precisely over the full blocks, which form the ciphertext. The final block, if it is incomplete, is supplemented with something (I will talk about the nuances of its addition below) and is encrypted in the same way as full blocks. By ciphertext, I mean - the result of the action of the encryption function on a certain amount of data that the user submitted for encryption. In other words, the ciphertext is the end result of encryption.

The history of the development of block ciphers is associated with the beginning of the 70s, when IBM, realizing the need to protect information when transmitting data over computer communication channels, began to carry out its own research program dedicated to the protection of information in electronic networks, including cryptography.

A group of researchers - developers of the IBM company, which began to study encryption systems with a symmetric key scheme, was headed by Dr. Horst Feistel.

2.1 Feistel networks

The architecture of the new encryption method proposed by Feistel in the classical literature was called the "Feistel Architecture", but at the moment in Russian and foreign literature a more established term is used - "Feistel's network" or Feistel's NetWork. Subsequently, according to this architecture, the Lucifer cipher was built - which was later published and caused a new wave of interest in cryptography in general.

The idea of the Feistel network architecture is as follows: the input information stream is divided into blocks of n bits in size, where n is an even number. Each block is divided into two parts - L and R, then these parts are fed into an iterative block cipher, in which the result of the j-th stage is determined by the result of the previous stage j-1! This can be illustrated with an example:

Rice. 1

Where, function A is the main action of the block cipher. It can be a simple action, such as the XOR operation, or it can have a more complex form, be a sequence of a number of simple actions - modulo addition, left shift, element replacement, etc., in aggregate, these simple actions form the so-called - the main step of the crypto transformation.

It should be noted that the key elements of the function are the supply of key elements and the XOR operation, and how well thought out the work of these operations is, speaks of the cryptographic strength of the cipher as a whole.

In order for the idea of Feistel networks to be completely clear, consider the simplest case depicted in Fig. rice. 1, where in the function A - the operations “mod 2” (“xor”) will perform, but this protozoan case, in a more serious situation, for example, hiding information of national importance, function A can be more complex (as far as I have seen, function A can indeed be very complicated):

Initial data:

L=1110b, R=0101, K=1111b

Get a ciphertext

(R + K) mod 2 4 = Smod, Smod = 0100b
(Smod + L) mod 2 = Sxor, Sxor = 1010b
L=R, R=Sxor

L=0101b, R=1010b

Let's explain our steps:

This operation is addition mod 2 4 . In practice, such an operation boils down to a simple addition, where we must add two numbers and ignore the carry to the 5th digit. Since, if we put exponents over the digits of the binary representation of a number, there will be an indicator of four over the fifth digit, take a look at the figure below, which shows the actions of our operation:

Rice. 2

Here I pointed to the exponents with an arrow, as you can see, the result should have been 10100, but since the carry is ignored during the mod 2 4 operation, we get 0100.

This operation in the literature is called mod 2, in assembly language it is implemented by the command XOR. But its more correct name is mod 2 1 . Without this unique operation, it is hardly possible to build a fast, easy-to-implement encryption algorithm and, at the same time, be quite crypto-resistant. The uniqueness of this operation lies in the fact that it is the reverse of itself! For example, if the number A is XORed with the number B, the result will be C, in the future it is enough to re-XOR the numbers B and C together to get the previous value of A!

In this operation, we got 1010 having the numbers 1110 and 0100, to get back 1110, it is enough to XOR the numbers 0100 and 1010 together! More details about this operation can be found in the article, which is embedded on the site. www.wasm.ru, « Elementary guide toCRC_error detection algorithms» the author who Ross N. Williams. There is a paragraph in this work - 5. Binary arithmetic without carries". It is in this article that the operation is described xor! I exclaim because in this article this operation is described in such a way that the reader not only understands how this operation works, he even begins it see, hear and feel!

This action is necessary so that when decrypting from the ciphertext, you can get the original values.

2.2 Block cipher GOST 28147-89

The encryption algorithm GOST 28147 - 89 belongs to the category of block ciphers operating on the Feistel balanced network architecture, where two parts of the selected information block are of equal size. The algorithm was developed in the bowels of the eighth department of the KGB, now transformed into FAPSI, and was enshrined as the encryption standard of the Russian Federation back in 1989 under the USSR.

For this method of the algorithm to work, it is necessary to break the information into blocks of 64 bits in size. Generate or enter into the encryption system the following key information: key and substitution table. The choice of the key and substitution table for encryption should be taken very seriously, because. this is the foundation of the security of your information. About what requirements are imposed on the key, and the table of substitutions, see the paragraph "Requirements for key information".

When considering the method, we will not focus on this, because. this article, as I said above, was written with the aim of teaching the reader how to encrypt data using the method of simply replacing this encryption algorithm, but we will definitely touch on this issue at the end of the article.

theoretical minimum.

3.1 Key information

As I said above, the following are actively involved in data encryption:

3.1.1. The key is a sequence of eight elements of 32 bits each. Further, we will denote the symbol K, and the elements of which it consists - k1,k2,k3,k4,k5,k6,k7,k8.

3.1.2 The substitution table is a matrix of eight rows and sixteen columns, hereinafter referred to as Hij. Each element at the intersection of row i and column j takes 4 bits.

The main action in the encryption process is the main step of the crypto transformation. This is nothing more than an action to encrypt data according to a certain algorithm, only the name the developers introduced was too cumbersome :).

Before starting to encrypt, the block is divided into two parts L and R, 32 bits each. The key element is selected and only then these two parts of the block are fed, the key element is the substitution table into the main step function, the result of the main step is one iteration of the base cycle, which will be discussed in the next paragraph. The main step consists of the following steps:

The addition part of the block R is added to the key element K mod 2 32 . I described a similar operation above, here it’s the same thing, only the exponent is not “4”, but “32” - the result of this operation will be denoted by Smod in the future.
We divide the previously obtained result Smod into four bit elements s7,s6,s5,s4,s3,s2,s1,s0 and feed it into the replacement function. The replacement is as follows: the element Smod - s i is selected, from the beginning we start from the youngest element, and replace it with the value from the table of replacements by i - the row and column indicated by the value of the element s i . We pass to s i +1 element and do the same and continue until we replace the value of the last element Smod - the result of this operation will be denoted as Ssimple.
In this operation, the value of Ssimple is shifted cyclically to the left by 11 bits and we get Srol.
We select the second part of the block L and add mod 2 with Srol, as a result we have Sxor.
At this stage, the L part of the block becomes equal to the value of the R part, and the R part in turn is initialized with the result of Sxor, and this completes the main step function!

3.3 Basic cycles: “32-Z”, “32-R”.

In order to encrypt information, it is necessary to break it into blocks of 64 bits in size, naturally the last block can be less than 64 bits. This fact is the Achilles' heel of this "simple substitution" method. Since its addition to 64 bits is a very important task to increase the cryptographic strength of the ciphertext, this sensitive place, if it is present in the array of information, but it may not be (for example, a file of 512 bytes in size!), should be treated with great responsibility!

After you have broken the information into blocks, you should break the key into elements:

K = k1,k2,k3,k4,k5,k6,k7,k8

Encryption itself consists in using the so-called basic cycles. Which, in turn, include the n -th number of basic crypto-transformation steps.

Basic cycles are, how to say, marked: n - m. Where n is the number of basic crypto-transformation steps in the base loop, and m is the "type" of the base loop, i.e. what is it about, "Z" encryption or "R" encryption of data.

The basic encryption cycle 32–3 consists of 32 basic cryptographic steps. The block N and the element of the key K are fed into the function that implements the actions of the step, and the first step occurs with k1, the second over the result with the element k2, etc. according to the following scheme:

k1,k2,k3,k4,k5,k6,k7,k8,k1,k2,k3,k4,k5,k6,k7,k8,k1,k2,k3,k4,k5,k6,k7,k8k8,k7, k6,k5,k4,k3,k2,k1

The 32-P decryption process proceeds in a similar way, but the key elements are fed in the reverse order:

k1,k2,k3,k4,k5,k6,k7,k8,k8,k7,k6,k5,k4,k3,k2,k1,k8,k7,k6,k5,k4,k3,k2,k1,k8, k7,k6,k5,k4,k3,k2,k1

Practice.

After we got acquainted with the theory of how to encrypt information, it is time to see how encryption works in practice.

Initial data:

Let's take a block of information N = 0102030405060708h, here the parts L and R are equal:

L = 01020304h, R = 05060708h, take the key:

K=' as28 zw37 q839 7342ui23 8e2t wqm2 ewp1' (these are ASCII codes, in order to view the hexadecimal representation, you can open this file in view mode in Total Commander by pressing the " F3” and then the key “ 3 "). In this key, the element values will be:

k1 = 'as28', k2 = 'zw37', k3 = 'q839', k4 = '7342'

k5 = 'ui23', k6 = '8e2t', k7 = 'wqm2', k8 = 'ewp1'

Also take the following substitution table:

Rice. 3

Here the rows are numbered from 0 to 7, the columns from 0 to F.

Warning: All information, including the key with the substitution table, is taken as an example for considering the algorithm!

Using the "Initial data", you need to get the result of the action of the main step of the crypto transformation.

We select the part R = 05060708h and the key element k1 = ‘as28’, in hexadecimal form the key element will look like this: 61733238h. Now we do the operation of summation mod 2 32:

Rice. 4

As you can see in the figure, we did not transfer to 33 bits marked in red and with the exponent " 32 ". And if we had other values of R and the key element, this could well happen, and then we would ignore it, and in the future use only the bits marked in yellow.

I perform such an operation with the assembler command add:

; eax=R, ebx='as28'

The result of this operation is Smod = 66793940h

Now the most intricate operation, but if you look closely, it is no longer as scary as it seems at first. Let's imagine Smod in the following form:

PATTERN NOT SAVE

Rice. 5

I tried to visualize the Smod elements in the figure, but I’ll explain anyway:

s0 = 0, s1 = 4, s2 = 9, etc.

Now, starting from the lowest element s0, we make a replacement. Remembering the paragraph 3.2 The main step of crypto transformation» i is a row, s i is a column, we are looking for a value in the zero row and zero column:

Fig.6

So the current value of Smod is not 6679394 0 h, and 6679394 5 h.

We proceed to replace s1, i.e. four. Using the first row and fourth column (s1= 4!). Let's look at the picture:

Rice. 7

Now the value is Smod, not 667939 4 5h, 667939 2 5h. I assume that now the replacement algorithm is clear to the reader, and I can say that after the end result of Ssimple will have the following value - 11e10325h.

I will talk about how this is easiest to implement in the form of assembler commands later in the next paragraph, after I talk about the extended table.

We must shift the resulting Ssimple value 11 bits to the left.

Rice. 8

As you can see, this action is quite simple, and is implemented by one assembly language command - roll and the result of this Srol operation is 0819288Fh.

Now it remains part L of our block of information to XOR with the value Srol. I take calculator from w2k sp4 and get Sxor = 091b2b8bh.
This action is final and we simply assign, clean R the value of the L part, and initialize the L part with the value of Sxor.

Final result:

L=091b2b8bh, R=01020304h

4.2 Increasing the speed of the algorithm

Now let's talk about optimizing the algorithm for speed. When implementing a project, one has to take into account that a program that works with registers more often than with memory works the fastest, and here this judgment is also very important, because. over one block of information as many as 32 encryption steps!

When I implemented the encryption algorithm in my program, I did the following:

Chose part of block L into register eax, and R into edx.
Initialized into the esi register with the address of the extended key, more on that below.
I assigned the value of the address of the extended substitution table to the ebx register, more on this below
Passed the information of points 1,2, 3 to the function of the basic cycle 32 - Z or 32 - R, depending on the situation.

If you look at the scheme for supplying key elements in paragraph " Basic cycles: “32-Z”, “32-R””, then our key for the basic cycle 32 - Z can be represented as follows:

K 32-Z =

'as28','zw37','q839','7342','ui23','8e2t','wqm2','ewp1',

'ewp1','wqm2','8e2t','ui23','7342','q839','zw37','as28'

Those. from the beginning go k1,k2,k3,k4,k5,k6,k7,k8 - ‘ as28', 'zw37', 'q839', '7342', 'ui23', '8e2t', 'wqm2', 'ewp1' This sequence is repeated three times. Then the elements go in reverse order, i.e.: k8,k7,k6,k5,k4,k3,k2,k1 - 'ewp1', 'wqm2', '8e2t', 'ui23', '7342', 'q839', 'zw37', 'as28'.

I pre-arranged the elements in the array in the order they should be served in 32 - Z. Thus, I increased the memory required per key, but saved myself from some thinking processes that I did not need, and increased the speed of the algorithm, for by reducing the memory access time! Here I described only the key for 32 - Z, for the cycle 32 - R I did the same, but using a different scheme for supplying elements, which I also described in paragraph " Basic cycles: “32-Z”, “32-R».

It's time to describe the implementation of the replacement function, as I promised above. I could not describe earlier, because. this requires the introduction of a new concept - an extended substitution table. I can't explain to you what it is. Instead, I will show you it, and you can formulate for yourself what it is - an extended substitution table?

So, in order to understand what an extended substitution table is, we need a substitution table, for example, I will take the one shown in Fig. 3.

For example, we needed to replace the number 66793940h. I will present it in the following form:

PATTERN NOT SAVE

Rice. 9

Now if we take the elements s1,s0, i.e. low byte, then the result of the replacement function will be 25h! After reading the article by Andrey Vinokurov, which I cited in the paragraph “ List of used literature”, you will actually find that if you take two lines, you can get an array that allows you to quickly find replacement elements using an assembler command xlat. They say it is possible in another way, faster, but Andrey Vinokurov spent about four years researching fast algorithms for implementing GOST! I don't think you should reinvent the wheel when it already exists.

So, about the array:

Let's take the first two lines, zero and first, and create an array of 256 bytes. Now we observe one feature that if you need to convert 00h, then the result will be 75h (based on Fig. 3) - put this value in the array at offset 00h. We take the value 01h, the result of the replacement function is 79h, put it in the array at offset 01 and so on until 0FFh, which will give us 0FCh, which we will put in the array at offset 0FFh. So we got an extended table of replacements for the first group of strings: the first and zero. But there are still three groups: the second page 2, page 3, the third page 4, page 5, the fourth page 6, page 7. With these three groups we proceed in the same way as with the first. The result is an extended substitution table!

Now you can implement an algorithm that will perform the replacement. To do this, we take the source codes that Andrey Vinokurov posted on his page, see " Bibliography».

lea ebx,extented_table_simple

mov eax,[put number to replace]

add ebx,100h ;go to the next two nodes

sub ebx,300h ; so that in the future ebx points to the table

Now one more feature, we not only replaced the previous actions, but also shifted the number by 8 bits to the left! We just have to shift the number another 3 bits to the left:

and we get the result of the operation rol eax,11!

There is nothing more I can add on optimization, the only thing I can emphasize what I said above is to use registers more often than memory accesses. I think these words are only for beginners, experienced people understand this very well without my words :).

Key Information Requirements.

As stated in the article by Andrey Vinokurov, the key is selected according to two criteria:

- criterion of equiprobable distribution of bits between values 1 and 0. Usually, Pearson's criterion ("chi-square") acts as a criterion of equiprobable distribution of bits.

This means the key, in principle any number can. That is, when forming the next bit of the key, the probability of its initialization by one or zero is 50/50!

Please note that the key has eight elements, each 32 bits, so there are 32 * 8 = 256 bits in total in the key and the number of possible keys is 2 256! Doesn't it strike you? 🙂

— series criterion.

If we look at our key, which I gave in paragraph " 4.1 Implementation of the main step of crypto transformation”, then you will notice that the following entry is true:

Rice. 10

In one phrase, the value of k 1 should not be repeated not in k 2 , not in any other element of the key.

That is, the key that we have chosen as the consideration of the encryption algorithm, well meets the two criteria above.

Now about the choice of substitution table:

Now let's talk about how to choose the right substitution table. The main requirement for the choice of substitution tables is the phenomenon of "non-repeatability" of elements, each of which is 4 bits in size. As you saw above, each row of the substitution table consists of the values 0h, 1h, 2h, 3h, ..., 0fh. So the main requirement is that each line contains the values 0h, 1h, 2h, ... , 0fh and each such value in one copy. For example, the sequence:

1 2 3 4 5 6 7 8 9 A B C D E F

Fully meets this requirement, but still! It is not recommended to select such a sequence as a string. Because if you supply a value to the input of a function that relies on such a string, then you will get the same value at the output! Don't believe? Then take the number 332DA43Fh and eight such lines as a substitution table. Perform the replacement operation, and I assure you, the output will be the number 332DA43Fh! That is, the same that you submitted to the input of the operation! And this is not a sign of good form in encryption, and was it? 🙂

That was one requirement, the next criterion says that - each bit of the output block must be statistically independent of each bit of the input block!

How does it look easier? And here is how, for example, we chose the element s0 = 0Fh, 01111b from the above number. The probability that we now replace the first bit with a one or a zero is 0.5! The probability of replacing the second, third and fourth bits, each bit, considered separately, by ones or zeros is also 0.5. When choosing s1 = 0Eh, we will replace the probability that we are a zero bit, and this is “0”, by zero or one too equals - 0.5! Thus, according to this criterion, there is no regularity between the replacement of zero bits of elements s0, s1! Yes, you could substitute ones, but you could also put zeros. 🙂

To evaluate the table according to this criterion, you can build a table of correlation coefficients calculated by the formula:

— if p = 1, then the value of bit j at the output is equal to the value of bit i at the input for any combination of bits at the input;

— if p = -1, then the value of bit j at the output is always the inversion of the input bit i;

— if p = 0, then the output bit j takes on the values 0 and 1 with equal probability for any fixed value of the input bit i.

Let's take a single line example:

D	B	4	1	3	F	5	9	0	A	E	7	6	8	2	C

Let's break it down into components:

We calculate one coefficient using the formula above. To make it easier to understand how this is done, I will explain in more detail:

- we take the 0th bit of the 0th number (0) at the input and the 0th bit of the 0th number at the output (1) we perform the operation 0 XOR 1 = 1.

- we take the 0th bit of the 1st number (1) at the input and the 0th bit of the 1st number at the output (1) we perform the operation 1 XOR 1 = 0.

- we take the 0th bit of the 2nd number (0) at the input and the 0th bit of the 2nd number at the output (0) we perform the operation 0 XOR 0 = 0.

- we take the 0th bit of the 3rd number (1) at the input and the 0th bit of the 3rd number at the output (1) we perform the operation 1 XOR 1 = 0.

After sequentially performing XOR operations in such a sequence, we count the number of all non-zero values, we get the value 6. Hence P 00 = 1-(6/2 4-1) = 0.25. So, it turned out that the value of bit 0 at the output is equal to the value of bit 0 at the input in 4 cases out of 16;

Final odds table:

The table of coefficients will be as follows (who is not lazy to recalculate)

	Entrance
Exit		0	1	2	3
	0	-0,25	0,00	0,00	0,00
	1	0,00	1,00	0,00	0,00
	2	0,00	0,00	1,00	0,00
	3	0,00	0,00	0,00	-0,50

Well, things are even worse in this table - bits 1 and 2 of the group remain unchanged! The cryptanalyst has a place to turn around 🙂 Taking into account all these requirements, a simple enumeration (“head-on”) found permutation tables corresponding to the specified theory (today - 1276 combinations) Here are some of them:

09 0D 03 0E-06 02 05 08-0A 07 00 04-0C 01 0F 0B

00 05 0A 07-03 08 0F 0C-0E 0B 04 09-0D 06 01 02

06 0B 0F 00-0C 01 02 0D-08 07 09 04-05 0A 03 0E

04 0E 00 09-0B 01 0F 06-03 0D 07 0A-0C 02 08 05

04 02 08 0E-05 0F 03 09-0B 01 0D 07-0A 0C 06 00

07 03 09 0C-08 00 06 0F-0E 04 01 0A-0D 0B 02 05

06 0F 03 08-0D 04 0A 01-09 02 05 0C-00 0B 0E 07

0C 06 08 01-03 09 07 0E-0B 05 0F 02-04 0A 00 0D

04 0B 09 06-0E 01 00 0F-0A 05 03 0C-0D 02 07 08

00 0E 0F 01-07 08 09 06-04 0B 0A 05-03 0D 0C 02

0F 09 01 07-04 0A 08 06-0E 00 02 0C-05 03 0B 0D

0A 03 04 01-05 0C 0B 0E-08 06 0F 0D-07 09 00 02

0B 06 0F 01-04 0A 08 05-00 0D 0C 02-07 09 03 0E

0C 03 02 08-0D 06 0B 05-07 09 04 0F-0A 00 01 0E

02 0B 0F 04-09 00 06 0D-05 0E 01 08-0C 07 0A 03

List of used literature.

Article by Andrey Vinokurov:

Encryption algorithm GOST 28147-89, its use and implementation

for Intel x86 platform computers.

(can be found at: http://www.enlight.ru/crypto/frame.htm).

Here are the source codes for the implementation of the encryption algorithm.

Article by Horst Feistel:

Cryptography and Computer Security.

(can be found at the same address as the previous article)

Ross N. Williams:

An Elementary Guide to CRC Error Detection Algorithms

Posted on the site www.wasm.en.

Thanks.

I would like to express my gratitude to all visitors of the forum www.wasm.ru. But I would especially like to thank ChS, who is currently known as SteelRat, he helped me understand things that I probably would never understand, as well as helping me write the paragraph: “ Key information requirements”, the main part of this paragraph was written by him. I am also deeply grateful to the employee of KSTU. A.N. Tupolev Anikin Igor Vyacheslavovich and it would be a sin not to mention Chris Kaspersky, for the fact that he is, and Volodya / wasm.ru for his instructions. Oh, and I get from him :). I also want to note Sega-Zero / Callipso for bringing some mathematical jungle to my mind.

This is perhaps all I would like to tell you.

I would be grateful for criticism or questions related to this article or just advice. My contact details: [email protected], ICQ - 337310594.

Sincerely, Evil`s Interrupt.

P.S.: I did not try to outdo anyone with this article. It was written with the intent to make it easier to study GOST, and if you have difficulties, this does not mean that I am to blame for this. Be reasonable and be patient, all the best to you!

The well-known term "processor performance" in society is an objective, calculated parameter that is measured in flops. However, the majority measures it in gigahertz, naively believing that this is the same thing. Nobody knows the term "code performance", and I will immediately explain why.

The reason is that I only recently came up with it and have not yet told anyone about it. However, code performance, like processor performance, has objective characteristics that can be measured. This article is specifically about the performance of the code executed by the processor core.

How is code performance measured? Since I was the first to talk about this, then by the right of the discoverer I will measure it in RTT-scales;).

Now seriously. In modern processors, the main transformations are operations on 32-bit numbers, everything else is, by and large, exotic. Therefore, we will take into account the main thing - operations with 32-bit numbers. How many 32-bit operations do you think the core of a modern processor can simultaneously execute?

The student will answer - one, his teacher will think and say that four, the professional - that there are only twelve operations so far.

So, a program code that loads all the processor's executive devices simultaneously throughout the entire code execution time will have a performance of 12 RTT NIS. Maximum! To be honest, I have never written such code before, but in this article I will try to make an effort on myself.

I will prove that code with simultaneous execution of twelve 32-bit operations is possible

The program code that uses one execution unit in the processor core will naturally have a performance of 1 RTT. Programs generated by language compilers can "boast" of such code performance. high level, and interpreters virtual machines. It should not be assumed that the CPU usage indicator, which can be seen in the OS task manager, can serve as an objective criterion for the effectiveness of the code. The processor core load can be 100%, but the program code will use one execution device in it (performance 1 RTT). In this case, at 100% load, the processor core will work at 1/12 of its maximum performance. In other words, when Windows Task Manager shows maximum CPU usage, its actual performance can vary from 1 to 12 RTT. Seeing in the performance window 100% load on any processor core, it is wrong to assume that all executive devices work in this core, by no means!

The only criterion for indirectly evaluating the operation of a processor core with maximum performance can be its power consumption and, as a result, the noise of the cooler. Now, if the cooler is noisy, then yes - the load has gone to the maximum. However, it is time to finish with general concepts and move on to harsh practice.

Traditional implementation of GOST 28147-89

I am not a professional in the field information security, but still familiar with the topic of encryption. It was my conversations with a professional cryptographer whom I deeply respect that inspired me to get into specifically symmetric stream encryption. And, having taken up this topic, I tried to do it well, and not just well, but also quickly, performing the maximum number of operations per unit of time. In other words, I was faced with the task of writing the program code with the maximum RTT value.

Cryptographic conversion according to GOST 28147-89 is used for streaming encryption of information in communication channels and on disk drives.

Currently, the software implementation of this GOST on RON is widely used. CPU. In the well-known methods for implementing GOST, all secret information (encryption keys, replacement blocks) is placed in random access memory. This reduces the reliability of encryption, since, having a RAM dump, it is possible to completely reveal all the secret elements of the crypto transformation. In addition, the method has speed limitations due to the location of the main objects of crypto-transformation in the OP and incomplete loading of the ALU executive devices. Modern processors, implementing a cryptoprocedure using a well-known method, can provide an encryption speed of 40–60 megabytes per second. And if you really understand it to the end, then the reason for the low performance and weak security of the crypto conversion is the software implementation of the substitution block. See its description in GOST in fig. 1.

According to clause 1.2 of GOST, this block implements tetrad (four bits) permutations in a 32-bit word, but the x86 / 64 processor architecture and its instruction set are not capable of effectively manipulating tetrads.

For the software implementation of the substitution block, special tables in RAM are used, which are prepared at the stage of initialization of the cryptofunction. These tables combine the substitution nodes of adjacent tetrads into 8 × 8 bit byte tables, so there are four 256-byte tables in RAM.

In more advanced implementations, these tables are 1024 bytes in size (256 words of four bytes). This is done in order to implement in the tables an additional cyclic shift by 11 positions of the 32-bit word obtained as a result of substitution (the next operation of the conversion algorithm according to GOST). An example of the implementation of GOST according to this method shown in appendix 1 (on disk).

The information of the substitution block is a secret component of the cryptofunction (as it is formulated in GOST, see Fig. 2).

The placement of these tables with the keys of the substitution block in the OP contradicts the requirements of GOST (clause 1.7), since secret information becomes available to third party programs working on a computer installation. The FSB, which also certifies software implementations of encryption according to GOST, looks at this violation, to put it mildly, condescendingly. If to place keys in the OP, the FSB still requires a “fig leaf” - masking the keys with the XOR operation, then nothing is required for replacement blocks in the OP, they are stored in clear text.

In short, the FSB skips such software implementations of the cryptoprocedure, despite the obvious decrease in the strength of such a solution and a direct violation of its own requirements according to GOST (clause 1.7). And this is despite the well-known methods of breaking ciphers through the removal of a memory dump ...

We will return to the issue of storing keys and replacement blocks in the internal registers of the processor a little later (there is a beautiful and quick solution), but for now we will only store encryption keys in MMX registers, this is more reliable.

But enough of the lyrics, it is important within the framework of the topic under consideration that this program code has a performance of 1 RTT-shku. Now let's write a code with a performance of 2 RTTs.

Multithreaded implementation of GOST 28147-89

The only way to speed up the crypto-procedures in the known algorithm is the introduction of multithreading. The meaning of such a change in the implementation of the algorithm is to calculate several blocks of data in parallel at once.

Most programmers mean by parallel processing only the work of several processor cores, synchronized through interrupts and semaphores in memory.

However, there is another variant of parallel data processing on a single processor core. Let me explain this non-obvious idea.

Modern processors include at least two, and even three to six arithmetic logic units. These ALUs (FPUs, address arithmetic units, and so on) can work independently of each other, the only condition for their parallel operation is non-overlapping software objects that they operate on. In other words, in instructions that simultaneously execute the ALU, the memory addresses and register numbers must be different. Or, no write operations should be performed to common registers and memory addresses accessed by various executive devices of the processor.

The loading of the work of all ALUs is controlled by a special hardware block inside the processor core - the scheduler, which looks through the executable code forward, to a depth of 32–64 bytes. If the scheduler finds commands that can be run on the ALU without conflicts, then it runs them simultaneously on different execution units. In this case, the counter of executed commands points to the executable command (there are several of them in such a scheme), after which all commands have already been executed.

Most program sequences generated automatically (by compilers) cannot load all the ALUs and FPUs that are in the processor core. In this case, the processor hardware is idle, which significantly reduces its resulting performance. Processor developers understand this and introduce modes to increase the core frequency when the equipment is not fully used. Hyper trading systems are also designed for this, and I will use this system to “press” the code to the maximum in the future.

Compilers, even the most optimized ones, and even more so - virtual machine engines, cannot generate optimized code in terms of performance. Only a programmer with engineering knowledge can write such optimized code, and the tool for writing it is exclusively assembler.

A characteristic illustration of the possibility of executing several independent program threads on one processor core is the GOST implementation, which is executed in two threads on a single processor core. The idea of the code is simple: there are two blocks of data for encryption/decryption, but one processor core that will perform the conversion. It is possible to perform the transformation for these two blocks of data sequentially, and this has been done up to now. In this case, the time required to perform the transformations is doubled.

But you can do otherwise: alternate commands related to the processing of different blocks of data. Graphically, these options are shown in Fig. 3.

In the figure, the upper example shows the usual order in which two independent data blocks are processed. First, the first block is processed, then the processor proceeds to the processing of the second block. Naturally, the resulting time is equal to twice the time required to process one block, and the execution units of the processor core are not fully loaded.

The following shows an example with interleaving commands from different processing threads. In this case, commands related to different data blocks are interleaved. The scheduler selects instructions that are independent of each other and passes them to ALU1 and ALU2 for execution. The grouping of commands of the first and second threads on these ALUs is carried out automatically, since the scheduler's operation algorithm includes grouping of commands with gearing according to common data on the same executive device.

In order for such program code to work without ALU idle time, it is necessary that each program thread work with its own set of registers. The cache in this scheme becomes a bottleneck (it has only two data output ports), so we store the keys in MMX registers. Since in this case the replacement (and shift) nodes are read-only in memory, they can be shared by both program threads.

This, of course, is a very simplified explanation of the principle of parallel execution of program threads on a single core, in reality everything is much more complicated. In practice, it is necessary to take into account the pipeline architecture of execution units, restrictions on simultaneous access to the cache and the block of RON registers, the presence of address arithmetic nodes, switches, and much more ... So this is a topic for professionals who can be counted on the fingers of ... one hand.

The parallel encryption method is effectively implemented only for the 64-bit processor operation mode, since in this mode there is a sufficient amount of RON (as many as 16 pieces!). An example of the implementation of GOST according to this method is shown in Appendix 2 (on disk).

It is clear that this GOST implementation has a code performance of 2 RTTs. Now let's see how this affects the execution time.

The encryption cycle for one stream (Appendix 1) is 352 cycles, and during this time 8 bytes of data are calculated, for a two-stream implementation of GOST (Appendix 2) 416 processor cycles are required, but 16 bytes are calculated. Thus, the resulting conversion speed increases from 80 to 144 megabytes for a 3.6 GHz processor.

An interesting picture is obtained: the code contains exactly twice as many instructions, and it takes only 15% longer, but I think readers have already understood the reason for this phenomenon ...

Theoretically, the code from the second example should be executed in the same number of cycles as the code from the first example, but the scheduler node is being developed even though engineers by Intel, but also people, and we are all far from perfect. So there is an opportunity to evaluate the effectiveness of their creation. This code will work on an AMD processor as well, and you can compare their results.

If someone doesn't take my word for it, then test programs with clock counters are included on the disk for such unbelievers. Programs in source codes, naturally in assembler, so there is an opportunity to check my words, and at the same time to peep some tricks of professional coding.

Using SSE-registers and AVX-commands of modern processors for the implementation of GOST 28147-89

Modern x86/64 architecture processors include a set of 16-byte SSE registers and specialized FPUs (at least two) to perform various operations on these registers. It is possible to implement GOST on this equipment, and in this case, replacement nodes can be placed not in the form of tables in RAM, but directly on dedicated SSE registers.

On one SSE-register, you can place two tables of 16 rows at once. Thus, four SSE registers will completely accommodate all replacement tables. The only condition for such placement is the interleaving requirement, according to which tetrads of the same byte must be placed in different SSE registers. In addition, it is advisable to place the low and high tetrads of the input bytes, respectively, in the low and high tetrads of the SSE registers.

These requirements are determined by optimization for the existing set of AVX commands. Thus, each byte of the SSE register will contain two tetrads corresponding to different bytes of the input register of the substitution block, while the position of the byte in the SSE register uniquely corresponds to the index in the replacement table of the substitution block.

A diagram of one of the possible placements of replacement nodes on SSE registers is shown in fig. 4.

Placing secret information of substitution nodes on SSE registers increases the security of the cryptoprocedure, but complete isolation of this secret information is possible under the following conditions:

The processor core has been put into hypervisor host mode and the interrupt block (APIC) has been forcibly disabled. In this case, the processor core is completely isolated from the operating system and applications running on the computing installation.
The loading of SSE registers and isolation of the computational core is performed before the start of the OS; it is optimal to perform these procedures from the trusted boot module (TDM).
Programs of cryptoprocedures according to GOST are placed in the non-modifiable memory area of the computing unit (either BIOS or flash memory MDZ).

Compliance with these requirements will ensure complete isolation and immutability program code cryptoprocedures and the secret information used in them.

For effective sampling from the SSE registers of tetrads, the multi-input byte switches available in the FPUs are used. These switches allow transfers from any byte of the source to any byte of the destination, by indexes located in a special SSE index register. Moreover, the transfer is performed in parallel for all 16 bytes of the SSE-register-receiver.

Having substitution storage nodes on SSE registers and a multi-input switch in FPU units, it is possible to organize the following transformation in the substitution unit (Fig. 5).

In this scheme, the input register in each tetrad sets the address for the corresponding switch, which transfers information from the replacement node drives to the output register via the data bus. Such a scheme can be organized in three ways:

Create an appropriate chip design, but that's fantastic for us.
Reprogramming the microcode and creating your own processor instruction to implement this function on existing processors is no longer a fantasy, but, unfortunately, it is unrealistic in the current conditions.
Write a program using the official AVX commands. The option, albeit not very effective, but we can implement it “here and now”. So that's what we're going to do next.

The switches are controlled by a special three-address command AVX VPSHUFB. Its first operand is the receiver of information from the switches, the second is the source to which the inputs of the switches are connected. The third operand is a control register for switches, each byte of which is associated with a corresponding switch; the value in it specifies the number of the direction from which the switch reads information. For a description of this command from the official Intel documentation, see fig. 5. In fig. Figure 6 shows the operation of this command - only half of the SSE registers are shown, for the second half everything is similar.

The switch uses only the least significant four bits to determine the switching direction, the last bit in each byte is used to force the corresponding receiver byte to zero, but this switch function is not yet required in our case.

A program with a selection of notebooks through FPU switches was written, but I didn’t even put it into the application - it’s too poor. Having a 128 bit register and using only 32 bits in it is unprofessional.

As they say, “Our finish is the horizon”, so squeeze it out like that... we will press it and put it in bags!

This is not a play on words, but a harsh FPU reality - SSE registers can be divided into equal parts and the same transformations can be performed on these parts with one command. In order for the processor to understand this, there is a magic letter "P" - a package that is placed before the command mnemonic, and no less magic letters "Q", "D", "W", "B", which are placed at the end and declare, what parts the SSE registers are divided into in this command.

We are interested in batch mode with a breakdown of the SSE register into four 32-bit blocks; accordingly, all commands will be prefixed with "P" and terminated with a "D" symbol. This makes it possible to process four blocks of 32 bits in parallel with one processor instruction, that is, to calculate four blocks of data in parallel.

The program that implements this method is available in Appendix 3, there are all the explanations.

However, press so press! Modern processors have at least two FPUs, and two independent instruction streams can be used to fully load them. If you correctly interleave commands from independent streams, then you can completely load both FPUs and get eight parallel processed data streams at once. Such a program was written, and you can see it in Appendix 4, but you need to look carefully - you can go crazy. This is what is called, "the code is not for everyone ...".

Issue price

The use of SSE registers to store replacement nodes is understandable - it gives a certain guarantee of secret information isolation, but the meaning of calculating the cryptofunction itself on the FPU is not obvious. Therefore, measurements were taken of the execution time of standard procedures using the direct replacement method in accordance with GOST for four and eight streams.

For four threads, an execution speed of 472 processor cycles was obtained. Thus, for a processor with a frequency of 3.6 GHz, one thread is considered at a speed of 59 megabytes per second, and four threads, respectively, at a speed of 236 megabytes per second.

For eight threads, an execution speed of 580 processor cycles was obtained. Thus, for a 3.6 GHz processor, one thread is considered at a speed of 49 megabytes per second, and eight threads at a speed of 392 megabytes per second.

As the reader may notice, the code in example #3 has a throughput of 4 RTT, while the code in example #4 has a throughput of 8 RTT. In these examples, on SSE registers, the patterns are the same as when using RON, only the scheduler has reduced its efficiency. It now provides a 20% increase in duration for a 2x increase in code length.

Moreover, these results were obtained using the universal AVX commands available both in Intel processors, and in AMD processors. If you optimize for an AMD processor, the result will be much better. Sounds counter-trend, but it's true nonetheless, and here's why: AMD processors have an additional set of instructions, the so-called XOP extension, and in this additional set there are commands that greatly simplify the implementation of the GOST algorithm.

This refers to the commands of the logical packet shift of bytes and the packet cyclic shift of double words. The examples given in Appendices 3 and 4 use sequences of universal commands that implement the necessary transformation: in the first case, one "extra" command, and in the other case, four extra commands at once. So there are optimization reserves, and considerable ones.

If we are talking about further optimization, it is worth remembering the presence of 256-bit registers (YMM-registers), using which you can theoretically double the speed of calculations. But for now, this is only a prospect, at the moment, processors slow down a lot when executing 256-bit instructions (FPUs have a path width of 128 bits). Experiments have shown that on modern processors, a count of 16 threads on YMM registers does not give any gain. But this is only for now, on new processor models, the speed of 256-bit instructions will undoubtedly be increased, and then the use of 16 parallel threads will become expedient and will lead to an even greater increase in the speed of the cryptoprocedure.

Theoretically, you can count on a speed of 600-700 megabytes per second if the processor has two FPUs with a working path width of 256 bits each. In this case, we can talk about writing code with an efficiency of 16 RTT, and this is not fantasy, but the near future.

mixed mode

Again, the question of the number of registers arises, they are not enough to promote such an algorithm. But the hyper trading mode will help us. The processor core has a second set of registers available in logical processor mode. Therefore, we will execute the same code on two logical processors at once. In this mode, of course, we will not have more executive devices, but due to alternation, we can get a full load of all executive devices.

You can't count on an increase of 50% here, the bottleneck is the cache memory where technological masks are stored, but you can still get an increase of 100 additional megabytes. This option is not listed in the appendices (the macros are the same as those used in the 8 RTT code), but it is available in program files. So if anyone does not believe in the possibility of encryption at a speed of 500 megabytes per second on a single processor core, let him run the test files. There are also texts with comments so that no one thinks that I am cunning.

This focus is possible only on Intel processors, AMD has only two FPUs per two processor modules (analogous to the hyper-trading mode). But there are four more ALUs, which are a sin not to use.

You can drive the processor modules of the Bulldozer into a mode similar to the hyper trading mode, but run conversion to RON on different modules in one thread, and to SSE registers in another thread and get the same 12 RTT. I have not tested this option, but I think the code in 12 RTT will work more efficiently on AMD. Those who wish can try, test programs can be adjusted to work on the "Bulldozers" quite easily.

Who needs it?

A serious question, but with a simple answer - everyone needs it. Soon we will all sit down on the clouds, we will store both data and programs there, and there, oh, how you want to equip your own, private corner. To do this, you will have to encrypt traffic, and the speed of crypto conversion will be the main determining factor in comfortable work in the cloud. Our choice of encryption algorithm is small - either GOST or AES.

Moreover, oddly enough, the AES algorithm encryption built into the processors turns out to be much slower, tests show a speed of 100-150 megabytes per second, and this is with the hardware implementation of the algorithm! The problem lies in the single-threaded counting and the replacement block, which operates on bytes (a table of 256 rows). So GOST turns out to be more efficient in implementation on the x86 / 64 architecture, who would have thought ...

This is if we talk about the achieved level of encryption speed. And if we keep in mind the theoretical refinements in the field of improving the efficiency of the code, then most likely nobody needs it. There are practically no specialists of level 3–6 RTT, compilers generally generate code at the level of 1–2.5 RTT, and the majority of programmers do not know assembler, and if they know its spelling, they do not understand the device of a modern processor. And without this knowledge, what is assembler, what is some kind of SI-sharp - it doesn't matter.

But not everything is so sad: in the "bottom line" after a week of sleepless nights there is a new algorithm for the implementation of GOST, which is a sin not to patent. And applications for patents (as many as three) have already been drawn up and filed, so, gentlemen, businessmen, line up - women and children have a discount.

This algorithm is mandatory for use as an encryption algorithm in government organizations RF and a number of commercial ones.

Description of the algorithm

The scheme of the algorithm is shown in fig. 3.1. As you can see, the scheme of this algorithm is quite simple, which unambiguously simplifies its software or hardware implementation.

The GOST 28147-89 algorithm encrypts information in blocks of 64 bits, which are divided into two subblocks of 32 bits (N1 and N2). Subblock N1 is processed in a certain way, after which its value is added

with the subblock value N2 (addition is performed modulo 2), then the subblocks are swapped. Such a transformation is performed for a certain number of rounds: 16 or 32, depending on the mode of operation of the algorithm (described below). The following operations are performed in each round:

1. Key overlay. The content of the /VI subblock is added modulo 2 32 to the Kx part of the key.

The encryption key of the GOST 28147-89 algorithm has a dimension of 256 bits, and Kx is its 32-bit part, i.e. a 256-bit encryption key is represented as a concatenation of 32-bit subkeys (Fig. 3.2):

SH ATI, AG2, Yu, AG4, K5, Kb, K7.

During the encryption process, one of these subkeys is used, depending on the round number and the mode of operation of the algorithm.

Rice. 3.1. Scheme of the algorithm GOST 28147-

Rice. 3.2. Encryption key of the GOST 28147-89 algorithm

2. Tabular replacement. After overlaying the key, the /VI subblock is divided into 8 parts of 4 bits, the value of each of which is individually replaced in accordance with the replacement table for this part of the subblock. Table replacements (Substitution box, S-box) are often used in modern encryption algorithms, so it is worth considering them in more detail.

Table replacement is used in this way: a data block of a certain dimension (in this case, 4-bit) is fed to the input, the numeric representation of which determines the number of the output value. For example, we have an S-box of the following form:

4, 11, 2, 14, 15, 0, 8, 13, 3, 12, 9, 7, 5, 10, 6, 1.

Let the 4-bit block “0100” come to the input, i.e. the value is 4. According to the table, the output value will be 15, i.e. "1111" (0 is replaced by 4, 1 is replaced by 11, the value of 2 is not changed, etc.).

As you can see, the scheme of the algorithm is very simple, which means that the greatest burden of data encryption falls on the replacement tables. Unfortunately, the algorithm has the property that there are "weak" substitution tables, using which the algorithm can be revealed by cryptanalytic methods. Weak ones include, for example, a table in which the output is equal to the input:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15.

3. Bitwise cyclic left shift by 11 bits.

Algorithm Modes

The GOST 28147-89 algorithm has 4 modes of operation:

□ simple replacement mode;

□ gamma mode;

P gamming mode with feedback;

□ production mode of imitation attachments.

These modes are somewhat different from the generally accepted ones (described in Section 1.4), so it is worth considering them in more detail.

These modes have different purposes, but use the same encryption transformation described above.

Easy Swap Mode

In simple replacement mode, the 32 rounds described above are simply performed to encrypt each 64-bit block of information. 32-bit subkeys are used in the following sequence:

□ KO, Kl, K2, K3, K4, K5, Kb, AG7, KO, ATI, etc. - in rounds 1 to 24;

□ K1, Kb, K5, K4, K3, K2, K\, KO - in rounds 25 to 32.

Decryption in simple replacement mode is done in exactly the same way, but with a slightly different sequence of using subkeys:

□ KO, K\, K2, KZ, K4, K5, Kb, KP - in rounds 1 to 8;

□ KP, Kb, K5, K4, K3, K2, K\, KO, K1, Kb, etc. - in rounds 9 to 32.

Similar to the standard ECB mode, due to the separate encryption of blocks, the simple replacement mode is strongly not recommended for encrypting the actual data; it should only be used to encrypt other encryption keys in multi-key schemes.

Gamma Mode

In the gamma mode (Fig. 3.3), each block of the plaintext is added bit by bit modulo 2 to the gamma block of the cipher with a size of 64 bits. The cipher gamma is a special sequence that is generated using the transformations described above as follows:

1. In the registers N1 and N2, their initial filling is written - a 64-bit value, called a "sync message" (a sync message, in fact, is an analogue of the initialization vector in the CBC, CFB and OFB modes).

2. The contents of the /VI and N2 registers (in this case, sync messages) are encrypted in the simple replacement mode.

3. The contents of N1 are added modulo (2 32 - 1) with the constant CI = 2 24 + 2 16 + 2 8 + 4 , the result of the addition is written to the /VI register.

4. The content of N2 is added modulo 2 with the constant C2 = 2 24 + 2 16 + 2 8 +1, the result of the addition is written to register N2.

5. The contents of the /VI and N2 registers are output as a 64-bit cipher gamma block (ie, in this case, /VI and N2 form the first gamma block).

6. If the next gamma block is needed (i.e. the encryption or decryption needs to continue), return to step 2.

For decryption, a gamma is generated in the same way, then the XOR operation is again applied to the ciphertext and gamma bits.

To generate the same cipher gamma, the user decrypting the cryptogram must have the same key and the same sync message value that were used to encrypt the information. Otherwise, you will not be able to get the original text from the encrypted one.

In most implementations of the GOST 28147-89 algorithm, the sync message is not a secret element, but the sync message can be as secret as the encryption key. In this case, we can consider that the effective length of the algorithm key (256 bits) is increased by another 64 bits of the sync message, which can be considered as an additional key element.

Feedback gamma mode

In the feedback gamma mode, starting from the 2nd block, the /VI and L/2 registers are filled not with the previous gamma block, but with the result of encryption of the previous plaintext block (Fig. 3.4). The first block in this mode is generated in exactly the same way as the previous one.

Rice. 3.4. Generation of the cipher gamma in the gamma mode with feedback

Imitator generation mode

A spoof is a cryptographic checksum calculated using an encryption key and designed to check the integrity of messages. To calculate it, there is a special mode of the GOST 28147-89 algorithm.

The generation of the imitation prefix is performed as follows:

1. The first 64-bit block of information, for which the impersonator is calculated, is written into registers N1 and N2 and encrypted in the reduced simple replacement mode, in which the first 16 rounds out of 32 are performed.

2. The result obtained is summed modulo 2 with the next block of information, saving the result in N1 and N2.

3. M and N2 are again encrypted in the reduced mode of simple replacement, and so on until the last block of information.

A prefix is considered to be the 64-bit resulting contents of registers N1 and N2 or part of it. Most often, a 32-bit imitation prefix is used, that is, half of the contents of the registers. This is enough, because, like any checksum, the imitation prefix is intended primarily to protect against accidental distortion of information. To protect against deliberate modification of data, other cryptographic methods are used - primarily electronic digital signature(see section 1.1).

The imitation prefix is used as follows:

1. When encrypting any information, the plaintext imitator is calculated and sent along with the ciphertext.

2. After decoding, the imitation prefix is calculated again and compared with the one sent.

3. If the calculated and sent imitation prefixes do not match, the ciphertext was distorted during transmission or incorrect keys were used during decryption.

The imitation prefix is especially useful for verifying the correct decryption of key information when using multi-key schemes.

The imitation prefix is some analogue of the MAC message authentication code computed in CBC mode; the difference is that the prefix calculation does not use a sync message, while the MAC calculation uses an initialization vector.

Cryptographic strength of the algorithm

In 1994, the description of the GOST 28147-89 algorithm was translated into English and published; it was after this that the results of his analysis performed by foreign experts began to appear; however, no attacks approaching practicable were found for a significant amount of time.

□ large key length - 256 bits; together with the secret sync message, the effective key length increases to 320 bits;

□ 32 rounds of transformations; already after 8 rounds, the full effect of dispersion of the input data is achieved: changing one bit of the plaintext block will affect all bits of the ciphertext block, and vice versa, i.e. there is a multiple safety margin.

Consider the results of cryptanalysis of the GOST 28147-89 algorithm.

Analysis of substitution tables

Since substitution tables are not given in the standard, a number of works (for example, in) suggest that a "competent organization" can issue both "good" and "bad" substitution tables. However, the famous expert Bruce Schneier calls such assumptions "rumors". It is clear that the cryptographic strength of the algorithm largely depends on the properties of the substitution tables used, respectively, there are weak substitution tables (see above for an example), the use of which can simplify the algorithm's attack. Nevertheless, the possibility of using different substitution tables seems to be a very worthy idea, supported by the following two facts from the history of the DES encryption standard (see Section 3.15 for details):

□ attacks using both linear and differential cryptanalysis of the DES algorithm use specific features of substitution tables; when using other tables, cryptanalysis will have to start over;

□ Attempts have been made to strengthen DES against linear and differential cryptanalysis by using stronger substitution tables; such tables, which are indeed more stable, have been proposed, for example, in the s 5 DES algorithm; but, alas, it was impossible to replace DES with s 5 DES, since the replacement tables are rigidly defined in the standard, respectively, the implementations of the algorithm probably do not support the ability to change tables to others.

In a number of works (for example, , and ) it is erroneously concluded that the secret substitution tables of the GOST 28147-89 algorithm can be part of the key and increase its effective length (which is not significant, since the algorithm has a very large 256-bit key). However, the work proves that the secret substitution tables can be calculated using the following attack, which can be applied practically:

1. Set the null key and search for the "null vector", i.e. the value z = /(0), where /() is the round function of the algorithm. This stage takes about 2 encryption operations.

2. Using the zero vector, the values of the substitution tables are calculated, which takes no more than 2 11 operations.

Algorithm modifications and their analysis

In the work, a cryptanalysis of modifications of the GOST 28147-89 algorithm was carried out:

□ the GOST-H algorithm, in which, relative to the original algorithm, the order of using subkeys is changed, namely, in rounds from the 25th to the 32nd subkeys are used in direct order, i.e. in the same way as in the previous rounds of the algorithm ;

□ A 20-round GOST® algorithm that uses XOR instead of modulo 2 32 for key overlay.

Based on the results of the analysis, it was concluded that GOST-H and GOST© are weaker than the original GOST 28147-89 algorithm, since both have classes of weak keys. It is worth noting that in terms of GOST© cryptanalysis, the work repeats word for word the section on cryptanalysis of the GOST 28147-89 algorithm, published in 2000 in a well-known work (without any reference to the original). This calls into question the professionalism of the authors of the work and its other results.

A very interesting modification of the algorithm is proposed in the work: the tables S \ ... Ss must necessarily be different; in each round of the algorithm, they must be permuted according to a certain law. This permutation may be dependent on the encryption key, or may be secret (i.e., be part of a larger encryption key than the original 256-bit key). Both of these options, according to their authors, significantly increase the resistance of the algorithm against linear and differential cryptanalysis.

And one more modification related to substitution tables is given in the work, in which one of the possible methods for calculating substitution tables based on the encryption key is analyzed. The authors of the work concluded that such dependence weakens the algorithm, since it leads to the presence of weak keys and to some potential vulnerabilities of the algorithm.

Full-Round Algorithm Analysis

There are also attacks on the full-round GOST 28147-89 without any modifications. One of the first open works in which the analysis of the algorithm was carried out, a well-known work, is devoted to attacks that exploit weaknesses in the key expansion procedure of a number of well-known encryption algorithms. In particular, the full-round algorithm GOST 28147-89 can be broken using differential cryptanalysis on linked keys, but only if weak substitution tables are used. The 24-round version of the algorithm (which lacks the first 8 rounds) is opened in the same way for any substitution tables, but strong substitution tables (for example, given in ) make such an attack absolutely impractical.

Domestic scientists A. G. Rostovtsev and E. B. Makhovenko in 2001 proposed a fundamentally new method of cryptanalysis (according to the authors, it is significantly more effective than linear and differential cryptanalysis) by forming an objective function from a known plaintext corresponding to it ciphertext and the desired value of the key and finding its extremum corresponding to the true value of the key. They also found a large class of weak keys of the GOST 28147-89 algorithm, which allow you to open the algorithm using only 4 selected plaintexts and their corresponding ciphertexts with a fairly low complexity. The cryptanalysis of the algorithm is continued in the work.

In 2004, a group of experts from Korea proposed an attack with which, using differential cryptanalysis on associated keys, it is possible to obtain with a probability of 91.7% 12 bits of a secret key. The attack requires 235 chosen plaintexts and 236 encryption operations. As seen, this attack, is practically useless for the real opening of the algorithm.