# Cryptographic hash functions

A **cryptographic hash function** is a hash function that takes an arbitrary block of data and returns a fixed-size bit string, the cryptographic hash value, such that any (accidental or intentional) change to the data will (with very high probability) change the hash value. The data to be encoded are often called the message, and the hash value is sometimes called the message digest or simply digest.

## Contents |

## About hash funtions

Hash function for a long time used in computer science are functions, mathematical or otherwise, which receive the input variable-length string (called a prototype ) and convert it into a fixed string, usually smaller length (called a hash value ). As a simple hash function can be considered a function that takes the inverse image and returns a byte representing the XOR of all the input bytes . The meaning of the hash function is to obtain a characteristic trait preimage values on which analyzes various prototypes of the inverse problem. Since usually the hash function is the ratio of " many to one " , it is impossible to say with all ficiently that the two strings are the same , but they can be used to give a reasonable estimate of the accuracy . One-way hash function - a hash function that works only in one direction. Easy to calculate the hash value on a prototype , but it is difficult to create a prototype , the hash value is equal to a given value . Mentioned earlier hash function , generally speaking, are not unidirectional : setting a specific byte , it is not difficult to create a string of bytes , XOR which gives a predetermined value. With one-way hash function this is not possible . A hash function is open, the secrets of its calculation does not exist. Safety one-way hash function is in its pointedness . At the exit, there is no visible depending on the input . Changing single bit leads to a change in prototype (average) half bits of the hash value is known as an avalanche effect . Computationally infeasible to find the inverse image corresponding to the specified hash value.

## Requirements

Hash function *H* is considered cryptographically strong, if it meets three basic requirements on which the majority of hash functions in cryptography are based:

- Irreversibility or durability to the restoration of the inverse image: for a given hash value of
*m*must be computationally infeasible to find the data block*X*, for which*H(X)=m*. - Collision resistance of first type or a preimage resistance: for a given message
*M*must be computationally infeasible to find another message*N*, for which*H(N)=H(M)*. - Collision resistance of second type of a second-preimage resistance: it should be computationally impossible to find a pair of messages
*(M, M')*, which have the same hash.

These requirements are not independent:

- Reversible function is unstable to collisions of first and second kind .
- Function, which is not resistant to the collisions of the first kind, is not resistant to collisions of the second kind, but not vice versa.

To tell the truth, there is no prove of existence of one-way hash function for which calculation of an input from a given hash value is theoretically impossible. Usually, finding the input is only computationally challenging.

Friendly "birthdays" attack allows find a collision of the n bits hash function average for about *2 ^{ n/2}* hash computations in average. Therefore,

*n*- bit hash function is considered cryptographically strong if the computational complexity of finding collisions is close to

*2*.

^{ n/2}It is also important that the slightest change in the argument value of hash function cause significant changes of output (avalanche effect). In particular, the hash value should not leak information even about specific bits of the argument.

## Design principles

### Iterative sequential scheme

Generally, the basis for constructing a hash function is an iterative sequential circuit. Core algorithm is contractive function - converting input *k* in *n* output bits, where *n* - number of output bits hash function, and *k* - arbitrary number greater *n* . Compressing function must satisfy all the conditions of the reliability.

The input stream is divided into blocks by *(k-n)* bits. The algorithm uses a temporary n bit size variable with random number as initial value. Each next data block is combined with the output value of the compressing function of the previous iteration. Hash value is output n bits of the last iteration. Each bit of the output value of the hash function based on the entire input data stream and the initial value. This way avalanche effect is achieved.

There is a problem with the size of the input data stream when designing a hash function based on an iterative scheme. The size of the input data stream must be a multiple *(k-n)* . Typically, before algorithm execution data is expanded by predefined way.

In addition to single pass algorithms multi pass algorithms exists and they have enhanced avalanche effect. In this case, data is first processed, and then expanded to the required size.

### Contractive function based on a symmetric block algorithm.

Symmetric block cipher algorithm can be used as the compression function. For greater security data block intended for hashing at this iteration can be used as the key and the previous results of compressing function as the input. Then the final iteration is the output of the algorithm. In this case, the security hash function based on the security algorithm used.

Usually hash function uses more complex system. Generalized scheme of symmetric block encryption algorithm is shown in Fig.2

Thus, we obtain 64 variant for constructing compression function. Most of them are either trivial or unsafe. Below four most secure schemes for all types of attacks are shown.

The main disadvantage of hash functions designed on the basis of block algorithms is low speed. Needed cryptographic integrity can be provided with few operations on the input data. There are faster hashing algorithms designed independently from scratch, based on the requirements of the reliability (the most common ones - MD5, SHA- 1, SHA- 2 and GOST R 34.11-94).

## Применения

### Digital signature

Digital signature is actually message encrypted with public key algorithm. Text, encrypted with a secret key is combined with the original message. Then checking the signature is decryption the message with public key and if the resulting text is similar to the original text - the signature is valid.

Hash function usually used to optimize the algorithm. Prover encrypts not all the message but hash of it. This method provides the following advantages:

- Lower computational complexity. Typically, the document is much bigger than its hash.
- Increased reliability. Cryptanalyst using the public key can not compute signature of the message, but only of its hash.
- Ensuring compatibility. Most algorithms operates with bit strings of data, but some use other representations. Hash function can be used to convert an arbitrary input text into a suitable format.

### Checking password

Checking password

In most cases passwords are not stored on the target objects but stored only their hash values. Saving passwords themselves is impractical because in the case of unauthorized access to the password file attacker gets them in open and ready to use form, and while stored hash values, he learns only hashes that are not reversible into the original data. During the authentication, hash value of entered password is computed and compared with the stored hash value.

For example, OS GNU / Linux and Microsoft Windows keeps only password hashes of user accounts.

This system involves transmission of messages over a secure channel, i.e. channel intruder cannot intercept or send his messages. Otherwise, he can intercept the hash value of the password and use it to further illegal authentication. Defending against such attacks can be managed by the method of "triple handshake".

Suppose a client named Alice, produces authentication by passphrase, “pass”, on a certain server. The server stores the hash value H (pass, R2), where R2 - pseudo-random, pre-selected number. The client sends a request (Alice, R1), where R1 – pseudo-random, each time new, number. In response, the server sends the value of R2. The client calculates a hash value H (R1, H (pass, R2)) and sends it to the server. The server also computes the value of H (R1, H (pass, R2)) and compares it with the received. If the values match - authentication is correct.

In this situation, the password is not stored on the server and cryptanalyst cannot recover the password even intercepting messages between the client and server. Due using random values, transmitted hash is different each time.

## Random oracle model

Let`s recall mixing property, which is inherent to hash function: for any argument hash is indistinguishable from a uniformly distributed bit string in the range of functions. If we replace last phrase by "belongs to a uniformly distributed random variables", we get a very powerful hypothetical function called random oracle (random oracle). It has three properties: determinism, efficiency, uniformity of distribution of the resulting values. However, none known computational models correspond the random oracle model in varying degrees. Uniformity and determinacy of values, computed by random oracle, means that the entropy of the results is higher than the entropy the input. Since the mixing property, which has hash function is only a computational nature hypothesis, the real hash function should only provide computational indistinguishability, i.e. the results should have a certain probability distribution in the range of values that can not be determined in polynomial time. This way, the real hash function only mimic the behavior of the random oracle, albeit with high accuracy.

## Attack based on the birthday paradox

Assume that the hash function H is random oracle. It is assumed when using attack on the square root (attack based on the birthday paradox) that for collision detection with non-zero probability it is sufficient to perform 2 to the power | h | / 2 computations of hash function from random values . For organizing the attack based on the birthday paradox, the attacker must generate a pairs of "cleartext - hash value" until two messages m and m`, m is not equal to m`, h (m) = h (m`), appears. This pair of messages called a collision of hash function H. For example, for function SHA- 1 condition | h | = 160, and hence its resistance based on the birthday paradox is estimated at 2 ^{80}.

## Глоссарий

- Атака «дней рождения»
- Аутентификация
- Бит
- Восстановление прообраза
- Вычислительная сложность
- Ключи в криптографии
- Коллизия хэш-функции
- Криптостойкость
- Лавинный эффект
- Хэш-функция
- Электронная цифровая подпись

## Литература

Move to bibliography for this section.

Составители

**Душа И. / Аверичев Р.**