Lightweight stream ciphers

A stream cipher — is a symmetric cipher in which each character of plaintext is transformed into a symbol of the ciphertext, where this process depends not only on the used key, but also on its position in the flow of the plaintext. A stream cipher uses a different approach to symmetric encryption, rather than block ciphers.

Right below is shown a list of discussed in this article lightweight cryptographic algorithms based on stream cipher and the basic mechanisms of implementation. But before we get to the description, you must familiarize yourself with the basic realizations, which are used in hardware: such as LFSR and FCSR.

This article is one the divided articles of the entry called "Lightweight ciphers"; so reading it has a recommendation character for the deeper understanding of the criteria of this kind of algorithms, the spheres of its' implementations as well as their pros and cons.

LFSR

Linear feedback shift register — a shift register of bit words, in which the value of the input bits equals the linear boolean function of the values of the remaining bits to the shift register. It can be organized both software and hardware. Suitable for generation pseudorandom sequences of bits, which is used, inter alia, in cryptography.

Principle of work

Figure 1. Linear feedback shift register

During each tact linear feedback shift register performs the following operations:

• bit, located in the cell L-1, is read; It is the next bit in output sequence;
• feedback function computes a new value for the cell 0 from the current values ​​of the cells;
• the contents of each i-cell is moved to the next cell i+1 where i = 0, 1, ..., L-2;
• 0 is written into the cell bits, previously calculated by feedback function.

If the feedback function performs a logical operation «XOR», bit values ​​of cells can be calculated by the following formulas:

FCSR

The idea of ​​using a feedback with carry shift register (FCSR) to create a stream filter was firstly proposed by Clapper and Goresky in 1994. Later, they developed an algorithm for this cipher. One FCSR without connecting the linear component can not be used as a stream cipher, as can be easily decrypted. In 2002 was offered self-synchronize stream cipher based on sharing and FCSR and LFSR. Later, it was subjected to an attack with a choice of ciphertext. In 2005, Arnaud and Berger have proposed the idea of ​​sharing FCSR and line filter for stream cipher, called F-FCSR (Filtered FCSR). Later they've offered 4 algorithms implementing this idea: F-FCSR-SF1, F-FCSR-SF, F-FCSR-DF1 and F-FCSR-DF8. The first two of them use the static filters, the last - filter, depending on the key. Later it was revealed the weakness of all these algorithms to the different types of attacks. In 2005, Terry Berger, François Arnol and Cedric Larad proposed two ciphers based on the F-FCSR for the contest eSTREAM: F-FCSR-H for hardware implementation and F-FCSR-8 program. As a result, subsequent tests for original versions of F-FCSR-H and F-FCSR-8 found vulnerabilities in them, which were later fixed in version F-FCSR-H v.2 and F-FCSR-16. An improved version of F-FCSR-H v.2 was a finalist of eSTREAM. But after the discovery of vulnerability was excluded from eSTREAM Portfolio (rev.1).

Principle of work

Figure 2. Configuration Galois for FCSR

FCSR has two implementions: the Galois and Fibonacci. The F-FCSR Galois configuration is used as it is more effective. We introduce the following notations:

1. q — the integrity of the connection (connection integer) - negative odd integer satisfying the following conditions:
• T = (| q | - 1) / 2 — simple, 2T — the period of the bit sequence p / q
• The number of units in the binary representation of (1 - q) / 2 with order n / 2
2. p — parameter depending on the key, such as 0 < p < | q |
3. n — the size of the main register FCSR, | q | in binary notation has n + 1 characters: 2 n </ sup> <-q <2 n + 1 </ sup>
4. d: d = (1 - q) / 2, in binary notation , d i = {0, 1}, d n-1 = 1
5. M — n-bit main register, the value of its i-th digit, .
6. C — l-bit shift register, l + 1 — the number of units in binary notation d, .
7. (m, c) - state FCSR

If (m, c) — FCSR state at the time t 0, , then — binary representation of p / q, where p = m + 2c.

The Grain family of stream ciphers

Grain is notable for its extremely small hardware representation. During the initial phase of the eSTREAM project, the original version, Grain v0, was strengthened after some observations by Berbain. The final version is known as Grain v1.

Like the other portfolio ciphers, Grain v1 is modern in the sense that it allows for public IVs, yet they only use 80-bit keys. Recognizing the emerging need for 128-bit keys, Hell proposed Grain-128 supporting 128-bit keys and 96-bit IVs. The design is akin to that of 80-bit Grain, but notably, the nonlinear parts of the cipher have smaller degrees than their counterparts in Grain v1.

The general idea of Grain with key size |k| is to use an NFSR and an LFSR, each of size |k|, and an output function which uses state material from both registers. The LFSR feeds into the NFSR. The NFSR is loaded with the key, and the LFSR is loaded with the IV, padded with a constant. After key-and-IV-loading, but before keystream generation, the state is initialized in 2 · |k| state updates where the suppressed output is fed back to the LFSR. After initialization, keystream generation begins.

Implementations of any keystream generator matching the above description can be made faster by implementing the state update and output function several times in parallel. One particular feature of the Grain ciphers is that newly produced bits in the registers are not used for several clockings. This means that no recursion is necessary to implement the parallel functions.

Grain v1

The stream cipher Grain was designed by Martin Hell, Thomas Johansson, and Willi Meier. Their main goal was to design an algorithm which is very easy to implement in hardware and requires only small chip area. It is a bit-oriented synchronous stream cipher which means that the key stream is generated independently from the plaintext. In general, a stream cipher consists of two phases.

The first phase is the initialization of the internal state using the secret key and the IV. After that, the state is repeatedly updated and hence used to generate key-stream bits. The main elements of the stream cipher Grain are two 80-bit shift registers where one has a linear feedback (LFSR) and the other a nonlinear feedback (NFSR). The key size is specified with 80 bits and additionally an initial value of 64 bits is required. Unfortunately, the initial version of Grain (Version 0) had a weakness in the output function which was discovered during the first evaluation phase. This paper uses Grain (Version 1) for implementation which solves the security issues of the initial version.

The basic structure of the algorithm can be seen in Figure 3. Two polynomials of degree 80, f(x) and g(x), are used as feedback function for the feedback shift registers LFSR and NFSR. The output function h(x) uses as input selected bits from both feedback shift registers. Additionally, seven bits of NFSR are XORed together and the result is added to the function h(x). This output is used during the initialization phase as additional feedback to LFSR and NFSR (grey lines in the figure indicate this feedback). During normal operation this value is used as key stream output.

Figure 3. Scheme of stream cipher Grain v1

The hardware module of Grain was implemented with a 16-bit AMBA APB interface in a 0.35 μm CMOS process technology. This interface fits to the 16-bit datapath architecture. The reason for implementing a 16-bit word size was the low-power design approach as presented in Section 2. The details of the datapath are shown in Figure 4. It can be seen that the feedback shift registers NFSR and LFSR shift 16 bits per clock cycle. Only a single register is clocked at the same point in time via clock gating which eases the input of the key and the initial value because the same 16 input wires are connected to all registers. Additionally, it reduces the mean power consumption significantly. This comes at the expense of having a temporary register which stores intermediate results. Additionally, all combinational circuits like the feedback functions g function, f function, and h function have to be implemented in radix 16. The inputs of these functions are selected bits from the registers and are not shown in detail in this figure. The h function in the datapath description also includes the XORoperations of the output function in the algorithm description. The output of the module is registered and instead of the key stream the encrypted result of the data input is stored in the register. Instead of a multiplexer that selects the correct feedback function for the temporary register, AND gates are used to enable and disable the appropriate inputs. Producing a 16-bit encryption result after initialization requires 13 clock cycles.

Figure 4. Datapath of Grain v1

Right below is shown a table with main values of components of steam cipher Grain v1 datapath in GE.

Table 1.Components of Grain v1 implementation
NFSR + LFSR (160 бит)
1,275
1,130
Temporary register
0
85
Output register
50
150
Combinational logic and misc.
315
1,835
Controller (FSM)
120
160
Total
1,760
3,360

Trivium

The developers of the stream cipher Trivium are Christophe De Canni`ere and Bart Preneel. This hardware-oriented synchronous stream cipher was designed to explore how simply a stream cipher could be designed without sacrificing its security. It is possible to generate up to 2^64 bits of key stream from an 80-bit key and an initial value (IV) of 80 bits. The state consists of 288 bits which are denoted as S0, S2, ..., S287. The algorithm uses 15 specific bits of the state to generate three variables which are used to update the state and which produce one bit of the output. During the initialization, which is not shown in the figure, the key and the IV are loaded to the state and the same update function is applied 1,152 times without using the output for the key stream. In the algorithm description, N is used for the number of output bits of the stream cipher.

The implementation of the Trivium module has the same 16-bit AMBA APB interface as the one for Grain. Implementing a radix-16 datapath is also motivated by the low-power design technique. Figure 5 shows the details of the architecture. The boxes denoted with comb are the combinational logic elements of the algorithm that are used for updating the state according to the algorithm specification. The 288 flip-flops for the state are separated in 16-bit registers. Additionally, two temporary registers are necessary which store intermediate results. The output register is again used for directly applying the XOR operation of the key stream with the input value. Again, clock gating is used to only clock one register per clock cycle. During initialization, the key, the IV, and the constants are loaded into the registers. Then the combinational circuit is used to update the registers in a kind of pipeline where the temporary registers are used to prevent overwriting of needed values. The generation of a 16-bit key stream after the initialization phase requires 22 clock cycles.

Figure 5. Datapath of Trivium

Right below is shown a table with main values of components of steam cipher Trivium datapath in GE.

Table 2. Components of Trivium implementation
State registers (288 bits)
1,840
2,040
Temporary registers
0
200
Output register
50
150
Combinational and misc.
290
410
Controller (FSM)
210
290
Total
2,390
3,090

Bean

BEAN is a stream cipher that has some superficial similarities with Grain: the NFSR and the LFSR are replaced by two FCSRs. There is a sound motivation behind this idea since the LFSR in Grain is used to provide large period and to guarantee random-like properties while the NFSR is used to provide nonlinearity. An FCSR combines both these properties, while still being efficient in hardware. Thus, BEAN has two shift register components, both providing nonlinearity, large period and random-looking sequences. There have been several FCSR-based stream ciphers. One notable design is F-FCSR-H which was selected for the final portfolio in the eSTREAM project. The hardware performance is good and the design is very simple in that it only uses a linear filter together with one FCSR. The performance of, and interest in, the F-FCSR-H stream cipher made it evident that FCSRs are attractive building blocks for stream ciphers. Although the attack on FCSRs in Galois architecture has been successful, BEAN uses Fibonacci FCSRs.

Right below in figure 6 is shown an architecture of stream cipher Bean.

Figure 6. Architecture of stream cipher Bean

Keystream generation

Both FCSRs are 80 bits in size, i.e., the same as the key size. The state of FCSR-I at time instance i is denoted by and correspondingly the state of FCSR-II at time instance i by :

FCSR-I is updated according to:

Similarly, FCSR-II is updated as:

A Boolean function is used to produce the keystream. This is given as a 6-to-4-bit Sbox in the original description but as only one bit from each word is taken as output, it is easier to analyze it if it is considered as a 6-to-1 Boolean function. The keystream bits are then given by:

The algebraic normal form of is:

Hummingbird

Different from existing lightweight cryptographic primitives which are either block ciphers or stream ciphers, Hummingbird is an elegant combination of the above two cipher structures with 16-bit block size, 256-bit key size, and 80-bit internal state. The size of the key and the internal state of Hummingbird provide a security level which is adequate for many embedded applications. A top-level structure of the Hummingbird cryptographic algorithm is shown in Figure 7.

Figure 7. A Top-Level Description of the Hummingbird Cryptographic Algorithm

The overall structure of the Hummingbird encryption algorithm consists of four 16-bit block ciphers - four 16-bit internal state registers RS1, RS2, RS3 and RS4, and a 16-stage LFSR. The 256-bit secret key K is divided into four 64-bit subkeys k1,k2,k3 and k4 which are used in the four block ciphers, respectively. A 16-bit plaintext block is encrypted by first executing a modulo addition of and the content of the first internal state register RS1. The result of the addition is then encrypted by the first block cipher . This procedure is repeated in a similar manner for another three times and the output of is the corresponding ciphertext . Furthermore, the states of the four internal state registers will also be updated in an unpredictable way based on their current states, the outputs of the first three block ciphers, and the state of the LFSR. The decryption process follows the similar pattern as the encryption.

When using Hummingbird in practice, four 16-bit random nonce are first chosen to initialize the four internal state registers (i = 1, 2, 3, 4), respectively, followed by four consecutive encryptions on the message (RS1+ RS3)mod2 by Hummingbird running in the initialization mode. The final 16-bit ciphertext TV is used to initialize the LFSR. Moreover, the 13th bit of the LFSR is always set to prevent a zero register. The LFSR is also stepped once before it is used to update the internal state register RS3.

Four identical 16-bit block ciphers are employed in a consecutive manner in the Hummingbird encryption scheme. The 16-bit block cipher is a typical substitution- permutation (SP) network with 16-bit block size and 64-bit key. It consists of four regular rounds and a final round that only includes the key mixing and the S-box substitution steps. Like any other SP network, one regular round comprises of three stages: a key mixing step, a substitution layer, and a permutation layer. For the key mixing, a simple exclusive-OR operation is used in this 16-bit block cipher for efficient implementation in both software and hardware.

Bibliography

Kouznetsov N.P., 2015