Processing Queries on an Encrypted Database

From CryptoWiki
Jump to: navigation, search


Formulation of the problem

Currently, due to the growth of information systems and the requirements for their performance become especially popular cloud database. The model database as a service in the cloud (Database-as-a-service, DBaaS) provides the client access to a remote database, all maintenance tasks that (administration, scalability, backup, disaster recovery, etc.) Are carried out cloud providers [L10]. In this regard, the use of cloud-based databases requires a much lower cost than setting up and managing their own databases. Database as a service in the cloud is a cost-effective solution, but it has one major drawback: the need to ensure the security of information stored in the remote database. Cloud DBMS must ensure the protection of sensitive data from both external attacks and from the cloud service provider.

The solution to this problem

As a solution to this problem has been the model of a protected database, which is to ensure the security of information using a variety of cryptographic algorithms. Secure database must store confidential information in encrypted and never produce decryption operations, moreover, it must not contain the encryption key. All encryption and decryption operations must be performed on the client side of the cloud service. Currently, there are several projects dealing with the development of secure database: CryptDB, MONOMI, Cipherbase, TrustDB [PoReZeBa11], [HaIyLiMe02], [TuKaMaZe13], [CuJoPo11]. To perform operations on encrypted data are commonly used encryption algorithms, preserving order, and homomorphic encryption algorithms. The first type of algorithm allows you to sort and compare the encrypted data, the second - to perform arithmetic operations on them.

Secure database CryptDB

The best-known solution in the field of secure database is a project CryptDB, developed by a group of scientists at the Massachusetts University of Technology (MIT) [PoReZeBa11]. The model CryptDB the following cryptographic algorithms:

  • Random (RND) — provides the maximum security in CryptDB, but you cannot perform any operations on encrypted messages;
  • Deterministic (DET) — allows you to identify the equivalent outgoing messages, checking the equivalence of their respective encrypted messages;
  • Order-preserving encryption (OPE) — allows order relations between data items to be established based on their encrypted values, without revealing the data itself;
  • Join (JOIN and OPE-JOIN) — separate encryption scheme is necessary to allow equality joins between two columns, because we use different keys for DET to prevent cross-column correlations;
  • Word search (SEARCH) — allows you to search over encrypted data;
  • Homomorphic encryption (HOM) — allows you to perform arithmetic operations on encrypted data.

Encryption algorithms are listed in descending order of cryptographic strength, that is the greatest secrecy provides a probabilistic encryption, and the least - homomorphic.

The model CryptDB for efficient storage of encrypted data using the so-called "onion-like": each value in the table sequentially encrypted by different encryption algorithms in order of increasing secrecy (Figure 1). In other words, the value in the enveloping layers of encrypted messages, each new layer is more resistant than the previous one. For all layers using different encryption keys.

Figure 1. Onion encryption layers and the classes of computation they allow. Onion names stand for the operations they allow at some of their layers (Equality, Order, Search, and Addition). In practice, some onions or onion layers may be omitted, depending on column types or schema annotations provided by application developers. DET and JOIN are often merged into a single onion layer, since JOIN is a concatenation of DET and JOIN-ADJ. A random IV for RND, shared by the RND layers in Eq and Ord, is also stored for each data item

To process the request is necessary to decode all the outer layers of the encrypted data to the layer, allowing perform the requested operation, and has a maximum level of privacy. To do this, the client application sends to the server the corresponding encryption keys. For example, suppose the query is necessary to check the equality of two values. The bulbous structure shown in Figure 2, two coats: DET and OPE - contain encrypted messages, allowing to perform this operation. In order to ensure maximum privacy decryption happens to the level of DET. Note that the server is never executed decryption of the first layer (in this case a layer OPE), that is in an untrusted environment, decrypted data is never completely.

Figure 2. The principle of the onion model

According to this model, all the values in a secure database initially encrypted to the highest level of secrecy. It is expected that as a result of processing multiple requests to the server will remove unnecessary layers of encrypted messages, and further away the need for cryptographic operations. That is, the bulbous structure provides dynamic regulation of the level of secrecy. In use, built on this principle protected database reaches a state in which all values of encrypted cryptographic algorithms that let you perform common values for this type of operation.

Lack CryptDB model is that if for the same data received requests requiring execution of different operations, the resulting initial value most of the time will be encrypted with the encryption algorithms lower secrecy.

Secure IBM data base

In contrast to the model CryptDB, in the model of a protected database, developed by IBM, for storing secret data encryption is also used in addition to the safe codes that let you perform queries over encrypted data. For each relationship ОбЗапФор3.png cloud database server is stored encrypted attitude ОбЗапФор4.png where attribute corresponds ОбрЗапФор 4.png tuple relations ОбЗапФор12.png encrypted block encryption algorithm (DES or AES). Each attribute is an index ОбЗапФор5.png attribute ОбЗапФор6.png necessary to perform operations on encrypted data. Thus, a tuple ОбЗапФор7.png stored on a server as ОбЗапФор8.png. To calculate the index attribute ОбЗапФор6.png used mapping function ОбЗапФор9.png. The project uses two types of mapping functions: preserving the order of the original values and probability. Using the index allows for safe operation of sorting and comparing data (by order-preserving mapping functions), as well as the effective implementation of collection queries for one or more parameters. Options sampling encrypted using mapping functions, the values obtained are compared with the indices stored in a secure database, on the basis of a match, the encrypted ОбЗапФор10.png relationships that are sent to the client machine, which stands ОбЗапФор11.png attribute, and the result set is generated [HaIyLiMe02].

The disadvantage of this method of storage of encrypted data is that the result of execution of the request is excessive: the client machine stands completely full motorcade, and not just the attributes defined by the query.

Secure database MONOMI

MONOMI is a system for securely executing analytical workloads over sensitive data on an untrusted database server. MONOMI works by encrypting the entire database and running queries over the encrypted data. MONOMI introduces split client/server query execution, which can execute arbitrarily complex queries over encrypted data, as well as several techniques that improve performance for such workloads, including per-row precomputation, space-efficient encryption, grouped homomorphic addition, and pre-filtering. Since these optimizations are good for some queries but not others, MONOMI introduces a designer for choosing an efficient physical design at the server for a given workload, and a planner to choose an efficient execution plan for a given query at runtime. A prototype of MONOMI running on top of Postgres can execute most of the queries from the TPC-H benchmark with a median overhead of only 1.24× (ranging from 1.03× to 2.33×) compared to an un-encrypted Postgres database where a compromised server would reveal all data [TuKaMaZe13].


In order to execute queries that cannot be computed on the server alone, MONOMI partitions the execution of each query across the untrusted server, which has access only to encrypted data, and a trusted client machine, which has access to the decryption keys necessary to decrypt the encrypted data. Consider TPC-H query 11, as shown in Figure 2. This query requires checking whether a SUM() for each group is greater than a sub-select expression that computes its own SUM() and multiplies it by a constant. MONOMI’s encryption schemes cannot support such queries directly over encrypted data, as described in the previous section, because addition and comparison involve incompatible encryption schemes, and because no efficient encryption scheme allows multiplication of two encrypted values.

Figure 3. TPC-H query 11, with join clauses omitted for brevity

To answer such queries, MONOMI partitions the execution of the query between the client and the server, by constructing a SQL operator tree for the query, consisting of regular SQL operators and decryption operators that execute on the client, as well as RemoteSQL operators that execute a SQL statement over encrypted data on the untrusted server. For example, Figure 4 shows a potential split query plan for executing TPC-H query 11. The general algorithm for computing a split query plan is presented in Algorithm 1. This algorithm can handle arbitrary SQL queries, but we explain its operation using TPC-H query 11 as an example, as follows [TuKaMaZe13].

Lines 6–13 try to find a way to execute the WHERE n_name=:1 clause on the server. The REWRITESERVER(expr,E, enctype) function returns an expression for computing the value of expr at the server, encrypted with enctype, given a set of encrypted columns E. For enctype=PLAIN, REWRITESERVER produces an expression which generates the plaintext value of expr. In our example, this translates n_name=:1 to n_name_DET=encrypt(:1), which producesthe same (plaintext) boolean value without revealing n_name.

Lines 14–18 try to move the GROUP BY clause to the server, by using REWRITESERVER to find a deterministic encryption of the group keys (in our example, ps_partkey) by passing in a DET enctype. For our example, REWRITESERVER returns ps_partkey_DET, which is placed into RemoteQ, the query that will run on the server.

Lines 22–26 try to push the HAVING clause to the server, assuming that the GROUP BY can be performed on the server. Since in our example the HAVING clause cannot be computed at the server, REWRITESERVER returns Nil. To execute the HAVING clause on the client, line 26 uses the EXPRS(expr) helper function to determine what sub-expressions can be computed on the server in order to then compute the entire expr on the client, and adds those expressions to the list of projections fetched by RemoteQ. Since the HAVING clause involves a sub-select, EXPRS recursively calls back to GENERATEQUERYPLAN, which determines how to run the sub-select on the server. This eventually returns the RemoteSQL, LocalDecrypt, and LocalProjection operators shown in the lower right branch of Figure 4.

Lines 32–37 determine how to best fetch the projections from the server, passing the ANY enctype to REWRITESERVER since any encryption scheme would suffice. In our example, this translates ps_partkey to ps_partkey_DET, and SUM(..) into GROUP(precomp_DET). Here, we assume the server has a precomputed deterministic encryption of ps_supplycost*ps_availqty, denoted by precomp_DET, but does not have its homomorphic encryption (as specified in the E argument). Thus, the GROUP() operator concatenates all values from the group, and the SUM() will be computed at the client. The planner may choose this set of encryption schemes E if computing the SUM() at the client is faster than decrypting a homomorphic ciphertext. Finally, the algorithm constructs the local part of the query plan.

Line 38 constructs the RemoteSQL operator coupled with a LocalDecrypt operator, as shown in the two branches in Figure 4.

Lines 39–40 apply any remaining WHERE or JOIN clauses. Lines 41– 44 add operators for client-side GROUP BY or HAVING operations.

Figure 4. Example split query plan for TPC-H query 11. “Local” operators run on the trusted client, and “Remote” operators run on the untrusted server. GROUP() denotes the concatenation of all values from each GROUP BY group, 0xabcdef denotes the deterministic encryption of the argument :1, and precomp_DET denotes the deterministic encryption of the precomputed expression ps_supplycost * ps_availqty. $n refers to the n-th column of the child operator
Algorithm 1. Pseudo-code for GENERATEQUERYPLAN. For brevity, the pseudo-code assumes that the query is a SELECT statement and does not have ORDER BY and LIMIT components. The pseudo-code also does not keep track of the positions of projections necessary to reconstruct expressions on the client

Secure database Microsoft Cipherbase

Figure 5 presents the Cipherbase architecture. An application interacts with Cipherbase through the Cipherbase client module. The Cipherbase client presents a plaintext database interface and hides data confidentiality details from the application. This client performs query encryption before sending the request to the Cipherbase server and decrypts returning results before forwarding them to the application. To implement this functionality, the Cipherbase client takes the encryption schema (see Section II) and the client encryption key as input. The Cipherbase server consists of the trusted module (TM) and a modified (SQL Server) database system (the untrusted module, UM or UMDBMS). A client application interacts with Cipherbase just as it would with a standard database system: using ad-hoc queries, stored procedures, and SQL scripts. A client library (Cipherbase client) mediates between the application and the Cipherbase server, providing the application with a transparent interface. The data confidentiality requirement for each column can be independently specified using a column-level encryption scheme. Informally, Cipherbase ensures that whenever a value (or derived value) is in the cloud, outside of the secure hardware module, it is encrypted using the specified (or stronger) encryption scheme. The supported encryption schemes are vetted standards: (1) strong encryption, which is non-deterministic (meaning that multiple encryptions of the same plaintext will produce different ciphertext). This provides strong data-at-rest security—formally, indistinguishability under chosen-plaintext attack (IND-CPA). It is implemented in Cipherbase with AES in CTR mode; (2) deterministic encryption, which always generates the same ciphertext from a given plaintext value. This allows plaintext equality to be checked without first decrypting by directly comparing ciphertext values—useful for operations such as equality filters, joins, and grouping. It is implemented in Cipherbase with AES in ECB mode and format-preserving encryption [ArEgJoKaKoRa15].

Figure 5. Cipherbase Architecture



Move to bibliography for "Processing Queries on an Encrypted Database" article.

Move to article "Processing Queries on an Encrypted Database" in Russian.

Move to Main Page

Arsentiev I.A., 2015