Proof of data possession
Proof of data possession protocols (PDP) provide the integrity of test data stored on remote servers, and provide a number of requirements: easy and reliable test, which is based on various assumptions, security and computational efficiency.
Relevance of the use of Proof of data possession protocols
Modern technological advances have led to an increase in the popularity of the cloud storage and services. Digital content grows continuously, and therefore there grow the demand for data storage. The remote storage systems is growing interest on the part of users and companies, since they increase the cost-effectiveness of the existing architecture. These solutions support the transmission, remote storage and data-intensive. However, these forward-looking at first glance, in the design the cloud services generate a large number of complex issues, mostly related to the loss of end-user control of the basic properties of information in the process of remote processing. These issues are directly related to the state security of the cloud system. Critical security problems associate with the storage of sensitive data in the cloud services, with users required proof of safety of services that enable trusted providers of commercial cloud services. As storage-outsourcing services and resource-sharing networks have become popular, the problem of efficiently proving the integrity of data stored at untrusted servers has received increased attention. In the provable data possession (PDP) model, the client preprocesses the data and then sends it to an untrusted server for storage, while keeping a small amount of meta-data. The client later asks the server to prove that the stored data has not been tampered with or deleted (without downloading the actual data). However, the original PDP scheme applies only to static (or append-only) files. One of the most acute problems associated with the storage of data from remote services is the problem of ensuring the integrity and availability of data to untrusted systems. Typically, providers of cloud servers argue that storing user files made redundant, sufficient to provide protection from data loss. Most systems used solutions providing distributed data storage fragments. The distribution of data across multiple independent software and hardware platform provides resistance against the problems associated with the physical violation of the integrity of computing and disk resources (hardware problems, breakdowns, natural disasters and others). However, in order to reduce operating costs and keep a place in the "cloud" deliberately dishonest suppliers may neglect the replication procedures that can lead to permanent damage or data loss. The attacks of various kinds by third parties can also lead to data leakage. Consequently, the client who uses the cloud service, must have an effective way to periodically check the basic properties of information stored in the service, without having to download the data to a computer. Worry client increase its limited storage capabilities, relatively low processing power of client equipment and large amounts of information. The solution may be a deterministic protocol PDP, which, in turn, have a number of advantages:
Firstly, such protocols supported by the public that the data owners frees the necessity of periodic inspection.
Second, it provide a constant complex switching in which messages are exchanged between the storage server and the client.
Third, it's effective and safe, so it used the approach to ensure the confidentiality of the data is resistant to attack and eliminate the leak-tested data.
Proof of data possession is a protocol that allows the client to verify the data file stored on a remote server cloud, the availability of the original version. The client and server exchange messages according to the model Request - Answer.
Protocol PDP consists of four treatments: pre-process, inquiry, confirmation, check. The client C (data owner) preprocesses the file, generating a small piece of metadata that is stored locally, transmits the file to the server S, and may delete its local copy. The server stores the file and responds to challenges issued by the client. To verify ownership of a file, the client sends a random challenge to the server to check for evidence of the specified file. The server generates a proof in response. This calculation requires the possession of the original data and the data of the current request, to avoid repeated attacks. Upon receipt, the client compares the evidence with a locally stored file metadata.
PDP-protocols are widely analyzed in two categories, depending on the role of inspection in the protocol: private check (private verifiability), where only the data owner can check the data on the server and multiple (public) verification (public verifiability), where any competent authority can perform verification procedure.
The simplest PDP based on a hash function
The simplest solution for the development of the PDP is hash function-based way. The client pre-computes a k random calls and calculates the relevant evidence . During the procedure, the client sends a request server, which computes . If the comparison is made, the client assumes that the server stores the file with the correct data. This solution has the disadvantage - the client can verify the authenticity of the files on the server only k times.
Requirements for PDP-protocol
1. Public verifiability: the public data possession verification is an important requirement, permitting any authorized entity to verify the correctness of outsourced data. Thus, the data owner can be relieved from the burden of storage and computation. 2. Stateless verification: proofs should be generated according to a randomly produced challenge. Thus, stateless verification requires the use of unpredictable values. 3. Low computation overhead: on one hand, for scalability reasons, the amount of computation at the cloud storage server should be also minimized, as the server may be involved in concurrent interactions. On the other hand, the proposed algorithms should also have low processing complexity, at the client side. 4. Low communication overhead: an efficient PDP should minimize the usage of bandwidth, relying on low communication cost. 5. Low storage cost: the limited storage capacities of the user devices has a critical importance in designing our solution. So, low storage cost at the client side is highly recommended. 6. Unlimited challenges: the number of challenges should be unlimited. This condition is considered as important to the efficiency of a PDP scheme.
Probabilistic proof of ownership of the data
The protocol is based on the generation of probabilistic evidence of ownership of random sample sets of server units, which significantly reduces costs. The client maintains a constant amount of metadata to test the evidence. Request / response protocols transmit a small, constant amount of data which minimizes communication costs. The protoсol PDP for remote data monitoring is light weight and supports large data sets in a distributed storage system. Experiments using this implementation checked practicality PDP. It's performance is limited more by hardware rather than cryptographic computations.
Formulation of the problem. The client C wants to store on the server S a file F which is a finite ordered collection of F blocks: .We denote the output x of an algorithm A by . We denote by |x| the length of x (in bits). Homomorphic Verifiable Tags (HVTs). Lets introduce the concept of homomorphic checked the tag that will be used as a building blocks for systems PDP.
For the message b (out of a block of a file) will be denoted by its homomorphic checked tag. Tags will be stored on the server with the file F. Homomorphic tags checks serve to check the metadata file blocks. It’s genuine and have a tendency to Blockless-check: using HVTs, the server can build the evidence that will allow the client to check whether the server block definition files, even if the client does not It has access to the actual file blocks.
HVT is a pair of values , where Wi is a random value obtained from an index i and is stored at the server. The index i can be seen as a one-time index because it is never reused for computing tags (a simple way to ensure that every tag uses a different index i is to use a global counter for i). The random value Wi is generated by concatenating the index i to a secret value, which ensures that Wi is different and unpredictable each time a tag is computed. HVTs and their corresponding proofs have a fixed constant size and are (much) smaller than the actual file blocks. The client must be able to verify specific tag file block, even if the client does not possess any of these blocks.
Definition Protocol PDP in the context of the task. The protocol is a set of four PDP polynomial-time algorithmsKeygen, TagBlock, GenProof, CheckProof.
- an algorithm that runs on the server to generate the proof of ownership of data; takes the input public key pk', an ordered set of F units, the request chal' and ordered set of , which is a test of metadata corresponding block in the F. The algorithm returns the proof of ownership of the data for units F, which are defined by the query chal.
- an algorithm executed by the client to check for proof of ownership data server. Takes as input the public key pk, the secret key sk, request chal and proof of ownership . The algorithm returns a response, whether correct proof of ownership for units installed chal.
The two phases of construction of the PDP-protocol
Setup and Request.
Request. C generates the call of chal, indicating concrete blocks, for which C wants to check the proof of ownership. C sends chal to S. S launches and sends proof of ownership to C. Now, C can check the validity of the evidence , running the .
In phase Settings, C computes the tags for each block of the file and stores them with the file on S. In the phase Request C asks for the proof of ownership for a subset of blocks in F. This phase can be done unlimited number of times in order to find out whether the all S has selected block. Note that GenProof and CheckProof may receive different input values for chal, because these algorithms run by S and C, respectively.
Let and - simple and let - module of RSA. Let g - Generator , in a unique cyclic subgroup order . We can get g, such , where , such that . All exponentiation performed modulo N, and for simplicity, sometimes it will be skipped. Let - deterministic hash function that displays rows evenly in .
In summary, look at two sheme embodiments of the algorithm.
Go to list of references to the section " Proof of Data Possession, PDP "
Kiryakina, S. 2015