# Privacy-preserving deep learning

## Contents |

# Introduction

Fueled by the massive influx of data and advanced algorithms, modern deep neural network (DNN) has surprisingly benefited IoT applications in a spectrum of domains, including visual detection, smart security, audio analytics, health monitoring, infrastructure inspection, etc. In recent years, enabling efficient integration of DNNs and IoT is receiving increasing attention from both academia and industry. DNN-driven applications typically have a two-phase paradigm: 1) a training phase wherein a model is trained using a training dataset. 2) an inference phase wherein the trained model is used to output results (e.g., predication, decision, recognition) for a piece of input data. With regard to the deployment on IoT devices, the inference phase is mainly adopted to process data collected on the fly. Given the fact that complex DNN inference tasks can contain a large amount of computational operations, their execution on resourceconstrained IoT devices becomes challenging, especially when time-sensitive tasks are taken into consideration. For example, a single inference task using popular DNN architectures (e.g., AlexNet, FaceNet, and ResNet) for visual detection can require billions of operations. Moreover, many IoT devices are powered by battery, which will be quickly drained by executing these complex DNN inference tasks. To soothe IoT devices from heavy computation and energy consumption, outsourcing complex DNN inference tasks to public cloud computing platforms has become a popular choice in the literature. However, this type of “cloud-backed” system can raise privacy concerns when the data sent to remote cloud servers contain sensitive information.

# Background and problem formulation

The computational flow of a DNN inference consists of multiple linear and non-linear computational layers.The input of each layer is a matrix or a vector, and the output of each layer will be used as the input of the next layer unless the last layer is reached. In this project, we investigate convolutional neural network (CNN) as an example, which is an important representative of DNN. In CNN, linear operations in an inference mainly performed in fully-connected (FC) and convolution (CONV) layers. Non-linear layers (e.g., activation layer and pooling layer) are typically placed after a CONV or FC layer to perform data transformation. In CONV and FC layers, dot product operation (DoT(·)) are repeatedly executed. To be specific, a FC layer takes a vector v ∈ R n as input and outputs y ∈ R m using linear transformation as y = W · v + b, where W ∈ R m×n is the weight matrix and b is the vector of bias. During he calculation of W · v, m dot products are computed as y[i] = DoT(W [i, :], v)1≤i≤m. In a CONV layer, a X ∈ R n×n input matrix will be processed into H kernels. For a (k × k) kernel K, it scans the matrix from top-left corner, and then moves from left to right. Each scan is a linear transformation that takes a (k × k) window in the input matrix and uses it to compute a dot product with the kernel, which then adds a bias term to the result.

As depicted in Fig.2, our framework involves two noncolluding edge computing servers, resource-constrained IoT devices, and the device owner. • Edge servers: we consider two non-colluding servers, denoted as EdgeA and EdgeB, that are deployed close to IoT devices. Each edge server has the capability to efficiently process DNN inference tasks over plaintext, such as a regular laptop. Each edge server will obtain linear layers of a trained DNN model from the device owner. EdgeA and EdgeB will process encrypted DNN inference requests from IoT devices in a privacypreserving manner. The multi-server architecture has been widely adopted to balance the security and efficiency in privacy-preserving outsourcing, wherein at least one server will not collude with the others. • IoT devices: we consider resource-constrained IoT devices that are deployed with limited computing capability and battery life. These devices collect data and need to process these data on the fly using DNN inference. • Device owner: the device owner has pre-trained DNN models and can deploy IoT devices for service. In this project, we focus on designing a framework that an IoT device can outsource the majority of computation in a DNN inference task to two non-colluding edge servers in a privacy-preserving manner. At the end of the inference, the IoT device obtains the result over its input data, whereas two edge servers do not learn the sensitive information of input data, intermediate outputs, and the final inference result. As all IoT devices are deployed by the owner, he/she has access to all data collected and processed by his/her IoT devices when necessary.

# Privacy-preserving outsourcing of DNN inference

In our framework, the IoT device outsources the execution of linear (CONV and FC) layers and keeps the computeefficient non-linear layers at local. Without loss of generality, we consider a DNN that contains q CONV and FC layers, each of which is followed by non-linear activation layers if necessary. We use µ to denote the length (in bit) of an element in the input matrix of CONV or input vector or FC layers, and λ to denote the security parameter. Random numbers utilized in our design are λ-bit generated using a pseudorandom function F(·). There are three major phases in our framework: Setup, Data, and Privacy-Preserving Execution. In the Setup, the owner prepares a pre-trained DNN model and generates the encryption and decryption keys for the IoT device. When the IoT device needs to perform DNN inference over its collected data, it will execute the Data Encryption phase to encrypt them and send them to two edge servers. The DNN inference is then executed in the Privacy-Preserving Execution phase. All outsourced DNN operations performed by edge servers are over encrypted data.

## Detailed Construction

Setup: To setup the framework, the device owner prepares a trained DNN model and sends its q linear layers (CONV and FC) to EdgeA and EdgeB. For the ith linear layer, the owner generates a pair of encryption and decryption keys {Si,in,Si,out}1≤i≤q. As presented in Algorithm 1, Si,in for ith linear layer will be randomly generated according to input dimension of the layer, and each element in Si,in will be a λ-bit random number. Si,out is the corresponding output of the ith linear layer when taking Si,in as the input. {Si,in,Si,out}1≤i≤q key pairs are deployed on the IoT device for later on privacy-preserving DNN inference tasks.

Data Encryption: When a DNN inference is needed for the IoT device, it will outsource the execution of linear layers to EdgeA and EdgeB in the privacy-preserving manner. To be specific, for the ith linear layer, its input will be encrypted and sent to EdgeA and EdgeB for processing. Intermediate results returned by edge servers are decrypted by the IoT devices, which are then fed into the follow up non-linear layers. The output of non-linear layers will be used as the input as the (i + 1)th linear layer. This process will be interactively conducted until all layers of the DNN are executed as shown in Algorithm.2.

To this end, the IoT device is able to efficiently handle each layer in a DNN. Compute-intensive linear (CONV and FC) layers are securely outsourced to the edge using after encryption. These compute-efficient layers are directly handled by the IoT device. Since our privacy-preserving solution is a
general design, it can be customized and recursively plugged into other DNN architectures.

# Security Analysis

In this section, we first prove that the encryption algorithm used in our framework is CPA-secure. Then, the security of the overall DNN inference. outsourcing is analyzed.

# Performance evalution

The theoretical analysis of our framework is summarized in Table II. For expression simplicity, we use one floating point operation (FLOP) to denote an addition or a multiplication. Compared with ref and outsourcing the CONV layer (or FC layer) without privacy protection, our framework achieve the same computational cost on each edge server. While our framework doubles the computational cost on the IoT device compared with ref, it is significantly less than the amount of outsourced computation as shown in Table II. With regards to the communication cost, our framework introduces two times of the elements as that in ref. This is caused by the two-edge server design, since the IoT device needs to communicate with both edge servers in our framework. Fortunately, we integrate data compression technique into our design, which can significantly shrink the size of ciphertext that causes communication by 70%+. As a result, the proposed framework achieves a better communication performance compared with ref as shown in the study cases of AlexNet and FaceNet (next paragraph). More importantly, ref has the bottleneck in storage overhead, which increases linearly to the number of DNN requests to be executed. Each DNN inference requires ref to pre-store a new set of keys to support its privacy-preserving outsourcing. If the IoT evice in ref generates a new set of keys for one DNN inference on-the-fly, the computational cost will be the same as executing the a complete DNN inference, which is not only time consuming on the IoT device but also drains the battery life of IoT device quickly. As a comparison, the proposed framework in this paper only requires the IoT device to store one set of keys for the entire deployment life cycle.

# Related work

The problem of privacy-preserving neural network inference (or prediction) has been studied in recent years under the cloud computing environment, These works focus on the “machine learning as a service” scenario, wherein the cloud server has a trained neural network model and users submitted encrypted data for predication. One recent line of research uses somewhat or fully homomorphic encryption (HE) to evaluate the neural network model over encrypted inputs after approximating non-linear layers in the neural network. Combining multiple secure computation techniques(e.g., HE, Secure multi-party computation (MPC), oblivious transfer (OT)) is another trend to support privacy-preserving neural network inference.The idea behind these mixed protocols is to evaluate scalar products using HE and non-linear activation functions using MPC techniques. In particular, SecureML utilized the mixed-protocol framework proposed in ABY, which involves arithmetic sharing, boolean sharing, and Yao’s garbled circuits, to implement both privacy-preserving training and inference in a two-party computation setting. In, MiniONN is proposed to support privacy-preserving inference by transforming neural networks to the corresponding oblivious version with the Single Instruction Multiple Data (SMID) batch technique. Trusted thirdparty is invoked in Chameleon and hence greatly reducing the computation and bandwidth cost for a privacy-preserving inference. In GAZELLE is proposed by leveraging latticebased packed additive homomorphic encryption (PAHE) and two-party secure computation techniques. GAZELLE deploys PAHE in an automorphism manner to achieve fast matrix ultiplication/convolution and thus boosting the final run-time efficiency. A multi-sever solution, named SecureNN, is proposed in, which greatly improves the privacy-preserving inference performance, i.e., 42.4× faster than MiniONN, and 27×, 3.68× faster than Chameleon and GAZELLE. While the performance of evaluating neural network over encrypted data for inference keeps being improved, the existing research works only focus on small-scale neural networks. Taking the state-of-the-art SecureNN as an example, the network-A evaluated (also used by) only requires about 1.2 million FLOPs for an inference, which costs 3.1s with wireless communication in their 3PC setting. As a comparison, the AlexNet evaluated in our framework contains 2.27 billion FLOPs for one inference, which costs 3.08s in our framework with similar wireless transmission speed. It is also worth to note that SecureNN utilizes powerful cloud server (36vCPU, 132 ECU, 60GB memory) for evaluation, whereas the edge computing device in this paper is just a regular laptop computer. Scaling up the network size is not a trivial task. For example, compared with the type-A network in [17], its type-C network with 500× multiplication increases the computational cost and communication cost to 430× and 592×. Recent research proposed to adopt an online/offline strategy to trade the efficiency of online real-time DNN inference on IoT devices using offline precomputation. While this scheme boosts the efficiency on complex DNN architecture,it is limited in practical deployment due to its bottleneck of storage overhead. To be specific, each set of pre-computed keys stored on the IoT device can only be used for one DNN inference request. In practice, IoT devices are typically deployed to continually collect and process data (using DNN inference in ref). Taking AlexNex and FaceNet examples, the pre-computed keys in ref will occupy 20.49GB or 74.91GB respectively for supporting only 1,000 requests. If the IoT device needs to generate by a new set of keys by itself, the computational cost will be the same as the execution of a complete DNN inference.

# Conclusion

In this paper, we proposed a two-edge-server framework that enables resource-constrained IoT devices to efficiently execute DNN inference requests with privacy protection. The proposed framework uniquely designs a lightweight encryption scheme to provide private and efficient outsourcing of DNN inference tasks. By discovering the fact that linear operations in DNNs over input and random noise can be separated, our scheme generates decryption keys to remove random noises and thus boosting the performance of real-time DNN requests. By integrating local edge devices, our framework ameliorates the network latency and service availability issue.Moreover, the proposed framework also makes the privacypreserving operation over encrypted data on the edge device as efficient as that over unencrypted data. In addition, the privacy protection in our framework does not introduce any accuracy loss to the DNN inference, since no approximation is required for all DNN operations. A random sampling-based integrity checking strategy is also proposed in our framework, which enables the IoT devices to detect computation errors contained the results returned by edge servers. Thorough security analysis is provided to show that our framework is secure in the defined threat model. Extensive numerical analysis as well as prototype implementation over the well-known DNN architectures demonstrate the practical performance of our framework.

## Glossary

### Bibliography

https://eprint.iacr.org/2020/155.pdf

Katyshev S.D