NASS: Optimizing Secure Inference via Neural Architecture Search
Abstract
Due to increasing privacy concerns, neural network (NN) based secure inference (SI) schemes that simultaneously hide the client inputs and server models attract major research interests. While existing works focused on developing secure protocols for NNbased SI, in this work, we take a different approach. We propose NASS, an integrated framework to search for tailored NN architectures designed specifically for SI. In particular, we propose to model cryptographic protocols as design elements with associated reward functions. The characterized models are then adopted in a joint optimization with predicted hyperparameters in identifying the best NN architectures that balance prediction accuracy and execution efficiency. In the experiment, it is demonstrated that we can achieve the best of both worlds by using NASS, where the prediction accuracy can be improved from 81.6% to 84.6%, while the inference runtime is reduced by 2x and communication bandwidth by 1.9x on the CIFAR10 dataset.
1 Introduction
With serious concerns growing over the security risks of property stealing [11] and private information leakage [20] related to machine learning as a service schemes, the study of the security properties of both neural network (NN) training [15] and inference [12] is becoming one of the most important fields of study across the disciplines. In particular, a secure inference (SI) scheme refers to the situation where Bob as a client wants to hide his inputs to Alice, the NN service provider. Meanwhile, Alice also needs to protect her trained network model, as such a trained model is extremely valuable due to the costly dataset preparation and lengthy training processes.
While a number of protocols have been proposed for both secure inference and training on neural networks [15, 16, 18, 12], the general approach of existing works is to find the equivalent NN operations (e.g., matrixvector product, activation functions) in the secure domain (e.g., using garbled circuits or homomorphic encryption), and instantiate the secure protocols accordingly. In other words, security is not an integral part of the proposed protocol, but rather an added feature with (in many cases, serious) performance penalties.
Recent advances in the secure machine learning field have suggested the possibility of formulating the secure protocols as a design automation problem. For example, in [3], authors proposed a framework that automatically instantiate parameters for homomorphic encryption (HE) schemes. Likewise, a line of research efforts [12, 10, 7] have explored how to optimize HE parameters and packing capabilities to improve the efficiency of secure computations, especially for neural network based protocols. In all of the existing works, secure primitives are designed to maximize efficiency of a predefined neural architecture (in fact, many of the existing works use the same manually designed architecture).
We argue that the existing design techniques based on fixed neural architectures lead to unsatisfactory solutions, as the efficiency of SI (in terms of the inference time and network bandwidth) are significantly affected by the architectures. The performance nonlinearities of cryptographic primitives are demonstrated through Fig. 1, where the computational and communication costs of SI are plotted with respect to quantization factors of some neural architecture. From Fig. 1, we can see that for certain quantization intervals (e.g., 14bit to 15bit), the inference time is doubled, while for other intervals (e.g., from 2bit up to 15 bits), the inference time remains unchanged. This nonlinear performance curve is primarily due to the underlying primitive (in this case a packed homomorphic encryption (PAHE) scheme) that is constrained by its cryptographic parameters. Our conclusion here is that, in order to obtain better design tradeoffs, a joint exploration considering both secure primitives and neural architectures is required to push forward the Pareto frontiers of the efficiency and prediction accuracy of NNbased SI.
In this work, we propose NASS, a novel Neural Architecture Search framework for Secure inference, where the optimization of cryptographic primitives and NN prediction accuracy are integrated. To the best of our knowledge, we are the first to take a synthetic approach to improve both the accuracy and the efficiency of SI on NN. In NASS, the process of finding the best SI scheme is formulated as predicting the most rewarding neural architecture, where the reward is derived from the accuracy and efficiency statistics. A system optimizer based on reinforcement learning is used to take feedback from the rewards to generate new architectures, acting as a neural architecture search (NAS) engine. Our main contributions are summarized as follows.

Synthesizing Secure Architectures: To the best of our knowledge, NASS represents the first work to search for neural architectures optimized in secure applications with multiple cryptographic building blocks. In NASS, cryptographic primitives are modeled as design elements, and secure computations with these elements become abstract operators that can be automatically synthesized by the optimization engines.

Optimizing HE Parameters: While existing works have already treated the instantiation of HE parameters as a design problem and proposed some solutions [3], we point out that these solutions are not adequate. In particular, we identify an optimization dilemma in learning with errors (LWE) based HE parameter instantiation, and observe that this optimization problem is (computationally) rather difficult to solve, especially for NASbased optimization with fast turnaround time.

A Thorough Architectural Search for SI: By conducting extensive architectural search, it is demonstrated that the performance of SI can be reduced while improving the prediction accuracy. We achieve a prediction accuracy of 84.6% on the CIFAR10 dataset, while reducing 2x computational time and 1.9x network bandwidth, compared to the best known SI scheme [12] with a prediction accuracy of only 81.6%.
The rest of this paper is organized as follows. First, in Section 2, basics on PAHE, secure inference, and NAS are discussed. Second, the NASS framework is outlined in Sections 3 and 4, where we detail how security and parameter analyses can be systematically performed, along with the design of reward functions for the integration with the NAS engine. Next, the output architectures of NASS along with performance statistics are demonstrated in Section 5. Finally, our work is summarized in Section 6.
2 Preliminaries
2.1 Cryptographic Building Blocks
In this work, we mainly consider the optimization involving two types of cryptographic primitives, packed additive homomorphic encryption (PAHE) based on the ring learning with error (RLWE) problem [5, 9, 4, 6], and garbled circuits (GC) [23]. In what follows, we provide a highlevel abstraction of each individual primitive.
PAHE: A PAHE is a cryptosystem, where the encryption () and decryption () functions act as group (additive) homomorphisms between the plaintext and ciphertext spaces. Except for the normal and , a PAHE scheme is equipped with the following three abstract operators. We use to denote the encrypted ciphertext of , and here is some lattice dimension.

Homomorphic addition : for , .

Homomorphic Hadamard product : for , , where is the elementwise multiplication operator.

Homomorphic rotation : for , let , for .
GC: GC can be considered as a more general form of HE. In particular, the circuit garbler, Alice, “encrypts” some function along with her input to Bob, the circuit evaluator. Bob evaluates using his encrypted input that is received from Alice obliviously, and obtains the encrypted outputs. Alice and Bob jointly “decrypt” the output of the function and one of the two parties learn the result.
2.2 Homomorphic Evaluation Errors in RLWEbased PAHE
In this work, we omit details on the implementation of RLWEbased PAHE schemes, such as BFV [5, 9], BGV [4], and CKKS [6]. However, for all of the above RLWEbased PAHE schemes (and most RLWEbased cryptosystems), the ciphertext output from the encryption function of the cryptosystem bares some intrinsic errors, which can be thought of an additive components to the ciphertext, i.e.,
(1) 
where is the “errorless” ciphertext, and the error (both are vectors in for some lattice dimension ). It is obvious that when we add two ciphertexts, , the error is also additively increased (i.e., ). Similarly, homomorphic Hadamard product and rotation operations also increases the errors. Therefore, each level of homomorphic evaluation increases the error contained in the ciphertext, and when the size of the error become too large (i.e., too many levels of homomorphic evaluations), some ciphertexts will not be correctly deciphered. We emphasize the point that not all ciphertexts become undecipherable as the size of the error is randomly distributed, and this probabilistic behavior can be utilized to improve the efficiency of SI schemes.
2.3 Secure Neural Network Inference
While a number of pioneer works have already established the concept of secure inference and training with neural networks [15, 16, 18], it was not until recently that such protocols carried practical significance. For example, in [15], an inference with a single CIFAR10 image takes more than 500 seconds to complete. Using the same neural architecture, the performance was improved to less than 13 seconds in one of the most recent arts on SI, Gazelle [12]. Unfortunately, 13 seconds per image inference is obviously still unsatisfactory, especially given the large amount of data exchange in realworld applications. Therefore, we adopt the Gazelle protocol in this work, and take a systemlevel approach to improve its efficiency.
An overview of the Gazelle protocol is outlined in Fig. 2, where Alice wants to classify some input (e.g., image), and Bob holds the weights. The Gazelle protocol classifies all NN operations into two types of layers: i) linear layers, where the computations are efficiently carried out by PAHEbased cryptographic primitives, and ii) nonlinear layers, where interactive protocols such as multiplication triples [2] or GC are employed.
Threat Model: The threat model in Gazelle and this work is that both Bob and Alice are semihonest, in the sense that both parties follow the described protocol (e.g., encryption and decryption procedures in PAHE, GC), but want to learn as much information as possible from the other party. In particular, Alice wishes to gain knowledge on the trained model from Bob, and Bob is curious about the encrypted inputs from Alice.
2.4 Neural Architecture Search
Recently, Neural Architecture Search (NAS) has been consistently breaking the accuracy records in a variety of machine learning applications, such as image classification [24], image segmentation [?], video action recognition [?], and many more. NAS attracts major attentions mainly because it successfully eliminates the needs of human expertise and labor time in identifying highaccuracy neural architectures.
A typical NAS, such as that in [24], is composed of a controller and a trainer. The controller will iteratively predict (i.e., generate) neural architecture parameters, referred to as child networks. The child networks will be trained from scratch by the trainer on a heldout dataset to obtain the prediction accuracy. Then, the accuracy will be feedback to update the controller. Finally, after the number of child networks predicted by the controller exceed a predefined threshold, the search process will be terminated. The searched architecture with the highest accuracy is identified to be the output of the NAS engine.
Existing works have demonstrated that the automatically searched neural architectures can achieve close accuracy to the best humaninvented architectures [25, 24]. However, without proper performance measures, the identified architectures can have overcomplex structures [?, ?, ?, ?, ?] that render them useless in realworld applications, especially for securityrelated schemes. Therefore, the main motivation in establishing the NASS framework is to find accurate and efficient neural architectures for secure inference schemes.
3 NASS Framework
3.1 Problem Formulation and Challenges
In this paper, we aim to identify the most efficient secure neural network inference via neural architecture search. The problem is informally defined as follows: Given a specific dataset and a set of secure inference protocols, our objective is to automatically generate a quantized neural network architecture and the parameters for each of the cryptographic primitives, such that the reward of the resultant neural network after training can be maximized. Here, we define the reward to be a function of the prediciton accuracy and performance statistics, including the inferecen time and network bandwidth.
To solve the above problem, several challenges need to be addressed from both the neural architecture search perspective and the secure protocol perspective. We list two main challenges as follows.
Challenge 1: There are missing links among neural architectural optimizations, quantization optimizations, and cryptographic protocol optimizations, resulting in the nonoptimal solutions from existing works. This is based on our observation that all of the above optimizations are tightly crosscoupled; that is, the optimization in one direction (e.g., better prediction accuracy) can have positive or negative impact on the other directions (e.g., larger quantization level and higher secure inference time). Therefore, a framework that can jointly optimize neural architectures, quantizations, and performances of cryptographic primitives, is needed. In this work, we derive the NASS framework (Section 3.2 and 3.3) to fill the gap.
Challenge 2: To the best of our knowledge, there exists no efficient performance estimator for secure inference (SI) involving multiple cryptographic primitives. Since the NAS engine generates a large amount of intermediate architectures iteratively, without an automatic performance estimator, it is impossible to evaluate the performance statistics of such networks adopted in SI. In this paper, we make the contribution of developing efficient estimator engines (Section 4).
3.2 Overview of NASS
Fig. 3 illustrates an overview of the proposed NASS Framework with four components: ➀ ParmGen, ➁ Machine Learning (ML) Estimators, ➂ Cryptographic Estimators, and ➃ Controller. Specifically, component ➀ paramertizes the architecture and quantization, which identifies a unique neural architecture for the subsequent computations. Upon receiving the input architecture from component ➀, Component ➁ trains and evaluates its accuracy, and component ➂ optimizes the cryptographic primitives to estimates its performance. Finally, component ➃ will control the optimization flow. All of the components collaboratively explore the parameter spaces of neural architecture, quantization, and cryptographic primitives to jointly optimize the accuracy, time, and bandwidth.
The NASS framework works in three steps. First, the controller generates a prediction on a quantized neural architecture (called child network), which will be formulated as . Second, the child network will be evaluated by ML Estimators to generate prediction accuracy (A), and optimized in Cryptographic Estimators to provide the inference time (T), and bandwidth (B) feedbacks. Lastly, a reward signal is generated in terms of A, T, B, to update the controller. Details of each component will be introduced in Section 3.3.
In practice, the lengthy training process dominates the search time. In NASS, we add a switch before the architecture parameter (AP) subcomponent to dramatically reduce the number of training processes. This is based on the observation that the quality of quantization parameters for the same architectures can be evaluated using the same trained (floatingpoint) weights. If the switch is on (), we will train the architecture from scratch to generate weights and obtain accuracy statistics in terms of quantization. Otherwise (), we reuse the weights and apply the new quantization parameters to obtain the accuracy.
The switch can be controlled by using a predefined function. In this work, we demonstrate the exploration procedure using the following function:
(2) 
where is the episode index given to each of the child networks predicted by the controller, is a scalar to indicate the number of child network with the same architecture but different quantization to be explored.
3.3 NASS Framework Details
➀ ParmGen. The ParmGen block generates layer parameters for the subsequent computations. A neural architecture consists of a set of layers. According to the linearity of function for each layer, there are two types of layers: linear layer (e.g., convolution, fully connection), and nonlinear layer (e.g., ReLU, pooling). Each layer can be specified as a set of parameters. Note that different types of layers have different parameters. We denote and to represent parameters of linear layers and nonlinear layers, respectively.
For linear layers, a set of layer parameter is denoted as , where and represent the dimensions of feature maps (i.e., input data); and represent the dimensions of filters (i.e., weights), and indicate the data and weight quantizers; and stand for the number of input and output channels.
For nonlinear layers, they do not contain weights, and therefore, there is no parameters for filter quantizations and dimensions. In addition, the number of channels will not be changed, and we only record the input channel number in layer parameter . In consequence, we denote parameter sets for nonlinear layers as .
A neural architecture can be represented by a collection of parameters for all layers, i.e., where represents the layer.
➁ Machine Learning (ML) Estimator. A machine learning estimator is composed of a trainer and an accuracy evaluator. According to the status of switch before AP in ➀, the ML estimator will take different actions. When the switch is on (i.e., ), a new architecture will flow into the ML estimator, and it will be trained from scratch using floating points. Then, the accuracy evaluator will quantize the weights from the trainer according to the given quantization parameters to obtain the accuracy of the quantized neural network. When the switch is off (i.e., ), it indicates that the previous predicted architecture is applied with new quantization parameters. In this case, the ML estimator will not train the architecture. The weights from in the previous iteration are reused with new quantization parameters to obtain the prediction accuracy on the training dataset.
➂ Crytographic Estimator. A crytographic estimator contains two subcomponents: the parameter instantiation engine (PIE) and the performance characterization engine (PCE). These engines take input from , and collaboratively instantiate parameters for the cryptographic parameters while evaluating their performance. In particular, the outputs of PIE are the cryptographic parameters. For example, for RLWEbased PAHE schemes (e.g., used for homomorphic matrixvector product in linear layers and multiplication triples for square activation), the cryptographic parameters are . During performance characterization, PCE consults with PAHE and GC libraries to produce characterized scores for a single round of secure inference using the architecture specified in . Details on the implementations of PIE and PCE will be discussed in Section 4. Kindly note that, for some cryptographic protocols (e.g., GC used in implemeting ReLU), the cryptographic parameters can be directly determined in terms of without using a PIE.
➃ Controller. The controller is a core component in the NASS framework. According to the output of the ML estimator (➁) and the crytographic estimator (➂), the controller predicts a new which supposedly has higher accuracy, lower latency, and lower bandwidth requirement compared to the architecture predicted in the previous iteration.
The controller can be implemented by different techniques, such as the reinforcement learning or the evolutionary algorithms. However, in both cases, the key element for the controller design is the reward function. In this work, we employ the reinforcement learning method in the controller whose interactions with the environment are modeled as a Markov decision process (MDP). The reward function is formulated as follows.
(3) 
where A is the prediction accuracy, and is the performance score reported by the cryptographic estimators. The detailed definition of can be found in Eq. (8). After calculating the reward, we follow the Monte Carlo policy gradient algorithm [?] to update the controller:
(4) 
where is the batch size and is the total number of steps in each episode. The rewards are discounted at every step by an exponential factor and the baseline is the exponential moving average of the rewards.
4 Estimators for Cryptographic Primitives
While CHET [7] realizes the importance of establishing an abstraction layer for the NN designer to hide specific HE implementation details, they did not think of the cryptographic primitives as design elements that carry distinct performance tradeoffs (actually, CHET only focused on compiling a single FHE primitive). As observed in Fig. 2, Gazelle instantiate different protocols according to the specific NN layers. Hence, in this section, we describe how to construct estimators that model cryptographic primitives as delay elements with communicational costs, analogous to the FPGA components modeled in the FNAS [?] framework.
4.1 Constructing PIE for PAHE
In this work, we use the widelyadopted BFV [9] cryptosystem as the example PAHE scheme, but our method applies broadly to all RLWEbased PAHE cryptosystems. In BFV, three parameters are required to instantiate the cryptosystem, , where is the lattice dimension, the plaintext modulus, and the ciphertext modulus.
The Feedback Loop
In the Gazelle protocol, since each linear layer is evaluated independently (decryptions are performed after only one layer of homomorphic evaluation), parameters can be minimized. For example, in our experiments, , and ranges from 60 to 180 bits. Therefore, even one bit of loose error margin can easily result in 1.5x to 2x performance penalty on 64bit machines, due to the requirement of extra integer slots (e.g., from 61bit to 62bit ).
In Gazelle, as long as the dimensions and quantizers are the same, the parameters do not scale with the number of layers in the NN. Therefore, parameter minimization needs to be carried out for every NN layer with varying qunatizations and dimensions. The main difficulty for perlayer paramter minization lies in the feedback loop between PCE and PIE. The dilemma is that, in order for PIE to instantiate parameters that ensure correct decryption, the error size (explained in Section 2.2) needs to be estimated by PCE. Meanwhile, PCE needs instantiated PAHE parameters from PIE to perform error analysis, thereby forms the loop. Iterating through all possible parameter combinations with error calculations for each NN layer creates significant computational burden in the NASS optimization process. In addition, generating large primes can also be time consuming, as BFV requires additional constraints on the relationship of , and to enable the batching technique [21], which is essential to the efficiency of SI. In particular, both and need to satisfy , where can be a large integer (e.g., 120 bits).
Instantiating the Parameters
An overview of the joint parameter optimization procedure is illustrated in Fig. 4.
➀ Initialization: To start the optimization process, inputs are first fed to PIE. The inputs include , the initial lattice dimension, and (, ), the respect quantizers for NN inputs and filters. Here, is an arbitrary number, and can be set as the smallest that grants some security levels for extremely small (e.g., , which is secure for ). and is used to determine the plaintext modulus . In order to carry out a successful inference, we need that and . After generating the plaintext modulus , along with and other parameters in (e.g., the input dimensions and the filter dimension and ), we can calculate a working ciphertext modulus . Note that this estimation can be loose, but will be tightened in PCE through optimization iterations.
➁ Optimization Loop: Upon receiving parameters from PIE, PCE performs two important evaluations: i) security level estimation, and ii) decryption failure rate estimation. Failure in meeting either of the conditions results in an immediate rejection. First, in i), The security levels are consulted with the community standard established in [1]. When the security standard is not met, we regenerate the lattice dimension and retry the security analysis. Next, in ii), after obtaining a valid for the estimated , a set of ciphertexts are created to see if is large enough for correct decryption. If the decryption failure rate is too high, we regenerate a larger and reevaluate the security of with respect to the new . After deriving valid that passes all the tests, the parameters are fed into a PAHE library to characterize the estimated amount of time and memory consumed by a single layer to calculation.
➂ Output Statistics: Steps ➀ and ➁ described above will be repeated for every layer in the input neural architecture, and all performance statistics are summed up to produce a final score to be used by the overall NASS framework in searching for a better neural architecture for SI.
Generating a Valid Ciphertext Modulus
One last note on the ciphertext modulus is that, as mentioned in Section 2.1, not all ciphertexts become undecipherable when is small. The probability that a ciphertext becomes undecipherable is called the decryption failure rate. Observe that different from [3], we do not need an expensive simulation to ensure an asymptotically small (e.g., ) decryption failure probability, since NNbased SI mispredicts much more often than . In most cases, a 0.1% accuracy degradation is not noticeable for practical CNN applications. Therefore, we can use the standard MonteCarlo simulation technique to ensure that is large enough to ensure that , where ranges from (1 decryption failure in 1000 inferences) to (1 in 100), depending on the prediction accuracy requirement.
4.2 PCE: Performance Characterization
Characterizing Linear Layers
The main arithmetic computations in both and involve a set of inner products between some plaintext matrix and ciphertext matrix (flatten as vectors) homomorphically. To compute any homomorphic inner product, the pioneering work in [12] proposes to align the weight matrix with the rotating input ciphertext vector to minimize the number of homomorphic operations. In general, the algorithm computes the inner product between , a weight matrix, and , the encrypted input vector as follows.
(5)  
(6)  
(7) 
where holds the result vector , ’s are the diagonally aligned columns of with dimension , and denotes . In Eq. (6), we first rotate times, each time multiplying it with the aligned vectors . Each multiplication generates an intermediate ciphertext that holds only one entry in with respect to . Summing these ciphertexts gives us a single ciphertext that is packed with partial sums in the corresponding inner products, and packed results can be summed up to obtain the final product [v].
It is noted that the performance nonlinearity illustrated in Fig. 1 lies critically in the way homomorphic inner products are computed. Take a toy example where and . The input vector can be tightly stored into a single ciphertext . Using the Gazelle algorithm, we rotate the input ciphertext times, and compute 10 homomorhpic Hadamard products. However, suppose that the input dimension is somehow . Since the lattice dimension can only be a power of 2, the ciphertext size becomes in order to hold an input vector . All subsequent homomorphic evaluations require double the amount of computations and bandwidths compared to . If the amount of accuracy improvement from to is marginal (e.g., ), then this improvement suggestion should be rejected.
The above example represents the precise procedure performed in PCE, where all the information contained in jointly determines how packing can be performed to maximize the protocol efficiency. The output of this procedure is the inferece time and network bandwidth . We use a simple weighted sum to derive the performance score , where
(8) 
One important implementation detail is that, instead of performing the entire calculation using an actual PAHE implementation, only basic operations (, and ) need to be characterized. The actual runtime and bandwidth usage can be scaled from the combinations of the basic operations. In fact, this can be a critical performance improvement, as secure inference is still quite slow for deep neural networks (10 to 100 seconds), and running PCE for full network characterizations can be a performance bottleneck in the optimization process.
PCE for NonLinear Layers
Running PCE for interactive protocols such as multiplication triples [2] and GC [23] is much simpler than linear layers, as these nonlinear functions (e.g., square and ReLU) are performed on a perelement basis with fixed functionality. The performance statistics can be characterized once, and used throughout all layers when properly scaled.
5 Numerical Experiments and Parameter Instantiations
5.1 Experiment Setup
In this work, we compare NASS with the performance statistics of two bestperforming recent works on secure inference, namely, Gazelle [12] and XONN [17]. First, we point out that the reported statistics are not entirely reliable in Gazelle. For example, the architecture used for a single CIFAR10 inference needs more than 120,000 ReLU calls, and Gazelle reports 551 ms of runtime per 10,000 ReLU evaluations. Nevertheless, the total online inference time is less than three seconds. Since the main focus of NASS is to improve the architectural design of NNs, the performance of both Gazelle and the derived architectures in this work are characterized by the proposed performance estimator. We base our experiments on three datasets, MNIST [14], fashionMNIST [22], and CIFAR10 [13]. The characterization experiments are conducted with a Intel i36100 3.7 Ghz CPU, and the architectural search is peformed using a NVIDIA P100 GPU. The adopted PAHE library is SEAL version 3.3.0 [19], and GC protocols are implemented using ABY [8].
5.2 Architectural Optimization and the Pareto Frontier
First, in Fig. 5, we use an example NASS run using the CIFAR10 dataset to show the effectiveness of the proposed framework. We trained the NASS controller through a 500 episodes window, where the neural architecture is fixed to have four convolution layers. As explained in Section 3.2, each episode generates a child network, and the reward of the child network is calculated in Eq. (3) using the prediction accuracy and cryptographic performance scores. The results in Fig. 5 indicate that both the rewards (where accuracy dominates the calculation, as described in Eq. (3)) and the secure inference times converge to their optimized states as learning episodes proceed. Furthermore, in Fig. 6, the bestperforming data points are gathered to plot the Pareto frontiers generated by NASS. Here, the vertical axis is the estimated performance score, and the horizontal axis denotes the prediction accuracy on the CIFAR10 dataset. Two observations are made here. First, the proposed NASS engine is able to learn the extremely nonlinear design space of CNNbased SI, and second, the NASS framework pushes forward the Pareto frontier for SI compared to existing works on both SI and NAS.
Architecture  Accuracy  Total  Bandwidth  No. Episode 
Time  (Search Time)  
MNISTAcc  98.6%  0.79 s  17 MB  1000 (17 hrs.) 
MNISTPer  98.6%  0.79 s  17 MB  1000 (17 hrs.) 
FashionAcc  90.6%  1.67 s  50 MB  2000 (32 hrs.) 
FashionPer  90.4%  0.72 s  22 MB  2000 (32 hrs.) 
CIFARAcc  84.6%  8.0 s  944 MB  1200 (60 hrs.) 
CIFARPer  82.6%  5.1 s  582 MB  1200 (60 hrs.) 
The predicted architectures laying on the Pareto frontier of the tested datasets are summarized in Table 1. Two types of architectures are selected here. Architectures with a suffix of Acc are the child networks that have better accuracy but (relatively) worse performance, and Per the reverse. The insight here is that, for smaller (i.e., easier) datasets such as MNIST, the search is almost exhaustive, where the best architecture achieves highest prediction accuracy and cryptographic performance. Nevertheless, for more complex datasets, the differences become increasingly large, where distinctive tradeoffs between neural architectures emerge. We emphasize that depending on the application, all neural architectures on the Pareto frontier are legitimate candidates. However, as also shown in Fig. 6, in many cases, significant performance degradation only results in marginal accuracy improvement, and vise versa.
5.3 Comparison to Existing Works
Gazelle  Best Searched by NASS  

Layer  Dimension  Quant.  Layer  Dimension  Quant. 
CR  23  CR  
CR  23  CR  
PL  23  PL  
CR  23  CR  
CR  23  CR  
PL  23  PL  
CR  23  CR  
CR  23  
FC  23  FC  
Accuracy: 81.6%  Accuracy: 84.6%  
Bandwidth: 1.815 GBytes  Bandwidth: 977 MB  
PAHE Time: 3.22 s  PAHE Time: 1.62 s  
GC Time: 13.2 s  GC Time: 6.38 s  
Total Time: 16.4 s  Total Time: 8.0 s 
We selected the CIFARAcc from Table 1 to compare the NASS against the baseline architecture proposed in [15]. The architectures are summarized in Table 2. Here, CR depicts a convolution layer plus a ReLU layer, and PL is an average pooling layer. Dimension indicates the filter dimension, and the input dimension is in the CIFAR10 dataset. The important observation here is that, by using architectural search, we do not need to trade accuracy for performance. The generated neural architecture requires only 5 convolution layers rather than 6 (as used in the baseline architecture), while improving the prediction accuracy from 81.6% to 84.6%. The inference time and network bandwidth are reduced by 2x and 1.9x, respectively. The reduction rate can be increased to more than 3x when the same level of accuracy suffices, as demonstrated by the CIFARPer architecture in Table 1. Finally, we note that the very recent work [17] that achieves a prediction accuracy of 85% requires more than 30 seconds to carry out the inference, which translates to 4x time reduction when compared to CIFARAcc.
6 Conclusion
In this work, NASS is proposed to optimize neural network architectures used in secure inference schemes. Models of cryptographic primitives are created to automatically generate computational and communicational profiles. Rewards are generated based on the calculated profiles and fed to a NAS optimizer to search in the architectural space of convolutional neural networks. Experiments show that securitycentric designs result in better inference speed and bandwidth footprint compared to manually tuned neural architectures, while achieving better prediction accuracy.
Acknowledgment
This work was partially supported by JSPS KAKENHI Grant No. 17H01713, 17J06952, and Grantinaid for JSPS Fellow (DC1).
References
 (201811) Homomorphic encryption security standard. Technical report HomomorphicEncryption.org, HomomorphicEncryption.org, Toronto, Canada. Cited by: §4.1.2.
 (1991) Efficient multiparty protocols using circuit randomization. In Annual International Cryptology Conference, pp. 420–432. Cited by: §2.3, §4.2.2.
 (2019) DArL: dynamic parameter adjustment for lwebased secure inference. In Proc. of DATE, pp. 1739–1744. Cited by: 2nd item, §1, §4.1.3.
 (2014) (Leveled) fully homomorphic encryption without bootstrapping. ACM Transactions on Computation Theory (TOCT) 6 (3), pp. 13. Cited by: §2.1, §2.2.
 (2012) Fully homomorphic encryption without modulus switching from classical GapSVP. In Advances in Cryptology–CRYPTO 2012, pp. 868–886. Cited by: §2.1, §2.2.
 (2017) Homomorphic encryption for arithmetic of approximate numbers. In International Conference on the Theory and Application of Cryptology and Information Security, pp. 409–437. Cited by: §2.1, §2.2.
 (2019) CHET: an optimizing compiler for fullyhomomorphic neuralnetwork inferencing. In Proc. of PLDI, pp. 142–156. Cited by: §1, §4.
 (2015) ABYa framework for efficient mixedprotocol secure twoparty computation.. In Proc. of NDSS, Cited by: §5.1.
 (2012) Somewhat practical fully homomorphic encryption.. IACR Cryptology ePrint Archive 2012, pp. 144. Cited by: §2.1, §2.2, §4.1.
 (2018) Secure outsourced matrix computation and application to neural networks. In Proc. of ACM SIGSAC Conference on Computer and Communications Security, pp. 1209–1222. Cited by: §1.
 (2019) PRADA: protecting against dnn model stealing attacks. In Proc. of EuroS&P, pp. 512–527. Cited by: §1.
 (2018) Gazelle: a low latency framework for secure neural network inference. arXiv preprint arXiv:1801.05507. Cited by: 3rd item, §1, §1, §1, §2.3, §4.2.1, §5.1.
 (2009) Learning multiple layers of features from tiny images. Technical report Citeseer. Cited by: §5.1.
 (2010) MNIST handwritten digit database. AT&T Labs [Online]. Available: http://yann. lecun. com/exdb/mnist 2. Cited by: §5.1.
 (2017) Oblivious neural network predictions via MinioNN transformations. In Proc. of ACM SIGSAC Conference on Computer and Communications Security, pp. 619–631. Cited by: §1, §1, §2.3, §5.3.
 (2017) SecureML: a system for scalable privacypreserving machine learning. In Proc. of Security and Privacy (SP), pp. 19–38. Cited by: §1, §2.3.
 (2019) XONN: xnorbased oblivious deep neural network inference.. IACR Cryptology ePrint Archive 2019, pp. 171. Cited by: §5.1, §5.3.
 (2018) Deepsecure: scalable provablysecure deep learning. In Proc. of DAC, pp. 1–6. Cited by: §1, §2.3.
 (201906) Microsoft SEAL (release 3.3). Note: \urlhttps://github.com/Microsoft/SEALMicrosoft Research, Redmond, WA. Cited by: §5.1.
 (2017) Membership inference attacks against machine learning models. In Proc. of Security and Privacy (SP), pp. 3–18. Cited by: §1.
 (2010) Fully homomorphic encryption with relatively small key and ciphertext sizes. In International Workshop on Public Key Cryptography, pp. 420–443. Cited by: §4.1.1.
 (20170828)(Website) External Links: cs.LG/1708.07747 Cited by: §5.1.
 (1982) Protocols for secure computations. In Foundations of Computer Science, 1982. SFCS’08. 23rd Annual Symposium on, pp. 160–164. Cited by: §2.1, §4.2.2.
 (2016) Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578. Cited by: §2.4, §2.4, §2.4.
 (2017) Learning transferable architectures for scalable image recognition. arXiv preprint arXiv:1707.07012 2 (6). Cited by: §2.4.