A Flexible Privacy-preserving Frameworkfor Singular Value Decomposition underInternet of Things Environment

A Flexible Privacy-preserving Framework
for Singular Value Decomposition under
Internet of Things Environment

Shuo Chen Nanyang Technological University, Singapore, Singapore
, 11email: zhangj@ntu.edu.sg
chen1087@e.ntu.edu.sg
   Rongxing Lu Faculty of Computer Science, University of New Brunswick, Canada
rxlu@ieee.org
   Jie Zhang Nanyang Technological University, Singapore, Singapore
, 11email: zhangj@ntu.edu.sg
chen1087@e.ntu.edu.sg
Abstract

The singular value decomposition (SVD) is a widely used matrix factorization tool which underlies plenty of useful applications, e.g. recommendation system, abnormal detection and data compression. Under the environment of emerging Internet of Things (IoT), there would be an increasing demand for data analysis to better human’s lives and create new economic growth points. Moreover, due to the large scope of IoT, most of the data analysis work should be done in the network edge, i.e. handled by fog computing. However, the devices which provide fog computing may not be trustable while the data privacy is often the significant concern of the IoT application users. Thus, when performing SVD for data analysis purpose, the privacy of user data should be preserved. Based on the above reasons, in this paper, we propose a privacy-preserving fog computing framework for SVD computation. The security and performance analysis shows the practicability of the proposed framework. Furthermore, since different applications may utilize the result of SVD operation in different ways, three applications with different objectives are introduced to show how the framework could flexibly achieve the purposes of different applications, which indicates the flexibility of the design.

1 Introduction

With the prosperous development of communication and computation technologies, the Internet of Things (IoT), which allows the physical objects to be sensed, accessed and controlled remotely through the network infrastructure, is no longer a fantasy nowadays. The big advantage of IoT is that with the data analysis on the huge amount of information collected from the physical world, the server is capable of making more accurate and optimal decisions which would produce considerable benefits. It is estimated that the global IoT market will be 14.4 trillion dollars by 2022 [1], the potential economic impact of IoT would be 3.9 to 11.1 trillion dollars by 2025 [2], and the number of devices connected to the Internet would be about 50 billion by 2020 [3]. It could be expected that rather than being conducted on the cloud or inside the intranet of companies, the data analysis work would be performed everywhere and anytime in the future due to the ubiquitous IoT. As the amount of data analysis tasks increases in IoT, the singular value decomposition (SVD), which is widely used in different data analysis applications [4][5][6][7][8], will be performed frequently. However, the traditional way of performing SVD, i.e. calculating the SVD in central server, may not be practical in future IoT due to the huge amount of IoT devices. If all the data is transmitted to a central server for computation, it would lead to considerable computation and communication resource consumption of the server, which would further severly impact the quality of service (QoS) of IoT applications.

To ease the burden of the IoT server and guarantee the QoS, a new technique called fog computing, which is proposed by Cisco [9], is suitable to be applied. The main idea of fog computing is to provide storage, computing and networking services between environmental devices and the central server. The fog devices which are in close proximity to end devices normally possess considerable storage and computation resource. With the equipped resource, the fog devices could process the collected data locally so as to loose the workload of the server. In specific, there are three tiers in the fog computing architecture: environmental tier, edge tier and central tier. In the environmental tier, there are billions of heterogeneous IoT devices collecting and uploading information of the physical world, e.g. medical sensors in eHealth and mobile phone of each people. The data collected by IoT devices will be transmitted to the edge tier. The distribution of fog devices in the edge tier are hierarchical which is a characteristic inherited from the traditional network architecture. For example, the switchers of a local area network could function as the first layer fog devices and the gateways which manage those switchers could serve as the second layer fog devices. The fog devices in the edge tier could perform the application-specific operations on received data locally and send the results to the server in the central tier. Owing to the processing of fog devices, the volume of data sent to server could be reduced to a large extent. Since the fog devices are spread in a highly distributed environment, it is impractical for the government or an institution which owns the central server to provide and maintain all those fog devices. Therefore, it is reasonable to assume that the fog devices would be supplied by third parties.

Under the context of fog computing, one could perform the SVD operations on the fog devices. However, another problem which would appear is the privacy issue. The third parties which control the fog devices may not be trustworthy while in many IoT applications, the data collected from the environment is considered as private by the IoT application users, e.g. the vital signs in eHealth, the location and speed of vehicles, and the power usage in a smart grid. Performing the SVD on plaintext with fog devices is infeasible if the privacy is a primary concern from the perspective of data owners. Therefore, how to take advantage of fog computing to locally process data in a privacy-preserving way is a challenging issue.

In this paper, we propose a flexible fog computing framework for performing SVD with privacy preserved. The homomorphic encryption technique called Paillier encryption [10] is applied to protect the data privacy. The framework is designed to be capable of supporting different applications based on the SVD computation. The main contributions of this paper are three-fold.

  • First, to perform data analysis for IoT applications, we propose a fog computing framework for SVD computation to ease the burden of server. Since the computation is performed in the fog devices which may not be trustable, the framework achieves the privacy-preserving SVD computation.

  • Second, there is only one communication round between the data providers and data processors in our work while most of the existing works require iterative communications, which brings heavy overhead.

  • Third, besides the framework for basic SVD operation, three applications are introduced in details to demonstrate the flexibility of the framework. It has been shown that the proposed framework could flexibly adapt to different applications with slight adjustments or several extra procedures.

The reminder of this paper is organized as follow. In Section 2, the preliminaries of our scheme are introduced. The system model, security requirements and design goals are described in Section 3. In Section 4, the proposed framework is presented in details. The security analysis and performance evaluation are discussed in Section 5 and 6. Three applications based on the proposed privacy-preserving SVD framework are illustrated in Section 7. In Section 8, we discuss the related work, and finally conclude our current work in Section 9.

2 Preliminaries

In this section, the Paillier Cryptosystem [10] and Singular Value Decomposition [11] which are the basis of the proposed framework are reviewed.

2.1 Paillier Cryptosystem

The Paillier Cryptosystem enables the addition and multiplication operations on plaintext through the specific linear algebraic manipulation conducted on the ciphertext. This property is extensively desired in many privacy-preserving applications [12, 13, 14]. In this paper, this feature allows fog devices to process the user data in encrypted form without leaking the data content. The Paillier Cryptosystem comprises three phases: key generation, encryption and decryption.

  • Key Generation: Given one security parameter , generate two large prime numbers , , where . Then compute the modulus , and choose a generator . Define the function and calculate . Then PK is published as the public key and SK is kept as the corresponding private key.

  • Encryption: Given a message , randomly choose a number , the ciphertext could be calculated as .

  • Decryption: Given the ciphertext , the corresponding plaintext could be recovered as . Note that, the Paillier Cryptosystem is provably secure against chosen plaintext attack, and the correctness and security can be referred to [10].

The homomorphic property of the Paillier Cryptosystem utilized in this work is: .

2.2 Singular Value Decomposition

SVD is a powerful and popular matrix factorization tool that underlies plenty of useful applications, e.g. abnormal detection [4, 5], recommendation system [6][7] and data compression [8]. Let be a matrix, the SVD of is of the form where means conjugate transpose, is a unitary matrix, is a rectangular diagonal matrix with non-negative diagonal values, and is an unitary matrix. The non-negative diagonal entries of are the singular values of matrix while the columns of and are known as the left-singular vectors and right-singular vectors of . Note that we only consider real matrix entries in this paper, thus the conjugate transpose could be simply regarded as transpose.

Another widely used matrix factorization tool is the eigenvalue decomposition. Unlike SVD which could be applied to any matrix, the eigenvalue decomposition is less general and could only be performed on square matrix. However, the two kinds of tool are closely related shown as follow:

(1)

It is obvious that the left-singular vectors is the eigenvectors of , the right-singular vectors is the eigenvectors of and the singular values are the square root of the eigenvalues of and . We will show that the above relation could be utilized to achieve the privacy-preserving SVD in the later sections.

3 System Model, Security Requirements and Design Goals

In this section, we describe the system model, discuss the security requirements and identify the design goals on privacy-preserving SVD.

3.1 System Model

In this work, we mainly focus on how to utilize the fog computing to compute the SVD of the data uploaded by environmental devices with privacy preserved. Specifically, there are four categories of entity in the system model, namely server, first layer fog device, second layer fog device and environmental device as shown in Fig. 1.

Figure 1: System Model

Server: server is a fully trustable entity located in the central tier. It is responsible for initializing the whole system and distributing key materials to others. The other operations the server may conduct are application-specific. Serval examples will be given in Section 7.

Environmental Device (ED): EDs are the devices distributed in the environmental layer of IoT environment. The analysis on the data uploaded by EDs could enable better decision-making.

First Layer Fog Device (FD): FDs are the fog devices which communicate with EDs directly. FDs process the collected data and upload the results to the second layer fog devices.

Second Layer Fog Device (SD): SDs are the fog devices which communicate with FDs. Compared to FDs, SDs are closer to the server and do not contact with EDs directly. In the proposed framework, there are three SDs playing different roles for SVD operation. One of them is responsible for decrypting the messages from FDs. The other two are in charge of decomposing and . We denote the one for decryption, decomposing and decomposing as SD, SD and SD respectively.

The basic workflow of our model is: the server first initializes the whole system and distributes the key materials or secrets to other entities accordingly. After initialization, EDs start to collect and upload application-specific data as required. Each data transmitted to FDs is encrypted. After FDs gather the data from EDs, FDs will randomize the collected data and send the results to SD. Upon decrypting the encrypted data from FDs, SD will perform further operations on the plaintext and send the outcomes to SD and SD respectively. With the messages from SD, SD and SD could recover and , and conduct the eigenvalue decomposition accordingly. Finally, the SVD of data collection is split into two parts which are held by SD and SD seperately.

3.2 Security Requirements

Security is fundamental for the effectiveness of proposed framework. In this work, the server and EDs are assumed to be trustable. The fog devices, i.e. FDs and SDs, are assumed to be honest-but-curious [15, 16] which means they will follow the specified procedures faithfully while being curious about the uploaded data. In addition, FDs and SDs are assumed not to collude with each other. The non-collusion assumption could be realized similarly as the EigenTrust scheme [17]. Briefly speaking, for each SVD computation, the server chooses fog devices based on distributed hash table. Due to the large number of fog devices, it is infeasible for the device providers to determine whether they would be selected for the same computation and negotiate for collusion in advance. Based on the above assumptions, the confidentiality as the security requirement should be fulfilled, i.e. even FDs and SDs process the collected data, they could not learn anything about the actual value of data. For authenticity and integrity, since there are many existing signature schemes for them, e.g. Boneh-Lynn-Shacham (BLS) short signature [18], this work focuses on confidentiality.

3.3 Design Goals

According to the aforementioned system model and security requirements, the goal is to design a flexible fog computing framework for privacy-preserving SVD computation. Specifically, the proposed framework should achieve the following objectives.

  • The confidentiality should be guaranteed in the proposed framework. All the user data contained in the transmitted messages should be protected. The processing in fog devices should not leak data privacy.

  • The framework should be flexible enough to be adopted by different applications. Instead of being the ultimate goal, SVD is the basis or initial step of many applications, which means the further procedures after SVD could be quite different for various scenarios. Therefore, the design of the framework should consider the flexibility such that the results of SVD could be further utilized to achieve the final purposes of different applications.

4 The Proposed Framework

In this section, the proposed framework for SVD computation is presented in details. The framework is composed of five phases: system initialization, data collection, data randomization, pre-computation and eigenvalue decomposition.

4.1 System Initialization

The server is the trustable entity which bootstraps the whole system. Assume the amount of users supported by the system is , each user data is -dimensional and the range for each dimension value is [, ] where is a constant. The system parameters are , , and . Let denote the bit length of . Given the parameter , the server calculates the public key for Paillier Cryptosystem PK: , and the corresponding private key SK: , where , are two large primes with . Given the parameters , and , let , the server randomly chooses two coprime integers and such that and , where and . Then, the server chooses one superincreasing sequence such that for and . Finally, the server publishes as public parameters, sends to SD as secret, and sends to FDs, SD and SD as secret respectively.

4.2 Data Collection

In the environmental tier, EDs collect and upload the application-specific -dimensional data . The data from EDs could form the data matrix .

To compute the SVD of matrix with privacy preserved is the goal of this work. The th column of matrix is the data from the th device ED. To upload the data, ED performs the following steps:

  • Step-1. Utilize the superincreasing to compute

    (2)
  • Step-2. Choose a random number and compute

    (3)
  • Step-3. Send the encrypted data to the FD which communicates with it.

4.3 Data Randomization

For each FD, it will perform the following steps to randomize the received data.

  • Step-1. For the th data , FD chooses random numbers which are and from the range . Then FD computes .

  • Step-2. FD randomizes as

    (4)
  • Step-3. FD sends the randomized data to SD.

4.4 Pre-computation

Upon receiving data from FDs, SD will perform the following steps to compute the randomized and .

  • Step-1. For each , SD will utilize the secret key to decrypt it and get the aggregation of randomized data

    (5)
  • Step-2. Through running the Algorithm 1, SD could recover the randomized value for each dimension of data .

    1:procedure Recover randomized value of each dimension Input: and
    2:Output: Randomized data
    3:      Let
    4:      Set
    5:      for  to  do
    6:            
    7:            
    8:      end for
    9:      
    10:      return
    11:end procedure
    Algorithm 1 Recover randomized value from aggregated data
  • Step-3. From each , SD could get an -dimensional randomized data. In total, SD could get the randomized data matrix

    The th entry of is . Then SD simply computes and , and sends the two resulting matrices to SD and SD respectively.

The correctness of Algorithm 1. In Algorithm 1, . Since the data value for each dimension is in the range of [, ], and and are chosen from , we have

(6)

Therefore, , and

(7)

Similarly, it can be proved that , for .

4.5 Eigenvalue Decomposition

4.5.1 Sd:

When receiving , SD will perform the following steps to compute the left part of the SVD for matrix , i.e. matrix and .

  • Step-1. For each entry of , SD derandomizes the entry as follow:

    (8)

    The result is the corresponding entry of matrix .

  • Step-2. After SD recovers the matrix , it performs eigenvalue decomposition for matrix and gets the matrix and .

4.5.2 Sd:

Similar as SD, SD will derandomize the entries of received to recover the matrix . Then SD performs the eigenvalue decomposition on to get the right part of ’s SVD, i.e. matrix and .

By now, the SVD of has been seperately held by SD and SD.

The correctness of derandomization The th entry of is implicitly formed as

(9)

Since

we have . Also, we have . Thus, which is the th entry of matrix .

Similarly, for the th entry of matrix , we have which is the th entry of matrix .

5 Security Analysis

In this section, the security properties of the proposed framework are analysed. As mentioned in the security requirements of Section 3, the participants of the framework will faithfully follow the defined work procedures while being curious about the user data. Thus, we first analyze the ability of each participant to learn private data under normal operations, i.e. the probability of leaking privacy when following legal processes. Then, the possible extra operations, denoted as potential attacks, which could be conducted by certain participants to snoop data are analyzed. The resistance of the framework against those attacks is discussed and the principles for system configuration are demonstrated.

5.1 Privacy Leakage Probability under Normal Operations

5.1.1 EDs:

In the proposed framework, each ED will encrypt its data with Paillier Cryptosystem before uploading. Thus, each ED could only know the plaintext of its own message and learn nothing about the other EDs’ data.

5.1.2 FDs:

The messages collected by the FDs are all Paillier ciphertext. Since the FDs do not have the private key for decrypting messages and it is assumed that there is no collusion among fog devices, the user data privacy is protected by the Paillier Cryptosystem no matter what operations are performed on the ciphertext by FDs.

5.1.3 Sd:

SD is the only fog device which has the private key for decryption. However, the plaintexts SD could get are the randomized data, i.e. for and . Since SD has no idea about the value of and , it could not gain any knowledge of the original data.

5.1.4 Sd:

Through derandomizing , SD obtains and further gets and . Since , SD needs to find the correct unitary matrix to determine . Since there are infinite unitary matrices, SD could not recover original data matrix with only and .

5.1.5 Sd:

Similar as SD, SD could not recover data matrix with only and .

Based on the above analysis, the data privacy is totally preserved in the proposed framework when the participants follow the defined procedures. In the following, the possible extra computations performed by participants to discover private data are considered.

5.2 Potential Attacks

For EDs and FDs, since they only have encrypted user data, they could not get any meaningful information no matter what operations they perform on the ciphertext. Therefore, we mainly discuss the potential attacks from SD, SD and SD in this part.

SD: As mentioned above, the information SD gets is the randomized data for and . What SD needs to do is to find the value of and and recover the original data as

(10)

Since is mixed with the random combination of and , it is infeasible for SD to determine and without additional information. Therefore, we consider the situations in which SD knows some of the user data. With the knowledge of user data, the possible operations SD could do are as follows:

  • Step-1. For each known data, SD converts the corresponding randomized data to the form by computing . Let denote the set of converted data and denote the th element of .

  • Step-2. SD performs the brute force attack, i.e. tries all possible . For each try, SD performs (modulo ) operation on each . Then SD computes the greatest common divisor (GCD) of the resulting set. If the GCD is larger than 1, it is the value of and the currently selected is the correct .

The rationale behind this attack is: the probability of randomly chosen integers being coprime is , where is Riemann zeta function [19]. When is large, the probability that they are not coprime is negligible. Thus, after the modulo operations on , only when the chosen is correct, the elements of the resulting set are of the form and have a GCD larger than 1, which is .

Note that, in some cases, SD could still form a set, in which the elements are of the form with high probability, even it does not know any user data. For example, if the data matrix is sparse, most of the randomized data is already of the desired form. Another case is that data range is not large enough compared to the amount of data, SD could compute for all possible pairwise combinations and some of the resulting s will be of the desired form. For those cases, SD could perform the brute force attack.

Parameter Selection. To resist the brute force attack in the possible cases, , i.e. , should be at least equal to 80. Moreover, SD could compute for all possible combinations . If certain combinations have the same inside, those resulting s would be of the form . SD could learn efficiently by computing the GCD of those s even when . To avoid the case, the randomly chosen should be different with each other with high probability. The is chosen from the range , and the total number of is . According to the generalized birthday problem [20], the probability of at least two chosen match is . Thus, the probability of no match is and the parameter which determines could be selected accordingly. Note that if FDs could cooperatively choose the set of such that there is no match, then the range only needs to be larger than .

SD: SD could get , and :

where is the rank of . The purpose of SD is to find the matrix :

Note that the first elements of each row in correspond to a column of . If SD knows the left columns of , it could recover the original data. Since there are infinite unitary matrices, SD could not determine the correct if it has no additional information. Thus, we assume that SD could get original data in some cases. Note that using one data, i.e. one column of , could form equations for the same row of . Solving the equations from one user data, SD could determine the first elements of that row.

When , since each row of is linearly independent with each other, obtaining one row does not help to learn the other rows. Thus, SD could not utilize the known data to learn the rest unknown data.

When , SD could determine the first elements of rows of , then the first elements of the last unknown row could be determined due to . The last unknown user data could be recovered accordingly.

SD: Similar as SD, the matrix which could be utilized by SD is :

where is the rank of . The purpose of SD is to find the matrix :

Different from the case of SD, the first elements of each row in correspond to a row of , i.e. the data value of a certain dimension from all users. If SD knows the left columns of , it could recover the original data. Similarly, we assume that SD could get linearly independent user data in some cases. We have ”linearly independent” here because linearly dependent data would not produce new linearly independent equations. Thus, only the number of linearly independent data matters. Each user data , i.e. one column of , could form one equation for each row of :

(11)

When , if , SD could form linearly independent equations for each row of . Then through solving equations, SD could determine the left columns of and thus recover the whole which contains the other unknown user data. On the other hand, if , SD could not recover the other unknown linearly independent data due to the lack of enough linearly independent equations. However, for the data which is linearly dependent with the known data, SD could recover them because the columns of have the same linear relationship as those existing among user data.

When , there is an additional condition for solving the equations of , i.e. is a unitary matrix. Specifically, the rows of could be regarded as the coordinate axis of -dimensional space whose rotation degree of freedom is . For each linearly independent data known to SD, the rotation degree of freedom of the coordinate axis reduces by 1. Therefore, if , the rotation degree of freedom of the coordinate axis reduces to 0, i.e. the coordinate axis is fixed. Moreover, since , each row of could be seen as a point locating on the unit sphere of -dimension. Thus, the intersection points between the fixed coordinate axis and the -dimensional unit sphere is the solution of .

Based on the above analysis, the proposed framework could resist the potential attacks launched by SD through properly choosing and . For SD, only when user data is obtained, it could learn the value of the last unknown data. For SD, if , the framework could resist not more than linearly independent user data leakage and it could resist not more than linearly independent user data leakage if .

6 Performance Evaluation

In this section, we evaluate the performance of the proposed fog computing framework in terms of the capacity and efficiency. The capacity demonstrates the number of required ciphertexts for different matrix sizes while the efficiency indicates the computational complexity and communication overhead of the framework.

6.1 Capacity

In the proposed framework, the aggregated randomized data is of the form . To guarantee the aggregated data could be recovered correctly through decryption, should be less than , i.e. the constraint must be fulfilled. At the same time, the superincreasing sequence needs to meet the constraint: for . Moreover, in order to successfully derandomize the data, and need to fulfill: and . To resist the potential attacks from SD, the value of should be chosen based on and , and should not be less than 80.

Let and denote the bit length of , and respectively. To resist the attacks from SD, should be larger than . For simplicity, assume that FDs could cooperatively select the such that no match happens. Then is enough. To meet , we have . Then due to , . For the superincreasing sequence , and . It is easy to find that and . Thus, the bit length of aggregated data:

It is obvious that the data dimension has a great influence on the aggregated data length. Given different and , the number of users which one ciphertext with could support is evaluated as shown in Fig. 2(a).

(a) = 1024
(b) = 10
(c) = 150
Figure 2: Capacity of the proposed framework

From Fig. 2(a), it could be seen that the increase of dimensionality could dramatically decrease the number of users which one ciphertext could support, while the impact of data range is not that significant. One ciphertext could support large number of users with low dimensional data, e.g. users with -dimensional data and users with -dimensional data. To support higher dimensional data for the same amount of users, each ED needs to use multiple ciphertexts to aggregate data, e.g. to support users with 16-dimensional data needs 2 ciphertexts each of which aggregates 8 dimensions. Given different and , the number of required ciphertexts with is evaluated in Fig. 2(b) and Fig. 2(c) respectively. It could be seen that each ED needs to use ciphertexts for uploading data.

6.2 Efficiency

As analyzed above, each ED may need more than one Paillier ciphertext for aggregating user data. Let denote the number of required ciphertexts for each ED. In the following, the computational complexity and communication overhead of the proposed framework are analyzed.

Computational Complexity: Since the crypto-operations are much heavier than the computations on plaintext, the amount of crypto-operations is the main concern in this part. The overall crypto-operations performed by each entity in the procedures of proposed framework are shown as follow.

ED: each encryption needs exponentiation and multiplication in . The overall crypto-operations conducted by an ED is exponentiation and multiplication in .

FD: each randomization needs exponentiation and multiplication in . Assuming the number of EDs communicating with the FD is , the overall crypto-operations performed by a FD is exponentiation and multiplication in respectively.

SD: to decrypt one ciphertext, SD needs to perform exponentiation operation in . After decryption, the other operations are conducted on plaintexts and those cost is negligible compared to decryption. The overall crypto-operations conducted by the SD is exponentiation in .

Note that SD and SD only perform computations on plaintext, and server is only in charge of system initialization. Therefore, their computation cost is negligible compared to the other entities.

Since the fog computing platform in current stage possesses the resource comparable to that of a smart phone, we have implemented the Paillier Cryptosystem on an Android mobile phone. The model number of the phone is Huawei Honor 3C (H30-U10) with the system parameters as: ARM Cortex-A7 4-core CPU @1.3GHz, 2GB memory and 4.2.2 Android version. When = 1024, the average running time (1000 iterations) for the exponentiation in is 55.493 milliseconds, the time for the multiplication in is 0.201 milliseconds and the time for the multiplication in is 0.101 milliseconds. It is obvious that the cost of multiplication is negligible compared to that of exponentiation. The computational cost for different entities is as shown in Table 1.

Entity Computational Cost (milliseconds)
ED
FDs
SD
Table 1: Computational Cost of the Proposed Framework

Another notable thing is that the evaluation here implicitly assumes the IoT environment devices are as powerful as the smart phone. This is true for some IoT applications which use mobile phones or vehicles to upload environmental information. However, for the applications utilizing low power sensors as EDs, the Paillier operations are still too heavy. To circumvent this issue, the sensor may transmit its data to nearby more powerful device for conducting the crypto-operations. For example, the wristband could connect with the mobile phone for processing and uploading data.

Communication Overhead: In this part, the communication overhead during SVD computation is evaluated. Note that for Paillier Cryptosystem, the ciphertext space is . Thus, the bit length of one ciphertext is . The overhead of each communication flow is as shown in Table 2.

Communication Flow Bit Length of Message
ED FD
FDs SD
SD SD
SD SD
Table 2: Communication Overhead of the Proposed Framework

For the ”ED FD” communication flow, each ED transits total ciphertexts to its corresponding FD, so the communication overhead is . For the ”FDs SD” communication flow, the overall ciphertexts SD gathers from all the FDs is , so the communication overhead is . For the ”SD SD” communication flow, SD sends to SD. Since the bit length of each entry in is and there are total entries, the communication overhead is . Similarly, the communication overhead for the ”SD SD” communication flow is .

7 Applications

In this section, we describe the potential IoT applications which could utilize the proposed framework. Basically, the proposed framework could be applied if the application possesses the following characteristics: 1) the application collects the environmental information for data analysis; 2) the data analysis is based on SVD; 3) the number of data analysis tasks is huge; 4) the environmental information is considered as privacy by the application users. Actually, the last two characteristics are the motivation of this work. The large amount of data analysis tasks motivates us to analyze data on fog computing platform. The privacy concern requires the analysis being privacy-preserving. Moreover, since different applications may utilize the result of SVD operation in different ways, we discuss how to adopt the proposed framework to achieve the purposes of different applications. In the following, three applications with different objectives are given as example to show that we only need to add several additional procedures or make some slight adjustments, the framework could be adapted to specific applications, which means the proposed framework is flexible.

7.1 Abnormal Detection

Since SVD could find the singular values of data matrix which reflect the features of data, it has proposed to apply SVD or the closely related technique–principal component analysis into abnormal detection [4][5]. The basic idea is quite straightforward: since the abnormal data would cause higher variation on the first eigenvector than the normal data, the variation degree of the first eigenvector could be utilized as the indicator of abnormal. When new data comes in, the system could perform SVD on the data matrix and check how much the direction of the first eigenvector changes. Note that to get the correct first eigenvector, it is usually required to normalize the data before performing SVD. However, the currently proposed framework is the general form which does not contain the normalization procedure. Thus, in the below, we show how to adjust the framework to accomplish that goal. Note that the eigenvector used for detection is from matrix , thus we only need to make adjustments on the procedures which are related to computing .

To include the normalization procedure, the and need to be chosen such that and . Then, it needs to add some extra steps in the pre-computation operation of SD and derandomization operation of SD. The extra steps are shown as follows.

  • Step-1. For each randomized data , SD performs

    (12)

    Then SD performs matrix multiplication as before and sends the resulting to SD.

  • Step-2. For each entry of , SD computes first. If , SD further computes . Then SD computes . If , SD further computes . At last, SD computes . We denote the resulting matrix as .

  • Step-3. Let denote the th row and th column entry of . SD computes the standard deviation of the th dimensional data as

    (13)

    Then for each entry , SD