Outsource Photo Sharing and Searching for Mobile Devices With Privacy Protection
With the proliferation of mobile devices, cloud-based photo sharing and searching services are becoming common due to the mobile devices’ resource constrains. Meanwhile, there is also increasing concern about privacy in photos. In this work, we present a framework SouTu, which enables cloud servers to provide privacy-preserving photo sharing and search as a service to mobile device users. Privacy-seeking users can share their photos via our framework to allow only their authorized friends to browse and search their photos using resource-bounded mobile devices. This is achieved by our carefully designed architecture and novel outsourced privacy-preserving computation protocols, through which no information about the outsourced photos or even the search contents (including the results) would be revealed to the cloud servers. Our framework is compatible with most of the existing image search technologies, and it requires few changes to the existing cloud systems. The evaluation of our prototype system with 31,772 real-life images shows the communication and computation efficiency of our system.
With the increasing population of smart personal devices (e.g., smartphone, tablet PC) as well as the emergence of wearable devices (e.g., Google Glass), huge amount of photos are produced everyday. The data volume of photos are growing exponentially due to the high-resolution on-board cameras, and this makes the photo management and sharing challenging to mobile devices. Facing such challenge, users often choose to outsource the burdensome image storage and searching to cloud servers such as Amazon Cloud Drive, Dropbox and some image-oriented cloud (Cloudinary). Various social networking systems (Flickr, Facebook, Google Plus etc.) also provide photo sharing services for personal uses.
To some extent, the above privacy concerns come from the fear that our photos might be illegally searched by a malicious hacker, and this is probably one of the primary reasons why users want to get rid of face recognition. However, the object recognition techniques could bring powerful ability to image search, and we believe simply disabling automatic recognition is not the best solution to the privacy problem because it also eliminates the potential utilities lying in the image search functionality. Ideally, privacy-sensitive users should have an option to use the secure version of photo sharing and searching system with little extra overhead, in which image search with object recognition is allowed for authorized users but the privacy leakages due to automatic recognition are prevented.
To achieve this vision, in this paper, we first design a framework SouTu, allowing mobile device users to enjoy photo sharing with fine-grained privacy protection policies, which can be provided by any cloud to attract privacy-seeking users. Via our framework, an owner can share his photos in a cloud without unintended access to his photos, and an authorized querier can send photo queries to conduct image search on others’ photos. To allow resource-bounded mobile devices to search a huge volume of photos, SouTu outsources the heaviest computation and storage tasks to the cloud, and only the private parts of photos are selectively protected to further reduce the computation and communication overhead. Despite such outsourcing, SouTu does not reveal the private image contents or the query contents (including its result) to the cloud. In the later section, to more aggressively enhance the performance, we further introduce the optimized where the computation overhead is reduced by half with only a little loss of accuracy.
SouTu can be considered as a step towards easily deployable frameworks for privacy-preserving photo sharing and searching services among mobile device users, taking advantage of the availability of cloud servers who possess powerful computation and storage abilities.
In summary, our contributions can be summarized as:
We propose a novel modularized photo sharing and searching framework to let mobile device users outsource photos and the majority of heavy jobs (storage, access control and searching) to cloud servers, without breaching users’ photo-related privacy.
We design two outsouced vector distance computation protocols for both real and binary vectors, which are the core of the framework. Different from the existing multi-party computation based methods, our protocols enable efficient vector distance computation in a non-interactive way, which means the photo owner does not have to interact with the cloud or the querier.
Our framework is compatible with common image services and feature based image search methods. To achieve nice user experience, all privacy protection modules work automatically and are transparent to users. After the privacy setting, users can enjoy the sharing and searching services as usual. We implement and evaluate our framework using 31,772 real-life images on both smartphones and laptops. The evaluation shows that very low extra overhead is incurred by our method.
2 Backgrounds and Motivation
One of our main contributions is enabling efficient photo sharing and searching on encrypted photos. In order to achieve high search accuracy, SouTu leverages the state-of-the-art image search technologies in computer vision field. Here, we briefly review the search techniques, and then discuss the privacy issues emerging from them in this section.
2.1 Descriptors Based Image Search
Images are usually searched by their contents. Different types of visual descriptors are proposed to model the visual characteristics of the image, e.g., color, intensity, texture or objects within the image. Various image contents can be recognized and localized (e.g., people and face) using visual descriptors. Among these, human face detection received extraordinary attention and is one of the most mature object detection techniques so far[5, 4].
The feature descriptor is usually constructed as a set of numeric vectors, denoted as feature vectors. There are some statistical feature vectors (e.g., intensity/color histograms). Also, many well designed visual descriptors are proposed to achieve accurate image search, e.g., SIFT  and SURF. In those works, each feature vector is generated from an interest point of the image to describe the visual characteristics around the point. Interest points are pixels containing distinguishing information of the image . In general, all feature vectors belonging to the same descriptor have the same dimension (e.g., SIFT has 128 dimensions and SUFR has 64 dimensions). The numeric type of vectors may be real number [1, 2] or binary [7, 8], and different types of vectors are used for different applications. Specifically, with a little accuracy loss, binary descriptors are usually more efficient in computation and suitable for resource-restricted mobile applications. We design our framework capable of dealing with both real number and binary feature vectors.
Given a query image, one needs following three steps to search the top- similar images from the database. Firstly, pre-defined image descriptor is extracted from the query image. Secondly, each feature vector in the query image is compared with feature vectors from the database images. Thirdly, similarity score for every database image is measured based on the vector comparison and finally the top- high-score images are returned to the querier.
2.2 Privacy Implications
Rich content of photos raises various privacy implications. There are many mature techniques to detect and recognize the objects within the photos as aforementioned. These techniques can possibly be used to automatically analyze the photos to mine sensitive information with various data mining techniques. Combining the location stamps and time stamps embedded in a photo, more sensitive information about the person may be derived (e.g., home location, occupation, level of incoming). Therefore, the private part (denoted as Region Of Privacy (ROP) hereafter) of a photo needs to be protected, so that no human or machine runnable algorithm can learn sensitive information in the photo.
Besides the outsourced photos, the query sent to the cloud side incurs privacy implications as well. Even though the uploaded photos are well protected via encryption so that the cloud does not gain useful information of them, their contents can be easily deduced if the queries’ contents and results are revealed to the cloud. Since the entire search process should be outsourced to the cloud for resource saving, protecting queries’ contents as well as the results is equally important to protecting the uploaded photos.
3 System Overview
SouTu is a novel framework allowing clouds to provide privacy preserving photo sharing and searching services to the mobile devices users.
It can attract users who need both outsourced photo service and privacy protection.
Figure 1 illustrates the architecture and workflow of our framework
3.1 Privacy Preserving Photo Storage
Before users upload their photos to cloud servers for sharing, the photos need to be pre-processed. Firstly, the region of privacy (ROP), which is a rectangle defined by two pixel-level coordinates (top-left and bottom-right) on the photo, is either automatically or manually determined. In the automatic manner, the user can define a category of objects as private content, e.g., faces. Then all private objects will be detected by object recognition algorithm and set as ROPs, e.g., the face in Fig. 2. Otherwise, the owner can also manually define the ROP by selecting a rectangle region on the photo. Then, the feature vectors of the ROP are extracted according to the definition of the image descriptor (Section 2.1). Note that, hereafter we use the human faces as example ROPs of photos in this work, but other objects such as pedestrians and cars, can also be defined as ROPs with corresponding recognition algorithms. Moreover, except defining ROP, all the following operations are conducted automatically by the system and transparent to users.
After the ROP is selected, it is separated into public part and secret part, where the public part doesn’t contain any sensitive information and the secret part is encrypted such that only the authorized users with keys can access to it and recover the original ROP. We review the following three different methods for the separation:
1. Mask: fills public part of ROP with solid black (all intensity values ‘0’) and takes the original ROP as secret part.
2. P3: separates ROP based on a threshold in the DCT frequency domain; sets the higher frequency part as secret part and the remaining as public part.
3. Blur: a normalization box filter is applied to ROP to generate the public part; subtracts the public part from ROP in a pixel-wise way to achieve the secret part.
Then, the public part of the whole photo is produced by replacing its ROP with the public part of ROP (as shown in Fig. 2). Our experiment (Section 6) shows that all three methods are resistant to automatic detection algorithms, but the blur based method outperforms others in the storage cost, hence we adopt the blur as the default separation method in SouTu. After extracting the private part from ROP, the owner encrypts the secret part as well as its image descriptor as a private bag, and uploads the private bag to the search cloud. Then, he also uploads the public part of the original photo as a public bag to the sharing cloud (Fig. a).
3.2 Fine-grained Photo Sharing
SouTu allows fine-grained photo sharing among users. The photo owner uses an access control scheme (e.g., [11, 12]) to encrypt the search keys so that only the authorized users with certain attributes can obtain search keys. As the Step 5 in Fig. a, the owner encrypts the search keys under the access rule that he defines, and the encrypted search keys are uploaded to the sharing cloud and made published. Obtaining the search key, the authorized user can generate valid photo queries and decrypt the private part of ROP. The completed original images can be recovered simply by merging the public parts of images and the private parts of ROPs. Here, all these operations are also automatic and transparent to users, and the authorized user can browse the shared images as usual.
3.3 Light-weight Photo Searching
When a querier wants to search a photo among someone else’s photos, he pre-processes the querying photo to achieve the corresponding image descriptor. Then, only if satisfying the owner’s access rule, he can retrieve the search keys to search on the owner’s photos, but it is the cloud who conducts the searching job and returns the result to the querier obliviously, i.e. without knowing contents of the owner’s ROPs or the contents of the query photo. After fetching the query result, as mentioned above, system generate the original image for the querier transparently. For the querier, the whole system appears like common image search systems. Fig. b illustrates the search procedure.
3.4 System Design Goals
Our system is designed to achieve efficiency, privacy protection and accuracy goals.
Efficiency: To overcome the resource limitation of mobile client, operations at the user side should be light-weight, and most of the expensive computations should be outsourced to the cloud side.
Privacy Preservation: Users outsource not only the storage of photos but also the searching to the cloud side in SouTu. Therefore, the framework is expected to protect users’ privacy in various aspects:
1. ROP Privacy: Unauthorized party should not learn secret part of ROP including cloud servers,
2. Query Privacy: Cloud servers should not learn query photos,
3. Result Privacy: Cloud servers should not learn search results, which are all non-trivial challenges since cloud servers are the party who conducts the searching jobs on the photos stored at his side.
Accuracy: Introducing the privacy protection mechanism should not bring much accuracy loss. That is, the search result from SouTu should be comparable with traditional image search technologies conducted on plain texts of photos.
3.5 Threat Model
W.l.o.g., we assume curious-but-honest cloud servers and malicious users in this work. Cloud servers will follow the protocol specification in general, but they will try their best to harvest any information about user’s photos. This is a justifiable assumption because deviating from the protocol and not returning a correct search result will lead to bad user experience as well as potential revenue loss of the service provider. However, they might conduct extra work to illegally harvest useful information from the protocol communications in order to infer the secret part of ROP or the contents of queries, which is sensitive information to be protected. On the other hand, queriers may misbehave throughout the protocol to infer the search keys to forge a valid photo query, where the search keys are supposed to be kept secret as well.
4 System Design
In this section, we first present the building blocks of our system, and then give the detail of our non-interactive private image search protocol, which is the core of the system and one of our main contributions.
4.1 Building Blocks of Our System
SouTu is a modularized and well integrated image sharing and searching system, which consists of several building blocks.
Image search is composed of three steps: image descriptor extraction, finding matching vector and similarity score calculation.
Matching Feature Vector. Given a feature vector and another descriptor , let be Euclidean distance between two feature vectors and . Then given the ’s nearest neighbor , and are a matching pair iff:
That is, iff the ratio between nearest distance and the second nearest distance is less than a threshold , and are a matching pair. For most object recognition algorithms, is set as .
Similarity Score. Given a querying descriptor and a queried descriptor , the similarity score between and are defined as the number of matching pair has, i.e.,
Given a querying image, matching images with high similarity score can be searched in a database.
Our system also takes advantage of rich cryptographic algorithms for privacy protection in cloud-based image search. It includes: homomorphic encryption, attribute based encryption and oblivious transfer.
Homomorphic Encryption. We employ Paillier’s cryptosystem  as a
building block which has the following
Note that the numeric type of feature vectors may be real number, but the Paillier’s cryptosystem is based on large integers, therefore we need to use integers to represent real numbers first. SouTu uses the fixed point representation to represent real numbers rather than floating-point representation due to its efficiency.
Ciphertext-Policy Attribute Based Encryption We also adopt ciphertext-policy attributed based encryption (CP-ABE)  for access control due to its generality and security. Other attribute based encryption methods can also be adopted, e.g., . In the CP-ABE, a trusted authority (not the image service provider) takes response of generating public parameters. Given the public parameters, a data owner can encrypt a message such that only the users satisfying a certain access rule can decrypt it. Secret keys of users contain attribute values for the key holders, and the access rule is expressed with boolean operators (AND, OR etc.) and attribute values. CP-ABE is proven to be IND-CCA1 secure, which implies the semantic security against chosen plaintext attack.
Oblivious Transfer The - oblivious transfer (OT)  let a receiver obtain any subset of items from the sender’s items, while the sender remains oblivious of the receiver’s selection, and the receiver remains oblivious of other items as well.
4.2 System Join
Whenever a new user joins the system, he generates a pair of Paillier Keys and picks a random vector , which has the same dimension of the feature vector. Then, he uses CP-ABE to encrypt under the access rule he wishes to enforce (i.e. who can search on his images). He uploads the following to the sharing cloud, which are the search keys to be used in the photo searching later.
4.3 Public & Private Bag Generation
When an owner wants to upload his photo , the ROP is selected either automatically or manually, and the image descriptor of ROP is extracted. is a set of fixed-dimension feature vectors . A photo may have several ROPs (several persons in the same photo), but w.l.o.g we consider only one ROP per image since multiple ROP is a simple extension. Then, the owner separates the ROP as public ROP and secret ROP as in Section 3.1, and the following public bag is uploaded to the sharing cloud:
After the public bag is uploaded, the owner encrypts the private part of ROP as the private bag using symmetric encryption such as AES-256. Also, for the cloud-based search, he homomorphically encrypts the feature descriptor, which are stored in the search bag (Protocol 1).
Then, the private bag and the search bag of are:
where and represent the sets and (Hadamard product between and each ) respectively. The private/search bag are uploaded to the sharing/search cloud respectively.
4.4 Cloud-based Image Search
When a querier wants to search an image among a specific owner’s images, he extracts corresponding image descriptor and obtains the owner’s search keys from the server. If he is authorized to search on the owner’s images, he will successfully decrypt the search keys and further proceed. Next, he encodes every single dimension of the feature vectors in the querying image as follows:
Consequently, the querier achieves two sets of encoded feature descriptors corresponding to . He then sends these two sets to the cloud server to outsource the image search. After receiving the encoded descriptors, the cloud conducts several homomorphic operations to achieve the encrypted pairwise distances between and for all with in the search cloud (Protocol 2). Then, he sends all the ciphertexts of results back to the querier.
Upon receiving the ciphertexts of pair-wise distances, the querier uses to decrypt every . Then, he finds the top-2 nearest distances to compute the similarity scores between feature descriptor and every according to Eq. 1.
4.5 Image Retrieval
Based on the similarity scores, the querier requests the public bags as well as the private bags of the top- similar images from the sharing cloud (e.g., by requesting the URLs). However, explicit request reveals the search result to the server. Even if every secret part of ROP is encrypted and the query contents are well protected, cloud may infer side information by gathering the statistics of the image retrieval (e.g., popular images and frequently visited images). Thus, we need to hide the retrieval pattern as well.
To achieve this requirement, we employ the - OT (Section 4). Since it is extremely expensive to construct a - OT with a large , we do not directly run a - OT across the whole database to obliviously retrieve images. Instead, we try to find a trade-off between privacy and performance as follows. The querier determines a random subset which contains the set of images that he wants to retrieve. The sizes of and are and respectively. Then, the querier and the sharing cloud engage in a - OT to let the querier obliviously select the images.
5 Security Analysis and Refinement
5.1 Security Analysis
Firstly, the secret part of ROP is well protected by the symmetric encryption, whose key is encrypted with CP-ABE proven to be semantically secure. Besides, the search keys are also protected by the CP-ABE. Therefore, clouds cannot infer sensitive information from its storage in SouTu.
Then, we design a game to prove that SouTu reveals no sensitive information to the cloud servers during the photo searching procedure theoretically. We omit the proof here due to space limitation, readers can refer to the appendix of () for the detail of proof.
Besides the adversarial cloud servers, we have also assumed malicious queriers in our adversarial model. However, unauthorized malicious users are not as threatening as cloud servers since they never get involved in any transaction with valid users. All they can do except compromising the server is to try man-in-the-middle attacks to sniff the search results, but this can be trivially prevented by introducing secure communication channel. Even if they compromised a server, CP-ABE guarantees the indistinguishability of the ciphertexts. In conclusion, malicious users do not learn about sensitive information either.
5.2 Refinements for Binary Descriptor
Some image retrieval systems use binary image feature descriptors because they are more compact and computationally manageable than real number ones, with a little accuracy loss in content recognition . It is more suitable for resource-limited applications. However, directly applying SouTu in mobile platforms with binary descriptors does not fully exploit the advantage of it. The exponentiation operations contribute to majority of the computation overhead in our cryptographic building blocks, but both image owners and queriers need exponentiations throughout the protocol where is the number of interest points in a image.
To relax this bottleneck, we further design our framework for the special case where binary descriptors are used (refer to the system as ), Note that for any two vectors , we have:
where is the -th bit of and is the bitwise XOR operator. Therefore, we consider using a succinct garbled circuit in combination with homomorphic encryption to achieve a light-weight and non-interactive framework dedicated to binary descriptor based search, which is one of our contributions.
Yao’s Garbled Circuit
To enhance the understanding, we briefly review Yao’s garbled circuit (GC), and we direct the readers to relevant literal works  for technical details. Yao’s Garbled Circuit is designed for two-party computation, where and wish to jointly compute a function over their private input and using a garbled boolean circuit. Here we use an XOR gate as an example.
Two random values , are chosen to represent the bit values 0 and 1 for each wire . Then, the shuffled Table 5.2.1 represents the garbled XOR gate (shuffled so that inputs are not inferred from the row number). Given two garbled inputs, the evaluator can obliviously evaluate the boolean gate by looking up the shuffled table and decrypting the output to get a garbled output.
Public & Private Bag Generation
To upload a photo , the owner extracts the ROP as well as the binary image descriptor and generates the public bag as in the original framework SouTu. After uploading the public bag to the sharing cloud, he symmetrically encrypts the private part of ROP, and keeps it as well as the key in the private bag. Then, he uses a collision-resistant hash function and the search keys to garble each bit as a garbled gate (Protocol 3), where denotes applying the hash function for times.
From the protocol, the feature vector is encrypted to a series of garbled gates (Fig. 4),
and the following are corresponding private bag and search bag of :
Cloud-based Image Search
To search a photo from other’s ones, the querier extracts corresponding descriptor and obtains the owner’s . If he successfully decrypts it, he further uses or to encode each -th bit as the garbled input to finally achieve the set of garbled inputs , which is uploaded to the cloud. The cloud server conducts homomorphic operations to achieve for all without interacting with the requester or the image owner (Protocol 4). Then, he sends the ciphertexts back to the querier, who proceeds as SouTu.
6 Implementation and Evaluation
6.1 Development Environment
We implemented both client side and cloud side of SouTu. The client side program is developed for Android smartphones and the commodity laptops for performance comparisons, and the cloud side program is developed only for the laptops. We used HTC G17 (1228Hz CPU, 1G RAM) and ThinkPad X1 (i7, 2.7GHz CPU, 4G RAM).
The CP-ABE is implemented based on the PBC library, and other building blocks (Section4) are implemented in Java, including the AES (128-bit), Paillier’s cryptosystem (512-bit primes ), - oblivious transfer and the fixed point operations. Based on these building blocks, we implemented the core protocols in both variants SouTu and . The automatic ROP detection is implemented with cascade object detection (e.g., face detection) . We employed widely used 64-dimensional SURF descriptor  and 128-dimensional SIFT descriptor  for the variant of real number descriptors (SouTu ), and 64 bit binary SURF and 128 bit binary SIFT for . Although our evaluation is conducted with these descriptors, our system is compatible with other vector-based descriptors too. Both the object detection and descriptor extraction are implemented using the image process library OpenCV for Window and Android. ROP separation (Mask, P3, and Blur) is also implemented with it.
6.2 Real-life Datasets
To measure the privacy protection and the cost of SouTu, we used the well-known Labelled Faces in the Wild (LFW) dataset , which consists of 30,281 real-life images collected from news photographs. We detect all human faces automatically and set those faces as ROPs of images, and feature vectors are extracted as their image descriptor . On average, ROP occupies less than 20% of each image for 80% images. We also used the INRIA Holidays dataset , which contains 1,491 high-resolution personal photos taken during their holidays (majority with resolution 2560px1920px). We set the entire image of the INRIA as the ROP.
6.3 Image Recognition on Public Part
ROPs are separated in three different methods (Mask, P3 and Blur) respectively. To evaluate the safety against the object detection algorithms, we ran face detection  and feature points detection  algorithms on the public part of ROPs. On average, there are faces in each original image in LFW, but only , and faces are detected in the public part of the ROPs generated by Mask, P3 and Blur respectively, and our manual examination shows that majority of the detections were false positives (e.g., some textures being detected as faces). Therefore, we conclude that almost no faces are detected in the public parts of images by algorithm. Also, no matched feature points are detected in the public parts of ROPs for both LFW and Holiday datasets as well. As a conclusion, all three methods provide good privacy protection against face/feature detection algorithms.
We also compare the computation cost and storage cost of three methods. Figure 5 illustrates the CDF of run time for processing each image with three methods. On average, Mask has the minimum computation cost with s per image in LFW Dataset, and s per image in Holiday Dataset; Blur needs s for LFW Dataset and s for Holiday Dataset; P3 needs s for LFW Dataset and s for Holiday Dataset. This result also confirms that protecting the entire image is much more expensive than protecting the sub-regions of the image. Figure 6 and 7 present the normalized storage cost of three methods for LFW and Holiday. The sizes of Blur-processed images are only of the original ones in Holiday dataset on average. In conclusion, Mask and Blur outperforms P3 in computation performance while Blur has the best storage performance, therefore SouTu uses Blur as the default method.
6.4 Search Accuracy
In SouTu, the search procedure follows exactly the same vector-based similarity comparison as typical image search technologies (e.g., ). Also, the accuracy loss introduced by the fixed point representation is almost negligible (less than in each value where is often 10 and is greater than 5), therefore SouTu provides a comparable accuracy as existing image search techniques.
6.5 Client Side Performance
For the photo owner, the computation overhead mainly comes from the following operations: (1) object detection and descriptor extraction; (2) ROP separation by Blur; (3) symmetric encryption of secret part; (3) descriptor encryption, which are all in the public & private bag generation. For the querier, the expensive operations include: (1) descriptor extraction; (2) descriptor encoding; (3) distance results decryption; (4) similarity calculation, which are all in the cloud-based photo searching. The time cost for other operations, e.g., fix point presentation conversion, are negligible. The cost for search key encryption and decryption by CP-ABE can also be ignored, since this is a one-time operation for each user which are sub-second.
As microbenchmark tests for each procedure (Table 2), Table a shows that protecting subregions (e.g., faces) of a image only takes the owner s to extract the descriptor and separate the ROP, while protecting the whole image takes s. Table b presents the computation overhead of main procedures in SouTu and .
Public & Private Bag Generation Binary feature vector reduces the owner’s computation overhead by half to s per feature vector. In a typical scenario in LFW dataset, there are only 9 feature vectors for each face. if we use 64 dimensional SURF descriptor, it takes s on laptops and s on smartphones to generate the public bag, private bag and search bag. When we use binary descriptor , the cost is reduced to s on laptops and s on smartphones, which is a significant reduction.
Cloud-based Photo Searching It takes a querier roughly 1s to encode the querying descriptor in SouTu. The run time becomes negligible in , and this is especially desirable for mobile devices. After the querier obtains the search result, it takes s on laptops and s on smartphones to decrypt each encrypted distance in both variants. In the LFW dataset, if a querier searches a photo among 1,000 photos, it takes s to process the search result on laptops and s on the smartphones on average. It is slightly beyond acceptable if owners have hundreds of photos on average. However, this non-negligible extra overhead comes from the linear search in all photos of an owner with a linear complexity, and it is promising and not trivial to reduce the complexity with existing optimized search mechanisms such as k-d tree . Thus, the scalability can be achieved using those search algorithms.
The communication overhead for the image owner mainly comes from uploading public bag and private bag to the cloud. When using Blur to separate ROPs, as presented in Figure 6 and 7, the size of the public part is of the original image in LFW Dataset and only in Holiday Dataset. The size of the secret part is in LFW Dataset and in Holiday Dataset. The average size of encrypted descriptor is 72KB per image for both variants, and this can be further reduced to 690B per image when using a common lossless compression, e.g., ZIP. As a summary, for the LFW Dataset, the extra communication cost brought by SouTu or is roughly of that for system without privacy consideration. But for the Holiday Dataset, our method actually save the communication cost by .
The communication overhead for uploading the encoded feature descriptors (query) is approximately 36 KB in SouTu and 9 KB in , which are reduced to 350B and 90B respectively after compression. The communication overhead for downloading the similarity result is 128B for each compared image in the database, and the one for downloading each image is similar to the uploading overhead of the image owner. Note that, to achieve - oblivious transfer, the querier needs to download extra images from the search server to hide the search pattern, where can be specified according to the trade-off between privacy and performance.
6.6 Cloud Side Performance
On the clouds, similarly, the image storage and communication is more for LFW Dataset and less for Holiday Dataset. The main computation overhead is from the distance computation. We evaluated the search performance on laptops, so the actual performance when deployed in more powerful cloud servers will be significantly improved. Our privacy-preserving distance protocols take nearly s to calculate the distance between two real number feature vectors (SouTu) and only s for binary feature vectors (). For well studied objects like faces (9 feature vectors in a descriptor), and for each owner, there are usually hundreds of images on the cloud. The computation time for a laptop to process one request is less than one minute. When there are large-scale complicated images whose ROPs may contain random objects other than faces, the optional optimization methods introduced may be introduced to reduce the query response time.
7 Related work
Image Privacy Protection A set of solutions are proposed to mask sensitive contents of images, e.g., human faces, to prevent any potential breach of owners’ privacy, e.g.,  and . P3  proposes to separate an image into a private part and a public part and simply encrypted the private part. But the produced public parts of those works are of limited utility and disable search on them. There are some literal works providing privacy-preserving face recognition in a face photos database. Those methods provide privacy protection to the requested images as well as the outcome, but the result is not secure against photo service provider and those works do not consider personal photo storage and sharing. Supporting privacy-preserving image search with untrusted server is still an open problem.
Privacy Preserving Cloud Services Many research efforts have been devoted to provide secure cloud-based storage, sharing and searching services to users. Those privacy preserving outsourced storage and sharing systems, e.g., and , provide well access control to private data, but cannot support search on encrypted data. Searchable encryption is proposed to enable secure search over encrypted data via keywords. But the existing approaches, e.g., [28, 29, 30, 31], are focus on keywords search by examining the occurrences of the searched terms (or words). They are not suitable for content-based image search since they cannot measure the distance between encrypted feature vectors.
Privacy-preserving Euclidean Distance Euclidean distance can be computed privately among parties using secure multi-party computation (SMC) methods . However, it requires online interaction between the image owner and queriers, and is unsuitable for the cloud based image service, where the owners are not guaranteed to stay online.  proposes an approach using Fourier-related transforms to hide accurate sensitive data and to approximately preserve Euclidean distances among them. It works well for some data mining purposes on common datasets, but for feature vectors the distances still reveal information of the objects in images.
We present a framework SouTu, which enables cloud servers to provide privacy-preserving photo sharing and searching service to mobile device users who intend to outsource photo management while protecting their privacy in photos. Our framework not only protects the outsourced photos so that no unauthorized users can access them, but also enables users to encode their image search so that the search can also be outsourced to an untrusted cloud server obliviously without leakage on the query contents or results. Our analysis shows the security of the framework, and the implementation shows a small storage overhead and communication overhead for both mobile clients and cloud servers.
Appendix A Security Proof
Firstly, we prove by the following game that SouTu is secure against adversarial cloud servers who are not authorized for image search.
Initialize: System is initialized, and relevant cryptosystems (Paillier’s cryptosystem, CP-ABE, OT etc.) are initialized by the challenger . publishes relevant public keys to the adversary .
Setup: generates/encrypts the search keys, and pre-processes/encrypts a set of images by the specification of SouTu such that cannot search on him. Then, he publishes the encrypted search keys and public/private/search bags to .
Phase 1: achieves polynomial number of encoded descriptors (encoded with ’s search keys) without knowing corresponding original descriptors.
Challenge: submits two images to . selects a bit uniformly at random, and generates two sets of encoded feature descriptor corresponding to (Section 4.4), which are given to .
Guess: gives a guess on .
The advantage of in this game is defined as
It is not hard to see this is an adversarial cloud server’s advantage, since the game is designed to ‘mimic’ a cloud server’s transaction.
Any probabilistic polynomial time adversary (PPTA) has at most negligible advantage in above game.
We define two PPTAs , and define their advantages as:
That is, is the advantage of when he is only given and gives a guess on . Since is given both adversaries’ views, if agree on the same guess, he will also give the same guess, otherwise his advantage does not change. Then, we have the following probabilities for four cases:
Since and gives their guesses based on independent views, we have
which also applies to the other three cases. Given those conditional probabilities, the total probability is ():
which leads to
Both Paillier’s cryptosystem and CP-ABE are proved to be semantically secure against chosen plaintext attack (SS-CPA)
Besides adversarial cloud servers, we assumed malicious users in our adversarial model. However, unauthorized malicious users are not as threatening as cloud servers since they never get involved in any transaction with valid users. All they can do is to try man-in-the-middle attacks sniff the search results, which can be trivially prevented with secure communication channel.
- We logically divide the cloud into sharing cloud and search cloud for explanation purpose, but revealing this structure does not breach users’ privacy at all.
- Computation is conducted in a finite cyclic group, and modular operations are followed. We omit the modular operations for the sake of simplicity and defer the detailed description on the finite group selection to Section 6
- CP-ABE is proved to achieve IND-CPA, which implies SS-CPA
- D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” IJCV, vol. 60, no. 2, pp. 91–110, 2004.
- H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robust features (surf),” CVIU, vol. 110, no. 3, pp. 346–359, 2008.
- B. Leibe, E. Seemann, and B. Schiele, “Pedestrian detection in crowded scenes,” in CVPR. IEEE, 2005.
- P. Viola and M. J. Jones, “Robust real-time face detection,” International journal of computer vision, vol. 57, no. 2, pp. 137–154, 2004.
- M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of cognitive neuroscience, vol. 3, no. 1, pp. 71–86, 1991.
- K. Mikolajczyk and C. Schmid, “Scale & affine invariant interest point detectors,” IJCV, vol. 60, no. 1, pp. 63–86, 2004.
- M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “Brief: Binary robust independent elementary features,” in ECCV, 2010.
- K. Peker, “Binary sift: Fast image retrieval using binary quantized sift features,” in CBMI, 2011.
- M.-R. Ra, R. Govindan, and A. Ortega, “P3: Toward privacy-preserving photo sharing,” in NSDI. USENIX, 2013.
- M. McDonnell, “Box-filtering techniques,” Computer Graphics and Image Processing, vol. 17, no. 1, pp. 65–70, 1981.
- J. Bethencourt, A. Sahai, and B. Waters, “Ciphertext-policy attribute-based encryption,” in S&P. IEEE, 2007, pp. 321–334.
- M. Chase and S. S. Chow, “Improving privacy and security in multi-authority attribute-based encryption,” in CCS. ACM, 2009.
- P. Paillier, “Public-key cryptosystems based on composite degree residuosity classes,” in EUROCRYPT. Springer, 1999, pp. 223–238.
- J. Camenisch, G. Neven et al., “Simulatable adaptive oblivious transfer,” in EUROCRYPT. Springer, 2007, pp. 573–590.
- “Extended version with security proof.” https://www.dropbox.com/s/c721a3gov75ylvt/SouTu.pdf.
- A. Alahi, R. Ortiz, and P. Vandergheynst, “Freak: Fast retina keypoint,” in CVPR. IEEE, 2012.
- Y. Lindell and B. Pinkas, “A proof of yao’s protocol for secure two-party computation.” IACR Cryptology ePrint Archive, p. 175, 2004.
- G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller, “Labeled faces in the wild: A database for studying face recognition in unconstrained environments,” Tech. Rep.
- J. Luo, Y. Ma, E. Takikawa, S. Lao, M. Kawade, and B.-L. Lu, “Person-specific sift features for face recognition,” in ICASSP. IEEE, 2007.
- H. Jegou, M. Douze, and C. Schmid, “Hamming embedding and weak geometric consistency for large scale image search,” in ECCV, 2008.
- T. Lindeberg, “Feature detection with automatic scale selection,” IJCV, vol. 30, no. 2, pp. 79–116, 1998.
- M. Muja and D. G. Lowe, “Fast approximate nearest neighbors with automatic algorithm configuration.” in VISAPP (1), 2009, pp. 331–340.
- E. M. Newton, L. Sweeney, and B. Malin, “Preserving privacy by de-identifying face images,” TKDE, vol. 17, no. 2, pp. 232–243, 2005.
- W. Zhang, S.-C. S. Cheung, and M. Chen, “Hiding privacy information in video surveillance system.” in ICIP, 2005.
- A.-R. Sadeghi, T. Schneider, and I. Wehrenberg, “Efficient privacy-preserving face recognition,” in Information, Security and Cryptology. Springer Berlin Heidelberg, 2010, pp. 229–244.
- Y. Tang, P. P. Lee, J. C. Lui, and R. Perlman, “Secure overlay cloud storage with access control and assured deletion,” TDSC, vol. 9, no. 6, pp. 903–916, 2012.
- B. Wang, B. Li, and H. Li, “Oruta: Privacy-preserving public auditing for shared data in the cloud,” in CLOUD. IEEE, 2012.
- C. Wang, N. Cao, K. Ren, and W. Lou, “Enabling secure and efficient ranked keyword search over outsourced cloud data,” TPDS, vol. 23, no. 8, pp. 1467–1479, 2012.
- M. Li, S. Yu, W. Lou, and Y. T. Hou, “Toward privacy-assured cloud data services with flexible search functionalities,” in ICDCSW. IEEE, 2012.
- C. Wang, K. Ren, S. Yu, and K. M. R. Urs, “Achieving usable and privacy-assured similarity search over outsourced cloud data,” in INFOCOM. IEEE, 2012, pp. 451–459.
- Y. Ren, Y. Chen, J. Yang, and B. Xie, “Privacy-preserving ranked multi-keyword search leveraging polynomial function in cloud computing,” in Globecom. IEEE, 2014.
- S. Mukherjee, Z. Chen, and A. Gangopadhyay, “A privacy-preserving technique for euclidean distance-based mining algorithms using fourier-related transforms,” VLDB Journal, vol. 15, no. 4, pp. 293–315, 2006.