Concealing the identity of faces in oblique images with adaptive hopping Gaussian mixtures
Cameras mounted on Micro Aerial Vehicles are increasingly used for recreational photography. However, aerial photographs of public places often contain faces of bystanders thus leading to a perceived or actual violation of privacy. To address this issue, we propose to pseudo-randomly modify the appearance of face regions in the images using a privacy filter that prevents a human or a face recogniser from inferring the identities of people. The filter, which is applied only when the resolution is high enough for a face to be recognisable, adaptively distorts the face appearance as a function of its resolution. Moreover, the proposed filter locally changes its parameters to discourage attacks that use parameter estimation. The filter exploits both global adaptiveness to reduce distortion and local hopping of the parameters to make their estimation difficult for an attacker. In order to evaluate the efficiency of the proposed approach, we use a state-of-the-art face recognition algorithm and synthetically generated face data with 3D geometric image transformations that mimic faces captured from an MAV at different heights and pitch angles. Experimental results show that the proposed filter protects privacy while reducing distortion and exhibits resilience against attacks.
Keywords:Privacy protection hopping Gaussian blur micro aerial vehicles
- Airborne Mobile Camera
- Inertial Measurement unit
- Unmanned Aerial Vehicle
- Micro Aerial Vehicle
- Unmanned Aerial System
- Closed Circuit Television
- Peak Signal to Noise Ratio
- Structural Similarity Index
- British Security Industry Association
- European Committee for Standardization
- Axis Communication
- Region of Interest
- Visual Sensor Network
- Infra Red
- Electro Optics
- Coordinated Universal Time
- Global Positioning System
- Inertial Measurement Unit
- Radio Frequency Identification
- Optical Character Recognition
- Proposed Research Module
- Field of View
- Adaptive Hopping Gaussian Mixture Model
- Point Spread Function
- pseudorandom number generator
- Labelled Faces in the Wild
- 3D Morphable Model
- Adaptive Gaussian Blur
- Space Variant Gaussian Blur
- Fixed Gaussian Blur
- Charge Coupled Device
- Spatial Light Modulator
- Receiver Operating Curve
MAVs are becoming common platforms for a number of civilian applications such as search and rescue (Waharte and Trigoni, 2010), disaster management (Quaritsch et al., 2010) and news reporting (Babiceanu et al., 2015). Moreover, individuals use MAVs equipped with high resolution cameras for recreational photography and videography in public places during sports activities and social gatherings (Hexo+, 2018; AirDog, 2018). Such use in public places raises privacy concerns as bystanders who happen to be within the field of view of the camera are captured as well. The identity of bystanders could be protected by locating and removing (or sufficiently distorting) key image regions, such as faces, using algorithms called privacy filters. However, in order to maintain the aesthetic value of an image, only a minimal distortion of the image content should be allowed.
A privacy filter for recreational aerial photography should satisfy the following properties: (a) introduce only a minimal distortion; (b) be robust against attacks; and (c) be computationally efficient. Minimal distortion is necessary to maintain quality of a protected image close to the unprotected one so that the attention of a viewer is not diverted. Therefore blanking out a face (Schiff et al., 2007) is not a desirable option. Robustness is important to avoid privacy violations by various attacks, e.g. brute-force, naïve, parrot and reconstruction attacks (Kundur and Hatzinakos, 1996; Boult, 2005; Newton et al., 2005; Dufaux and Ebrahimi, 2008; Erdelyi et al., 2014; Korshunov and Ebrahimi, 2014; Dong et al., 2016). A brute-force attack tries to decipher the protected probe images by an exhaustive search (Boult, 2005; Dufaux and Ebrahimi, 2008). Other attacks use gallery images in addition to the protected probe images (Newton et al., 2005; Erdelyi et al., 2014; Korshunov and Ebrahimi, 2014; Dong et al., 2016). In a naïve attack, the protected probe images are compared against the unprotected gallery images (Newton et al., 2005; Erdelyi et al., 2014; Korshunov and Ebrahimi, 2014). In a parrot attack, the attacker has knowledge about the privacy filter and can transform the gallery images into the distorted domain (Newton et al., 2005). In a reconstruction attack, the attacker has some knowledge of how to (partially) reconstruct the probe image from the protected to the unprotected domain (Kundur and Hatzinakos, 1996). Examples of reconstruction methods include inverse filtering and super-resolution techniques (Kundur and Hatzinakos, 1996; Dong et al., 2016). Finally, computational efficiency is desirable when the filter operates using the limited computational and battery power of a MAV.
Privacy filters for aerial photography need to face challenges caused by the ego-motion of the camera, changing illumination conditions, and variable face orientation and resolution. Recent frameworks that support facial privacy-preservation in airborne cameras are Generic Data Encryption (Kim et al., 2014), Unmanned Aircraft Systems-Visual Privacy Guard (Babiceanu et al., 2015) and Adaptive Gaussian Blur (Sarwar et al., 2016). Generic Data Encryption sends an encrypted face region to a privacy server that Gaussian blurs or mosaics the face and then forwards it to an end-user. Unmanned Aircraft Systems-Visual Privacy Guard (Babiceanu et al., 2015) and Adaptive Gaussian Blur (Sarwar et al., 2016) are aimed instead at on-board implementation with an objective to reduce latency and discourage brute-force attacks on the server (Kim et al., 2014). Adaptive Gaussian Blur adaptively configures the Gaussian kernel depending upon the face resolution in order to minimise distortion, while Unmanned Aircraft Systems-Visual Privacy Guard blurs faces with a fixed filter. These methods are prone to parrot attacks (Newton et al., 2005) on the Gaussian blur.
In this paper, we present a novel privacy protection filter to be used on-board an MAV. The proposed filter distorts a face region with secret parameters to be robust to naïve, parrot and reconstruction attacks. The distortion is minimal and adaptive to the resolution of the captured face: we select the smallest Gaussian kernel that reduces the face resolution below a certain threshold. The selected threshold protects the face against the naïve attack as well as maintains its resolution at a specified level. To prevent other attacks, we then insert supplementary Gaussian kernels in the selected Gaussian kernel and hop their parameters locally using a pseudorandom number generator (PRNG) so their estimation is difficult from the filtered face image. The block diagram of the proposed filter is shown in Figure 1.
In contrast to airborne photography, an updated work based on the proposed filter is presented in Sarwar et al. (2018), specifically for the airborne videography. The main contributions of this paper are: (1) basic idea of the Gaussian hopping kernels and their details, (2) a large-scale synthetic face image data set emulating faces captured from an MAV, and (3) extensive experiments to validate the proposed Gaussian hopping kernels, including the reconstruction attacks.
The paper is organised as follows. Sec. 2 covers the state-of-the-art in visual privacy protection filters. Sec. 3 defines the problem. Sec. 4 describes the proposed algorithm, and discusses its computational complexity and security level. Sec. 5 presents our face data set generation and Sec. 6 discuss the experimental results. Finally, Sec. 7 concludes the paper.
Visual privacy protection filters can be applied as pre-processing or post-processing (Fig. 2).
Pre-processing privacy filters are irreversible and operate during image acquisition to prevent a camera from capturing sensitive regions. These filters disable the software or hardware of the camera or notify about photography prohibition (Safe Haven, 2003). Hardware based filters prevent the camera from taking images for example by bursting back an intense light for flash photography (Eagle Eye, 1997; Zhu et al., 2017) or by detecting human faces using an infrared sensor and then obfuscating using a spatial light modulator sensor placed in front of the Charge Coupled Device (CCD) sensor (Zhang et al., 2014).
Post-processing privacy filters protect sensitive regions after image acquisition and can be reversible or irreversible. Reversible filters conceal sensitive regions using a private key, which can later be used to recover the original sensitive region. Irreversible filters deform the features of a sensitive region permanently. Both reversible and irreversible filters can be non-adaptive or adaptive.
Reversible non-adaptive filters are based on generic encryption (Boult, 2005; Chattopadhyay and Boult, 2007; Rahman et al., 2010; Winkler and Rinner, 2011; Zhang et al., 2018). Reversible adaptive filters include scrambling (Dufaux and Ebrahimi, 2006, 2008; Baaziz et al., 2007; Sohn et al., 2011; Ruchaud and Dugelay, 2017), warping (Korshunov and Ebrahimi, 2013b) and morphing (Korshunov and Ebrahimi, 2013a). While reversible adaptive filters are robust against a parrot attack, their protected faces can be compromised by spatial-domain (Jiang et al., 2016a, b) or frequency-domain attacks (Rashwan et al., 2015).
Irreversible non-adaptive filters blank out (Schiff et al., 2007; Koelle et al., 2018) or replace a face with a de-identified representation (Newton et al., 2005). For example, to maintain k-anonymity, the algorithm ”k-Same” (Newton et al., 2005) replaces k faces with their average face. Variants of this algorithm use additional specialised detectors to then preserve attributes such as facial expressions, pose, gender, race, age (Gross et al., 2006; Du et al., 2014; Lin et al., 2012; Letournel et al., 2015; Meden et al., 2018). Irreversible non-adaptive filters are robust to parrot attacks. Irreversible adaptive filters lower the resolution of a sensitive region so that humans or algorithms cannot recognise the identity. Examples include pixelation (Chinomi et al., 2008), Gaussian blur (Wickramasuriya et al., 2004) and cartooning (Erdelyi et al., 2014). The kernel size of the privacy filters can be manually selected (Korshunov and Ebrahimi, 2014; Erdelyi et al., 2014) or the centre kernel size is manually selected and then the Space Variant Gaussian Blur (SVBG) filter (Saini et al., 2012) automatically decreases the kernel size from the centre to the boundary of the detected face. AGB (Sarwar et al., 2016) exploits the different horizontal and vertical resolutions that are typical in aerial photography, and automatically adapts an anisotropic kernel based on the resolution of the detected face. However, irreversible adaptive filters are vulnerable to parrot attacks.
|Distortion||adaptive control||image based||✓||✓||✓||✓|
|Robustness||to brute-force attack||✓||✓||✓||✓||✓||✓||✓|
|to naïve attack||✓||✓||✓||✓||✓||✓||✓||✓||✓|
|to inverse filter attack||✓||✓||✓|
|to super-resolution attack||✓||✓||✓||✓|
|to parrot attack||with detectors||✓|
As a summary, Table 1 compares representative filters for the following categories: reversible & adaptive (Dufaux and Ebrahimi, 2008), reversible & non-adaptive (Boult, 2005), and irreversible & non-adaptive filters (Du et al., 2014). The rest (Babiceanu et al., 2015; Erdelyi et al., 2014; Saini et al., 2012; Korshunov and Ebrahimi, 2014; Sarwar et al., 2016) and proposed are irreversible & adaptive filters.
3 Problem Definition
Let the set contain face data of subjects, where represents the identity (labels). Let each subject appear in at most images, i.e. . Let be the gallery and probe sets, respectively. Usually , where is the cardinality of a set, and .
Let a privacy filter distort image features in order to reduce the probability for an attacker to correctly predict labels. This operation produces a protected probe set , whose distortion depends on , where indicates the horizontal and vertical direction in an image. Let the distortion generated by be measured by the Peak Signal to Noise Ratio (PSNR):
where is the dynamic range of the pixel values. The mean square error, MSE, between the pixel intensities of an unprotected, , and protected, , face is
where and are width and height of , respectively.
We express the privacy level of a face region as the accuracy of a face recogniser (Erdelyi et al., 2014; Korshunov and Ebrahimi, 2014). The value of is the commutative rank-n in face identification or the Equal Error rate (EER) in face verification. We consider in this paper face verification, thus
where and are true positives and true negatives, respectively. Our target is to force a face recogniser of an attacker to have the accuracy of random classifier, which for face verification is .
We therefore aim to design that irreversibly but minimally distorts the appearance of so that the identity is not recognisable with a probability higher than a random guess. If , the ideal distortion parameter, , should be derived as:
where . The first term aims to introduce a minimal distortion, whereas the second term leads the classification results to be equivalent to that of a random classifier, irrespective of whether the filtered or reconstructed face is compared against the unprotected, filtered or reconstructed gallery data sets. The second term objective is dependent upon the recognition capability of a face recogniser and is heuristically addressed for a given face recogniser (Chriskos et al., 2016; Erdélyi et al., 2017; Pittaluga and Koppal, 2017).
The content of should be protected against naïve-T, parrot-T and reconstruction attacks. Let an attacker have access to , where is the filtered gallery data set and is the filtered and reconstructed gallery data set. An attacker can modify , , or both, to correctly predict of . In a naïve attack (here referred to as naïve-T attack), a privacy filter is applied on to generate a protected probe data set , while the unaltered is used for training (Newton et al., 2005). A parrot attack (here referred to as parrot-T attack), learns the privacy filter type and its parameters (e.g. Gaussian blur of certain standard deviation used to generate ). Then, the learned filter is applied on to generate a privacy protected gallery data set . Finally, and are used for training and testing, respectively (Newton et al., 2005). In a reconstruction attack, the discriminating features of are first restored (e.g. using an inverse filter or a super-resolution algorithm) to generate a reconstructed probe data set and then compared against or a reconstructed gallery data set . An inverse filter first estimates the parameters of a privacy filter using and then performs an inverse operation to reconstruct the original faces (Kundur and Hatzinakos, 1996). Similarly, a super-resolution algorithm first learns embeddings between the high-resolution and their corresponding low-resolution faces and then reconstructs the high-resolution faces for (Dong et al., 2016).
4 Proposed Approach
In order to minimally distort as well as to achieve robustness against brute-force, naïve-T, parrot-T and reconstruction attacks, we propose the Adaptive Hopping Gaussian Mixture Model (AHGMM) algorithm. The AHGMM consists of a globally estimated optimal Gaussian Point Spread Function (PSF) and supplementary Gaussian PSFs added inside the optimal Gaussian PSF. For a single supplementary Gaussian PSF inside an optimal Gaussian PSF, the AHGMM is illustrated in Fig. 3, while the pseudo-code is given in Algorithm 1. A list of important notations is presented in Appendix 7.
Figure 1 shows the processing diagram of our proposed framework and the different blocks of it are explained in more details in the following subsections.
4.1 Pixel Density Estimation
Let an MAV capture an image while flying at an altitude of meters. Let the principal axis of its on-board camera be tilted by from the nadir direction (see Figure 4). We assume that height and tilt angle of the camera can be estimated.
A value of generates an oblique image. Let be the height of the face above ground111While each image could contain faces, for simplicity we consider in this paper only the case .. We represent the face region in the image as , which is viewed at an angle .
Let represent the pixel density (px/cm) around the centre of . If and represent the physical dimensions of a pixel in the horizontal and vertical direction, respectively and is the focal length of the camera, the horizontal density for a pixel around (Sarwar et al., 2016) is
and the vertical density , by exploiting the small angle approximation for a single pixel of the image sensor (Sarwar et al., 2016), is
Let define whether is naturally protected () because of a low horizontal and vertical density, or not () (Sarwar et al., 2016):
where and are pixel densities at which a state-of-the-art machine algorithm starts recognising human faces, and simply called thresholds. If , then the original frame can be transmitted without any modifications. Otherwise, should be protected by a privacy filter to reduce its pixel densities below and . When is not inherently protected, we assume that the corresponding bounding box is given.
4.2 Optimal Gaussian PSF
In the case of Gaussian blur, is an approximated Gaussian function of mean and standard deviation (Saini et al., 2012; Korshunov and Ebrahimi, 2014; Sarwar et al., 2016), and thus called a Gaussian PSF of parameter . More specifically, the parameter controls the distortion strength of and provides pixel density in , respectively.
As a higher results into lower , we first find the minimum value called optimal parameter of that makes . As a result, provides the minimum distortion in while making it robust against the naïve-T attack (i.e. ). Increasing beyond increases the distortion without improving the privacy level as the recogniser performance is already at the level of the random classifier. For a face captured from an MAV with pixel densities , we calculate of an optimal Gaussian PSF (lines 2-5 in Algorithm 1), where like in traditional Gaussian blur (Saini et al., 2012; Korshunov and Ebrahimi, 2014; Sarwar et al., 2016) and (Sarwar et al., 2016) is estimated as follows:
where is measured in cycles/cm, in px and in px/cm. Let represents the Nyquist frequency of . Let is the highest spatial frequency component that we want to completely remove using a low pass filter, i.e. Gaussian blur. In other words, is the Nyquist frequency of , i.e. pixel density after filtering. Both and are related as
As we are interested in removing frequency components beyond , we can select because the amplitude response of a Gaussian PSF at three times of its standard deviation is very close to zero and multiplication (convolution in space domain) with such a Gaussian PSF will suppress frequencies larger than . Substituting in Eq. 10, in the resulting relation Eq. 9 and finally rearranging gives the optimal standard deviation of Gaussian PSF as
4.3 Hopping GMM Kernels
Filtering with the optimal Gaussian PSF defined by would only protect from a naïve-T attack but not from a parrot-T attack and a reconstruction attack. To ensure that the probability of correctly predicting the label of is not increased in case of the parrot-T attack (i.e. ) as well as the reconstruction attack (i.e. or ), we secretly modify to while generating so that an adversary is unable to accurately reconstruct face region , or even generate and . For this purpose, we generate a set which consists of sub-regions in such a way that each sub-region covers a small area of :
The size of (in pixels) affects the total number of sub-regions per face region , which could influence its privacy level. Smaller values of (larger sub-regions) result in a reduced distortion.
After finding , and generating , we make a hopping mixture of Gaussian for each sub-region, i.e. we pseudo-randomly change to for each . Moreover, we select supplementary Gaussian PSFs inside this optimal Gaussian PSF and vary their parameters based on pseudo-random weights (lines 9-17 in Algorithm 1).
Let set contains the parameters of the modified optimal and supplementary Gaussian PSFs for each sub-region, and is represented as
while the remaining elements (i.e. ) belong to the supplementary Gaussian PSFs. These elements are calculated as
where, and are normalised pseudo-randomly generated numbers and control the local distortion in filtering. The variable controls the relative size of the supplementary Gaussian PSF w.r.t. the optimal Gaussian PSF.
|(6.21, 4.63)||(6.21, 4.56)||(3.11, 2.17)||(3.11, 2.00)||(1.55, 0.89)||(1.55, 0.74)||(0.78, 0.29)||(0.78, 0.20)|
Finally, a set of mixture models is generated for each sub-region (line 22 in Algorithm 1) as
where each element is calculated as
4.4 Local and Global Filtering
We have now discretised Gaussian mixture models in for sub-regions of . We locally convolve each sub-region (Eq. 12) with their respective to make a protected sub-region :
where . Changing the convolutional kernel for each sub-region generates blocking artefacts (see Fig. 5). To smooth these artefacts, we apply a global convolution filter (line 25 in Algorithm 1) with a Gaussian kernel of zero mean and standard deviation
where represents the sub-region size in pixels. As a result, a smoothed protected face is developed which is replaced in the captured image to generate a privacy protected image . Fig. 6 shows few sample images filtered by AHGMM at different thresholds.
4.5 Computational Complexity
The generation of a convolutional kernel is more complex in AHGMM than in the adaptive Gaussian blur filter (Sarwar et al., 2016). In fact, the latter only needs to compute a single Gaussian function, while AHGMM requires the computation of Gaussian functions. Moreover, the adaptive Gaussian blur exploits the separability property of 2D convolutional kernels, i.e. , to reduce the number of multiplications and additions from to + ( and represent the width and height of in pixels, respectively). Instead, AHGMM dynamically reconfigures the convolutional kernel after processing each sub-region and therefore requires exactly multiplications and additions.
5 Dataset Generation
To the best of our knowledge, there is no large publicly available face dataset collected from an MAV. We therefore generate face images as if they were captured from an MAV via geometric transformation and down-sampling of the LFW dataset (Huang et al., 2007). The LFW dataset was collected in an unconstrained environment with extreme illumination conditions and extreme poses. We use the standard verification benchmark test of the LFW dataset (12000 images of 4281 subjects), divided into 10-folds for cross-validation. Each fold contains 600 images of the same subject and 600 images of different subjects. We use the deep funnelled version of the LFW dataset.
Figure 8 shows sample images of the stages of the dataset generation pipeline. We fit a 3D Morphable Model (3DMM) (Bas et al., 2016) on an input image to detect 68 facial landmarks (Zhu and Ramanan, 2012) and then iteratively fit a 3DMM to generate a 3D image representation222Among the 12000 images, the landmark detector (Zhu and Ramanan, 2012) was unable to detect 68 facial landmarks on 74 images. Therefore, we were unable to fit a 3DMM and used the original 74 images in order to comply with the standard verification test script of the LFW data set.. As there may be only a few degrees pitch of the subject captured in the images (e.g. a person looking slightly downward or upward), we rotate the 3D image at pitch by applying a geometric transformation computed from the estimated pose of the fitted 3DMM. This disturbs the image alignment of the original data set, so a realignment is required, which we perform after generating the pitch effect. The synthetic pitch angles start from to with a step size of and project it back to generate a corresponding 2D image. In order to align this image so that the eyes and nose appear at the same place among the images belonging to the same pitch angle, we apply an affine transformation computed by detecting eyes and nose tip using Dlib library (King, 2009) such that the transformed face has a resolution of pixels. As the detection accuracy of the eyes and nose decrease with increasing pitch angle, we generate a ground truth (location of eyes and nose tip) of the pitch angle images and uses it for the higher pitch angle images.
Finally, to introduce different height effects for the synthetically generated images, we down-sample them with a factor of , , and generating images of , , , pixels, respectively. Thus, we increase the size of the original standard verification test of the LFW data set by times, i.e. from images to images. Fig. 7 shows the 40 sample images belonging to the same and different subjects.
We manually determined the values of and by
where is the cropped face size in pixels, is the pitch angle of the image and and are the average human face dimensions, i.e. the bitragion breadth of 15.45 cm and menton-crinion length of 20.75 cm, respectively (DoD, 2000).
6 Experimental Results
6.1 Experimental Set up
We compare AHGMM against Space Variant Gaussian Blur (SVGB) (Saini et al., 2012), Adaptive Gaussian Blur (AGB) (Sarwar et al., 2016) and Fixed Gaussian Blur (FGB), which uses a constant Gaussian kernel defined with respect to the highest resolution face. Thus, we estimate the kernel for FGB as in (Sarwar et al., 2016) for the face with pixels at pitch angle. For the SVGB filter, we divide the face into four concentric circles and reduce the kernel size by while radially moving out between two consecutive regions as in (Saini et al., 2012). Although the kernel for the innermost region was manually selected in the original work, we choose the anisotropic kernel as estimated by the AGB (Sarwar et al., 2016) and convert it into an isotropic kernel for a fair comparison. We use a block size of and for the AHGMM.
To compare privacy filters, we measure the face verfication accuracy using OpenFace (Amos et al., 2016), an open source implementation of Google’s face recognition algorithm FaceNet (Schroff et al., 2015). OpenFace uses a deep Convolutional Neural Network (CNN) as a feature extractor, which is trained by a large face data set (500k images). This feature extractor is applied on the training and test images for their representations (embeddings) which are used for classification (Schroff et al., 2015).
We perform experiments with 480,000 images (consisting of 5 different resolutions and 8 different pitch angles) to determine the validity of the proposed AHGMM to protect the identity information of an individual. For this purpose, we analyse the effect of a naïve-T attack, a parrot-T attack, an inverse filter attack and a super-resolution attack. Moreover, we quantify the corresponding fidelity degradation caused by the AHGMM.
As AGB and SVGB do not use any secret key, we evaluate them only using their accurate parameters in the parrot-T, inverse filter and super-resolution attacks. In contrast, any of these attacks on AHGMM can be further divided into three sub-attacks: optimal kernel, pseudo AHGMM and accurate AHGMM. In the optimal kernel sub-attack, we assume that an attacker is able to estimate the parameters of the optimal kernel and applies the optimal kernel to the entire face. In the pseudo AHGMM sub-attack, we assume that the attacker knows the optimal kernel and randomly modifies the filter parameter for the sub-regions. In the accurate AHGMM sub-attack, we assume that the attacker has access to the secret key and can decipher all filter parameters for the sub-regions. As this prior-knowledge can be exploited for both probe and gallery images, we therefore evaluate AHGMM under 13 different scenarios stated in Table 2.
We assume that an attacker is able to determine the pitch angle of a protected face using the background information of an image captured from an MAV and can apply a geometric transformation to transform the gallery images at that pitch angle. Therefore, in all the following attacks, both the gallery and the probe images are at the same pitch angle which can be protected or unprotected depending upon the attack type. Moreover, we use the same resolution for both the gallery images and the probe images.
6.2 Naïve-T Attack
First of all, we perform a naïve-BL attack which shows the baseline face verfication accuracy when both the probe data set and the gallery data set are unprotected. The results of the naïve-BL attack are given in Fig. 9. After that we perform a naïve-T attack in which the gallery images are unprotected, while the probe images are protected using FGB, SVGB (Saini et al., 2012), AGB (Sarwar et al., 2016) and AHGMM. The results of this attack are given in Fig. 10 at different thresholds .
The naïve-BL attack shows that the accuracy of our synthetically generated data set decreases with the decrease of the face resolution and with the increase in the face pitch angle. However, this trend vanishes at high pitch angles, i.e. and , where it shows slight randomness. Finally, for the low resolution faces ( pixels), the accuracy does not show any effect of the pitch angle and slightly oscillates. Therefore, we consider pixels inherently privacy protected and remove these images from the analysis of the privacy filters.
|(optimal kernel)||(pseudo AHGMM)||(accurate AHGMM)|
From the naïve-T attack, we are interested in finding the optimal threshold which defines the optimal kernel for AGB (Sarwar et al., 2016) (see Section 4 and Eq. 11). It is clear from Fig. 10 that the accuracy of the naïve-T attack decreases while decreasing the threshold. When the threshold reaches px/cm, the difference between the accuracy achieved by AGB (Sarwar et al., 2016) and a random classifier () becomes very small except, unexpectedly, at high pitch angles. This difference further decreases at px/cm and px/cm. Thus, the optimal threshold defining the optimal kernel can be px/cm, px/cm and px/cm. The later two thresholds decreases the accuracy negligibly but distort the images severely. Therefore, we decide to perform a trade-off analysis of the accuracy (under naïve, parrot attack and reconstruction attacks) and the distortion at these three thresholds.
At these three thresholds under the naïve-T attack, the accuracy of the AHGMM is higher as compared to the AGB (Sarwar et al., 2016). The main reason for this slightly higher accuracy is due to the under blurred sub-regions of the AHGMM filtered face as it hops its kernel below and above the optimal Gaussian kernel. In contrast, the accuracy of the Space Variant Gaussian Blur (Saini et al., 2012) is always lower than AGB and AHGMM. This is because SVGB uses an isotropic Gaussian kernel which deteriorates a face more severely as compared to the anisotropic kernel of the AGB and AHGMM filter. FGB possess the lowest accuracy at any threshold due to over blurring of all images except pixels images at pitch angle.
6.3 Parrot-T Attack
In the parrot-T attack, we filter both gallery and probe images and then evaluate the achieved accuracy. We study the parrot-T attack on AHGMM under three sub-attacks: optimal kernel parrot-T sub-attack, pseudo AHGMM parrot-T sub-attack and accurate AHGMM parrot-T sub-attack. The accuracy results of these sub-attacks are given in Fig. 10 at different thresholds , while Receiver Operating Curves for the accurate AHGMM parrot-T sub-attack at px/cm are presented in Fig. 11.
The parrot-T attack on state-of-the-art privacy filters increases the accuracy as compared to the naïve-T attack. Under the optimal kernel parrot sub-attack, our AHGMM shows the least accuracy improvement at any of the three thresholds. This is because the optimal kernel Gaussian blur is a spatially invariant blur that is not helpful in recognising spatially varying Gaussian blurred images, e.g. the AHGMM filtered images. Thus, our AHGMM provides the lowest accuracy against the parrot-T attack using the optimal kernel.
The pseudo AHGMM parrot-T sub-attack slightly improves the accuracy further as compared to the optimal kernel parrot-T sub-attack. The main reason is that both the gallery and the probe images are now filtered using spatially varying Gaussian blur. However, under the pseudo AHGMM sub-attack, the accuracy of AHGMM remains below the other three state-of-the-art privacy filters. Thus, our AHGMM provides the highest privacy protection even against the pseudo AHGMM parrot sub-attack.
Finally, the accurate AHGMM sub-attack improves the accuracy as compared to the optimal kernel and almost eqivalent to the pseudo AHGMM sub-attacks. Comparatively, even under the accurate AHGMM sub-attack, AHGMM performs better than FGB, AGB (Sarwar et al., 2016) and SVGB (Saini et al., 2012) at these three thresholds with the least improvement at px/cm.
From the accurate AHGMM sub-attack, it is apparent that our AHGMM permanently removes the sensitive information from the face and an attacker can not recognise it with a high accuracy even when he/she has access to the secret key. This is in contrast to the reversible filters, e.g. encryption/scrambling based filters, which can reconstruct the original face after having the secret key. Thus, our AHGMM is robust against the brute-force attack.
6.4 Inverse Filter Attack
In the inverse-filter (IF) attack, we reconstruct the probe images by deconvolving the protected face with an accurate or estimated kernel. We evaluate the IF attack under four sub-attacks: optimal kernel naïve-IF sub-attack, pseudo AHGMM naïve-IF sub-attack, accurate AHGMM naïve-IF sub-attack and accurate AHGMM parrot-IF sub-attack. Fig. 12 depicts the effect of inverse filtering on selected sample images protected with AGB, SVGB and AHGMM. Fig. 13 shows the achieved accuracies under the different sub-attacks at different values of , while Fig. 14 presents ROCs for the accurate AHGMM parrot-IF sub-attack at px/cm.
As can be seen in Fig. 12, the face reconstruction quality decreases when the threshold increases (increasing the filter kernel) even if the filter parameters are known. This is true for both space invariant Gaussian blur (AGB) and linear space variant Gaussian blur (SVGB). The main reason is that the boundaries of the face start propagating towards the center of the face as the threshold is decreased. Thus, it becomes difficult to distinguish between reconstructed faces at the lower thresholds (see Fig. 13).
In case of non-linear space variant blur (AHGMM), the reconstruction becomes more challenging even when the same hopping kernels are used as for the protection. The main reason, in addition to the boundary propagation, is that while deconvolving a sub-region, the IF incorrectly treats the adjacent subregions as if they were filtered with the same kernel, thus not enabling it to reconstruct the original face (see Fig 12). Consequently, it becomes difficult to accurately predict the label of the reconstructed face.
|Threshold||AGB (Sarwar et al., 2016)||SVGB (Saini et al., 2012)||AHGMM|
|(optimal kernel)||(pseudo AHGMM)||(accurate AHGMM)||(accurate AHGMM)|
In contrast to naïve-IF attacks, parrot-IF attack is more severe and increases significantly the accuracy, especially for AGB, FGB and SVGB. AHGMM also shows the accuracy improvement but less than AGB, FGB and SVGB; and is more robust to an inverse filter attack even when using an accurate secret key.
6.5 Super-resolution Attack
In this attack, we reconstruct the filtered probe images with SRCNN (Dong et al., 2016). SRCNN first learns a mapping between the high-resolution images and their corresponding low-resolution version, and then applies this mapping to enhance the details of a low-resolution image. We learn the SRCNN mapping for iterations between the protected images (i.e. the low resolution) and their corresponding unprotected images (i.e. the high resolution) using the same data sets (91-images and Set5) as used in (Dong et al., 2016). As learning of the mapping is a time consuming process, we investigate the super-resolution attack for a single point of our synthetic data set: 12000 images each with pixels and pitch angle.
We evaluate the super-resolution (SR) attack under four sub-attacks: optimal kernel naïve-SR sub-attack, pseudo AHGMM naïve-SR sub-attack, accurate AHGMM naïve-SR sub-attack and accurate AHGMM parrot-SR sub-attack. Tab. 3 summarises the achieved accuracies under the different sub-attacks, while Fig. 15 presents the ROC for the accurate AHGMM parrot-SR sub-attack. Fig. 16 depicts a visual comparison of the super-resolution reconstruction for three sample faces protected by AGB, SVGB and AHGMM filters.
|optimal naïve-SR||0.592 (0.012)||0.566 (0.016)||0.515 (0.014)|
|pseudo AHGMM naïve-SR||–||–||0.520 (0.006)|
|accurate AHGMM naïve-SR||–||–||0.532 (0.018)|
|accurate AHGMM parrot-SR||0.634 (0.015)||0.583(0.034)||0.546 (0.018)|
For the space invariant Gaussian blur (AGB), it is apparent from Fig. 16 that the SR attack can reconstruct the faces more effectively, even when the kernel size is quite high (i.e. px/cm). Therefore, the faces protected by AGB achieves a higher accuracy (see Tab. 3). In contrast, faces protected by linear space variant Gaussian blur (SVGB) are difficult to reconstruct. The main reason is that the SR mapping becomes erroneous especially for patches which contain parts processed by different kernels. However, SR can effectively reconstruct patches where the Gaussian blur is locally invariant (e.g. compare the areas around eyes of the SVGB restored faces in Fig. 16). The overall reconstruction is worse than for AGB and thus the achieved accuracy is lower.
Reconstruction by super-resolution is even more challenging for AHGMM protected faces. The main reason is that a single patch for learning the mapping contains several sub-regions each filtered with pseudo-randomly correlated Gaussian mixture models. Thus, the error in the learned SR mapping increases resulting in the lowest accuracy as compared to AGB and SVGB.
Similarly to parrot-IF attack, the accuracy improves for the parrot-SR attack where SR-reconstruction is also performed for the gallery images. Especially for AGB and SVGB, the similarity between (protected and reconstructed) gallery images and the (reconstructed) probe images increases. Thus, the accuracy increases. As for the other attacks, AHGMM is more robust to parrot attacks than AGB and SVGB, and achieves the lowest accuracy.
6.6 Distortion Analysis
We measure the distortion of the FGB, SVGB (Saini et al., 2012), AGB (Sarwar et al., 2016) and AHGMM using PSNR. For a trade-off analysis between distortion and privacy, we plot the face verification accuracy against PSNR. The results of this trade-off analysis are presented in Fig. 17.
AGB (Sarwar et al., 2016) has the highest average PSNR values followed by SVGB (Saini et al., 2012), AHGMM and FGB. The main reason is that AGB uses a single anisotropic kernel instead of spatially linearly varying kernel used by SVGB (Saini et al., 2012). Although AHGMM also uses an anisotropic kernel like AGB, the spatial hopping phenomena of the Gaussian mixture model of the AHGMM results in high distortion (PSNR values) as compared to AGB and SVGB (see Fig. 6). FGB has the highest distortion as it does not change its parameters depending upon the resolution of the face.
We presented an irreversible visual privacy protection filter which is robust against a parrot, an inverse-filter and a super-resolution attack that are faced by an adhoc blurring of sensitive regions. The proposed filter is based on an adaptive hopping Gaussian mixture model. Depending upon the captured resolution of a sensitive region, the filter globally adapts the parameters of the Gaussian mixture model to minimise the distortion, while locally hop them pseudo-randomly so that an attacker is unable to estimate these parameters. We evaluated the validity of the AHGMM using a state-of-the-art face recognition algorithm and a synthetic face data set with faces at different pitch angles and resolutions emulating faces as captured from an MAV. The proposed algorithm provides the highest privacy level under a parrot, an inverse-filter and a super-resolution attack and an almost equivalent level of privacy to state-of-the-art privacy filters under a naïve attack.
Unlike face-de-identification approaches ((Newton et al., 2005; Gross et al., 2006; Du et al., 2014; Lin et al., 2012; Letournel et al., 2015; Chriskos et al., 2015)), we do not depend on an auxiliary visual detector (i.e. pose, facial expression, age, gender, race) to counter a parrot, an inverse-filter or a super-resolution attack. Moreover, unlike the encryption/scrambling filters ((Dufaux and Ebrahimi, 2006, 2008; Baaziz et al., 2007; Sohn et al., 2011; Korshunov and Ebrahimi, 2013b, a; Boult, 2005; Chattopadhyay and Boult, 2007; Rahman et al., 2010; Winkler and Rinner, 2011)), AHGMM prevents the recovery of the original face even with access to the seed of the PRNG.
We will make available to the research community the face dataset of 4281 subjects we generated to emulate faces captured from an MAV under varying poses and illumination conditions.
All the symbols used in the paper along with their meanings are summarised in Table 4.
|unprotected, protected and reconstructed face region|
|width and height of|
|A data set including both gallery and probe data sets|
|unprotected, protected and reconstructed gallery data set|
|unprotected, protected and reconstructed probe data set|
|original and predicted identity labels|
|a privacy filter of parameter|
|a function that an attacker exploits|
|distortion introduced by|
|probability of predicting the label of a face|
|face verification accuracy|
|verification accuracy of a random classifier|
|focal length of the camera|
|physical dimension of a pixel in direction|
|height of a camera and face from ground level|
|vectors representing Nadir and principal axis of a camera|
|angle between , and ,|
|Number of sub-regions of|
|Number of supplementary Gaussian functions|
|pixel density (px/cm), where|
|threshold pixel density for privacy filtering|
|mean and standard deviation of a Gaussian PSF|
|mean and standard deviation of an optimal Gaussian PSF|
|randomly modified and for Gaussian PSF|
|randomly generated numbers for and|
|a tuple (, ), (, ) and ()|
|Nyquist frequency of and|
|frequency domain standard deviation corresponding to|
|scaling factor for|
|set of tuple containing parameters of Gaussian functions|
|a set of Gaussian functions|
|an element of|
|a set of weights for Gaussian mixture model|
|an element of|
|Gaussian mixture model|
|an element of|
|sub-region size in pixels|
|standard deviation of global smoothing filter|
O. Sarwar was supported in part by Erasmus Mundus Joint Doctorate in Interactive and Cognitive Environment, which is funded by the Education, Audio-visual & Culture Executive Agency under the FPA no 2010-0015.
- AirDog (2018) AirDog (2018) https://www.airdog.com/, [Last accessed: 2018-10-21]
- Amos et al. (2016) Amos B, Ludwiczuk B, Satyanarayanan M (2016) Openface: A general-purpose face recognition library with mobile applications. Tech. rep., CMU-CS-16-118, CMU School of Computer Science
- Baaziz et al. (2007) Baaziz N, Lolo N, Padilla O, Petngang F (2007) Security and privacy protection for automated video surveillance. In: Proc. IEEE Int. Symposium on Signal Processing and Information Technology, Cairo, Egypt, pp 17–22, DOI 10.1109/ISSPIT.2007.4458044
- Babiceanu et al. (2015) Babiceanu R, Bojda P, Seker R, Alghumgham M (2015) An onboard UAS visual privacy guard system. In: Proc. Integrated Communication, Navigation, and Surveillance Conf., Herdon, USA, pp J1:1–J1:8, DOI 10.1109/ICNSURV.2015.7121232
- Bas et al. (2016) Bas A, Smith WAP, Bolkart T, Wuhrer S (2016) Fitting a 3D morphable model to edges: A comparison between hard and soft correspondences. In: Proc. Asian Conf. on Computer Vision, Taipei, Taiwan, pp 1–15
- Boult (2005) Boult TE (2005) PICO: Privacy through invertible cryptographic obscuration. In: Proc. Computer Vision for Interactive and Intelligent Environment, Lexington, USA, pp 27–38, DOI 10.1109/CVIIE.2005.16
- Chattopadhyay and Boult (2007) Chattopadhyay A, Boult TE (2007) PrivacyCam: A privacy preserving camera using uCLinux on the blackfin DSP. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Minneapolis, USA, pp 1–8, DOI 10.1109/CVPR.2007.383413
- Chinomi et al. (2008) Chinomi K, Nitta N, Ito Y, Babaguchi N (2008) Prisurv: Privacy protected video surveillance system using adaptive visual abstraction. In: Proc. Int. Conf. on Advances in Multimedia Modeling, Kyoto, Japan, pp 144–154
- Chriskos et al. (2015) Chriskos P, Zoidi O, Tefas A, Pitas I (2015) De-identifying facial images using projections on hyperspheres. In: Proc. IEEE Int. Conf. and Workshops on Automatic Face and Gesture Recognition, Ljubljana, Slovenia, vol 04, pp 1–6, DOI 10.1109/FG.2015.7285020
- Chriskos et al. (2016) Chriskos P, Zoidi O, Tefas A, Pitas I (2016) De-identifying facial images using singular value decomposition and projections. Multimedia Tools and Applications pp 1–34, DOI 10.1007/s11042-016-4069-8
DoD (2000) Human Engineering Design Data Digest, Department of Defense Human
Factors Engineering Technical Advisory Group.
- Dong et al. (2016) Dong C, Loy CC, He K, Tang X (2016) Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 38(2):295–307, DOI 10.1109/TPAMI.2015.2439281
- Du et al. (2014) Du L, Yi M, Blasch E, Ling H (2014) Garp-face: Balancing privacy protection and utility preservation in face de-identification. In: Proc. IEEE Int. Joint Conf. on Biometrics, pp 1–8, DOI 10.1109/BTAS.2014.6996249
- Dufaux and Ebrahimi (2006) Dufaux F, Ebrahimi T (2006) Scrambling for video surveillance with privacy. In: Proc. Computer Vision and Pattern Recognition Workshops, New York, USA, pp 160–160, DOI 10.1109/CVPRW.2006.184
- Dufaux and Ebrahimi (2008) Dufaux F, Ebrahimi T (2008) Scrambling for privacy protection in video surveillance systems. IEEE Trans on Circuits and Systems for Video Technology 18(8):1168–1174, DOI 10.1109/TCSVT.2008.928225
- Eagle Eye (1997) Eagle Eye (1997) Bulletin of the Connecticut Academy of Science and Engineering 12(2)
- Erdelyi et al. (2014) Erdelyi A, Barat T, Valet P, Winkler T, Rinner B (2014) Adaptive cartooning for privacy protection in camera networks. In: Proc. Int. Conf. on Advanced Video and Signal Based Surv., Seoul, Korea, pp 44–49, DOI 10.1109/AVSS.2014.6918642
- Erdélyi et al. (2017) Erdélyi Á, Winkler T, Rinner B (2017) Privacy protection vs. utility in visual data. Multimedia Tools and Applications pp 1–28, DOI 10.1007/s11042-016-4337-7
- Gross et al. (2006) Gross R, Sweeney SL, Torre FJdl, Baker SM (2006) Model-based face de-identification. In: Proc. Conf. on Computer Vision and Pattern Recognition Workshop, New York, USA, pp 161–161, DOI 10.1109/CVPRW.2006.125
- Hexo+ (2018) Hexo+ (2018) https://hexoplus.com/, [Last accessed: 2018-10-21]
- Huang et al. (2007) Huang GB, Ramesh M, Berg T, Learned-Miller E (2007) Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Tech. Rep. 07-49, University of Massachusetts, Amherst
- Jiang et al. (2016a) Jiang R, Al-Maadeed S, Bouridane A, Crookes D, Celebi M (2016a) Face recognition in the scrambled domain via salience-aware ensembles of many kernels. IEEE Trans on Information Forensics and Security 11(8):1807–1817, DOI 10.1109/TIFS.2016.2555792
- Jiang et al. (2016b) Jiang R, Bouridane A, Crookes D, Celebi M, Wei HL (2016b) Privacy-protected facial biometric verification using fuzzy forest learning. IEEE Trans on Fuzzy Systems 24(4):779–790, DOI 10.1109/TFUZZ.2015.2486803
- Kim et al. (2014) Kim Y, Jo J, Shrestha S (2014) A server-based real-time privacy protection scheme against video surveillance by unmanned aerial systems. In: Proc. Int. Conf. on Unmanned Aircraft Systems, Orlando, USA, pp 684–691, DOI 10.1109/ICUAS.2014.6842313
- King (2009) King DE (2009) Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research 10:1755–1758
- Koelle et al. (2018) Koelle M, Ananthanarayan S, Czupalla S, Heuten W, Boll S (2018) Your smart glasses’ camera bothers me!: Exploring opt-in and opt-out gestures for privacy mediation. In: Proc. Nordic Conf. on Human-Computer Interaction, Oslo, Norway, pp 473–481, DOI 10.1145/3240167.3240174
- Korshunov and Ebrahimi (2013a) Korshunov P, Ebrahimi T (2013a) Using face morphing to protect privacy. In: Proc. IEEE Int. Conf. on Advanced Video and Signal Based Surv., Kraków, Poland, pp 208–213, DOI 10.1109/AVSS.2013.6636641
- Korshunov and Ebrahimi (2013b) Korshunov P, Ebrahimi T (2013b) Using warping for privacy protection in video surveillance. In: Proc. Int. Conf. on Digital Signal Processing, Fira, Santorini, Greece, pp 1–6, DOI 10.1109/ICDSP.2013.6622791
- Korshunov and Ebrahimi (2014) Korshunov P, Ebrahimi T (2014) Towards optimal distortion-based visual privacy filters. In: Proc. IEEE Int. Conf. on Image Processing, Paris, France, pp 6051–6055, DOI 10.1109/ICIP.2014.7026221
- Kundur and Hatzinakos (1996) Kundur D, Hatzinakos D (1996) Blind image deconvolution. IEEE Signal Processing Magazine 13(3):43–64, DOI 10.1109/79.489268
- Letournel et al. (2015) Letournel G, Bugeau A, Ta VT, Domenger JP (2015) Face de-identification with expressions preservation. In: Proc. IEEE Int. Conf. on Image Processing, pp 4366–4370, DOI 10.1109/ICIP.2015.7351631
- Lin et al. (2012) Lin Y, Wang S, Lin Q, Tang F (2012) Face swapping under Large Pose Variations: A 3D model based approach. In: Proc. IEEE Int. Conf. on Multimedia and Expo, pp 333–338, DOI 10.1109/ICME.2012.26
- Meden et al. (2018) Meden B, EmerÅ¡iÄ Å, Å truc V, Peer P (2018) k-Same-Net: k-Anonymity with generative deep neural networks for face de-identification. Entropy 20(1), DOI 10.3390/e20010060
- Nawaz and Ferryman (2015) Nawaz T, Ferryman J (2015) An annotation-free method for evaluating privacy protection techniques in videos. In: Proc. IEEE Int. Conf. on Advanced Video and Signal Based Surv., Karlsruhe, Germany, pp 1–6, DOI 10.1109/AVSS.2015.7301800
- Newton et al. (2005) Newton EM, Sweeney SL, Malin SB (2005) Preserving privacy by de-identifying facial images. IEEE Trans on Knowledge and Data Engineering 17:232–243
- Oppenheim et al. (1996) Oppenheim A, Willsky A, Nawab S (1996) Signals & Systems (2nd Ed.). Prentice-Hall, Inc., Upper Saddle River, USA
- Pittaluga and Koppal (2017) Pittaluga F, Koppal SJ (2017) Pre-capture privacy for small vision sensors. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(11):2215–2226, DOI 10.1109/TPAMI.2016.2637354
- Popkin et al. (2010) Popkin T, Cavallaro A, Hands D (2010) Accurate and efficient method for smoothly space-variant gaussian blurring. IEEE Trans on Image Processing 19(5):1362–1370
- Quaritsch et al. (2010) Quaritsch M, Kruggl K, Wischounig-Strucl D, Bhattacharya S, Shah M, Rinner B (2010) Networked UAVs as aerial sensor network for disaster management applications. e & i Elektrotechnik und Informationstechnik 127:56–63
- Rahman et al. (2010) Rahman S, Hossain M, Mouftah H, El Saddik A, Okamoto E (2010) A real-time privacy-sensitive data hiding approach based on chaos cryptography. In: Proc. IEEE Int. Conf. on Multimedia and Expo, Suntec City, Singapore, pp 72–77, DOI 10.1109/ICME.2010.5583558
- Rashwan et al. (2015) Rashwan H, GarcÃa M, BallestÃ© A, Puig D (2015) Defeating face de-identification methods based on DCT-block scrambling. Machine Vision and Applications 27:251â262, DOI DOI10.1007/s00138-015-0743-5
- Ruchaud and Dugelay (2017) Ruchaud N, Dugelay JL (2017) Aseppi: Robust privacy protection against de-anonymization attacks. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition Workshops, Honolulu, Hawaii, US, pp 1352–1359, DOI 10.1109/CVPRW.2017.177
- Safe Haven (2003) Safe Haven (2003) Safe Haven from Iceberg Systems ensures privacy from camera phones; Camera phone voyeurs and spy‘s can be defeated by new technology. [Last accessed: 2017-03-17]
- Saini et al. (2012) Saini M, Atrey PK, Mehrotra S, Kankanhalli M (2012) Adaptive transformation for robust privacy protection in video surv. Advances in Multimedia 2012:1–14, DOI 10.1155/2012/639649
- Sarwar et al. (2016) Sarwar O, Rinner B, Cavallaro A (2016) Design space exploration for adaptive privacy protection in airborne images. In: Proc. IEEE Advanced Video and Signal-based Surv., Colorado Springs, USA, pp 159–165
- Sarwar et al. (2018) Sarwar O, Rinner B, Cavallaro A (2018) Temporally smooth privacy-protected airborne videos. In: Proc. IEEE Int. Conf. on Intelligent Robots and Systems, Madrid, Spain, pp 1–6
- Schiff et al. (2007) Schiff J, Meingast M, Mulligan DK, Sastry S, Goldberg K (2007) Respectful cameras: detecting visual markers in real-time to address privacy concerns. In: Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, San Diego, USA, pp 971–978, DOI 10.1109/IROS.2007.4399122
- Schroff et al. (2015) Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Boston, USA, pp 815–823, DOI 10.1109/CVPR.2015.7298682
- Sohn et al. (2011) Sohn H, Wesley DN, Man Ro Y (2011) Privacy protection in video surveillance systems: Analysis of subband-adaptive scrambling in JPEG XR. IEEE Trans on Circuits and Systems for Video Technology 21(2):170–177, DOI 10.1109/TCSVT.2011.2106250
- Waharte and Trigoni (2010) Waharte S, Trigoni N (2010) Supporting search and rescue operations with UAVs. In: Proc. Int. Conf. on Emerging Security Technologies, Canterbury, UK, pp 142–147, DOI 10.1109/EST.2010.31
- Wickramasuriya et al. (2004) Wickramasuriya J, Datt M, Mehrotra S, Venkatasubramanian N (2004) Privacy protecting data collection in media spaces. In: Proc. Int. Conf. on Multimedia, New York, USA, pp 48–55, DOI 10.1145/1027527.1027537
- Winkler and Rinner (2011) Winkler T, Rinner B (2011) Securing Embedded Smart Cameras with Trusted Computing. EURASIP J Wirel Commun Netw 2011:8:1–8:20, DOI 10.1155/2011/530354
- Zhang et al. (2018) Zhang X, Seo S, Wang C (2018) A lightweight encryption method for privacy protection in surveillance videos. IEEE Access 6:18074–18087, DOI 10.1109/ACCESS.2018.2820724
- Zhang et al. (2014) Zhang Y, Lu Y, Nagahara H, Taniguchi Ri (2014) Anonymous camera for privacy protection. In: Proc. Int. Conf. on Pattern Recognition, Stockholm, Sweden, pp 4170–4175
- Zhu et al. (2017) Zhu S, Zhang C, Zhang X (2017) Automating visual privacy protection using a smart led. In: Proc. Int. Conf. on Mobile Computing and Networking, Snowbird, Utah, USA, pp 329–342, DOI 10.1145/3117811.3117820
- Zhu and Ramanan (2012) Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Providence, USA, pp 2879–2886, DOI 10.1109/CVPR.2012.6248014