Concealing the identity of faces in oblique images with adaptive hopping Gaussian mixtures

Concealing the identity of faces in oblique images with adaptive hopping Gaussian mixtures

Omair Sarwar O. Sarwar Institute of Networked and Embedded Systems, Alpen-Adria-Universität Klagenfurt, Austria, and Centre for Intelligent Sensing, Queen Mary University of London, UK
22email: omair.sarwar@aau.atB. Rinner Institute of Networked and Embedded Systems, Alpen-Adria-Universität Klagenfurt, Austria
44email: bernhard.rinner@aau.atA. Cavallaro Centre for Intelligent Sensing, Queen Mary University of London, UK
66email: a.cavallaro@qmul.ac.uk
   Bernhard Rinner O. Sarwar Institute of Networked and Embedded Systems, Alpen-Adria-Universität Klagenfurt, Austria, and Centre for Intelligent Sensing, Queen Mary University of London, UK
22email: omair.sarwar@aau.atB. Rinner Institute of Networked and Embedded Systems, Alpen-Adria-Universität Klagenfurt, Austria
44email: bernhard.rinner@aau.atA. Cavallaro Centre for Intelligent Sensing, Queen Mary University of London, UK
66email: a.cavallaro@qmul.ac.uk
   Andrea Cavallaro O. Sarwar Institute of Networked and Embedded Systems, Alpen-Adria-Universität Klagenfurt, Austria, and Centre for Intelligent Sensing, Queen Mary University of London, UK
22email: omair.sarwar@aau.atB. Rinner Institute of Networked and Embedded Systems, Alpen-Adria-Universität Klagenfurt, Austria
44email: bernhard.rinner@aau.atA. Cavallaro Centre for Intelligent Sensing, Queen Mary University of London, UK
66email: a.cavallaro@qmul.ac.uk
Received: date / Accepted: date
Abstract

Cameras mounted on Micro Aerial Vehicles are increasingly used for recreational photography. However, aerial photographs of public places often contain faces of bystanders thus leading to a perceived or actual violation of privacy. To address this issue, we propose to pseudo-randomly modify the appearance of face regions in the images using a privacy filter that prevents a human or a face recogniser from inferring the identities of people. The filter, which is applied only when the resolution is high enough for a face to be recognisable, adaptively distorts the face appearance as a function of its resolution. Moreover, the proposed filter locally changes its parameters to discourage attacks that use parameter estimation. The filter exploits both global adaptiveness to reduce distortion and local hopping of the parameters to make their estimation difficult for an attacker. In order to evaluate the efficiency of the proposed approach, we use a state-of-the-art face recognition algorithm and synthetically generated face data with 3D geometric image transformations that mimic faces captured from an MAV at different heights and pitch angles. Experimental results show that the proposed filter protects privacy while reducing distortion and exhibits resilience against attacks.

Keywords:
Privacy protection hopping Gaussian blur micro aerial vehicles

AMC
Airborne Mobile Camera
IMU
Inertial Measurement unit
UAV
Unmanned Aerial Vehicle
MAV
Micro Aerial Vehicle
UAS
Unmanned Aerial System
CCTV
Closed Circuit Television
PSNR
Peak Signal to Noise Ratio
SSIM
Structural Similarity Index
BSIA
British Security Industry Association
CEN
European Committee for Standardization
AC
Axis Communication
ROI
Region of Interest
VSN
Visual Sensor Network
IR
Infra Red
EO
Electro Optics
UTC
Coordinated Universal Time
GPS
Global Positioning System
IMU
Inertial Measurement Unit
RFID
Radio Frequency Identification
OCR
Optical Character Recognition
PRM
Proposed Research Module
FOV
Field of View
AHGMM
Adaptive Hopping Gaussian Mixture Model
PSF
Point Spread Function
PRNG
pseudorandom number generator
LFW
Labelled Faces in the Wild
3DMM
3D Morphable Model
AGB
Adaptive Gaussian Blur
SVGB
Space Variant Gaussian Blur
FGB
Fixed Gaussian Blur
CCD
Charge Coupled Device
SLM
Spatial Light Modulator
ROC
Receiver Operating Curve

1 Introduction

MAVs are becoming common platforms for a number of civilian applications such as search and rescue (Waharte and Trigoni, 2010), disaster management (Quaritsch et al., 2010) and news reporting (Babiceanu et al., 2015). Moreover, individuals use MAVs equipped with high resolution cameras for recreational photography and videography in public places during sports activities and social gatherings (Hexo+, 2018; AirDog, 2018). Such use in public places raises privacy concerns as bystanders who happen to be within the field of view of the camera are captured as well. The identity of bystanders could be protected by locating and removing (or sufficiently distorting) key image regions, such as faces, using algorithms called privacy filters. However, in order to maintain the aesthetic value of an image, only a minimal distortion of the image content should be allowed.

A privacy filter for recreational aerial photography should satisfy the following properties: (a) introduce only a minimal distortion; (b) be robust against attacks; and (c) be computationally efficient. Minimal distortion is necessary to maintain quality of a protected image close to the unprotected one so that the attention of a viewer is not diverted. Therefore blanking out a face (Schiff et al., 2007) is not a desirable option. Robustness is important to avoid privacy violations by various attacks, e.g. brute-force, naïve, parrot and reconstruction attacks (Kundur and Hatzinakos, 1996; Boult, 2005; Newton et al., 2005; Dufaux and Ebrahimi, 2008; Erdelyi et al., 2014; Korshunov and Ebrahimi, 2014; Dong et al., 2016). A brute-force attack tries to decipher the protected probe images by an exhaustive search (Boult, 2005; Dufaux and Ebrahimi, 2008). Other attacks use gallery images in addition to the protected probe images (Newton et al., 2005; Erdelyi et al., 2014; Korshunov and Ebrahimi, 2014; Dong et al., 2016). In a naïve attack, the protected probe images are compared against the unprotected gallery images (Newton et al., 2005; Erdelyi et al., 2014; Korshunov and Ebrahimi, 2014). In a parrot attack, the attacker has knowledge about the privacy filter and can transform the gallery images into the distorted domain (Newton et al., 2005). In a reconstruction attack, the attacker has some knowledge of how to (partially) reconstruct the probe image from the protected to the unprotected domain (Kundur and Hatzinakos, 1996). Examples of reconstruction methods include inverse filtering and super-resolution techniques (Kundur and Hatzinakos, 1996; Dong et al., 2016). Finally, computational efficiency is desirable when the filter operates using the limited computational and battery power of a MAV.


Figure 1: Block diagram of the proposed Adaptive Hopping Gaussian Mixture Model filter. KEY – , : number of pixels (px) per unit distance (cm) (pixel densities) of a sensitive region ; , : altitude and tilt angle of the camera used to calculate the pixel densities; : control signal generated from the pixel densities to decide when to protect ; : sub-regions of ; , : standard deviations for the hopping Gaussian mixture model that filters to generate the protected sub-regions ; : protected image.

Privacy filters for aerial photography need to face challenges caused by the ego-motion of the camera, changing illumination conditions, and variable face orientation and resolution. Recent frameworks that support facial privacy-preservation in airborne cameras are Generic Data Encryption (Kim et al., 2014), Unmanned Aircraft Systems-Visual Privacy Guard (Babiceanu et al., 2015) and Adaptive Gaussian Blur (Sarwar et al., 2016). Generic Data Encryption sends an encrypted face region to a privacy server that Gaussian blurs or mosaics the face and then forwards it to an end-user. Unmanned Aircraft Systems-Visual Privacy Guard (Babiceanu et al., 2015) and Adaptive Gaussian Blur (Sarwar et al., 2016) are aimed instead at on-board implementation with an objective to reduce latency and discourage brute-force attacks on the server (Kim et al., 2014). Adaptive Gaussian Blur adaptively configures the Gaussian kernel depending upon the face resolution in order to minimise distortion, while Unmanned Aircraft Systems-Visual Privacy Guard blurs faces with a fixed filter. These methods are prone to parrot attacks (Newton et al., 2005) on the Gaussian blur.

In this paper, we present a novel privacy protection filter to be used on-board an MAV. The proposed filter distorts a face region with secret parameters to be robust to naïve, parrot and reconstruction attacks. The distortion is minimal and adaptive to the resolution of the captured face: we select the smallest Gaussian kernel that reduces the face resolution below a certain threshold. The selected threshold protects the face against the naïve attack as well as maintains its resolution at a specified level. To prevent other attacks, we then insert supplementary Gaussian kernels in the selected Gaussian kernel and hop their parameters locally using a pseudorandom number generator (PRNG) so their estimation is difficult from the filtered face image. The block diagram of the proposed filter is shown in Figure 1.

In contrast to airborne photography, an updated work based on the proposed filter is presented in Sarwar et al. (2018), specifically for the airborne videography. The main contributions of this paper are: (1) basic idea of the Gaussian hopping kernels and their details, (2) a large-scale synthetic face image data set emulating faces captured from an MAV, and (3) extensive experiments to validate the proposed Gaussian hopping kernels, including the reconstruction attacks.

The paper is organised as follows. Sec. 2 covers the state-of-the-art in visual privacy protection filters. Sec. 3 defines the problem. Sec. 4 describes the proposed algorithm, and discusses its computational complexity and security level. Sec. 5 presents our face data set generation and Sec. 6 discuss the experimental results. Finally, Sec. 7 concludes the paper.

2 Background

Visual privacy protection filters can be applied as pre-processing or post-processing (Fig. 2).


Figure 2: A taxonomy of visual privacy protection filters.

Pre-processing privacy filters are irreversible and operate during image acquisition to prevent a camera from capturing sensitive regions. These filters disable the software or hardware of the camera or notify about photography prohibition (Safe Haven, 2003). Hardware based filters prevent the camera from taking images for example by bursting back an intense light for flash photography (Eagle Eye, 1997; Zhu et al., 2017) or by detecting human faces using an infrared sensor and then obfuscating using a spatial light modulator sensor placed in front of the Charge Coupled Device (CCD) sensor (Zhang et al., 2014).

Post-processing privacy filters protect sensitive regions after image acquisition and can be reversible or irreversible. Reversible filters conceal sensitive regions using a private key, which can later be used to recover the original sensitive region. Irreversible filters deform the features of a sensitive region permanently. Both reversible and irreversible filters can be non-adaptive or adaptive.

Reversible non-adaptive filters are based on generic encryption (Boult, 2005; Chattopadhyay and Boult, 2007; Rahman et al., 2010; Winkler and Rinner, 2011; Zhang et al., 2018). Reversible adaptive filters include scrambling (Dufaux and Ebrahimi, 2006, 2008; Baaziz et al., 2007; Sohn et al., 2011; Ruchaud and Dugelay, 2017), warping (Korshunov and Ebrahimi, 2013b) and morphing (Korshunov and Ebrahimi, 2013a). While reversible adaptive filters are robust against a parrot attack, their protected faces can be compromised by spatial-domain (Jiang et al., 2016a, b) or frequency-domain attacks (Rashwan et al., 2015).

Irreversible non-adaptive filters blank out (Schiff et al., 2007; Koelle et al., 2018) or replace a face with a de-identified representation (Newton et al., 2005). For example, to maintain k-anonymity, the algorithm ”k-Same” (Newton et al., 2005) replaces k faces with their average face. Variants of this algorithm use additional specialised detectors to then preserve attributes such as facial expressions, pose, gender, race, age (Gross et al., 2006; Du et al., 2014; Lin et al., 2012; Letournel et al., 2015; Meden et al., 2018). Irreversible non-adaptive filters are robust to parrot attacks. Irreversible adaptive filters lower the resolution of a sensitive region so that humans or algorithms cannot recognise the identity. Examples include pixelation (Chinomi et al., 2008), Gaussian blur (Wickramasuriya et al., 2004) and cartooning (Erdelyi et al., 2014). The kernel size of the privacy filters can be manually selected (Korshunov and Ebrahimi, 2014; Erdelyi et al., 2014) or the centre kernel size is manually selected and then the Space Variant Gaussian Blur (SVBG) filter (Saini et al., 2012) automatically decreases the kernel size from the centre to the boundary of the detected face. AGB (Sarwar et al., 2016) exploits the different horizontal and vertical resolutions that are typical in aerial photography, and automatically adapts an anisotropic kernel based on the resolution of the detected face. However, irreversible adaptive filters are vulnerable to parrot attacks.

DCT-S

PICO

GARP

UAS-VPG

Cartooning

SVGB

ODBVP

AGB

Proposed

Distortion adaptive control image based
navigation sensors
2D kernel isotropic
anisotropic
Robustness to brute-force attack
to naïve attack
to inverse filter attack
to super-resolution attack
to parrot attack with detectors
without detectors
Computational simplicity
Table 1: Post-processing privacy filters. KEY – DCT-S: Discrete Cosine Transform Scrambling (Dufaux and Ebrahimi, 2008); PICO: Privacy through Invertible Cryptographic Obscuration (Boult, 2005); GARP: Gender, Age and Race Preservation (Du et al., 2014); UAS-VPG: Unmanned Aircraft Systems-Visual Privacy Guard (Babiceanu et al., 2015); Cartooning (Erdelyi et al., 2014); SVGB: Space Variant Gaussian Blur (Saini et al., 2012); ODBVP: Optimal Distortion-Based Visual Privacy (Korshunov and Ebrahimi, 2014); AGB: Adaptive Gaussian Blur (Sarwar et al., 2016). Adaptive control modulates the strength of a privacy filter.

As a summary, Table 1 compares representative filters for the following categories: reversible & adaptive (Dufaux and Ebrahimi, 2008), reversible & non-adaptive (Boult, 2005), and irreversible & non-adaptive filters (Du et al., 2014). The rest (Babiceanu et al., 2015; Erdelyi et al., 2014; Saini et al., 2012; Korshunov and Ebrahimi, 2014; Sarwar et al., 2016) and proposed are irreversible & adaptive filters.

3 Problem Definition

Let the set contain face data of subjects, where represents the identity (labels). Let each subject appear in at most images, i.e. . Let be the gallery and probe sets, respectively. Usually , where is the cardinality of a set, and .

Let a privacy filter distort image features in order to reduce the probability for an attacker to correctly predict labels. This operation produces a protected probe set , whose distortion depends on , where indicates the horizontal and vertical direction in an image. Let the distortion generated by be measured by the Peak Signal to Noise Ratio (PSNR):

(1)

where is the dynamic range of the pixel values. The mean square error, MSE, between the pixel intensities of an unprotected, , and protected, , face is

(2)

where and are width and height of , respectively.

We express the privacy level of a face region as the accuracy of a face recogniser (Erdelyi et al., 2014; Korshunov and Ebrahimi, 2014). The value of is the commutative rank-n in face identification or the Equal Error rate (EER) in face verification. We consider in this paper face verification, thus

(3)

where and are true positives and true negatives, respectively. Our target is to force a face recogniser of an attacker to have the accuracy of random classifier, which for face verification is .

We therefore aim to design that irreversibly but minimally distorts the appearance of so that the identity is not recognisable with a probability higher than a random guess. If , the ideal distortion parameter, , should be derived as:

(4)

where . The first term aims to introduce a minimal distortion, whereas the second term leads the classification results to be equivalent to that of a random classifier, irrespective of whether the filtered or reconstructed face is compared against the unprotected, filtered or reconstructed gallery data sets. The second term objective is dependent upon the recognition capability of a face recogniser and is heuristically addressed for a given face recogniser (Chriskos et al., 2016; Erdélyi et al., 2017; Pittaluga and Koppal, 2017).

The content of should be protected against naïve-T, parrot-T and reconstruction attacks. Let an attacker have access to , where is the filtered gallery data set and is the filtered and reconstructed gallery data set. An attacker can modify , , or both, to correctly predict of . In a naïve attack (here referred to as naïve-T attack), a privacy filter is applied on to generate a protected probe data set , while the unaltered is used for training (Newton et al., 2005). A parrot attack (here referred to as parrot-T attack), learns the privacy filter type and its parameters (e.g. Gaussian blur of certain standard deviation used to generate ). Then, the learned filter is applied on to generate a privacy protected gallery data set . Finally, and are used for training and testing, respectively (Newton et al., 2005). In a reconstruction attack, the discriminating features of are first restored (e.g. using an inverse filter or a super-resolution algorithm) to generate a reconstructed probe data set and then compared against or a reconstructed gallery data set . An inverse filter first estimates the parameters of a privacy filter using and then performs an inverse operation to reconstruct the original faces (Kundur and Hatzinakos, 1996). Similarly, a super-resolution algorithm first learns embeddings between the high-resolution and their corresponding low-resolution faces and then reconstructs the high-resolution faces for (Dong et al., 2016).

4 Proposed Approach

In order to minimally distort as well as to achieve robustness against brute-force, naïve-T, parrot-T and reconstruction attacks, we propose the Adaptive Hopping Gaussian Mixture Model (AHGMM) algorithm. The AHGMM consists of a globally estimated optimal Gaussian Point Spread Function (PSF) and supplementary Gaussian PSFs added inside the optimal Gaussian PSF. For a single supplementary Gaussian PSF inside an optimal Gaussian PSF, the AHGMM is illustrated in Fig. 3, while the pseudo-code is given in Algorithm 1. A list of important notations is presented in Appendix 7.

Figure 1 shows the processing diagram of our proposed framework and the different blocks of it are explained in more details in the following subsections.

4.1 Pixel Density Estimation

Let an MAV capture an image while flying at an altitude of meters. Let the principal axis of its on-board camera be tilted by from the nadir direction (see Figure 4). We assume that height and tilt angle of the camera can be estimated.


Figure 3: Visualisation of local filtering in AHGMM. The face region is divided into sub-regions and each sub-region is convolved () with a hopping Gaussian mixture model kernel , which is made by an optimal Gaussian function and one (or more) supplementary Gaussian function added inside the optimal Gaussian function. While convolving with each sub-region of the face, the optimal and the supplementary Gaussian functions change their parameters, i.e. mean and standard deviation, which consequently changes the shape of the Gaussian mixture model based kernel.

A value of generates an oblique image. Let be the height of the face above ground111While each image could contain faces, for simplicity we consider in this paper only the case .. We represent the face region in the image as , which is viewed at an angle .

Let represent the pixel density (px/cm) around the centre of . If and represent the physical dimensions of a pixel in the horizontal and vertical direction, respectively and is the focal length of the camera, the horizontal density for a pixel around (Sarwar et al., 2016) is

(5)

and the vertical density , by exploiting the small angle approximation for a single pixel of the image sensor (Sarwar et al., 2016), is

(6)

Let define whether is naturally protected () because of a low horizontal and vertical density, or not () (Sarwar et al., 2016):

(7)

where and are pixel densities at which a state-of-the-art machine algorithm starts recognising human faces, and simply called thresholds. If , then the original frame can be transmitted without any modifications. Otherwise, should be protected by a privacy filter to reduce its pixel densities below and . When is not inherently protected, we assume that the corresponding bounding box is given.

Input: unprotected image
                detected face region
                pixel density, where
     
Output: protected image

1:procedure FilterAHGMM()
2:     for  do
3:          
4:          
5:     end for
6:      sub-regions of
7:     for  do
8:          for  do
9:               for  do
10:                    if  then
11:                         
12:                         
13:                    else
14:                         
15:                         
16:                    end if
17:               end for
18:               
19:                compute Gaussian functions
20:                generate weights
21:          end for
22:           Gaussian mixture model
23:          
24:     end for
25:      apply global filter on
26:      replace with in
27:     return
28:end procedure
Algorithm 1 AHGMM

4.2 Optimal Gaussian PSF

A 2D PSF , or impulse response, is the output of a filter when the input is a point source. In the discrete domain (Oppenheim et al., 1996), it is given as , where is the convolution operation and

(8)
Figure 4: Capturing an image with an airborne camera at height . The principal axis of the camera is tilted by from the nadir direction . The face region , at height above the ground, is viewed at an angle . The variables and represent the horizontal and vertical pixel density of at its centre in the captured image. Four sample images show a scrambled, blanked, Gaussian blurred and AHGMM filtered image, which is captured at .

In the case of Gaussian blur, is an approximated Gaussian function of mean and standard deviation (Saini et al., 2012; Korshunov and Ebrahimi, 2014; Sarwar et al., 2016), and thus called a Gaussian PSF of parameter . More specifically, the parameter controls the distortion strength of and provides pixel density in , respectively.

As a higher results into lower , we first find the minimum value called optimal parameter of that makes . As a result, provides the minimum distortion in while making it robust against the naïve-T attack (i.e. ). Increasing beyond increases the distortion without improving the privacy level as the recogniser performance is already at the level of the random classifier. For a face captured from an MAV with pixel densities , we calculate of an optimal Gaussian PSF (lines 2-5 in Algorithm 1), where like in traditional Gaussian blur (Saini et al., 2012; Korshunov and Ebrahimi, 2014; Sarwar et al., 2016) and (Sarwar et al., 2016) is estimated as follows:

A Gaussian PSF of standard deviation in the spatial domain is another Gaussian PSF of standard deviation in the frequency domain and both the Gaussian PSFs are related as

(9)

where is measured in cycles/cm, in px and in px/cm. Let represents the Nyquist frequency of . Let is the highest spatial frequency component that we want to completely remove using a low pass filter, i.e. Gaussian blur. In other words, is the Nyquist frequency of , i.e. pixel density after filtering. Both and are related as

(10)

As we are interested in removing frequency components beyond , we can select because the amplitude response of a Gaussian PSF at three times of its standard deviation is very close to zero and multiplication (convolution in space domain) with such a Gaussian PSF will suppress frequencies larger than . Substituting in Eq. 10, in the resulting relation Eq. 9 and finally rearranging gives the optimal standard deviation of Gaussian PSF as

(11)

4.3 Hopping GMM Kernels

Filtering with the optimal Gaussian PSF defined by would only protect from a naïve-T attack but not from a parrot-T attack and a reconstruction attack. To ensure that the probability of correctly predicting the label of is not increased in case of the parrot-T attack (i.e. ) as well as the reconstruction attack (i.e.  or ), we secretly modify to while generating so that an adversary is unable to accurately reconstruct face region , or even generate and . For this purpose, we generate a set which consists of sub-regions in such a way that each sub-region covers a small area of :

(12)

The size of (in pixels) affects the total number of sub-regions per face region , which could influence its privacy level. Smaller values of (larger sub-regions) result in a reduced distortion.

After finding , and generating , we make a hopping mixture of Gaussian for each sub-region, i.e. we pseudo-randomly change to for each . Moreover, we select supplementary Gaussian PSFs inside this optimal Gaussian PSF and vary their parameters based on pseudo-random weights (lines 9-17 in Algorithm 1).

Let set contains the parameters of the modified optimal and supplementary Gaussian PSFs for each sub-region, and is represented as

(13)

where is the number of the supplementary Gaussian PSFs. The element represents the modified optimal Gaussian PSF given by

(14)
(15)

while the remaining elements (i.e. ) belong to the supplementary Gaussian PSFs. These elements are calculated as

(16)
(17)

where, and are normalised pseudo-randomly generated numbers and control the local distortion in filtering. The variable controls the relative size of the supplementary Gaussian PSF w.r.t. the optimal Gaussian PSF.


(a)
(b)

(c)
Figure 5: Minimising blocking artefacts of spatially hopping Gaussian functions in AHGMM filter by a convolution with a global kernel. (a) Original image of pixels from the LFW data set, (b) image after local filtering in AHGMM showing blocking artefacts and (c) image after the local filtering followed by the global filtering in AHGMM.

After generating the parameters of the Gaussian PSFs, a set representing 2D anisotropic-discretised Gaussian PSFs corresponding to is created as

(18)

where each is calculated (line 19 in Algorithm 1) as (Popkin et al., 2010)

(19)
(6.21, 4.63) (6.21, 4.56) (3.11, 2.17) (3.11, 2.00) (1.55, 0.89) (1.55, 0.74) (0.78, 0.29) (0.78, 0.20)
(a)
(b) px/cm
(c) px/cm
(d) px/cm
Figure 6: Visual comparison between fixed Gaussian blur (FGB), AGB (Sarwar et al., 2016) and AHGMM on the multi-resolution synthetically generated face data set. (a) Original images with pixel densities decreasing from left to right due different height and pitch angle. (.,.) indicates the horizontal and vertical pixel density in px/cm, respectively. (b-d) For various thresholds (, ), results of FGB (first row), AGB (second row) and AHGMM filter (third row). For each threshold, FGB is selected w.r.t. the highest pixel density image in the data set. FGB does not adapt its parameters and therefore results into almost blanking out the image with smaller pixel density. In contrast, both AGB and AHGMM maintain high smoothness by varying their parameters depending upon the pixel densities of an image. Comparatively, AGB produces smoother images, while AHGMM filter creates blocking artefacts due to spatial switching of its parameters.
(a)
(b)
Figure 7: Sample images belonging to (a) a single subject and (b) multiple subjects from our synthetically generated airborne data set based on the LFW data set (Huang et al., 2007). In each row, the pitch angle varies from to in steps from left to right, while the image resolution remains constant, i.e. first row: pixels, second row: pixels, third row: pixels, fourth row: pixels and fifth row: pixels.

where

(20)

and

(21)

with . In order to develop a mixture model from the discretised Gaussian PSFs of each sub-region, a set of weights is required. We again utilise a PRNG to generate such that

(22)

Finally, a set of mixture models is generated for each sub-region (line 22 in Algorithm 1) as

(23)

where each element is calculated as

(24)

4.4 Local and Global Filtering

We have now discretised Gaussian mixture models in for sub-regions of . We locally convolve each sub-region (Eq. 12) with their respective to make a protected sub-region :

(25)

where . Changing the convolutional kernel for each sub-region generates blocking artefacts (see Fig. 5). To smooth these artefacts, we apply a global convolution filter (line 25 in Algorithm 1) with a Gaussian kernel of zero mean and standard deviation

(26)

where represents the sub-region size in pixels. As a result, a smoothed protected face is developed which is replaced in the captured image to generate a privacy protected image . Fig. 6 shows few sample images filtered by AHGMM at different thresholds.

4.5 Computational Complexity

The generation of a convolutional kernel is more complex in AHGMM than in the adaptive Gaussian blur filter (Sarwar et al., 2016). In fact, the latter only needs to compute a single Gaussian function, while AHGMM requires the computation of Gaussian functions. Moreover, the adaptive Gaussian blur exploits the separability property of 2D convolutional kernels, i.e. , to reduce the number of multiplications and additions from to + ( and represent the width and height of in pixels, respectively). Instead, AHGMM dynamically reconfigures the convolutional kernel after processing each sub-region and therefore requires exactly multiplications and additions.

5 Dataset Generation

To the best of our knowledge, there is no large publicly available face dataset collected from an MAV. We therefore generate face images as if they were captured from an MAV via geometric transformation and down-sampling of the LFW dataset (Huang et al., 2007). The LFW dataset was collected in an unconstrained environment with extreme illumination conditions and extreme poses. We use the standard verification benchmark test of the LFW dataset (12000 images of 4281 subjects), divided into 10-folds for cross-validation. Each fold contains 600 images of the same subject and 600 images of different subjects. We use the deep funnelled version of the LFW dataset.

(a)
(b)
(c)
(d)
(e)
Figure 8: Sample images at different stages during the data set generation process. (a) Original image pixels, (b) image after fitting a 3D morphable model at pitch angle, (c) image with synthetic pitch effect produced by applying a 3D geometric transformation, (d) aligned image of pixels produced by applying an affine transformation computed by detecting eyes and nose location and (e) down-sampled image emulating an image captured at a different height.

Figure 8 shows sample images of the stages of the dataset generation pipeline. We fit a 3D Morphable Model (3DMM) (Bas et al., 2016) on an input image to detect 68 facial landmarks (Zhu and Ramanan, 2012) and then iteratively fit a 3DMM to generate a 3D image representation222Among the 12000 images, the landmark detector (Zhu and Ramanan, 2012) was unable to detect 68 facial landmarks on 74 images. Therefore, we were unable to fit a 3DMM and used the original 74 images in order to comply with the standard verification test script of the LFW data set.. As there may be only a few degrees pitch of the subject captured in the images (e.g. a person looking slightly downward or upward), we rotate the 3D image at pitch by applying a geometric transformation computed from the estimated pose of the fitted 3DMM. This disturbs the image alignment of the original data set, so a realignment is required, which we perform after generating the pitch effect. The synthetic pitch angles start from to with a step size of and project it back to generate a corresponding 2D image. In order to align this image so that the eyes and nose appear at the same place among the images belonging to the same pitch angle, we apply an affine transformation computed by detecting eyes and nose tip using Dlib library (King, 2009) such that the transformed face has a resolution of pixels. As the detection accuracy of the eyes and nose decrease with increasing pitch angle, we generate a ground truth (location of eyes and nose tip) of the pitch angle images and uses it for the higher pitch angle images.

Finally, to introduce different height effects for the synthetically generated images, we down-sample them with a factor of , , and generating images of , , , pixels, respectively. Thus, we increase the size of the original standard verification test of the LFW data set by times, i.e. from images to images. Fig. 7 shows the 40 sample images belonging to the same and different subjects.

We manually determined the values of and by

(27)
(28)

where is the cropped face size in pixels, is the pitch angle of the image and and are the average human face dimensions, i.e. the bitragion breadth of 15.45 cm and menton-crinion length of 20.75 cm, respectively (DoD, 2000).

6 Experimental Results

6.1 Experimental Set up

We compare AHGMM against Space Variant Gaussian Blur (SVGB) (Saini et al., 2012), Adaptive Gaussian Blur (AGB) (Sarwar et al., 2016) and Fixed Gaussian Blur (FGB), which uses a constant Gaussian kernel defined with respect to the highest resolution face. Thus, we estimate the kernel for FGB as in (Sarwar et al., 2016) for the face with pixels at pitch angle. For the SVGB filter, we divide the face into four concentric circles and reduce the kernel size by while radially moving out between two consecutive regions as in (Saini et al., 2012). Although the kernel for the innermost region was manually selected in the original work, we choose the anisotropic kernel as estimated by the AGB (Sarwar et al., 2016) and convert it into an isotropic kernel for a fair comparison. We use a block size of and for the AHGMM.

To compare privacy filters, we measure the face verfication accuracy using OpenFace (Amos et al., 2016), an open source implementation of Google’s face recognition algorithm FaceNet (Schroff et al., 2015). OpenFace uses a deep Convolutional Neural Network (CNN) as a feature extractor, which is trained by a large face data set (500k images). This feature extractor is applied on the training and test images for their representations (embeddings) which are used for classification (Schroff et al., 2015).

To measure distortion as in (Erdelyi et al., 2014; Nawaz and Ferryman, 2015), we apply the PSNR, the power ratio of the original image with respect to the filtered image.

We perform experiments with 480,000 images (consisting of 5 different resolutions and 8 different pitch angles) to determine the validity of the proposed AHGMM to protect the identity information of an individual. For this purpose, we analyse the effect of a naïve-T attack, a parrot-T attack, an inverse filter attack and a super-resolution attack. Moreover, we quantify the corresponding fidelity degradation caused by the AHGMM.

As AGB and SVGB do not use any secret key, we evaluate them only using their accurate parameters in the parrot-T, inverse filter and super-resolution attacks. In contrast, any of these attacks on AHGMM can be further divided into three sub-attacks: optimal kernel, pseudo AHGMM and accurate AHGMM. In the optimal kernel sub-attack, we assume that an attacker is able to estimate the parameters of the optimal kernel and applies the optimal kernel to the entire face. In the pseudo AHGMM sub-attack, we assume that the attacker knows the optimal kernel and randomly modifies the filter parameter for the sub-regions. In the accurate AHGMM sub-attack, we assume that the attacker has access to the secret key and can decipher all filter parameters for the sub-regions. As this prior-knowledge can be exploited for both probe and gallery images, we therefore evaluate AHGMM under 13 different scenarios stated in Table 2.

Gallery images
unprotected protected
unchanged reconstructed
IF SR

Probe images

unprotected

naïve-BL N/A N/A N/A

protected

unchanged

naïve-T
parrot-T
- optimal
- pseudo
- accurate

reconstructed

IF

naïve-IF
-optimal
-pseudo
-accurate
parrot-IF
-accurate

SR

naïve-SR
-optimal
-pseudo
-accurate
parrot-SR
-accurate
Table 2: Attacks used to evaluate the privacy level of the proposed AHGMM algorithm. Both the gallery faces and the probe faces can be protected or unprotected (naïve-BL). Moreover, the protected faces could be either unchanged or reconstructed (e.g. through an inverse-filter (IF) or super-resolution (SR)). Finally, any AHGMM attack could be further divided into three sub-attacks corresponding to the prior-knowledge of an attacker: optimal, pseudo and accurate.

We assume that an attacker is able to determine the pitch angle of a protected face using the background information of an image captured from an MAV and can apply a geometric transformation to transform the gallery images at that pitch angle. Therefore, in all the following attacks, both the gallery and the probe images are at the same pitch angle which can be protected or unprotected depending upon the attack type. Moreover, we use the same resolution for both the gallery images and the probe images.

6.2 Naïve-T Attack

First of all, we perform a naïve-BL attack which shows the baseline face verfication accuracy when both the probe data set and the gallery data set are unprotected. The results of the naïve-BL attack are given in Fig. 9. After that we perform a naïve-T attack in which the gallery images are unprotected, while the probe images are protected using FGB, SVGB (Saini et al., 2012), AGB (Sarwar et al., 2016) and AHGMM. The results of this attack are given in Fig. 10 at different thresholds .

The naïve-BL attack shows that the accuracy of our synthetically generated data set decreases with the decrease of the face resolution and with the increase in the face pitch angle. However, this trend vanishes at high pitch angles, i.e.  and , where it shows slight randomness. Finally, for the low resolution faces ( pixels), the accuracy does not show any effect of the pitch angle and slightly oscillates. Therefore, we consider pixels inherently privacy protected and remove these images from the analysis of the privacy filters.


Figure 9: Face verification accuracy of a naïve-BL attack on our synthetically generated face data set. In general, increases with increasing the face size except at high pitch angles of and degrees where it slightly fluctuates randomly. For pixels faces, is the lowest and rather independent of the pitch angle.
Naïve-T Parrot-T Parrot-T Parrot-T
(optimal kernel) (pseudo AHGMM) (accurate AHGMM)
Figure 10: Face verification accuracy achieved by naïve and parrot attacks on images protected by four different privacy protection filters at different thresholds : first row: px/cm, second row: px/cm, third row: px/cm, fourth row: px/cm, fifth row: px/cm. The filled marker shows the mean and the vertical bar indicates the standard deviation of for the multi-resolution images (, , , ). Legend: AHGMM, AGB (Sarwar et al., 2016), SVGB (Saini et al., 2012), FGB. Under the naïve-T attack, AHGMM posses the highest which converges towards as the is decreased and finally at px/cm, the difference between of AHGMM, AGB, SVGB and FGB becomes negligible, except unexpectedly at pitch angles and degrees. The parrot-T attack on AHGMM is divided into three sub-attacks: optimal kernel parrot-T attack, pseudo AHGMM parrot-T attack and accurate AHGMM parrot-T attack. In contrast to naïve-T attack, AHGMM provides the lowest under any type of the three parrot-T attacks and this fact becomes negligible at px/cm under accurate AHGMM parrot-T attack.
Figure 11: Receiver Operating Curves (ROCs) for the accurate AHGMM parrot-T attack at threshold px/cm. Each ROC is the mean of 10-curves generated by the 10-folds used for cross validation. Legend: Unprotected, AGB, SVGB, FGB, AHGMM. In each column, the image resolution remains constant, i.e. first column: , second column: , third column: and fourth column: pixels, while the pitch angle varies i.e. first row: , second row: , third row: , fourth row: , fifth row: , sixth row: , seventh row: and eighth row: . The legend values represent the Area Under Curve (AUC).

From the naïve-T attack, we are interested in finding the optimal threshold which defines the optimal kernel for AGB (Sarwar et al., 2016) (see Section 4 and Eq. 11). It is clear from Fig. 10 that the accuracy of the naïve-T attack decreases while decreasing the threshold. When the threshold reaches px/cm, the difference between the accuracy achieved by AGB (Sarwar et al., 2016) and a random classifier () becomes very small except, unexpectedly, at high pitch angles. This difference further decreases at px/cm and px/cm. Thus, the optimal threshold defining the optimal kernel can be px/cm, px/cm and px/cm. The later two thresholds decreases the accuracy negligibly but distort the images severely. Therefore, we decide to perform a trade-off analysis of the accuracy (under naïve, parrot attack and reconstruction attacks) and the distortion at these three thresholds.

At these three thresholds under the naïve-T attack, the accuracy of the AHGMM is higher as compared to the AGB (Sarwar et al., 2016). The main reason for this slightly higher accuracy is due to the under blurred sub-regions of the AHGMM filtered face as it hops its kernel below and above the optimal Gaussian kernel. In contrast, the accuracy of the Space Variant Gaussian Blur (Saini et al., 2012) is always lower than AGB and AHGMM. This is because SVGB uses an isotropic Gaussian kernel which deteriorates a face more severely as compared to the anisotropic kernel of the AGB and AHGMM filter. FGB possess the lowest accuracy at any threshold due to over blurring of all images except pixels images at pitch angle.

6.3 Parrot-T Attack

In the parrot-T attack, we filter both gallery and probe images and then evaluate the achieved accuracy. We study the parrot-T attack on AHGMM under three sub-attacks: optimal kernel parrot-T sub-attack, pseudo AHGMM parrot-T sub-attack and accurate AHGMM parrot-T sub-attack. The accuracy results of these sub-attacks are given in Fig. 10 at different thresholds , while Receiver Operating Curves for the accurate AHGMM parrot-T sub-attack at px/cm are presented in Fig. 11.

The parrot-T attack on state-of-the-art privacy filters increases the accuracy as compared to the naïve-T attack. Under the optimal kernel parrot sub-attack, our AHGMM shows the least accuracy improvement at any of the three thresholds. This is because the optimal kernel Gaussian blur is a spatially invariant blur that is not helpful in recognising spatially varying Gaussian blurred images, e.g. the AHGMM filtered images. Thus, our AHGMM provides the lowest accuracy against the parrot-T attack using the optimal kernel.

The pseudo AHGMM parrot-T sub-attack slightly improves the accuracy further as compared to the optimal kernel parrot-T sub-attack. The main reason is that both the gallery and the probe images are now filtered using spatially varying Gaussian blur. However, under the pseudo AHGMM sub-attack, the accuracy of AHGMM remains below the other three state-of-the-art privacy filters. Thus, our AHGMM provides the highest privacy protection even against the pseudo AHGMM parrot sub-attack.

Finally, the accurate AHGMM sub-attack improves the accuracy as compared to the optimal kernel and almost eqivalent to the pseudo AHGMM sub-attacks. Comparatively, even under the accurate AHGMM sub-attack, AHGMM performs better than FGB, AGB (Sarwar et al., 2016) and SVGB (Saini et al., 2012) at these three thresholds with the least improvement at px/cm.

From the accurate AHGMM sub-attack, it is apparent that our AHGMM permanently removes the sensitive information from the face and an attacker can not recognise it with a high accuracy even when he/she has access to the secret key. This is in contrast to the reversible filters, e.g. encryption/scrambling based filters, which can reconstruct the original face after having the secret key. Thus, our AHGMM is robust against the brute-force attack.

6.4 Inverse Filter Attack

In the inverse-filter (IF) attack, we reconstruct the probe images by deconvolving the protected face with an accurate or estimated kernel. We evaluate the IF attack under four sub-attacks: optimal kernel naïve-IF sub-attack, pseudo AHGMM naïve-IF sub-attack, accurate AHGMM naïve-IF sub-attack and accurate AHGMM parrot-IF sub-attack. Fig. 12 depicts the effect of inverse filtering on selected sample images protected with AGB, SVGB and AHGMM. Fig. 13 shows the achieved accuracies under the different sub-attacks at different values of , while Fig. 14 presents ROCs for the accurate AHGMM parrot-IF sub-attack at px/cm.

As can be seen in Fig. 12, the face reconstruction quality decreases when the threshold increases (increasing the filter kernel) even if the filter parameters are known. This is true for both space invariant Gaussian blur (AGB) and linear space variant Gaussian blur (SVGB). The main reason is that the boundaries of the face start propagating towards the center of the face as the threshold is decreased. Thus, it becomes difficult to distinguish between reconstructed faces at the lower thresholds (see Fig. 13).

In case of non-linear space variant blur (AHGMM), the reconstruction becomes more challenging even when the same hopping kernels are used as for the protection. The main reason, in addition to the boundary propagation, is that while deconvolving a sub-region, the IF incorrectly treats the adjacent subregions as if they were filtered with the same kernel, thus not enabling it to reconstruct the original face (see Fig 12). Consequently, it becomes difficult to accurately predict the label of the reconstructed face.

Threshold AGB (Sarwar et al., 2016) SVGB (Saini et al., 2012) AHGMM
() filtered reconstructed filtered reconstructed filtered reconstructed
optimal pseudo accurate
px/cm
px/cm
px/cm
Figure 12: Inverse filtering of protected faces at different thresholds . AGB and SVGB protected faces can be reconstructed by inverse filtering to some extent. Inverse filtering of AHGMM protected faces is hardly possible even if the hopping kernel parameters are known.
Naïve-IF Naïve-IF Naïve-IF Parrot-IF
(optimal kernel) (pseudo AHGMM) (accurate AHGMM) (accurate AHGMM)
Figure 13: Face verification accuracy achieved by an inverse filter (IF) attack on images protected by four different privacy protection filters at different thresholds : first row: px/cm, second row: px/cm, third row: px/cm. The filled marker shows the mean and the vertical bar indicates the standard deviation of for the multi-resolution images (, , , ). Legend: AHGMM, AGB, SVGB, FGB. The IF attack is investigated under four sub-attacks: optimal kernel naïve-IF, pseudo AHGMM naïve-IF, accurate AHGMM naïve-IF and accurate AHGMM parrot-IF attack. The AHGMM achieves a slightly higher under the naïve-IF attacks than the state-of-the-art filters, independently of the used threshold . In contrast, AHGMM achieves the lowest under the parrot-IF attack. As is close to under the naïve-IF attack for px/cm, we therefore do not perform experiments for px/cm.
Figure 14: Receiver Operating Curves (ROCs) for the accurate AHGMM parrot-IF attack at threshold px/cm. Each ROC is the mean of 10-curves generated by the 10-folds used for cross validation. Legend: Unprotected, AGB, SVGB, FGB, AHGMM. In each column, the image resolution remains constant, i.e. first column: , second column: , third column: and fourth column: pixels, while the pitch angle varies i.e. first row: , second row: , third row: , fourth row: , fifth row: , sixth row: , seventh row: and eighth row: . The legend values represent the Area Under Curve (AUC).

In contrast to naïve-IF attacks, parrot-IF attack is more severe and increases significantly the accuracy, especially for AGB, FGB and SVGB. AHGMM also shows the accuracy improvement but less than AGB, FGB and SVGB; and is more robust to an inverse filter attack even when using an accurate secret key.

6.5 Super-resolution Attack

In this attack, we reconstruct the filtered probe images with SRCNN (Dong et al., 2016). SRCNN first learns a mapping between the high-resolution images and their corresponding low-resolution version, and then applies this mapping to enhance the details of a low-resolution image. We learn the SRCNN mapping for iterations between the protected images (i.e. the low resolution) and their corresponding unprotected images (i.e. the high resolution) using the same data sets (91-images and Set5) as used in (Dong et al., 2016). As learning of the mapping is a time consuming process, we investigate the super-resolution attack for a single point of our synthetic data set: 12000 images each with pixels and pitch angle.

We evaluate the super-resolution (SR) attack under four sub-attacks: optimal kernel naïve-SR sub-attack, pseudo AHGMM naïve-SR sub-attack, accurate AHGMM naïve-SR sub-attack and accurate AHGMM parrot-SR sub-attack. Tab. 3 summarises the achieved accuracies under the different sub-attacks, while Fig. 15 presents the ROC for the accurate AHGMM parrot-SR sub-attack. Fig. 16 depicts a visual comparison of the super-resolution reconstruction for three sample faces protected by AGB, SVGB and AHGMM filters.

Attack type AGB SVGB AHGMM
optimal naïve-SR 0.592 (0.012) 0.566 (0.016) 0.515 (0.014)
pseudo AHGMM naïve-SR 0.520 (0.006)
accurate AHGMM naïve-SR 0.532 (0.018)
accurate AHGMM parrot-SR 0.634 (0.015) 0.583(0.034) 0.546 (0.018)
Table 3: Face verification accuracy after a super-resolution attack on faces protected by adaptive Gaussian blur (AGB), space variant Gaussian blur (SVGB) and AHGMM at threshold px/cm. The values of are given as , where indicates the mean and the standard deviation for the 10-fold cross validations. In the naïve-SR attack, the reconstructed probe faces are compared against the unprotected gallery images, while both the probe and the gallery images are super-resolved in the parrot-SR attack.

Figure 15: Receiver Operating Curve (ROC) for the accurate AHGMM parrot-SR attack at threshold px/cm. Each ROC is the mean of 10-curves generated by the 10-folds used for cross validation. Legend: Unprotected, AGB, SVGB, AHGMM. This test is performed only for a single resolution ( pixels) and pitch anfle (). The legend values represent the Area Under Curve (AUC).
Original AGB SVGB AHGMM
filtered restored filtered restored filtered restored
optimal pseudo accurate
Figure 16: Visual comparison of reconstructed faces with super-resolution algorithm SRCNN (Dong et al., 2016) for threshold px/cm. Reconstruction performance deteriorates from AGB (Sarwar et al., 2016) over SVGB (Saini et al., 2012) to AHGMM protected faces.

For the space invariant Gaussian blur (AGB), it is apparent from Fig. 16 that the SR attack can reconstruct the faces more effectively, even when the kernel size is quite high (i.e. px/cm). Therefore, the faces protected by AGB achieves a higher accuracy (see Tab. 3). In contrast, faces protected by linear space variant Gaussian blur (SVGB) are difficult to reconstruct. The main reason is that the SR mapping becomes erroneous especially for patches which contain parts processed by different kernels. However, SR can effectively reconstruct patches where the Gaussian blur is locally invariant (e.g. compare the areas around eyes of the SVGB restored faces in Fig. 16). The overall reconstruction is worse than for AGB and thus the achieved accuracy is lower.

Reconstruction by super-resolution is even more challenging for AHGMM protected faces. The main reason is that a single patch for learning the mapping contains several sub-regions each filtered with pseudo-randomly correlated Gaussian mixture models. Thus, the error in the learned SR mapping increases resulting in the lowest accuracy as compared to AGB and SVGB.

Similarly to parrot-IF attack, the accuracy improves for the parrot-SR attack where SR-reconstruction is also performed for the gallery images. Especially for AGB and SVGB, the similarity between (protected and reconstructed) gallery images and the (reconstructed) probe images increases. Thus, the accuracy increases. As for the other attacks, AHGMM is more robust to parrot attacks than AGB and SVGB, and achieves the lowest accuracy.

(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Figure 17: Trade-off analysis between the Face verification accuracy and the distortion provided by the different privacy filters under the naïve-T, parrot-T and inverse filter (IF) attacks at threshold px/cm. The distortion is measured by the Peak Signal to Noise Ratio (PSNR). Legend: AHGMM, AGB (Sarwar et al., 2016), SVGB (Saini et al., 2012), FGB. Under the naïve-T attack, our proposed AHGMM possesses almost equivalent to the state-of-the-art filter, but lowest under the parrot-T attacks. However, AHGMM has slightly lower PSNR as compared to AGB and SVGB, but much higher than FGB. (a) naïve-T attack, (b) accurate AHGMM parrot-IF attack, (c) optimal kernel naïve-IF attack, (d) optimal kernel parrot-T attack, (e) pseudo AHGMM naïve-IF attack, (f) pseudo AHGMM parrot-T attack, (g) accurate AHGMM naïve-IF attack and (h) accurate AHGMM parrot-T attack. For the last three naïve-IF and parrot-T attacks, the results of AGB, SVGB and FGB are the same and have been superimposed for the comparison. Please see Section 6.2, Section 6.3 and Section 6.4 for the details of the attacks.

6.6 Distortion Analysis

We measure the distortion of the FGB, SVGB (Saini et al., 2012), AGB (Sarwar et al., 2016) and AHGMM using PSNR. For a trade-off analysis between distortion and privacy, we plot the face verification accuracy against PSNR. The results of this trade-off analysis are presented in Fig. 17.

AGB (Sarwar et al., 2016) has the highest average PSNR values followed by SVGB (Saini et al., 2012), AHGMM and FGB. The main reason is that AGB uses a single anisotropic kernel instead of spatially linearly varying kernel used by SVGB (Saini et al., 2012). Although AHGMM also uses an anisotropic kernel like AGB, the spatial hopping phenomena of the Gaussian mixture model of the AHGMM results in high distortion (PSNR values) as compared to AGB and SVGB (see Fig. 6). FGB has the highest distortion as it does not change its parameters depending upon the resolution of the face.

7 Conclusion

We presented an irreversible visual privacy protection filter which is robust against a parrot, an inverse-filter and a super-resolution attack that are faced by an adhoc blurring of sensitive regions. The proposed filter is based on an adaptive hopping Gaussian mixture model. Depending upon the captured resolution of a sensitive region, the filter globally adapts the parameters of the Gaussian mixture model to minimise the distortion, while locally hop them pseudo-randomly so that an attacker is unable to estimate these parameters. We evaluated the validity of the AHGMM using a state-of-the-art face recognition algorithm and a synthetic face data set with faces at different pitch angles and resolutions emulating faces as captured from an MAV. The proposed algorithm provides the highest privacy level under a parrot, an inverse-filter and a super-resolution attack and an almost equivalent level of privacy to state-of-the-art privacy filters under a naïve attack.

Unlike face-de-identification approaches ((Newton et al., 2005; Gross et al., 2006; Du et al., 2014; Lin et al., 2012; Letournel et al., 2015; Chriskos et al., 2015)), we do not depend on an auxiliary visual detector (i.e. pose, facial expression, age, gender, race) to counter a parrot, an inverse-filter or a super-resolution attack. Moreover, unlike the encryption/scrambling filters ((Dufaux and Ebrahimi, 2006, 2008; Baaziz et al., 2007; Sohn et al., 2011; Korshunov and Ebrahimi, 2013b, a; Boult, 2005; Chattopadhyay and Boult, 2007; Rahman et al., 2010; Winkler and Rinner, 2011)), AHGMM prevents the recovery of the original face even with access to the seed of the PRNG.

We will make available to the research community the face dataset of 4281 subjects we generated to emulate faces captured from an MAV under varying poses and illumination conditions.

All the symbols used in the paper along with their meanings are summarised in Table 4.

Notation Meaning
unprotected, protected and reconstructed face region
centre of
width and height of
A data set including both gallery and probe data sets
unprotected, protected and reconstructed gallery data set
unprotected, protected and reconstructed probe data set
original and predicted identity labels
a privacy filter of parameter
a function that an attacker exploits
distortion introduced by
probability of predicting the label of a face
face verification accuracy
verification accuracy of a random classifier
focal length of the camera
physical dimension of a pixel in direction
height of a camera and face from ground level
vectors representing Nadir and principal axis of a camera
angle between , and ,
Number of sub-regions of
Number of supplementary Gaussian functions
pixel density (px/cm), where
threshold pixel density for privacy filtering
mean and standard deviation of a Gaussian PSF
mean and standard deviation of an optimal Gaussian PSF
randomly modified and for Gaussian PSF
randomly generated numbers for and
a tuple (, ), (, ) and ()
Nyquist frequency of and
frequency domain standard deviation corresponding to
scaling factor for
set of tuple containing parameters of Gaussian functions
a set of Gaussian functions
an element of
a set of weights for Gaussian mixture model
an element of
Gaussian mixture model
an element of
sub-region size in pixels
standard deviation of global smoothing filter
Table 4: List of notations.

Acknowledgment

O. Sarwar was supported in part by Erasmus Mundus Joint Doctorate in Interactive and Cognitive Environment, which is funded by the Education, Audio-visual & Culture Executive Agency under the FPA no 2010-0015.

References

  • AirDog (2018) AirDog (2018) https://www.airdog.com/, [Last accessed: 2018-10-21]
  • Amos et al. (2016) Amos B, Ludwiczuk B, Satyanarayanan M (2016) Openface: A general-purpose face recognition library with mobile applications. Tech. rep., CMU-CS-16-118, CMU School of Computer Science
  • Baaziz et al. (2007) Baaziz N, Lolo N, Padilla O, Petngang F (2007) Security and privacy protection for automated video surveillance. In: Proc. IEEE Int. Symposium on Signal Processing and Information Technology, Cairo, Egypt, pp 17–22, DOI 10.1109/ISSPIT.2007.4458044
  • Babiceanu et al. (2015) Babiceanu R, Bojda P, Seker R, Alghumgham M (2015) An onboard UAS visual privacy guard system. In: Proc. Integrated Communication, Navigation, and Surveillance Conf., Herdon, USA, pp J1:1–J1:8, DOI 10.1109/ICNSURV.2015.7121232
  • Bas et al. (2016) Bas A, Smith WAP, Bolkart T, Wuhrer S (2016) Fitting a 3D morphable model to edges: A comparison between hard and soft correspondences. In: Proc. Asian Conf. on Computer Vision, Taipei, Taiwan, pp 1–15
  • Boult (2005) Boult TE (2005) PICO: Privacy through invertible cryptographic obscuration. In: Proc. Computer Vision for Interactive and Intelligent Environment, Lexington, USA, pp 27–38, DOI 10.1109/CVIIE.2005.16
  • Chattopadhyay and Boult (2007) Chattopadhyay A, Boult TE (2007) PrivacyCam: A privacy preserving camera using uCLinux on the blackfin DSP. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Minneapolis, USA, pp 1–8, DOI 10.1109/CVPR.2007.383413
  • Chinomi et al. (2008) Chinomi K, Nitta N, Ito Y, Babaguchi N (2008) Prisurv: Privacy protected video surveillance system using adaptive visual abstraction. In: Proc. Int. Conf. on Advances in Multimedia Modeling, Kyoto, Japan, pp 144–154
  • Chriskos et al. (2015) Chriskos P, Zoidi O, Tefas A, Pitas I (2015) De-identifying facial images using projections on hyperspheres. In: Proc. IEEE Int. Conf. and Workshops on Automatic Face and Gesture Recognition, Ljubljana, Slovenia, vol 04, pp 1–6, DOI 10.1109/FG.2015.7285020
  • Chriskos et al. (2016) Chriskos P, Zoidi O, Tefas A, Pitas I (2016) De-identifying facial images using singular value decomposition and projections. Multimedia Tools and Applications pp 1–34, DOI 10.1007/s11042-016-4069-8
  • DoD (2000) DoD (2000) Human Engineering Design Data Digest, Department of Defense Human Factors Engineering Technical Advisory Group. http://www.acq.osd.mil/rd/hptb/hfetag/products/documents/
    HE_Design_Data_Digest.pdf
  • Dong et al. (2016) Dong C, Loy CC, He K, Tang X (2016) Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 38(2):295–307, DOI 10.1109/TPAMI.2015.2439281
  • Du et al. (2014) Du L, Yi M, Blasch E, Ling H (2014) Garp-face: Balancing privacy protection and utility preservation in face de-identification. In: Proc. IEEE Int. Joint Conf. on Biometrics, pp 1–8, DOI 10.1109/BTAS.2014.6996249
  • Dufaux and Ebrahimi (2006) Dufaux F, Ebrahimi T (2006) Scrambling for video surveillance with privacy. In: Proc. Computer Vision and Pattern Recognition Workshops, New York, USA, pp 160–160, DOI 10.1109/CVPRW.2006.184
  • Dufaux and Ebrahimi (2008) Dufaux F, Ebrahimi T (2008) Scrambling for privacy protection in video surveillance systems. IEEE Trans on Circuits and Systems for Video Technology 18(8):1168–1174, DOI 10.1109/TCSVT.2008.928225
  • Eagle Eye (1997) Eagle Eye (1997) Bulletin of the Connecticut Academy of Science and Engineering 12(2)
  • Erdelyi et al. (2014) Erdelyi A, Barat T, Valet P, Winkler T, Rinner B (2014) Adaptive cartooning for privacy protection in camera networks. In: Proc. Int. Conf. on Advanced Video and Signal Based Surv., Seoul, Korea, pp 44–49, DOI 10.1109/AVSS.2014.6918642
  • Erdélyi et al. (2017) Erdélyi Á, Winkler T, Rinner B (2017) Privacy protection vs. utility in visual data. Multimedia Tools and Applications pp 1–28, DOI 10.1007/s11042-016-4337-7
  • Gross et al. (2006) Gross R, Sweeney SL, Torre FJdl, Baker SM (2006) Model-based face de-identification. In: Proc. Conf. on Computer Vision and Pattern Recognition Workshop, New York, USA, pp 161–161, DOI 10.1109/CVPRW.2006.125
  • Hexo+ (2018) Hexo+ (2018) https://hexoplus.com/, [Last accessed: 2018-10-21]
  • Huang et al. (2007) Huang GB, Ramesh M, Berg T, Learned-Miller E (2007) Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Tech. Rep. 07-49, University of Massachusetts, Amherst
  • Jiang et al. (2016a) Jiang R, Al-Maadeed S, Bouridane A, Crookes D, Celebi M (2016a) Face recognition in the scrambled domain via salience-aware ensembles of many kernels. IEEE Trans on Information Forensics and Security 11(8):1807–1817, DOI 10.1109/TIFS.2016.2555792
  • Jiang et al. (2016b) Jiang R, Bouridane A, Crookes D, Celebi M, Wei HL (2016b) Privacy-protected facial biometric verification using fuzzy forest learning. IEEE Trans on Fuzzy Systems 24(4):779–790, DOI 10.1109/TFUZZ.2015.2486803
  • Kim et al. (2014) Kim Y, Jo J, Shrestha S (2014) A server-based real-time privacy protection scheme against video surveillance by unmanned aerial systems. In: Proc. Int. Conf. on Unmanned Aircraft Systems, Orlando, USA, pp 684–691, DOI 10.1109/ICUAS.2014.6842313
  • King (2009) King DE (2009) Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research 10:1755–1758
  • Koelle et al. (2018) Koelle M, Ananthanarayan S, Czupalla S, Heuten W, Boll S (2018) Your smart glasses’ camera bothers me!: Exploring opt-in and opt-out gestures for privacy mediation. In: Proc. Nordic Conf. on Human-Computer Interaction, Oslo, Norway, pp 473–481, DOI 10.1145/3240167.3240174
  • Korshunov and Ebrahimi (2013a) Korshunov P, Ebrahimi T (2013a) Using face morphing to protect privacy. In: Proc. IEEE Int. Conf. on Advanced Video and Signal Based Surv., Kraków, Poland, pp 208–213, DOI 10.1109/AVSS.2013.6636641
  • Korshunov and Ebrahimi (2013b) Korshunov P, Ebrahimi T (2013b) Using warping for privacy protection in video surveillance. In: Proc. Int. Conf. on Digital Signal Processing, Fira, Santorini, Greece, pp 1–6, DOI 10.1109/ICDSP.2013.6622791
  • Korshunov and Ebrahimi (2014) Korshunov P, Ebrahimi T (2014) Towards optimal distortion-based visual privacy filters. In: Proc. IEEE Int. Conf. on Image Processing, Paris, France, pp 6051–6055, DOI 10.1109/ICIP.2014.7026221
  • Kundur and Hatzinakos (1996) Kundur D, Hatzinakos D (1996) Blind image deconvolution. IEEE Signal Processing Magazine 13(3):43–64, DOI 10.1109/79.489268
  • Letournel et al. (2015) Letournel G, Bugeau A, Ta VT, Domenger JP (2015) Face de-identification with expressions preservation. In: Proc. IEEE Int. Conf. on Image Processing, pp 4366–4370, DOI 10.1109/ICIP.2015.7351631
  • Lin et al. (2012) Lin Y, Wang S, Lin Q, Tang F (2012) Face swapping under Large Pose Variations: A 3D model based approach. In: Proc. IEEE Int. Conf. on Multimedia and Expo, pp 333–338, DOI 10.1109/ICME.2012.26
  • Meden et al. (2018) Meden B, EmerÅ¡ič Å, Å truc V, Peer P (2018) k-Same-Net: k-Anonymity with generative deep neural networks for face de-identification. Entropy 20(1), DOI 10.3390/e20010060
  • Nawaz and Ferryman (2015) Nawaz T, Ferryman J (2015) An annotation-free method for evaluating privacy protection techniques in videos. In: Proc. IEEE Int. Conf. on Advanced Video and Signal Based Surv., Karlsruhe, Germany, pp 1–6, DOI 10.1109/AVSS.2015.7301800
  • Newton et al. (2005) Newton EM, Sweeney SL, Malin SB (2005) Preserving privacy by de-identifying facial images. IEEE Trans on Knowledge and Data Engineering 17:232–243
  • Oppenheim et al. (1996) Oppenheim A, Willsky A, Nawab S (1996) Signals & Systems (2nd Ed.). Prentice-Hall, Inc., Upper Saddle River, USA
  • Pittaluga and Koppal (2017) Pittaluga F, Koppal SJ (2017) Pre-capture privacy for small vision sensors. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(11):2215–2226, DOI 10.1109/TPAMI.2016.2637354
  • Popkin et al. (2010) Popkin T, Cavallaro A, Hands D (2010) Accurate and efficient method for smoothly space-variant gaussian blurring. IEEE Trans on Image Processing 19(5):1362–1370
  • Quaritsch et al. (2010) Quaritsch M, Kruggl K, Wischounig-Strucl D, Bhattacharya S, Shah M, Rinner B (2010) Networked UAVs as aerial sensor network for disaster management applications. e & i Elektrotechnik und Informationstechnik 127:56–63
  • Rahman et al. (2010) Rahman S, Hossain M, Mouftah H, El Saddik A, Okamoto E (2010) A real-time privacy-sensitive data hiding approach based on chaos cryptography. In: Proc. IEEE Int. Conf. on Multimedia and Expo, Suntec City, Singapore, pp 72–77, DOI 10.1109/ICME.2010.5583558
  • Rashwan et al. (2015) Rashwan H, García M, Ballesté A, Puig D (2015) Defeating face de-identification methods based on DCT-block scrambling. Machine Vision and Applications 27:251–262, DOI DOI10.1007/s00138-015-0743-5
  • Ruchaud and Dugelay (2017) Ruchaud N, Dugelay JL (2017) Aseppi: Robust privacy protection against de-anonymization attacks. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition Workshops, Honolulu, Hawaii, US, pp 1352–1359, DOI 10.1109/CVPRW.2017.177
  • Safe Haven (2003) Safe Haven (2003) Safe Haven from Iceberg Systems ensures privacy from camera phones; Camera phone voyeurs and spy‘s can be defeated by new technology. [Last accessed: 2017-03-17]
  • Saini et al. (2012) Saini M, Atrey PK, Mehrotra S, Kankanhalli M (2012) Adaptive transformation for robust privacy protection in video surv. Advances in Multimedia 2012:1–14, DOI 10.1155/2012/639649
  • Sarwar et al. (2016) Sarwar O, Rinner B, Cavallaro A (2016) Design space exploration for adaptive privacy protection in airborne images. In: Proc. IEEE Advanced Video and Signal-based Surv., Colorado Springs, USA, pp 159–165
  • Sarwar et al. (2018) Sarwar O, Rinner B, Cavallaro A (2018) Temporally smooth privacy-protected airborne videos. In: Proc. IEEE Int. Conf. on Intelligent Robots and Systems, Madrid, Spain, pp 1–6
  • Schiff et al. (2007) Schiff J, Meingast M, Mulligan DK, Sastry S, Goldberg K (2007) Respectful cameras: detecting visual markers in real-time to address privacy concerns. In: Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, San Diego, USA, pp 971–978, DOI 10.1109/IROS.2007.4399122
  • Schroff et al. (2015) Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Boston, USA, pp 815–823, DOI 10.1109/CVPR.2015.7298682
  • Sohn et al. (2011) Sohn H, Wesley DN, Man Ro Y (2011) Privacy protection in video surveillance systems: Analysis of subband-adaptive scrambling in JPEG XR. IEEE Trans on Circuits and Systems for Video Technology 21(2):170–177, DOI 10.1109/TCSVT.2011.2106250
  • Waharte and Trigoni (2010) Waharte S, Trigoni N (2010) Supporting search and rescue operations with UAVs. In: Proc. Int. Conf. on Emerging Security Technologies, Canterbury, UK, pp 142–147, DOI 10.1109/EST.2010.31
  • Wickramasuriya et al. (2004) Wickramasuriya J, Datt M, Mehrotra S, Venkatasubramanian N (2004) Privacy protecting data collection in media spaces. In: Proc. Int. Conf. on Multimedia, New York, USA, pp 48–55, DOI 10.1145/1027527.1027537
  • Winkler and Rinner (2011) Winkler T, Rinner B (2011) Securing Embedded Smart Cameras with Trusted Computing. EURASIP J Wirel Commun Netw 2011:8:1–8:20, DOI 10.1155/2011/530354
  • Zhang et al. (2018) Zhang X, Seo S, Wang C (2018) A lightweight encryption method for privacy protection in surveillance videos. IEEE Access 6:18074–18087, DOI 10.1109/ACCESS.2018.2820724
  • Zhang et al. (2014) Zhang Y, Lu Y, Nagahara H, Taniguchi Ri (2014) Anonymous camera for privacy protection. In: Proc. Int. Conf. on Pattern Recognition, Stockholm, Sweden, pp 4170–4175
  • Zhu et al. (2017) Zhu S, Zhang C, Zhang X (2017) Automating visual privacy protection using a smart led. In: Proc. Int. Conf. on Mobile Computing and Networking, Snowbird, Utah, USA, pp 329–342, DOI 10.1145/3117811.3117820
  • Zhu and Ramanan (2012) Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Providence, USA, pp 2879–2886, DOI 10.1109/CVPR.2012.6248014
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
313653
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description