Personal data disclosure and data breaches: the customer’s viewpoint

Personal data disclosure and data breaches: the customer’s viewpoint

Giuseppe D’Acquisto Garante per la Protezione dei Dati Personali, Piazza di Monte Citorio n. 121, Rome, Italy Maurizio Naldi Università di Roma Tor Vergata, Dipartimento di Informatica Sistemi Produzione, Via del Politecnico 1, 00133 Roma, Italy Giuseppe F. Italiano Università di Roma Tor Vergata, Dipartimento di Informatica Sistemi Produzione, Via del Politecnico 1, 00133 Roma, Italy
July 9, 2019

1 Introduction

In most cases, a customer can receive a service over the Internet without disclosing its personal data, aside from its IP address. However, some services may need personal information or may be accomplished faster and more easily if the customer is willing to release some personal information. In other cases the service may be enhanced if more is known about the customer. For example an online store may propose additional items for purchase if the customer’s purchasing history is known, or if the customer’s tastes are known otherwise, and a cloud computing provider may sell additional CPU time or storage space if the consumption behaviour of the customer is known. The effect of service enhancements as incentives for the customer to loose its privacy requirements has been known since long as a marketing tool [1]. In all those cases the customer is induced into divulging some personal information, so that the customer and the service provider share such information. However, the release of personal information carries along negative consequences as well when it leaks outside the customer-service provider circle. Such unintentional release of secure information to an untrusted environment is called data breach. When a data breach occurs, that information falls in the hands of a third party, which may accomplish an identity theft or exploit those data for malicious uses (e.g., fraudulent unemployment claims, fraudulent tax returns, fraudulent loans, home equity fraud, and payment card fraud, as briefly reported in [2]). Such leakages may turn into an economical loss both for the service provider that had to protect the customer’s data and for the customer as well [3]. A service provider has then to perform a trade-off between its investments in security and the losses it may incur because of data breaches, and may be led to invest in privacy-enhancing technologies [4]. On the other hand, the customer has to balance the benefits obtained by releasing its personal information against the potential losses associated to a data breach. In the following the term customer is not necessarily referred to an individual, but includes companies that act as customers, for which the amount of money at risk through data breaches is quite larger than for individuals.

Although some effort has been spent to determine optimal investment policies for a company investing in security on its information system [5, 6], to the best of our knowledge no aids have been proposed to help the customer’s decision, when the customer heavily relies on the security of its service provider’s information system. However, it has been argued that the economic analysis of the trade-offs that a company faces when tracking its customers is gaining relevance, since some start-ups have started offering bargains in return for users’ data [7]. Such offers strengthen the awareness that customers’ personal information have a value, and spur the customer themselves to carefully assess if the release of those information is worthwhile.

Our aim is to explore the pros and cons the customer gets when releasing personal data and facing the danger of identity theft. In this paper we introduce a model to describe the advantages and disadvantages associated with the release of personal data, and formulate the customer’s problem of choosing the right level of personal information release as a trade-off problem, with the aim of maximizing the customer’s surplus. Rather than considering just the service provider as the party in charge of protecting the data, our models consider the wider viewpoint that data breaches may occur on the service provider’s side as well as on the customer’s side. We prove that the solution to the trade-off problem exists and is unique, and show that solution in a reference scenario. The results represent a first step to model, in a parsimonious way, the complex interaction between service providers and customer as to demand, security, and privacy, from an economical perspective. It appears that, among the parameters outside the control of the customer, the customer’s decision is most sensitive to the price imposed for the service and to the impact of the profiling capability of the service provider on the loss incurred after data breaches. Finally, we consider the limit case of a perfectly secure service provider, so that data breaches can occur just on the customer’s side. For that case we provide closed form expressions both for the optimal level of exposure for the customer, and for the sensitivity with respect to all the parameters involved in the trade-off problem (embodied by the elasticity or quasi-elasticity functions). We believe that the results may provide theoretical grounds to introduce a regulatory approach to the issue, to achieve a regulated balance of interests between the contrasting aims of preserving the customer’s privacy and spurring the service provider’s business.

2 Service demand curve, data disclosure and the effects of data breaches

When releasing personal data in exchange for enhanced services, the customer has to perform a trade-off between what it releases, the benefits it gets, and the risk related to divulging that information. The main variables involved in that trade-off are: the quantity of services that are sold; their unit price; the personal information that are released as part of the service sale; and the potential loss deriving from that information disclosure. In order to derive some guidelines for the customer releasing that information, we need to define the relationship between these quantities. In this section, we provide analytical models that link all the four quantities mentioned above.

We recall that the service provider sells services at a unit price ; the customer buys a quantity of such services, represented, e.g., by minutes of phone traffic, bytes of data traffic volume, digital units, CPU time, bytes of storage capacity. The relationship between and is the demand curve [8]. For sake of simplicity, we assume here that in our case the relationship is linear. When no personal information is disclosed, those quantities are then related by the expression

(1)

where is the maximum quantity of service that the service provider can provide, and is the maximum unit price that the customer can sustain (a.k.a. its willingness-to-pay). When the service is free (), the customer asks for the maximum quantity that the service provider can supply (). When the price is larger than the willingness-to-pay (), the customer will not buy the service, and the quantity of service sold will be . In the following we treat both the quantity of service and the unit price as continuous variables (though their variation is actually discrete), since we assume that their granularity is extremely small with respect to the values at hand.

However, if the customer is willing to release some personal information, the service provider eases the provision of services, e.g., by providing personalized services or automatic login. In fact, the more the service provider knows about the customer, the better it (or any other third party associated to the service provider and sharing that information) can shape and direct its offer to achieve a sale. The release of personal data can therefore help reduce the product/service search costs for both parties: the time employed by customers when looking for that product/service, and the effort spent by sellers trying to reach out to their customers. Varian has shown that customers rationally want some of their personal information to be available to sellers [9]. Hence, the customer is incentivized to supply its personal data and increase its consumption. We assume that the overall information that the customer can release is collected into a number of sets , so that each set includes the information belonging to the previous set, i.e., , with .

Each release of information by the customer is rewarded by a new offer by the service provider, which at the same time incentivizes the consumption. The demand curve correspondingly changes, e.g., as illustrated in Figure 1, where we can visualize the shift in the customer’s consumption by observing how the working point moves onto the new demand curve. For example, we can consider in Figure 1 the point on the pre-release demand curve represented by Eq. (1). After the release of personal data the customer may move to the working point on the after-release demand curve.

Figure 1: The demand curve before and after the release of personal data

If we assume the willingness-to-pay to stay unchanged and the demand curve to be linear, the change in the demand curve is equivalent to a translation of the maximum amount of service, as illustrated in Figure 1. When the customer releases the personal information contained in the set both the marginal demand (i.e., the increase in demand for a decreasing unit change in price) and the maximum consumption increase by the factor , where is the marginal demand factor. Since we can safely assume that , with . The new line passes through the points and , so that its equation is now

(2)

The movement of the customer on the new demand curve is a key point in determining its viability, and therefore the interest in releasing personal information. We have to make a distinction between customers whose consumption is time-constrained and customers who are not.

If a user has a limited amount of time to use the service, we can assume that the extension of the allowed range of consumption is of no interest to it: time-constrained customers will consume the same quantity of service regardless of the new offer, so that , and will not find convenient to release personal data to get a new offer.

Customers that are not time-constrained can in principle move to any position on the new demand curve. However, we expect that their moves, as well as the provider’s offer, aim at increasing the respective satisfaction. In turn, that means that customers will change their consumption so as to increase their surpus, while a service provider will be willing to put forward a new offer if that leads to a revenue increase. Both assumptions pose conditions on the set of working points on the new demand curve; their intersection defines the set of valid working points.

We examine first the condition that the customer’s surplus must increase. By moving to the new demand curve, the customers changes its surplus from to :

(3)

Since and , the customers’ surplus increases iff the following condition is satisfied

(4)

We now deal with the thrust that moves the service provider: getting more revenues. On the first demand curve, the revenues of the service provider, as given by the product of price and quantity, are

(5)

After the release of personal information and the passage to the new demand curve, the revenues become

(6)

The revenues increase if , which in turn leads to the following quadratic inequality

(7)

By solving it, we obtain that the passage to the new demand curve leads to increased revenues for the service provider if the new demand satisfies the inequality

(8)

By combining the two constraints represented by the inequalities (3) and (8), a necessary condition for the release of private information to be beneficial both to the customer and to the service provider is that the new demand satisfies the inequality

(9)

We notice that, if the service provider maintains its unit price, the customer, acting as a price taker, changes its demand to

(10)

which is surely within the acceptable region defined by the inequality (9). In fact, the constraint on the customer’s surplus is satisfied, since . And the provider’s revenues increase, since the unit price has been kept constant and the demand has increased.

The width of the valid region represented by the inequality (9) shows that the provider may increase its revenues even if the unit price is lowered.

But the disclosure of personal information comes with a cost: its potential malicious use by third parties. That misuse may turn into an economic loss for the customer. This may be both a direct loss deriving from the identity theft and an indirect one due, e.g., to the expenses incurred to recover deleted data, the loss of reputation, legal expenses, or the inability to use networking services. Each release of information is then associated to a potential money loss , with . In the following we assume that all quantities (e.g., the losses and the quantity of service) are referred to a unit period of time, be it a month or a year. Again we can safely assume that the potential loss grows with the size of the information set, i.e., . Since we have at the same time a positive effect (increase of services) and a negative one (potential money loss) linked to the same root cause (the disclosure of personal information), we can draw a relationship between the two effects through the common cause. In this paper we assume that relationship to be expressed through a power law:

(11)

In addition to its well known property of scale invariance and its appearance in a number of contexts (see, e.g., [10][11]), the choice of a power law allows us to describe a variety of behaviours by acting on the single parameter , which we call the privacy parameter. If , the customer releases its information starting with the most potentially damaging, and the additional risk associated to further releases is a decreasing function of the information released. In particular, if (i.e., the service provider is privacy-friendly), the customer gains a large benefit (i.e., a large extension of the maximum quantity of services) even for small pieces of the information released (i.e., small potential losses). When , we have instead a linear relationship between the information released and the associated economical loss. The case models instead the situation where the customer releases information starting with the least sensitive one. Here we don’t support strongly any specific value forthe privacy parameter. However, we could set its value by exploiting a single instance of eq. (11), i.e., considering the fraction of the maximum potential loss corresponding to a given fraction of the benefit obtained . For example, we might apply the Pareto principle, which states that roughly 80% of the effects come from 20% of the causes. That 80/20 relationship has been observed in many cases in the context of information security (see, e.g., [12] [13]). If we adopt that principle that the 80% of the benefit is obtained with the 20% of the effort, and assume and in Eq. (11), we obtain . Empirical data that shed some light on the nature of the information release law embodied by Equation (11) are provided in [14]: a transactional privacy mechanism has been proposed, whereby customers choose to release access to their personal information (namely, their browsing behaviour) in exchange for money. In the transactional privacy mechanism, customers are led to release their personal information starting with the least sensitive ones, since this strategy optimizes their revenues through that privacy selling mechanism.

3 Data vulnerability

In Section 2 we have shown that the extension of service provisioning (i.e., the increase in the maximum consumption by the factor ) is related to the potential loss that the customer incurs when it releases the personal information and a data breach occurs. Such loss is uncertain; we need to associate that value with the probability that a data breach occurs. In this section we provide a model for such occurrence.

We consider that a data breach may take place because of deficiencies on either of the two sides of the customer-service provider relationship. The data theft may be due either to an attack on the service provider’s information system or to the customer’s data repository (e.g., its computer). We assume that the failures on the two sides are independent of each other, and that a data breach takes place as either of the two sides fail. Under these hypotheses, a suitable model for the overall data breach phenomenon is the classical series combination of two systems that we can borrow from the reliability field (see Ch. 3.2 in [15]). The data breach probability is then related to the individual data breach probabilities of the two sides (service provider) and (customer) by the formula

(12)

As to the vulnerability on the customer’s side we consider that the probability of data breach is a growing function of the amount of personal information that the customer has divulged, since that determines its level of exposure to security threats. We assume a simple power law function to hold, and, by exploiting the relationship between information released and potential loss extablished in Section 2, we obtain the following function:

(13)

where is the probability of breach corresponding to the maximum release of information. The parameter describes the balance between the probability of breach and the quantity of personal information released (for which the economical loss represents a proxy): if (reckless customer) the probability of data breach is close to its maximum even for the smallest pieces of released information; if (privacy-aware customer) the customer has to release a substantial amount of information before it suffers a significant probability of data breach. We call the security parameter. We can set a value for it, again by resorting to the Pareto principle invoked for the privacy parameter.

For data breaches on the service provider’s side a model has been proposed by Gordon and Loeb [6] to describe the dependence of the data breach probability on the investments in security carried out by the service provider. However, in this context we prefer not to explicitly account for the dependence of the data breach probability on that, or other determinants. In fact, the inclusion of specific variables affecting the data breach probability would leave the customer with the problem of determining their value to derive its optimal information disclosure strategy: in the Gordon-Loeb model the customer would have to guess the amount of security investment. Instead, we consider the system-related data breach probability as a parameter. We assume that the customer may estimate through a number of sources, and certainly more easily than the value of investments or any other information private to the service provider.

4 Maximization of customer’s surplus

In Sections 2 and 3 we have provided models respectively for the price paid by the customer and for the data vulnerability, and we have also linked those quantities to the potential loss suffered by the customer through the disclosure of its personal information. In this section we use those relationships to determine the net surplus for the customer and the best information disclosure strategy.

The net surplus for the customer is given by the algebraic sum of two terms. The positive component is the surplus obtained by paying a price for the service lower than its willingness-to-pay. This surplus grows with the quantity of service that the customer receives, which in turn grows with the personal information that the customer reveals. Hence, the positive term grows with the amount of disclosed information (and with the related potential loss). On the other hand, the customer suffers a potential loss due to the same information disclosure; this is the negative term that again grows with the amount of disclosed information. We therefore expect to find a trade-off value for that amount of disclosed information that maximizes the net surplus. That represents the strategy the customer has to follow when revealing its private information to the service provider. In that strategy the customer acts as a price taker: the price is set by the service provider and the customers responds with a consumption dictated by the after-release demand curve. The latter depends however on the quantity of personal information released by the customer: different degrees of information correspond to different slopes of the demand curve shown in Figure 1. The quantity of personal information released is then the leverage employed by the customer to maximize its surplus. Hereafter we provide the expression of the net surplus.

It is to be noted that we do not assume any specific relationship between the information released by the customer and the potential loss for it; we just assume that there is one, and that the customer knows it (we expect that relationship to be different for each customer). In the following, the surplus is maximized by considering the loss as a leverage, since it actually represents a proxy for the information released. If the customer is able to identify the value of the loss that maximizes its surplus, at the same time it can identify the associated amount of information that maximizes the customer’s surplus.

When the customer reveals the set of information , its net surplus is given by the difference of the two terms mentioned above. If we indicate the price associated to the generic quantity as , we have

(14)

where the first term embodies the surplus deriving from paying a price lower than the willingness-to-pay, integrated over the quantity of service received by the customer. We can now recall that: a) the unit price and the quantity of service are related through the demand curve (2), namely ; b) the maximum amount of service is related to the potential loss through the power law (11); c) the data breach probability is represented by expr. (12) and (13). With that additional information the net surplus can be expressed in the following form

(15)

If we stick to the discrete framework we have adopted in Section 2, the optimal amount of information to be disclosed by the customer is the set associated to the loss

(16)

However, we can gain a substantial insight into the trade-off problem, with a little loss in accuracy, if we now treat the potential loss as a continuous quantity, i.e., by replacing for in expr. (15). The optimal loss can now be derived as

(17)

If we derive expr. (15), we get

(18)

By equating that derivative to zero we get the optimal loss as the solution of the equation

(19)

which we can rewrite in the synthetic form

(20)

where and are the following positive quantities

(21)

We note that the derivative is made of three terms, two of which are powers of , while the third one is a constant. That derivative is then a continuous function of over the interval .

Solving the decision equation (20) provides us with the optimal amount of information (for which the economical loss is proxy) that the customer can release. However, since we are interested in solutions that are at the same time positive and within the interval , identified as the range of potential money losses embodied in expr. (11), we introduce the following definitions:

Definition 1 (Legal solution).

A legal solution of the trade-off problem is any value for which the customer’s surplus reaches a maximum over the interval .

Definition 2 (Feasible solution).

A feasible solution of the trade-off problem is any value such that , .

Then, we are actually looking for a feasible solution of the trade-off problem. We can state the following theorem.

Theorem 1 (Trade-off solution).

If a feasible solution of the trade-off problem exists and is unique.
If a feasible solution of the trade-off problem exists and is unique if the following condition holds


If a feasible solution of the trade-off problem exists and is unique iff the following condition holds

Proof.

We consider the decision equation (20), whose solution provides us with an extremal point of the customer’s surplus. We deal separately with the three cases, where , , or .
When it is convenient to rewrite the derivative of the customer’s surplus as follows:

(22)

The behaviour of such derivative at either end of the interval over which we look for a legal solution is easily obtained as follows:

(23)

In addition the second derivative is

(24)

so that the first derivative is a monotonic decreasing function.

We can find a closed interval such that in and in . We can prove this statement by construction. The upper bound can be found by solving the expression

(25)

and noting that

(26)

As to the lower bound, we can set so that . In the derivative of the customer’s surplus is then

(27)

By Bolzano’s theorem we can then conclude that there exists a value . Such value is an extremal point for the customer’s surplus. Since the second derivative is strictly positive over the same interval, the customer’s surplus reaches its maximum in . In addition, since the first derivative is a monotonic function, its inverse is unique; hence there is a single value .

We have so far a legal solution of the trade-off problem. In order to find a feasible solution we note that if , then the legal solution is also feasible, i.e., . Instead, if , then the feasible solution is , since in that case when and the customer’s surplus is a growing function over the interval .

We consider now the case and further subdivide into the two subcases and , which we label subcases a) and b) respectively.
In both subcases we have

(28)

But the behaviour of the partial derivative at the other end of the interval is different in the two subcases

(29)

In subcase a) we have then a first derivative that takes the same sign at both ends of the interval . We have a legal solution if that derivative takes the zero value within that interval. In order to understand what happens,we must look at the second derivative (24), which happens to get mixed signs. Namely, when , we have

(30)

This means that the first derivative starts at , first increases, reaches a peak, and then falls without end. The peak is reached when . We can have the two different trends shown in Figure 2, depending on the sign of the first derivative at its peak. If the peak is negative, we have no legal solution (and a fortiori no feasible solution); if the peak is positive, we have two legal solutions, but 0, 1, or 2 feasible solutions.

Figure 2: Alternative trends of the first derivative in subcase (a)

In fact, even if the peak is positive, the number of feasible solutions depends on the position of the zeros. Since those zeros zeros represent two extremal points for the customer’s surplus, we indicate them by the symbols and respectively, with . We summarize the possible situations in Table 1.

Condition No. of feasible solutions
0
AND 1
2
Table 1: Number of feasible solutions in subcase (a)

We have then a single feasible solution just if the first derivative crosses the x-axis downward at . A sufficient condition for that to happen is that the the first derivative is still positive at :

(31)

which can be reformulated as a condition on the maximum potential loss

(32)

In subcase (b), where , the first derivative takes opposite signs at either end of the . Since it is a continuous function, we are sure that it crosses the -axis for some value of . But it is not monotone, so that there might be more than one legal solution. This calls for a look at the second derivative (24), and we find that the situation is reversed with respect to Equation (30) of case (a), namely

(33)

According to (33), the first derivative starts at , first decreases, and then grows without interruption, as depicted in Figure 3.

Figure 3: Trend of the first derivative in subcase (b)

This warrants a single legal solution. For it to be a single legal solution as well, we must evaluate the first derivative at : we have a single legal solution iff it is positive. This turns out to be the same condition as (32), which is then a sufficient condition for a single feasible solution both in subcases (a) and (b), i.e., when .

Finally, when , the decision equation (20) reduces to

(34)

which leads to the unique legal solution

(35)

if

(36)

This solution is also feasible if , i.e.

(37)

By introducing the definitions (21) in the two previous conditions, we find that we have a unique legal solution of the decision equation (20) iff

(38)

The useful range obtained for the maximum loss is always nonzero since

5 Sensitivity of customer’s feasible solution

In Section 4 we have shown that the trade-off problem has always a unique solution. The solution is represented by the maximum loss that the customer is willing to suffer to maximize its surplus. It is interesting to show the shape of the solution in a typical scenario such as that illustrated in Table 2. In this section we examine the relationship between the optimal potential loss and the parameters appearing in the decision equation. We first consider the price imposed by the service provider as the single driving factor, and then analyze the sensitivity to all the other driving factors. Throughout this section, we consider the case .

We have stated in Section 4 that the customer acts as a price taker. The price is then a relevant leverage that the service provider may use to drive the behaviour of the customer and eventually raise its tolerance towards taking more potential risks (i.e., increasing the maximum loss it tolerates). In order to examine the effect of price on the risk-taking attitude of the customer, we plot in Figure 4 the optimal potential loss for the customer when the other quantities take the values in Table 2

Parameter Value
Maximum quantity of service 250
Willingness-to-pay 1
Privacy parameter 0.138647
security parameter 0.138647
Maximum loss 5000,10000
Marginal demand factor 20%
System Data Breach Probability
Maximum Customer Data Breach Probability
Table 2: Values of parameters for the case study
Figure 4: Impact of the unit price on the optimal amount of information released

Aside from the initial flat portion (due to the limitation on the maximum loss ) the optimal potential loss is a decreasing function of the price. We see that the maximum loss has no effect on the curve, excepting the cap that determines the initial flat portion. Hence, we can conclude that increasing the price makes the customer more careful in taking security-related risks. If the service provider wishes to relax the customer’s attitude towards taking risks (in order to gather more personal information), it should then lower the price, though it is not necessary to go below the threshold corresponding to the saturation value (i.e., the maximum loss, equal to either 5000 or 10000 in Figure 4).

On the other hand, the service provider could use the price for reasons other than driving the customer’s behaviour towards taking risks, for example to maximize its profit. Though it can derive profits from exploiting the customer’s personal information, we can even consider just the profits deriving from the customer’s payment to gain some insight. Actually, for the same data of Table 2 (with ), we can see in Figure 5 that there is an optimal price maximizing the revenues. This optimal price is roughly 0.5 in this case, slightly larger than the maximum price needed to get all the personal information from the customer (which is roughly 0.4 in this case). Hence, by setting the price in this range, the service provider may strike a balance (and achieve the overall maximum profit) between making a profit by exploiting the customer’s personal information or by extracting revenues from the customer’s payments.

Figure 5: Impact of the unit price on the revenues collected by the service provider

In order to analyse the impact that each factor intervening in the decision equation has on the trade-off solution, we impose a change to each of the driving factors separately (only one at a time) and observe the resulting change in the optimal potential loss. Among the driving factors undergoing the same change that resulting in the largest change in the optimal potential loss can be considered as the most relevant. We take the scenario described in Table 3 as a reference. The optimal potential loss in that case is .

Parameter Value
Maximum quantity of service 250
Willingness-to-pay 1
Price 0.5
Privacy parameter 0.138647
security parameter 0.138647
Maximum loss 10000
Marginal demand factor 20%
System Data Breach Probability
Maximum Customer Data Breach Probability
Table 3: Reference values for the sensitivity analysis

In order to perform a sensitivity analysis we divide the parameters into two groups: the first group is composed of the parameters possessing a unit of measure, namely the maximum quantity of service , the willingness-to-pay , the price , and the maximum loss ; the second group is composed of the dimensionless variables , , , and . For the two groups we adopt two different sensitivity measures, namely the elasticity for the first group and the quasi-elasticity for the second group.

We start with the parameters possessing a unit of measure. For them we define a discrete version of the elasticity measure. The elasticity is the ratio of the percent change in the output variable to the percent change in the driving factor, when small changes are imposed on the driving factor. In our case the output variable is the optimal potential loss , but we consider significant, rather than small, changes in the driving factor. With reference to the generic driving factor (which may be either , , or ), the discrete elasticity is

(39)

where and are respectively the discrete changes of the optimal potential loss and the driving factor. Here we impose in turn for each of the driving factors, observe the resulting changes in the optimal potential loss, and compute the elasticity through Definition (39). The driving factors exhibiting larger values of elasticity have a greater influence on the optimal potential loss and hence on the customers’ attitude towards divulging personal data.. In order to compare the three driving factor at hand, we draw a Tornado chart. The Tornado chart is a special type of bar chart, well used to perform sensitivity analyses [16]; in our case the bars represent the values of the elasticity, and we arrange the driving factors vertically, ordered so that the largest bar appears at the top of the chart, the second largest appears second from the top, and so on. The Tornado chart for , , and is shown in Figure 6, where the darker bars are employed to show the elasticity for positive increments of the driving factor, and the lighter bars for negative ones. We see that there are very large differences in the impact of each quantity. Both the elasticities and are positive; hence, when either the willingness-to-pay or the maximum quantity of service increase/decrease, the optimal potential loss increases/decreases as well. Since , changes in the maximum quantity of service translate to roughly the same percentage change in the optimal potential loss. Instead, the effect of changes in the willingness-to-pay is nearly trebled, and the effect of price changes is nearly doubled. On the other end of the scale, the maximum loss has a negligible impact on the optimal potential loss (as long as we do not enter the saturation region, i.e., the flat portion of Figure 4, where , so that, as long as we keep staying on the flat portion, each change of the maximum loss reflects in an equal increase of the optimal potential loss, or, in other terms, the elasticity is unitary). Both the elasticity with respect to the price and to the maximum loss are negative: if we increase either the maximum loss or the price the optimal potential loss decreases.

Figure 6: Elasticity for the maximum quantity of service, the willingness-to-pay, and the maximum loss

We can then see that the maximum loss is, and by far, the least relevant factor in the customer’s decision, while a greater role is played by the price and by the quantity of service. We can observe that a decrease in the price or an increase in the maximum quantity of service are benefits that the customer appreciates readily. Since the maximum loss is associated to the worst case, and its evaluation is often influenced by attention-grabbing catastrophic statements or press releases, we can then expect the effect of such statements to be quite small on the actual customer’s behaviour. This conclusions aligns with the observed small impact of data breach notifications on identity thefts reported in [2]. The major effects are then provided by immediate benefits (price and maximum quantity of services) rather than by prospective effects (the maximum loss that may occur in the future). In fact, it has been observed that customers tend to underestimate the present value of a future loss, e.g., by applying a hyperbolic discount rate and underinsure themselves against future risks [17]; such behaviour is then well predicted by our model.

In addition to the degree of influence they exert on the customer’s decision, the four quantities so far considered differ also as to the capability of the service provider to act on them. In fact, the willingness-to-pay and the maximum quantity of service depend mainly on macro-economic and social factors (e.g., inflation and salaries, service penetration level, social use of the service), on which the service provider has little or no influence. On the other hand, the service provider has a nearly total control on the price (excluding regulatory intervention and the market pressure). The price is therefore an easy leverage in the hands of the service provider. The service provider may exert a relevant influence on the maximum potential loss as well, for example by determining the level of anonymization applied to the personal data so to mitigate the losses in the case of a data breach.

For the dimensionless driving factors (i.e., the security parameter , the privacy parameter , and the data breach probabilities and ) we cannot use the elasticity. In fact, the popular interpretation of elasticity is that it represents the percentage change in the output variable upon a 1% change in the driving factor, and the major reason why the elasticity is preferred to the derivative is that it is invariant to the arbitrary units of measurements of both variables. In the present case, however, the driving factor is either a probability or a bounded parameter, and its scale is not arbitrary since it ranges from 0 to 1. It is then usual to resort to the quasi-elasticity, where instead of two percentage changes we form the ratio of a percentage value and an absolute quantity (see, e.g., Section 2.1 of [18]). In our case, the quantity expressed as a percentage is the optimal potential loss. As for the elasticity, we recall that a positive sign of the quasi-elasticity means that the optimal potential loss changes in the same direction as the driving factor. For each dimensionless driving factor , we define the following discrete version of the quasi-elasticity:

(40)

We deal first with the quasi-elasticity with respect to the security and privacy parameters. For both we examine what happens to the optimal potential loss when they take two values at either side of the reference value 0.138647, namely the values 0.1 and 0.2. We plot the resulting quasi-elasticities in Figure 7, where we note that the privacy parameter exerts the larger influence by far. We add that the quasi-elasticity is positive, while is negative. We can conclude that the customer appears to be more sensitive to the profiling activities conducted by the service provider (embodied by the privacy parameter) than to its own self-protection attitude (embodied by the security parameter). If the service provider is privacy-friendly, the customer is willing to share more data (since the optimal potential loss grows). A privacy-friendly attitude by the provider spurs a positive market reaction, and the opt-in approach (the approach requiring the express consent by the customer to be profiled or to receive information, merchandise, or messages from a marketer) is not a detrimental factor for the market [19, 20].

Figure 7: Quasi-Elasticity for the security and privacy parameters

We plot again the Tornado chart for the quasi-elasticity with respect to the two data breach probabilities and in Figure 8. For both probabilities the range considered is , i.e., two octaves wide: the very large values (in the order of thousands) reported for the quasi-elasticity in this case are a consequence of the very small values of the data breach probabilities, which form the denominator in the ratio employed in the definition (40). In both cases the quasi-elasticity is negative: an increase in the data breach probability leads anyway the customer towards a more cautious behaviour. Here, the influence of the two driving factors is quite comparable. However, the influence of the customer’s data breach probability is slightly larger.

Figure 8: Quasi-Elasticity for the data breach probabilities

6 The case of a perfectly secure provider

In Section 3 we have seen that both the customer and the provider have a role in determining the overall probability of a data breach. In the sample cases shown in Section 5 we have considered the case where both parties are vulnerable. However, we may expect the provider’s information system to be better protected than the individual customer against data thefts. It is then interesting to consider what happens when the data breach is entirely due to the customer’s vulnerability. In this section we consider the limit case of the perfectly secure service provider and observe the effect on the customer’s trade-off decision.

In order to examine the case of the perfectly secure provider, we go back to the decision equation (19), and set the service provider’s data breach probability . We get the new decision equation

(41)

which can be solved for the optimal potential loss

(42)

We expect the risky attitude of the customer to grow as an effect of the security assured by the provider. Actually, we can measure such effect by forming the ratio of the optimal loss values obtained when and , all the other parameters being equal. We may then define the Optimal Loss Ratio as follows

(43)

If , then the customer is willing to accept a larger potential loss as a consequence of the perfect security offered by the service provider. We consider the sample case defined in Table 3 (but with ) to see how the risk-taking attitude of the customer changes in the perfect provider scenario. We plot the ratio in Figure 9. We see that the customer is actually ready to increase its potential loss, even doubling it, and that the optimal loss ratio increases with the service price. The discontinuity exhibited by that ratio in the picture, roughly around , is due to the optimal potential loss reaching its maximum value in the completely secure provider scenario while it is still growing in the unsecure provider case.

Figure 9: Increase in the optimal potential loss under complete provider’s security

In addition to observing how the optimal potential loss moves when the service provider increases the price, we can examine the sensitivity of the trade-off solution to each of the variables appearing in the decision equation (41), i.e., the driving factors. In the case of the unsecure provider, where we had not a closed-form solution of the decision equation, we resorted to the numerically-evaluated discrete elasticity, defined for significant changes of the driving factor. Here, since we have the solution (42), we can use instead the elasticity

(44)

where is again any of the driving factors. By deriving the expr. (42) with respect to each of the driving factors and replacing the trade-off solution and its first derivative in the definition of elasticity, we find the following expressions for the elasticity:

(45)

We note first that all the elasticities depend on the difference , i.e., between the security parameter and the privacy parameter. Namely, the elasticities with respect to the maximum quantity and the maximum loss depend on that difference only. That difference also appears in the solution (42). The presence of the difference means that there is a sort of counterbalance between the two parameters: if the customer has set its optimal amount of shared personal data (i.e., its proxy optimal loss ) and the service provider gets looser in the treatment of personal data (the privacy parameter grows), the customer may take a counterbalancing move by increasing its self-protecting attitude, i.e., by increasing its security parameter so to keep the difference constant. Instead, the elasticities with respect to the price imposed by the service provider and the willingness-to-pay of the customer also depend on the ratio. When and , we can plot the elasticity as a function of , with the ratio as a parameter. We obtain the curves shown in Figure 10 (where the ratio is indicated as PW). The elasticity with respect to the willingness-to-pay is always positive: easy spenders take larger risks with respect to low-spenders. On the other hand, the elasticity with respect to the price is always negative: an increase in price spurs a more restrained attitude towards divulging personal data. This observation confirms the conclusions reached in [21], where it was shown that customers aware of the use of their personal data are less willing to be profiled (profiling takes place through recording the purchasing history of the customer) and refuse to buy at high prices. A similar attitude towards being profiled is also present in [22], where the customer may refuse an offer by a service provider with the aim of getting a lower price. We see that in all cases, as the difference grows (we tend towards a combination of privacy-friendly provider and privacy-aware customer), the absolute value of the elasticity decays towards a limit value. In fact, when , we have (quite an anelastic response), so that the customer is relatively unwilling to increase its exposure to risk because of changes in the service offer by the service provider or in the maximum loss. In all cases we note that the ranking among the different elasticities observed for the case of unsecure provider in Section 5 is preserved.

Similarly, for the parameters that do not possess a unit of measure, we use the quasi-elasticity

(46)

For the privacy and the security parameters and the customer’s data breach probability we have respectively

(47)

We note that none of this set of quasi-elasticities is bounded. The quasi-elasticities with respect to the security and the privacy parameters may take both positive and negative values, while is always negative. Namely, we have iff . If the privacy parameter is not very close to 1, that may happen nearly for the whole range of prices: for the reference scenario of Table 3, the quasi-elasticity is positive if the optimal potential loss is larger than 7.13 (with ), a condition met as long as . Instead, as to the security parameter, we have iff . Again for the scenario of Table 3, the quasi-elasticity is positive as long as .

Figure 10: Increase in the optimal potential loss under complete provider’s security

7 Conclusions

We have formulated the decision to be taken by the customer on the amount of personal information to release as a trade-off problem, where the customer aims at maximizing its surplus represented by the algebraic sum of benefits and disadvantages. The benefit for the customer is the reduction of price (or, equivalently, the increase in the quantity of service obtained) achieved in return for the release of personal information. But the customer may suffer an economical loss due to the fraudulent use of its personal data. We have proved that the solution to the trade-off problem exists and is unique, providing the optimal potential loss associated to the amount of personal information the customer finds convenient to release. We have also performed a sensitivity analysis to identify the most influential driving factors, i.e., those parameters that have the largest influence on the trade-off solution. The major effects are provided by immediate benefits (price and maximum quantity of services) rather than by prospective negative effects (the maximum loss that may occur in the future). A major role is also played by the so-called privacy parameter, by which the service provider regulates the benefits released to the customers. For the special case of a perfectly secure provider (a data breach may occur just on the customer’s side) we have provided a closed-form solution of the decision equaiton, and analytical expressions for the elasticity (or quasi-elasticity) with respect to all the variables involved. The results obtained for the perfectly secure provider are quite close to those obtained for the general case. The price and the willingness-to-pay play anyway a major role in the customer’s decision. Easy spenders take larger risks with respect to low-spenders, but in general an increase in price spurs a more restrained attitude towards divulging personal data. The results may be used by the customer to determine its behaviour in response to the information disclosure requests coming from its service provider.

References

  • [1] G.R. Milne and M.E. Gordon. Direct mail privacy-efficiency trade-offs within an implied social contract framework. Journal of Public Policy & Marketing, 12(2):206–215, 1993.
  • [2] S. Romanosky, R. Sharp, and A. Acquisti. Data breaches and identity theft: When is mandatory disclosure optimal ? In The Ninth Workshop on the Economics of Information Security (WEIS), Harvard University, USA, 7-8 June 2010.
  • [3] The Ponemon Institute. 2009 Annual Study: Cost of a Data Breach. Technical report, January 2010.
  • [4] London Economics. Study on the economic benefits of privacy-enhancing technologies (PETs). Final Report to the European Commission DG Justice, Freedom and Security. Technical report, July 2010.
  • [5] Huseyin Cavusoglu, Birendra Mishra, and Srinivasan Raghunathan. A model for evaluating IT security investments. Commun. ACM, 47:87–92, July 2004.
  • [6] Lawrence A. Gordon and Martin P. Loeb. The economics of information security investment. ACM Trans. Inf. Syst. Secur., 5(4):438–457, 2002.
  • [7] Balachander Krishnamurthy. I know what you will do next summer. ACM SIGCOMM Computer Communication Review, 40(5):65–70, October 2010.
  • [8] N.G. Mankiw. Principles of Microeconomics. South-Western College Pub, 3rd edition, 2003.
  • [9] Hal Varian. Economic aspects of personal privacy. In William H. Lehr and Lorenzo Maria Pupillo, editors, Internet Policy and Economics, pages 101–110. Springer, 2009.
  • [10] M. Newman. Power laws, Pareto distributions and Zipf’s law. Contemporary Physics, 46:323–351, September 2005.
  • [11] D.C. Roberts and D.C. Turcotte. Fractality and self-organized criticality of wars. Fractals, 6(4):351–357, 1998.
  • [12] Ye-Sho Chen, P. Pete Chong, and Bin Zhang. Cyber security management and e-government. Electronic Government, 1(3):316–327, 2004.
  • [13] L. Chung. Dealing with Security Requirements during the Developmente of Information Systems. In 5th International Conference on Advanced Information Systems Engineering CAiSE 93, pages 234–251, Paris France, June 1993.
  • [14] Christopher Riederer, Vijay Erramilli, Augustin Chaintreau, Pablo Rodriguez, and Balachander Krishnamurthy. For Sale : Your Data By : You . In Tenth ACM Workshop on Hot Topics in Networks (HotNets-X), Cambridge (MA), 14-15 November 2011.
  • [15] Boris Gnedenko and Igor Ushakov. Probabilistic Reliability Engineering. John Wiley & Sons, 1995.
  • [16] Ted G. Eschenbach. Spiderplots versus tornado diagrams for sensitivity analysis. Interfaces, 22(6):40–46, Nov. - Dec. 1992.
  • [17] Alessandro Acquisti. Privacy (in italian). Rivista di Politica Economica, pages 319–368, Maggio-Giugno 2005.
  • [18] J.S. Cramer. Logit Models from Economics and Other Fields. Cambridge University Press, 2003.
  • [19] Article 29 Data Protection Working Party. Opinion 2/2010 on online behavioural advertising. European Commission, Directorate-General Justice, Working Paper WP-171, 22 June 2010.
  • [20] Federal Trade Commission. Protecting Consumer Privacy in an Era of Rapid Change. Technical report, FTC, December 2010.
  • [21] Curtis R. Taylor. Consumer privacy and the market for customer information. The RAND Journal of Economics, 35(4):pp. 631–650, 2004.
  • [22] Maurizio Naldi and Andrea Pacifici. Optimal sequence of free traffic offers in mixed fee-consumption pricing packages. Decision Support Systems, 50(1):281–291, 2010.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
54983
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description