PrivacyPreserving Methods for Sharing
Financial Risk Exposures^{1}^{1}1We thank Arnout Eikeboom, Martin Hirt,
Bob Merton, and Ron Rivest for helpful comments and discussion. The
views and opinions expressed in this article are those of the authors only,
and do not necessarily represent the views and opinions of AlphaSimplex,
EPFL, MIT, any of their affiliates or
employees, or any of the individuals acknowledged above.
Disclosures: In addition to his academic affiliation, A. Lo is also Chief Investment
Strategist of AlphaSimplex Group, LLC, a consultant to the Office of
Financial Research, a research associate of the National Bureau of
Economic Research, a member of FINRA’s Economic Advisory Committee, the
New York Fed’s Financial Advisory Roundtable, Moody’s Academic Advisory
and Research Committee, and Beth Israel Deaconness Medical Center’s Board
of Overseers. In addition to his academic affiliation, A. Khandani is also an associate
at Morgan Stanley.
Unlike other industries in which intellectual property is patentable, the financial industry relies on trade secrecy to protect its business processes and methods, which can obscure critical financial risk exposures from regulators and the public. We develop methods for sharing and aggregating such risk exposures that protect the privacy of all parties involved and without the need for a trusted third party. Our approach employs secure multiparty computation techniques from cryptography in which multiple parties are able to compute joint functions without revealing their individual inputs. In our framework, individual financial institutions evaluate a protocol on their proprietary data which cannot be inverted, leading to secure computations of realvalued statistics such a concentration indexes, pairwise correlations, and other single and multipoint statistics. The proposed protocols are computationally tractable on realistic sample sizes. Potential financial applications include: the construction of privacypreserving realtime indexes of bank capital and leverage ratios; the monitoring of delegated portfolio investments; financial audits; and the publication of new indexes of proprietary trading strategies.
Introduction
While there is still considerable controversy over the root causes of the Financial Crisis of 2007–2009, there is little dispute that regulators, policymakers, and the financial industry did not have ready access to information with which early warning signals could have been generated. For example, prior to the Dodd Frank Act of 2010, even systemically important financial institutions such as AIG and Lehman Brothers were not obligated to report their amount of financial leverage, asset illiquidity, counterparty risk exposures, market share, and other critical risk data to any regulatory agency. If aggregated over the entire financial industry, such data could have played a critical role in providing regulators and investors with advance notice of AIG’s unusually concentrated position in credit default swaps, as well as the exposure of money market funds to Lehman bonds. Of course, such information is currently considered proprietary and highly confidential, and releasing it into the public domain would clearly disadvantage certain companies and benefit their competitors. But without this information, regulators and investors cannot react in a timely and measured fashion to growing threats to financial stability, thereby assuring their realization.
At the heart of this vexing challenge is privacy. Unlike other industries in which intellectual property is protected by patents, the financial industry consists primarily of “business processes” that the U.S. Patent Office deems unpatentable, at least until recently [1]. Therefore, trade secrecy has become the preferred method by which financial institutions protect the vast majority of their intellectual property, hence their need to limit disclosure of their business processes, methods, and data. Forcing a financial institution to publicly disclose its proprietary information—and without the quid pro quo of 17year exclusivity that a patent affords—will obviously discourage innovation, which benefits no one. Accordingly, government policy has tread carefully on the financial industry’s disclosure requirements.
In this paper, we propose a new approach to financial systemic risk management and monitoring via cryptographic computational methods in which the two seemingly irreconcilable objectives of protecting trade secrets and providing the public with systemic risk transparency can be achieved simultaneously. To accomplish these goals, we develop selfregulated protocols for securely computing aggregate risk measures. The protocols are constructed using secure multiparty computation tools [2, 3, 4, 5, 6, 7], specifically using secret sharing [8]. It is known from [6, 2] that general Boolean functions can be securely computed using “circuit evaluation protocols”. Since computing any function on realvalued data is approximated arbitrarily well by computing a function on quantized (or binary) data, such an approach can theoretically be used. However, for arbitrary functions and high precision, the resulting protocols may be computationally too demanding and therefore impractical. We show in this paper that for computing aggregate risk measures based on standard sample moments such as means, variances, and covariances—the typical inputs for financial risk measures—simple and efficient protocols can be achieved using secretsharing over large finite fields or directly over the reals.
With the resulting measures, it is possible to compute the aggregate risk exposures of a group of financial institutions—for example, a concentration (or “Herfindahl”) index of the credit default swaps market, the aggregate leverage of the hedgefund industry, or the margintoequity ratio of all futures brokers—without jeopardizing the privacy of any individual institution. More importantly, these measures will enable regulators and the public to accurately measure and monitor the amount of risk in the financial system while preserving the intellectual property and privacy of individual financial institutions.
Privacypreserving risk measures may also facilitate the ability of the financial industry to regulate itself more effectively. Despite the long history of “selfregulatory organizations” (SROs) in financial services, the efficacy of self regulation has been sorely tested by the recent financial crisis. However, SROs may be considerably more effective if they had access to timely and accurate information about systemic risk that did not place any single stakeholder at a competitive disadvantage. Also, the broad dissemination of privacypreserving systemic risk measures will enable the public to respond appropriately as well, reducing general risktaking activity as the threat of losses looms larger due to increasing systemic exposures. Truly sustainable financial stability is more likely to be achieved by such selfcorrecting feedback loops than by any set of regulatory measures.
Secure Protocols
Many important statistical measures such as, mean, standard deviation, concentration ratios, pairwise correlations can be obtained by taking summations and inner products on the data. Therefore, we present secure protocols for these two specific functions.
We start with a basic protocol to securely compute the sum of secret numbers. This protocol result from an application of secretsharing [8] and basic probability results. We assume that each number belongs to a known range, which we pick to be for simplicity. Recall that the operation modulo (written ) produces the unique number where is an integer, e.g., .
SecureSum Protocol
For , each party possesses the secret number as an input, and the output to
each party is (where the addition is over the reals).
The protocol is as follows:

Each pair of parties exchange privately random numbers. Namely, for all with , party provides to party a random number drawn uniformly at random in .

For each , party adds to its secret number the random numbers it has received from other parties and subtracts the random numbers it has provided to other parties. More formally, party computes . Each party publicly reveals .

Each party computes , which equals .
Numerical example. Let (i.e., three parties), , and . In the first round of the protocol, the parties exchange random numbers . For example,
Party 1  Party 2  Party 3  

Party 1 provides  1.4  2.1  
Party 2 provides  1.1  2.3  
Party 3 provides  0.3  2.9 
In the second round, party adds to its secret number the elements of the th column and subtract the elements of the th row (using modulo 3 arithmetic). Each party publishes the result :
1  1.1  1.5 

Finally, the parties add these numbers (modulo 3) and compute the output sum:
Protocol correctness and secrecy. If the parties follow the protocol correctly, it is easy to check that the correct sum is always obtained, since each element is added and subtracted once in . In addition, we show that this protocol reveals nothing else about the secret numbers than their sum, even if the parties attempt to infer more from the exchanged data. For example, Party 1 may try to learn more about other parties’ secret numbers by using the information gathered in . We state informally the secrecy guarantee in the following theorem and provide a formal statement and proof in the appendix. We first illustrate a weaker fact here by plotting the values of for several realizations of the random numbers , while keeping fixed , and . As shown in Figure 1, the realizations of uniformly cover the set of points for which , suggesting that there is no relevant information in the ’s other than their sum.
The following is obtained assuming that parties follow the protocol requirements without deviating from it.
Theorem 1.
The SecureSum protocol outputs the sum of privately owned real numbers and does not reveal any additional information about the individual numbers.
This theorem follows directly from secretsharing [8] and basic probability results. For convenience, we provide a proof in the Appendix.
SecureInnerProduct Protocol
To compute securely the inner product of two real vectors, slightly more sophisticated protocols are developed and presented in the appendix, using basic secret sharing [8], secretsharing as employed in [7, 3, 4], and Oblivious Transfer [9, 10]. The variants include informationtheoretic and cryptographic protocols on quantized or real data, and have different attributes discussed in the appendix. We state here an informal result regarding one of these protocols which we call SecureInnerProduct protocol 1.
Theorem 2.
The SecureInnerProduct protocol 1 outputs the sum of two privately owned quantized vectors and does not reveal any additional information about the individual vectors.
Note that the previous two theorems hold provided that the parties follow the protocol requirements (without colluding or cheating). Extensions to malicious parties or other type of functions can also be developed but are not discussed here.
Illustrative Example
To illustrate the practical implementation of privacypreserving measures, we provide a simple numerical example using publicly available quarterly data from June 1986 to December 2010 (released in arrears by the U.S. Federal Reserve) on the total amount of outstanding loans linked to real estate issued by three major bank holding companies: Bank of America, JPMorgan, and Wells Fargo [11]. Suppose that the aggregate value of these loans across the three banks is the risk exposure of interest, and the magnitude of outstanding loans for each bank is the proprietary data to be kept private. The historical time series of these data are displayed in Figure 2; the bar graph in blue is the aggregate risk exposure to be computed and the three line graphs are the proprietary inputs.
The desired result can be obtained with an application of the SecureSum protocol described above [12], which consists of two steps. In the first step, each institution produces two random numbers to be shared, one for each of the other two participating institutions. These numbers are shown in line graphs of Figure 2 where the color coding indicates the institution generating the random numbers. Since these numbers are purely random, there is no relationship between them and the private data of Figure 2, a fact that is clear from visual inspection of the intermediate outputs in Figure 2.
In the second step of the SecureSum protocol, each institution uses its private data, the two numbers it receives from the other two participating banks, as well as the two numbers it sends to the other two institutions to produce a single value, which we refer to as the privacypreserving measure of its private data. This value will be revealed to the other two institutions. While these privacypreserving measures, shown in Figure 2, seem like a pure noise, they have just enough of the original data so that the sum of these three numbers under modulo arithmetic yields the correct sum of the original inputs. The key here is that the randomness produced in the first step, as shown in Figure 2, exactly cancels in the second step due to the way that the protocol in constructed. It is apparent that the aggregate loans outstanding in Figure 2 is identical to the corresponding graph in Figure 2, but the former graph has been computed using only the privacypreserving measures of Figure 2.
Despite the fact that the underlying data used in this example is not confidential, even in this simple illustrative case privacypreserving measures may still prove useful in providing financial institutions and regulators with an incentive to release the data without a lag. More timely releases would obviously benefit all stakeholders by allowing them to respond more nimbly to changing market conditions, but such releases could also disadvantage certain parties in favor of others if privacy were not assured. Moreover, this example underscores the simplicity with which more sensitive data such as leverage ratios, positions in illiquid assets, and offbalancesheet derivatives holdings can be shared regularly, securely, and in a timely fashion.
We consider only three institutions in this example because it is the simplest nontrivial case in which privacypreserving measures of aggregate sums can be constructed. Clearly, the protocol is applicable for any number of participants greater than two, and implementation for even several thousand participants is extremely fast. More complex risk exposures such as Herfindahl concentration indices require two applications of the SecureSum protocol, but the computational burdens are still quite modest. The SecureInnerProduct protocol can be used to construct multipoint statistical measures such as average correlations between changes in securities holdings or leverage across industry participants.
Discussion
By construction, privacypreserving measures of financial risk exposures cannot be “reverseengineered” to yield information about the individual constituents. Accordingly, there is no guarantee that the individual inputs are truthful. In this respect, the potential for misreporting and fraud are no different for these measures than they are for current reporting obligations by financial institutions to their regulators, and existing mechanisms for ensuring compliance—random periodic examinations and severe criminal and civil penalties for misleading disclosures—must be applied here as well.
However, unlike traditional regulatory disclosures, privacypreserving measures will provide its users with a strong incentive to be truthful because the mathematical guarantee of privacy eliminates the primary motivation for obfuscation. Since each institution’s proprietary information remains private even after disclosure, dishonesty yields no discernible benefits but could have tremendous reputational costs, and this asymmetric payoff provides significantly greater economic incentive for compliance. Moreover, accurate and timely measures of systemwide risk exposures can benefit the entire industry in allowing institutions and investors to engage in selfcorrecting behavior that can reduce the likelihood of systemic shocks. For example, if all stakeholders were able to monitor the aggregate amount of leverage in the financial system at all times, there is a greater chance that market participants would become more wary and less aggressive as they observe leverage rising beyond prudent levels.
A related issue is whether participation in privacypreserving disclosures of financial risk exposures is voluntary or mandated by regulation. Given the extremely low cost/benefit ratio of such disclosures, there is reason to believe that the financial industry may well adopt such disclosures voluntarily. A case in point is Markit, a successful industry consortium of dealers of credit default swaps (CDS) that emerged in 2001 to pool confidential pricing data on individual CDS transactions and make the anonymized data available to each other and the public so as to promote transparency and liquidity in this market [13]. According to Markit’s website, the data of its consortium members are “provided on equal terms to whoever wanted to use it, with the same data released to all customers at the same time, giving both the sellside and buyside access to exactly the same daily valuation and risk management information”. From this carefully crafted statement, it is clear that equitable and easy access to data is of paramount importance in structuring this popular datasharing consortium. Privacypreserving methods of sharing information could greatly enhance the efficacy and popularity of such cooperatives.
The same motivation applies to the sharing of aggregate financial risk exposures, but with even greater stakes as the recent financial crisis has demonstrated. Once a privacypreserving systemriskexposures consortium is established, the benefits will so clearly dominate the nominal costs of participation that it should gain widespread acceptance and adoption in short order. Indeed, participation in such a consortium may serve as a visible commitment to industry best practices that yields tangible benefits for business development, leading to a “virtuous cycle” of privacypreserving risk disclosure throughout the financial industry
Conclusion
Privacypreserving measures of financial risk exposures solve the challenge of measuring aggregate risk among multiple financial institutions without encroaching on the privacy of any individual institution. Previous approaches to addressing this challenge require trusted third parties, i.e., regulators, to collect, archive, and properly assess systemic risk. Apart from the burden this places on government oversight, such an approach is also highly inefficient, requiring properly targeted and perfectly timed regulatory intervention among an increasingly complex and dynamic financial system. Privacypreserving measures can promote more efficient “crowdsourced” responses to emerging threats to systemic stability, enabling both regulators and market participants to accurately monitor systemic risks in a timely and coordinated fashion, creating a more responsive negativefeedback loop for stabilizing the financial system. This feature may be especially valuable for promoting international coordination among multiple regulatory jurisdictions. While a certain degree of regulatory competition is unavoidable given the competitive nature of sovereign governments, privacypreserving measures do eliminate a significant political obstacle to regulatory collaboration across national boundaries.
Privacypreserving risk measures have several other financial and nonfinancial applications. Investors such as endowments, foundations, pension and sovereign wealth funds can use these measures to ensure that their investments in various proprietary vehicles—hedge funds, private equity, and other private partnerships—are sufficiently diversified and not overly concentrated in a small number of risk factors. Financial auditors charged with the task of valuing illiquid assets at a given financial institution can use these measures to compare and contrast their valuations with the industry average and the dispersion of valuations across multiple institutions. Realtime indexes of the aggregate amount of hedging activity in systemically important markets like the S&P 500 futures contract may be constructed, which could have served as an early warning signal for the “Flash Crash” of May 6, 2010.
More broadly, privacypreserving measures of risk exposures may be useful in other industries in which aggregate risks are created by individual institutions and where maintaining privacy in computing such risks is important for promoting transparency and innovation, such as healthcare, epidemiology, and agribusiness.
References and Notes
 [1] Lerner, J. Where does state street lead?: A first look at financial patents, 1971–2000. Journal of Finance 57, 901–930 (2002).
 [2] Goldreich, O., Micali, S. & Wigderson, A. How to play any mental game. In ACM Sympos. on Theory of Comput. (STOC), 218–229 (New York, NY, 1987).
 [3] Chaum, D., Crépeau, C. & Damgard, I. Multiparty unconditionally secure protocols. In Proceedings of the twentieth annual ACM symposium on Theory of computing, STOC ’88, 11–19 (1988).
 [4] Cramer, R., Damgard, I., Dziembowski, S., Hirt, M. & Rabin, T. Efficient multiparty computations with dishonest minority. In Proceedings of EuroCrypt, Springer Verlag LNCS series (1999).
 [5] Beaver, D., Micali, S. & Rogaway, P. The round complexity of secure protocols. In ACM Sympos. on Theory of Comput. (STOC), 503–513 (New York, NY, 1990).
 [6] Yao, A. C. Protocols for secure computations. In 23rd Annual Symposium on Foundations of Computer Science (FOCS), 160–164 (1982).
 [7] BenOr, M., Goldwasser, S. & Wigderson, A. Completeness theorems for noncryptographic faulttolerant distributed computation. In ACM Sympos. on Theory of Comput. (STOC), 1–10 (New York, NY, 1988).
 [8] Shamir, A. How to share a secret. Communications of the ACM 22, 612–613 (1979).
 [9] Rabin, M. O. How to exchange secrets by oblivious transfer. In Technical Report TR81 (1981).
 [10] Even, S., Goldreich, O. & Lempel, A. A randomized protocol for signing contracts. In Communications of the ACM, vol. 28, 637–647 (1985).
 [11] We have used series BHCK1410 (Loans secured by real estate) which is disclosed by US Bank Holding Companies to the Federal Reserve via form FR Y9C. For Bank of America we use RSSD ID 1026016 prior to 199809 and RSSD ID 1073757 after, for Wells Fargo we use RSSD ID 1027095 prior to 199812 and RSSD ID 1120754 after, and for JP Morgan we use RSSD ID 1039502.
 [12] The Secure Sum Protocol discussed previously assume input are in range of [0,1]. The protocol works equally with any range and in this application we first turned the raw data in billions of dollars and then assumed a range of [0,10000] for all inputs. This implies that all arithmetic is done using modulo 30000 (3x10000).
 [13] This consortium—currently known as “Markit”—was original called “MarkIt Partners” and in a December 16, 2003 press release, the company described itself in the following way: “Founded early in 2001 with the support of contributing data partners ABN Amro, Bank of America, CitiGroup, CSFB, Deutsche Bank, Dresdner Kleinwort Wasserstein, Goldman Sachs, JP Morgan, Lehman Brothers, Merrill Lynch, Morgan Stanley, TD Securities, and UBS, MarkIt Partners currently receives daily credit default swap (CDS) pricing on over 3,700 issues and receives pricing on over 30,000 cash securities. These banks feed current and historical credit data into the Markit Partners system on a daily basis, facilitating better decisionmaking and credit risk management within banks’ credit operations.”.
 [14] Rivest, R., Shamir, A. & Adleman, L. A method for obtaining digital signatures and publickey cryptosystems. Communications of the ACM 21, 120–126 (1978).
 [15] Goldreich, O. Secure multiparty computation (working draft). Available from http://www.wisdom.weizmann.ac.il/home/oded/public html/ foc.html (1998).
 [16] Naor, M. & Pinkas, B. Efficient oblivious transfer protocols. In Proceedings of the SIAM Symposium on Discrete Algorithms (SODA) (Washington DC, 2001).
 [17] Cramer, R., Damgaard, I. & Nielsen, J. B. Multiparty computations from threshold homomorphic encryption. In Proceedings of 20th Annual IACR EUROCRYPT, vol. 2045, 280–300 (Springer Verlag LNCS, Innsbruck, Austria, 2001).
 [18] Franklin, M. & Haber, S. Joint encryption and messageefficient secure computation. Journal of Cryptology 9, 217–232 (1996).
 [19] Gentry, C. Fully homomorphic encryption using ideal lattices. In ACM Sympos. on Theory of Comput. (STOC), 169–178 (2009).
 [20] Brakerski, Z. & Vaikuntanathan, V. Efficient fully homomorphic encryption from (standard) LWE. In 23rd Annual Symposium on Foundations of Computer Science (FOCS) (2011).
 [21] Damgard, I., Groth, J. & Salomonsen, G. The theory and implementation of an electronic voting system. 77–100 (Kluwer Academic Publishers, 2002).
 [22] Naor, M., Pinkas, B. & Sumner, R. Privacy preserving auctions and mechanism design. In Proceedings of the 1st ACM conference on Electronic commerce (1999).
 [23] Lindell, Y. & Pinkas, B. Privacy preserving data mining. Lecture Notes in Computer Science 1880, 36–54 (2000).
 [24] Chaum, D. Blind signatures for untraceable payments. Lecture Notes in Computer Science 1880, 36–54 (2000).
 [25] Bogetoft, P. et al. Multiparty computation goes live. Cryptology ePrint Archive, Report 2008/068 (2008).
 [26] Chase, M., Lauter, K., Benaloh, J. & Horvitz, E. Patientcontrolled encryption: patient privacy in electronic medical records. ACM Cloud Computing Security Workshop (2009).
Appendix
In this appendix, we provide formal theorems and proofs of the security guarantees ensured by the SecureSum and three SecureInnerProduct protocols, assuming semihonest parties (possibly curious but following the protocol correctly). Extensions to malicious parties can be considered but are not discussed here.
SecureInnerProduct protocols 1 and 2 use a third dummy party to help with the computations while SecureInnerProduct protocol 3 does not. The dummy party does not possess inputs or receives meaningful information but simply helps with the computation (note that for the applications in mind, the use of a dummy party does not represent a significant obstacle). SecureInnerProduct protocols 1 and 3 are defined on quantized data, while SecureInnerProduct protocol 2 applies directly to realvalued data. Finally, SecureInnerProduct protocol 1 provides informationtheoretic security, SecureInnerProduct protocol 2 provides ‘almost’ informationtheoretic security (as defined in Theorem 5) and both protocols require only elementary operations at a computational level, while SecureInnerProduct protocol 3 provides cryptographic security (i.e., it relies on computationalhardness assumptions) and uses OT protocols (hence nonelementary operations such as RSA [14] encryptions and decryptions).
An important benchmark for the practical consideration of secure protocols is the number of communication rounds, which require exchange of data over communications media such as the internet. With a standard internet connection and for arbitrary distances this can take no longer than 2–3 seconds but may also dominate the protocol running time. All protocols proposed here require few communication rounds. The following table summarizes these properties, where denotes the vector dimension and the quantization level.
Protocols  Security  Dummy party  Data  Rounds  Complexity 

SecureSum  IT  no  real  2  elem. op. 
SecureInnerProduct 1  IT  yes  quantized  3  elem. op. 
SecureInnerProduct 2  almost IT  yes  real  3  elem. op. 
SecureInnerProduct 3  crypto  no  quantized  3  OT 
Sum Protocols and Theorems
For convenience, we restate the SecureSum protocol.
SecureSum Protocol.
Inputs: for , party possesses the secret number .
Output: each party obtains (where the addition is over the reals).
Protocol:

Each pair of parties exchange privately random numbers. Namely, for all with , party provides to party a random number drawn uniformly at random in .

For each , party adds to its secret number the random numbers it has received from other parties and subtract the random numbers it has provided to other parties. In formula, party computes . Each party publicly reveals .

Each party computes , which equals .
One can define other variants and extensions of this protocol, in which fewer random numbers are exchanged to minimize information flow, or in which more information is exchanged to check the correctness of parties computations (one may also use virtual parties for that).
Theorem 3.
Let be privately owned real numbers. Let and denote the view of party i obtained from the SecureSum protocol with inputs . The protocol outputs the sum and the distribution of depends on only through and .
We provide first the proof argument for . Assume that party 1 collects all the data it possesses and received from other parties to try to learn something about their secret numbers. That is, party 1 possesses its secret number , the numbers exchanged in step 1, the numbers revealed in step 2 and the output sum (whose information is already contained in the ’s). From these, party 1 can subtract in the terms depending on and obtain the righthand side of
(1)  
(2) 
and this is all the information party 1 can gather about other parties secret numbers. Adding these equations provides , i.e., what can be deduced from knowing the sum of the secret numbers. To see that nothing else can be inferred from (1) or (2), note that is uniform on . However, for any fixed number , if one adds to it a random number uniformly drawn in , the number is also uniformly drawn in . Therefore, (1) (or (2)) does not provide any further information about (or ).
Proof of Theorem 3.
All the arithmetic in this proof is modulo . We first check that the protocol computes indeed the sum. We set for all , to simply notations. This is straightforward since and hence, . Let be the protocol view of party 1, i.e.,
Party 1 can subtract the ’s it has access to in the ’s, obtaining as a sufficient statistic for , where
and
Let us define and , where contains all the for which (in increasing order). Note that and are a random vectors of dimension respectively and . We then have that
where is the matrix whose th row is filled with 0’s except at columns where it is 1, and is a permutation matrix. Note that the rank of and the rank of is , implying that , where
Therefore, for any , there exists such that and
where the second equality uses the fact that and are both i.i.d. uniform over . This shows that is uniform over and is uniform over
Therefore, the distribution of , and hence of , depends only on and . By symmetry, the analogue conclusion holds for any parties, which concludes the proof of the theorem. ∎
InnerProduct Protocols and Theorems
We now present secure protocols to compute the sample correlation, or equivalently the inner product, between two real vectors. Recall that the sample correlation of two vectors and is given by
where , , , , and .
Definition 1.
We denote by the set , and by the same set equipped with the Galois field operations when is a power of a prime. We define by the sets of tuples in which add up to , i.e.,
We may call the ’s to be shares of .
SecureInnerProduct Protocol 1.
Common inputs: (the quantization level), (the vector dimensions) and a prime larger than .
Party 1 inputs: .
Party 2 inputs: .
Party 3 inputs: none.

For , party 1 splits in three shares , and uniformly drawn in and party 2 splits in three shares , and uniformly drawn in . Party provides privately to party the shares and privately to party the share . Party provides privately to party the shares and privately to party the share .

Party 1 sets and , party 2 sets and , and party 3 sets and . For , party splits in three shares and uniformly drawn in and reveals privately to party , for .

For , party computes . Parties 1 and 2 exchange and and party 3 provides to parties 1 and 2. Parties 1 and 2 compute .
Theorem 4.
Let and be two privately owned vectors on . Let denote the view of party 1 obtained from the SecureInnerProduct protocol 1 with inputs . The protocol outputs the inner product and the distribution of depends on only through and . The reciprocal result holds for party 2.
Proof of Theorem 4.
The arithmetic is on in the following. We first check that the protocol computes indeed the inner product. For every , , hence
Moreover, , hence
Let be the protocol view of party 1, which is a function of
where contains all components for and similarly for the . Note that for , are independent and uniformly drawn in , where . Moreover, step 2. and 3. of the protocol are equivalent to running the securesumprotocol on . Hence, from Theorem 3, for any realization of , the distribution of depends only on the sum and on , where depends only on and on which are independent and uniformly distributed over . Therefore, the distribution of , hence , depends only on and on . ∎
SecureInnerProduct Protocol 2.
Common input: (the vector dimensions) and
Party 1 inputs: .
Party 2 inputs: .
Party 3 inputs: none.

For ,

party 1 splits in three shares by evaluating a random polynomial at , where and where is uniformly drawn in . Party 1 reveals to party for ,

party 2 splits in three shares , for where is uniformly drawn in , and reveals to party for .


For ,

party computes ,

party draws independently and uniformly at random in and for , sets and shares with party ,

is made available to parties 1 and 2.


Party 1 and 2 compute by interpolating a degree 2 polynomial on , , obtaining .
Theorem 5.
Let and be two privately owned real vectors on , where is fixed. Let denote the view of party 1 obtained from the SecureInnerProduct protocol 2 (over the reals) with inputs . The protocol outputs the inner product and the distribution of can be approximated arbitrarily close (in total variation distance and when increases) by a distribution depending on only through and . The reciprocal result holds for party 2.
We omit the proof of this theorem to conserve space since it does not concern the main scope of the paper. We refer to Theorem 4 for a proof of a Secure InnerProduct protocol, which can be used on real data via quantization.
We provide a third protocol to compute securely the innerproduct function without using a third dummy party but ensuring only cryptographic security. This protocol uses the Oblivious Transfer (OT) protocol, developed by [9, 10], which is an important protocol for multiparty computations as it allows to compute in particular secret shares of the product of two bits and , and can then be used in the computation of more general circuit computations. The basic OT protocol allows a sender to transfer one of potentially many bits to a receiver; however, the sender remains oblivious as to what bit the receiver wants and the receiver remains oblivious about any other bits than the one he has requested. In other words, the functionality in the OT protocol takes the bits as inputs for the first party and the index for the second party, and produces as output nothing for the first party and the bit requested by the second party. Formally,
where denotes the no information symbol. We now describe OT.
Ot protocol
Sender inputs: and a private key .
Receiver inputs: and a public key .
Algorithm:

The sender generates two random numbers and transmit them to the receiver.

The receiver generates a random number , encrypts it with the public key and scrambles the outcome with to produce

The sender decrypts the two numbers and to get and respectively (i.e., it computes for ). Note that either or is equal to , but these are equally likely for the sender, and reciprocally, is not accessible to the receiver. The sender then transmits and .

The receiver finds .
The OT protocol is easily obtained by extending previous protocol to multiple sender bits, ad similarly, one can extend the protocol to non binary fields.
We now present a cryptographic protocol for the inner product.
SecureInnerProduct Protocol 3.
Common inputs: (the quantization level), (the vector dimensions).
Party 1 inputs: .
Party 2 inputs: .

For ,

party 1 picks uniformly at random in and reveals it to party 2, who picks uniformly at random in and reveals it to party 1.

party 1 picks uniformly at random in and sends
(all operations ) with OT to party 2 who picks the th element.

party 2 picks uniformly at random in and sends
(all operations ) with OT to party 1 who picks the th element.

party 1 computes and party computes . Note that these are shares of the product .


Party 1 computes and reveals it to party 2, who computes and reveals it to party 1.

Each party computes .
From the protocol construction, we have the following result.
Lemma 1.
SecureInnerProduct protocol 3 privately reduces the correlation computation to the OT protocol.
The notion of being “privately reducible” is formally defined in Section 2.2. of [15]. From the composition theorem for the semihonest setting in Section 2.2. of [15], one obtains as a consequence of the previous lemma that SecureInnerProduct protocol 3 privately computes the inner product provided the existence of trapdoor oneway permutations. In particular, using RSA for the encryptions in OT, the protocol is secure provided that RSA cannot be broken.
This protocol requires OT protocols but only three communication rounds. This still means a possibly high number of public and private encryptions/decryptions (e.g., with RSA). One may use [16] to improve the OT protocols running time. Another approach consist in using a Boolean circuit for correlations as in Figure 3, using OT protocols to compute shares of the multiplication gates (and simply adding shares for the XOR gates). Such an approach, as developed in [2], or related approaches as in [6, 5], may be particularly useful for other functions such as for the quantile function, which does not have the arithmetic structure of the summation or innerproduct functions. In particular, [6, 5] provide protocols with constant communication rounds which may matter for practical considerations, although for real data problems, the practicality of such algorithms need to be further investigated.
Related literature on MPCs
Theory
The problem of secure multiparty computation emerged with the work of Yao [6] in 1982, and with the work of Goldreich, Micali and Wigderson [2] in 1987. It is shown in [6] that any Boolean functionality can be computed without requiring an external trusted party for two parties, and [2] provides protocols for arbitrarily many parties. Since these papers, many have proposed variations of MPC settings, allowing different kinds of adversarial parties, security, and efficiency attributes. In particular, [5] introduces cryptographic protocols with bounded circuit depths (requiring finitely many communication rounds) and [7, 3, 4] develop informationtheoretic protocols. Homomorphic encryption has also been shown to provide another approach to secure multiparty computations [17, 18], and more recently, Gentry [19] showed that fully homomorphic encryption schemes can be constructed, allowing addition and multiplication to be performed on encrypted data without having to decrypt it. This approach leads to MPC protocols that do not have communication rounds increasing with the circuit complexity, although fully homomorphic encryption is still considered impractical. For certain functionality, progress regarding practical fully homomorphic encryption have been achieved in [20] with somewhat fully homomorphic encryptions schemes using the learningwitherrors assumption.
Applications
The main applications associated with MPCs in the literature include distributed voting [21], private bidding and auctions [22], data mining [23], and sharing of signature [24]. MPCs have been used for the first time in a realworld application only in 2008, when 1,200 farmers in Denmark employed an MPC protocol in a nationwide auction to determine the market price of sugarbeets contracts without revealing their selling and buying prices [25]. The whole computation took about half an hour, a satisfactory time for this application. In a different context, [26] introduces “Patient Controlled Encryption” scheme, where an electronic health record system allowing searches to be done on encrypted data is developed.