An Effective Private Data storage and Retrieval System using Secret sharing scheme based on Secure Multi-party Computation
Privacy of the outsourced data is one of the major challenge.Insecurity of the network environment and untrustworthiness of the service providers are obstacles of making the database as a service.Collection and storage of personally identifiable information is a major privacy concern.On-line public databases and resources pose a significant risk to user privacy, since a malicious database owner may monitor user queries and infer useful information about the customer.The challenge in data privacy is to share data with third-party and at the same time securing the valuable information from unauthorized access and use by third party.A Private Information Retrieval(PIR) scheme allows a user to query database while hiding the identity of the data retrieved.The naive solution for confidentiality is to encrypt data before outsourcing.Query execution,key management and statistical inference are major challenges in this case.The proposed system suggests a mechanism for secure storage and retrieval of private data using the secret sharing technique.The idea is to develop a mechanism to store private information with a highly available storage provider which could be accessed from anywhere using queries while hiding the actual data values from the storage provider.The private information retrieval system is implemented using Secure Multi-party Computation(SMC) technique which is based on secret sharing. Multi-party Computation enable parties to compute some joint function over their private inputs.The query results are obtained by performing a secure computation on the shares owned by the different servers.
Keywords: Database,Data storage,private Information Retrieval,Query Processing, Shamir’s Secret Sharing,Secure Multi-party Computation
Secure storage of confidential data and their private retrieval are major research challenges, when the data are outsourced to a third party untrusted service provider.Private Information Retrieval(PIR) allows clients to retrieve data from a database server in a privacy-preserving manner.PIR schemes make use of cryptographic protocols to safeguard the privacy of database users. This allow clients to retrieve records from public databases, while the identity of the retrieved records is completely hidden from database owners. The major goal is that the database server should be able to respond to client queries without learning any information about the records retrieved.
A trivial solution is to encrypt the database  using cryptographic techniques.But for the query processing, the entire database must be downloaded and queries must be issued locally.The query execution over encrypted data is a major research challenge.Most of the solutions are inefficient due to the large query processing time and complexities involved in key management. The use of encrypted database is clearly information-theoretically secure and the server cannot learn which record the client seeks, but the key management, time consuming encryption decryption process, overhead in large encrypted database downloading and the difficulties involved in query processing make the scheme impractical.
Fragmentation is another solution for providing confidentiality of the outsourced data .The data owner partitions the tables horizontally or vertically and distribute them to different servers.Encryption is unavoidable in this case also because fragmentation cannot preserve the confidentiality of a single attribute.Collusion between servers is also a security issue.Agarwal et al  use secret sharing technique to provide confidentiality.Their solution supports different type of queries to run efficiently.But untrusted servers have prior knowledge about data distribution or frequency.
The proposed system suggests a secret sharing method for confidentiality in the outsourced data.A relation is split into random shares and these shares are send to the different servers.This provides both reliability and security.The threshold secret sharing scheme helps the data to be retrieved from number of servers out of servers where the shares are stored.The random shares also provides information theoretical security at the cost of additional storage space. Query processing and searching is an issue here.An efficient mechanism for searching and query processing is also suggested in this paper.It needs interaction between client and different servers.The servers will send the shares which are the results of the query.The shares are then combined to form the original data.Since the computations are performed on shares, it provides a Secure Multi party Computation(SMC) environment.
There are several situations in which mutually distrustful parties need to perform a joint computation without revealing their inputs to each other. This happens, for example, during auctions, voting, negotiations and business analytics. The problem is how to perform such a computation without revealing the inputs.SMC is the solution to such problems.It permits a group of parties to jointly compute a function of their private inputs while preserving privacy and correctness of input.Every participant will get the result of computation without exposing their input.SMC protocol was first introduced by Yao in 1982 by exploring the famous Millionaire’s problem. The protocol is secure, if no participant can learn more from the description of the public function and the result of the computation.
SMC is accomplished here by using Shamir’s secret sharing scheme. In secret sharing, the secret is not single handed, but multi-handed so that even if any of the parties involved in the computation are malicious, the secret can be reconstructed. A verifiable secret sharing scheme is one in which parties can verify the validity of the shares for consistency. To handle malicious parties involved in any computation, the secret sharing scheme needs to be verifiable.
Development of secret sharing scheme started as a solution to the problem of safeguarding cryptographic keys by distributing the key among participants and or more of the participants can recover it by pooling their shares. Thus the authorized set is any subset of participants containing more than members.This scheme is denoted as threshold scheme. The notion of a threshold secret sharing scheme is independently proposed by Shamir and Blakley in 1979. Since then much work has been put into the investigation of such schemes. Linear constructions were most efficient and widely used. A threshold secret sharing scheme is called ideal, if the share size is same as the secret size and is perfect, if less than shares give no information about the secret.Blakley’s scheme is not perfect while Shamir’s scheme is perfect. Both Blakley’s and the Shamir’s constructions realize -out-of- threshold secret sharing scheme. However,their constructions are fundamentally different.
Shamir’s scheme is based on polynomial interpolation over a finite field. It uses the fact that we can construct a polynomial of degree only if data points are given. A polynomial , with is set to the secret value and the coefficients to are assigned random values in the field, is used for secret sharing.The polynomial is evaluated at different points and each value is given as share to a participant. That is , the value is given to the user as secret share.Here is considered as threshold. When any out of users join together they can reconstruct the polynomial using Lagrange interpolation with points and hence obtain the secret . Any set of users cannot gain any information about the secret and is a perfect scheme. This scheme is easily computable when necessary data is available and it avoids single point of failure . Also it increases reliability, security, safety and convenience .
The rest of the paper is organized as follows.Related works are given in section II.The proposed system and architecture are mentioned in section III.Section IV contains an example.Conclusions are drawn in section V.
Ii Related work
The PIR (Private Information Retrieval) was introduced by Benny Chor  and has already received a lot of attention . The study of PIR is motivated by growing concern about the user’s privacy when querying a large commercial database .
Protocols for PIR  and Symmetric Private Information Retrieval (SPIR) provide a limited type of privacy preserving search. In PIR the server and clients are involved, where the server has a database of items and the client wants to obtain the item at position without the server learning the value of . In the case of SPIR, it is additionally required that the user does not learn any information about other item except the one that was requested. These protocols improved the general multiparty computation and have sub-linear communication and polynomial computational complexity.But still these protocols remain inefficient for many practical uses and support only simple selection, rather than general query capability.
In database outsourcing, one party possess large amount of data, but does not have enough storage at hand for the reliable data storage.Many papers address the issues related with database outsourcing .The major issue is that we have to keep the data confidential from untrusted server and it must be retrieved without revealing any info. The approaches of  use encryption systems.The searching over encrypted data is time consuming, which need the word to be encrypted before searching.Thus the running time of the search in these approaches is linear in the number of all searchable tokens and the searching become inefficient even though it provides better security. This pinpoints the issue for trade off between efficiency and strong privacy guarantees. Curtmola et al. use the idea of inverted indices for efficiency gain.They suggest preprocessing of the data by the querier and compute inverted indices on search words. An untrusted server can learn search pattern over multiple queries in this case.
SMC based on homomorphic public-key encryption is also proposed in ,.In this each party distributes encryptions of its private inputs to the other parties. The computations are performed on this encrypted data. The homomorphic property of encryption can be used to achieve a specific functionality.Authorized set of users can do threshold decryption and the final result can be obtained.
Iii Proposed System
The proposed system suggests a method of storing and retrieving private data in a secure and effective manner. The private data include personal information, sensitive information or unique identification etc. The data storage may be a private information storage using cloud database.
Iii-a Secure Data storage
The system does not use any encryption technique.Shares corresponds to each relations are generated using Shamir’s secret sharing scheme.These shares are then stored on different servers. The architecture for data storage is shown in figure 1
The architecture has four main modules
Database owner gives the table schema to Table schema handler which is copied into all the database servers. When a data is inserted, the data insertion module will give the data into share generator module. The share generator converts the data into a bytecode form using bytecode generator and then it is divided into shares using shamir secret share generator.The shares are then distributed and stored in different database servers.
Iii-B Algorithm for data storage
Step1: The database schema is copied into database servers
Step2: When an insert operation is performed, each attribute value is divided into shares and stored in the database servers
where, is the database server
,,, is the record containing attribute shares of attributes and is the share of attribute.
Step3:Along the attribute values a primary key column containing index values starting from 1 will also be created automatically.The purpose of this is to make the retrieval process easy.
Iii-C Secure Data Retrieval
The architecture for secure data retrieval is shown in figure2
The main modules are
Step1:Client gives a query
Step2:The Query Handler parses the query and extract the where condition attribute and passes it to Computation Agent (CA)
Step3:CA request the shares of the condition attribute from DB Server manager which forwards the request to all share holding databases
Step4:After getting the attribute shares, CA reconstruct the attribute values and check the condition and finds out the index values of satisfying attributes
Step5:Query Handler gives the select attribute name and request the CA to get the shares corresponding to the index values obtained in step 4
Step6:CA forwards a packet containing the following fields to
|Indexvalue||Select attribute name||Client IP address|
DB Manager and from there to database servers.
Step7:The DB servers send the requested attribute column shares having the specified index values to the provided IP address of client
Step8:The result constructor in the client reconstructs the shares to retrieve the actual query result
Iv Example Scenario
Consider a hospital database system which contains patient’s disease records. Since it is a large database, it is outsourced in a cloud storage. The database contains sensitive information so that the content of the database should not be revealed to a third party. And also suppose the hospital authority wants to know how many AIDS patients are there keeping the anonymity of the patient.
In this case, the hospital data is stored in 3 database servers in the form of shares and a (2,3) scheme is used. Each database item is formed in to shares using shamir’s secret sharing scheme and stored in different servers having the same database schema.The database owner stores the data in the form of shares and Database Server Manager has the details of locations of database servers where these shares are getting stored.
Consider the patient_details table as shown in Table II.
Client generates the query Select Patientname from patient_details where Diagonosis=’Aids’.
The query is passed to Query Handler (QH) module. QH extracts the where portion and take the attribute Diagonosis and send to Computation Agent (CA)
CA forwards the attribute name to Database manager
Database Manager request the ’Diagonosis’ column values from any 2 database servers as per the threshold.
On getting the request each database server replies by sending the shares of the requested attribute column(Diagonosis) to Database Manager.
Database manager forwards the shares to CA where CA gets the Diagonosis column shares from any two of the database servers as shown in Table VI
Diagonosis Diagonosis 1111 210 931 1245 832 911 120 319 TABLE VI: Share from servers
CA applies langrange interpolation to reconstruct the original values for Diagonosis field as shown in Table VII
Diagonosis Aids Cancer Fever Aids TABLE VII: After Reconstruction
CA checks the where condition of the query, that is diagonosis=’Aids’ and compute index values 1 and 4 which satisfies the condition
Query Analyser passes the select attribute name (patientname) to CA and CA forwards the packet containing index values, select attribute name, client IP address to Database Manager
Database Manager requests the share holding DB servers to send the shares of attribute patientname in the specified indexes to the client IP address.
DB servers pass the shares to the specified IP address of the client
The result constructor in the client receives all the three shares of patientnames as in Table VIII
Patientname Patientname 1115 789 641 319 TABLE VIII: Shares of patientname
After reconstruction, Client gets the result as in Table IX
Patientname Ann Dona TABLE IX: Result of Query
PIR and secure storage and retrieval of data in untrusted servers raise a major security challenge. We presented a secure database storage and retrieval system based on secret sharing. Since the data is stored as shares in databases, the knowledge of shares will not reveal any clue regarding the original data. The query analysis and result reconstruction are performed in the client side computation agent which ensures privacy preserving query processing and computation.The system proves to be efficient, secure and reliable. The work can be extended with unstructured data. The coalition of the service providers to retrieve the original data is a major security concern.A secret vector which contains the values used to evaluate the secret polynomial corresponds to each user can be used, which is known only to the clients and hence provides added security against untrusted service providers.Simple and efficient XOR based secret sharing scheme can be used, if the number of servers and threshold is small.
-  Divyakant Agrawal, Amr El Abbadi, Fatih Emekci, and Ahmed Metwally. Database management as a service: Challenges and opportunities. In Data Engineering, 2009. ICDE’09. IEEE 25th International Conference on, pages 1709–1716. IEEE, 2009.
-  VP Binu and A Sreekumar. An epitome of multi secret sharing schemes for general access structure. arXiv preprint arXiv:1406.5596, 2014.
-  George Robert Blakley. Safeguarding cryptographic keys. In Managing Requirements Knowledge, International Workshop on, page 313. IEEE Computer Society, 1899.
-  Dan Boneh, Xavier Boyen, and Hovav Shacham. Short group signatures. In Advances in Cryptology–CRYPTO 2004, pages 41–55. Springer, 2004.
-  Christian Cachin, Silvio Micali, and Markus Stadler. Computationally private information retrieval with polylogarithmic communication. In Advances in CryptologyâEUROCRYPTâ99, pages 402–414. Springer, 1999.
-  Alberto Ceselli, Ernesto Damiani, Sabrina De Capitani Di Vimercati, Sushil Jajodia, Stefano Paraboschi, and Pierangela Samarati. Modeling and assessing inference exposure in encrypted databases. ACM Transactions on Information and System Security (TISSEC), 8(1):119–152, 2005.
-  Benny Chor, Eyal Kushilevitz, Oded Goldreich, and Madhu Sudan. Private information retrieval. Journal of the ACM (JACM), 45(6):965–981, 1998.
-  Yvo Desmedt and Yair Frankel. Threshold cryptosystems. In Advances in CryptologyâCRYPTOâ89 Proceedings, pages 307–315. Springer, 1990.
-  Craig Gentry and Zulfikar Ramzan. Single-database private information retrieval with constant communication rate. In Automata, Languages and Programming, pages 803–815. Springer, 2005.
-  Shafi Goldwasser. Multi party computations: past and present. In Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing, pages 1–6. ACM, 1997.
-  Yuval Ishai and Eyal Kushilevitz. Improved upper bounds on information-theoretic private information retrieval. In Proceedings of the thirty-first annual ACM symposium on Theory of computing, pages 79–88. ACM, 1999.
-  Felipe Saint-Jean. Java implementation of a single-database computationally symmetric private information retrieval (cspir) protocol. Technical report, DTIC Document, 2005.
-  Adi Shamir. How to share a secret. Communications of the ACM, 22(11):612–613, 1979.
-  Lena Wiese. Horizontal fragmentation for data outsourcing with formula-based confidentiality constraints. In Advances in Information and Computer Security, pages 101–116. Springer, 2010.
-  Peter Williams, Radu Sion, and Bogdan Carbunar. Building castles out of mud: practical access pattern privacy and correctness on untrusted storage. In Proceedings of the 15th ACM conference on Computer and communications security, pages 139–148. ACM, 2008.
-  David Woodruff and Sergey Yekhanin. A geometric approach to information-theoretic private information retrieval. In Computational Complexity, 2005. Proceedings. Twentieth Annual IEEE Conference on, pages 275–284. IEEE, 2005.
-  Andrew Chi-Chih Yao. Protocols for secure computations. In FOCS, volume 82, pages 160–164, 1982.
-  Hong Zhu, Jing Cheng, Renchao Jin, and Kevin Lu. Executing query over encrypted character strings in databases. In Frontier of Computer Science and Technology, 2007. FCST 2007. Japan-China Joint Workshop on, pages 90–97. IEEE, 2007.