Get Complete Project Material File(s) Now! »
Outsourced data Confidentiality by Encryption
Despite that traditional encryption mechanisms (e.g., Data Encryption Standard (DES) [NIST 1999] and Advanced Encryption Standard (AES) [FIPS. 2001]) ensures strong confidentiality guarantees. They are suffering from two main limitations that reduce the effectiveness of these traditional schemes, especially because of the large amount of outsourced data.
First, when outsourcing the data to an untrusted (honest by curious) service provider, data owners generally encrypt the data before its outsourcing to the remote storage server. However, the employment of traditional encryption schemes is incon-venient for huge amount of data as it reduces significantly the computation capacity both at the client and the server sides. Second, these traditional encryption approaches are deterministic, they are not adaptable and do not allow performing operations over encrypted data (e.g., search, computation, etc.). Search over encrypted data is very useful to overcome the traditional all-or-nothing retrieval policy of encrypted data. When designing an approach for searching over encrypted data, the computation on the client-side, as well as the communication overhead, plays critical roles for deciding its efficiency. More the computation on the client-side is minimized and the communication overhead is reduced, more the searching over encrypted data approach is efficient.
Several approaches have emerged to allow efficient search over encrypted outsourced data. They can be classified into several classes: Bucketing-based on some specialized data structures [Damiani et al. 2003b, Damiani et al. 2003a, Shi et al. 2007, Park 2011, Wang et al. 2014].
Bucketing Based Approaches
Hakan Hacigümüs et al. [Hacigümüs et al. 2002] have introduced a model for ensuring the confidentiality of outsourced databases based on a bucketization-based technique. This technique consists of segmenting the domain of attributes, which their assumed values are considered sensitive, into a number of non-intersecting subsets of values called buckets and having the same size. Then, a unique random identifier ID is associated to the set of unencrypted values in each Bucket. Finally, the encrypted tuples together with buckets IDs are outsourced to a cloud storage server. Let us suppose that a database D composed of a finite set of relational tables {T1, · · · , Tn} has to be outsourced. For each tuple ti = {v1i, · · · , vmii } in the relational table Ti of D, tsi = {etuple, IDv1i , · · · , IDvmii } will be outsourced to the cloud server, where etuple represents the encryption of the tuple (v1i, · · · , vmii ) and IDji represents the identifier of the bucket containing each vji, 1 ≤ j ≤ mi. The bucketization method (segmentation function) used to generate the corresponding bucket IDs for the values of each attribute are stored privately by the data owner. For illustrative purpose, let us consider that the relational table Patient (Figure 2.1 (a)) will be outsourced. Figure 2.1 (b) represents the bucketization method used for the values of the attribute Yob and Figure 2.1 (c) represents the encrypted form of the relation Patient that will be outsourced. The values of the attribute IC in Figure 2.1 are indexes obtained by the application of the bucketization method in Figure 2.1 for the attribute Y ob. In order to query the outsourced relation, an authorized user replaces the values used to search over each attribute by their corresponding buckets IDs calculated using the adopted bucketization method. For example, if an authorized user wants to execute the query q : πP atient.SSN (σP atient.Dob=1965(P atient)) over the outsourced relation, he or she needs to rewrite the query q by replacing the value 1965 by it corresponding bucket ID 9.
Hacigümüs’s bucketization technique suffers from three main disadvantages. First, despite this method allows to perform exact search over outsourced relational databases as equal values are indexed with equal bucket IDs, it does not support order search since the bucketization method used to generate bucket IDs does not necessarily preserve the plaintext domain ordering (e.g., the bucketization method used in Figure 2.1 (b)). Second, it is clear that the final results provided by this encrypted data querying method contain false positives and needs to be filtered to remove out any tuples that do not satisfy the original query conditions. Third, this technique may give rise to a possible privacy violations at server side.
Enhancing Hacigümüs’s technique, Hore et al. [Hore et al. 2004] investigated firstly the problem of performing order search over encrypted relational databases while min-imizing the number of false positives in the queries results. In a second time, they studied the leakage of information due to the bucketization-based technique. A two levels analysis is provided. In the first level, the leakage of information due to the pub-lication of the bucket IDs of only one attribute is considered. In the second level, they consider the leakage of information due to the publication of the bucket IDs of all attributes. They propose two metrics for privacy measures relevant to data bucketization: a measure based on the distribution of the values within each bucket and a measure based on the distribution entropy of the values within a bucket. To ensure the best level of privacy, the data owner has to maximize the distribution entropy of the values within each bucket. Finally, they prove that finding the best trade-off between data privacy and querying efficiency when using bucketization techniques is NP-complete. To overcome this problem, they propose an heuristic-based method fixing a maximum permitted degradation threshold of querying performance.
Another problem related to this bucketization techniques is that, to achieve the best level of data protection, the buckets should all have the same size (the number of tuples in each bucket is the same). Despite this property can be ensured in the case of data at rest databases 2, it is too hard to maintain it for data in motion databases 3.
Specialized Data Structures Based Techniques
Damiani et al. [Damiani et al. 2003a, Damiani et al. 2003b] introduced an approach allowing to perform exact and order searches over encrypted data using a B+– tree based indexing method. The proposed technique is used for each attribute to efficiently allows order and exact searches over it. That is, a B+– tree is created over the unencrypted values of the attributes to be used to search the data. Then, for each node in the created trees, a couple (id, enode) is created and outsourced to the cloud server, where id is a node identifier and enode = hlid, E(value), ridi represents the encrypted form of the cleartext node using a deterministic encryption E, lid and rid represent respectively the left and the right nodes identifiers.
To perform an order search (i,e,. range query) over an attribute attr, as a first step, the client (or data owner) selects the root node of the outsourced B+– tree created over the values of attr. In the second step, the client decrypts E(value) and compares value with the search condition to figure out with branch (left or right) is to be taken. This procedure repeats until the leaf level of the B+– tree is reached. Finally, the client navigates through the leaves to select the corresponding values. Although the final querying results produced using this method are accurate (no false positive), to perform a single search operation, we still need to perform many interactions (the height of B+– tree) between the client and the cloud server.
Shi et al. [Shi et al. 2007] proposed MRQED – a confidentiality-preserving search-able scheme supporting multi-dimensional order search over encrypted data. In this approach, a discrete integer values 1 throught D is used to encode each attribute, where D represents the number of possible values that might be assumed by each attribute. Therefore, a binary interval tree BIT (D) is created over integers 1 from D. Despite this approach ensures provably strong security, it is useless in practice since a server needs to fully scan the whole encrypted relation to perform single order search request.
Based on the work proposed by Dan Boneh et al. in [Boneh and Waters 2007], a public key based approach called Hiden Vector Encryption (HVE) is introduced in [Park 2011] supporting equality and order searches over encrypted data. Nonetheless, the complexity of range search per data item is linear in the range size, which can be too much expensive in terms of execution time when the range size is large. Moreover, the proposed technique does not use any form of indexing to reduce access complexity that might be extremely expensive when dealing with large datasets.
Table of contents :
1 Introduction
1.1 Data Outsourcing Basics
1.2 Data Outsourcing Security Challenges
1.3 Objectives and Contributions
1.4 Organization of the dissertation
I Preserving Outsourced Data Confidentiality
2 Outsourced Data Confidentiality: State of the Art
2.1 Introduction
2.2 Outsourced data Confidentiality by Encryption
2.2.1 Bucketing Based Approaches
2.2.2 Specialized Data Structures Based Techniques
2.2.3 Order-Preserving Encryption Based Approaches .
2.2.4 Searchable Symmetric Encryption Based Approaches .
2.2.5 Homomorphic Encryption Based Approaches
2.3 Outsourced Data Confidentiality by Dissociation
2.3.1 Confidentiality by Fragmentation
2.3.2 Confidentiality by Combining Fragmentation and Encryption
2.4 Conclusion
3 Preserving Multi-relational Outsourced Databases Confidentiality using Fragmentation and Encryption
3.1 Introduction
3.1.1 Motivating Scenario
3.1.2 Contributions
3.1.3 Chapter Outline
3.2 Technical Preliminaries
3.2.1 Architecture
3.2.2 Trust and Attack Model
3.3 Confidentiality Using Fragmentation and Encryption
3.4 Query Transformation and Optimization
3.5 Preserving Data Unlinkability
3.5.1 PIR System design
3.6 Implementation and Evaluation
3.6.1 Experimental Design
3.6.2 Evaluation
3.7 Conclusion and Contributions
4 Combining Encryption-based Mechanisms to ensure Outsourced Data Confidentiality
4.1 Introduction
4.2 Problem Description
4.2.1 Adjustable Database Encryption
4.2.2 Functional Requirements
4.2.3 Security Levels
4.2.4 Security Requirements and The Need for Policy Configuration .
4.2.5 Policy Enforcement
4.3 Policy Configuration
4.3.1 System Modeling
4.3.2 Policy Modeling
4.3.3 Policy Conflict Detection
4.3.4 Policy Satisfaction
4.3.5 Heuristic Search
4.4 Use Case
4.5 Implementation and Evaluation
4.5.1 Experimental Design
4.5.2 Evaluation
4.6 Conclusion and Contributions
II Heterogeneous Security and Utility Requirements Specification and Enforcement over Outsourced Data
5 An Epistemic Temporal Logic based language for Specifying Security Policy and Security Mechanisms
5.1 Introduction
5.2 An Epistemic Temporal Logic based language
5.2.1 Syntax
5.2.2 Semantics
5.2.3 Axiomatizations
5.3 Data Model Specification
5.4 Goal Specification
5.5 Policy Specification
5.5.1 Security Constraints
5.5.2 Utility Constraint
5.6 Mechanisms Specification
5.6.1 Encryption Based Security Mechanisms
5.6.2 Anonymization Based Security Mechanisms
5.6.3 Watermarking Based Security Mechanisms
5.6.4 Fragmentation Based Security Mechanisms
5.7 Conclusion and Contributions
6 Formal Reasoning Method to enforce Security Policies over Outsourced Data
6.1 Related Work
6.2 Choosing the Right Security Mechanisms
6.2.1 Fourth step: Choosing the Right Security Mechanisms.
6.3 Security Mechanisms Planning to Enforce Security Policies
6.3.1 Graphplan Description
6.3.2 Graphplan Modelling of the Planning Problem
6.3.3 Extending Planning Graph: Planning Under Security Constraints
6.3.4 Searching the Best Security Mechanisms Plan
6.4 Implementation and Evaluations
6.4.1 Implementation
6.4.2 Experimental Setup
6.4.3 Experimental Results
6.5 Conclusion and Contribution
7 Best effort based Approach for Security Mechanisms Planning to Enforce Security Policies Over Outsourced Data
7.1 Related Work
7.2 Preliminaries
7.3 Planning Graph Tainting
7.3.1 Taint Propagation Rules
7.4 Finding the Best Compromise
7.4.1 Heuristic Search
7.5 Conclusion
8 Conclusions and Perspectives
A Expression et déploiement de politiques de sécurité intégrées pour données externalisées
A.1 Introduction
A.2 middling version
A.3 middling version
A.3.1 Spécification de la politique à déployer
A.3.2 Déploiement de la politique
A.4 middling version
A.5 Conclusion
B The Specification of the System Used to Evaluate our Approach Proposed in Chapter 6
C Proofs of Theorems and Lemmas of Chapter 7
C.1 Proof of Lemma 1
C.2 Proof of Lemma 2
C.3 Proof of Theorem 11
C.4 Proof of Theorem 12
C.5 Proof of Theorem 13
List of Publications
Bibliography