Get Complete Project Material File(s) Now! »
They key role of privacy research
In the early 2000s it became apparent after many credit card scams, that digital commerce, today a standard practice, was doomed to failure if the private and public parties were not able to cooperate and gain the trust of consumers.
Today, new data-driven technologies such as Big Data, Internet Of Things, machine learning, computational biology have to potential to solve the great chal-lenges of our time but are at risk of being held back by the fear of the repercussion they might have over our private lives. Privacy research is again the key to restore the confidence of citizens in the new technologies and allow them to benefit from this untapped potential. A report from the European Commission DG Freedom, Security and Justice [Eco10] denotes the economic benefits of Privacy Enhancing Technologies (PETs) and future challenges. The US Defense Research Agency (DARPA) recently launched project Brandeis which “aims to enable individuals, enterprises, and U.S. government agencies to keep private and/or proprietary in-formation private.” These efforts prove that there is an awareness that the adoption of new ground-braking technologies will only be possible if regulations, guaran-tees and transparency are put in place.
The modern privacy scenario is extremely complex, a crossroad of technol-ogy, law, economy and social sciences. The contribution of information science can be twofold: finding formal privacy definitions (and measures) and designing mechanisms to enforce them.
The main characteristic that separates privacy form the other aspects of secu-rity is the partial disclosure of information. A classical problem of information security when handling a sensitive piece of information is keeping it confidential e.g. encrypting a document. When studying the information flow in a system, an analogous property is non interference, stating that from the observable out-puts nothing should be learned of the secret inputs. These notions characterize a quality of the system, being safe or not.
On the contrary in privacy we typically want to extract part of the information of the secret in order to obtain some utility while maintaining safe its sensitive nature. For example a user may be willing to reveal her country of residence in order to have a website localized in the right language while keeping safe her home address. Because of the nature of the problem, we don’t usually denote a system as being private or not, but rather how much private. In other words, in systems that leak some information by design, we are interested in quantitative measures of privacy to characterize this leakage. Finding the right balance between privacy and utility in one of the key challenges of the field.
Moreover privacy is an extremely diverse and personal topic, that may be in-terpreted differently from person to person. Going back to the example of the address, for some people revealing even the city of residence could still be con-sidered harmless while for others revealing the country is already too much. Ul-timately it is important to include the user in the discussion and leave room for personalization.
In summary the approach that we take in protecting privacy is to first define clearly what the user wishes to protect and how much, thus focusing on a formal quantitative definition of privacy. We then develop a mechanism, also called Pri-vacy Enhancing Technology (PET), that enforces the definition and we evaluate it measuring both its privacy and utility.
Location Privacy
In recent years, the popularity of devices capable of providing an individual’s position with a range of accuracies (e.g. through wifi access points, GPS satel-lites, etc) has led to a growing use of Location Based Services (LBSs) that record and process location data. A typical example of such systems are mapping ap-plications, Points of Interest retrieval, coupon providers, GPS navigation, and location-aware social networks – providing a service related to the user’s loca-tion. Although users are often willing to disclose their location in order to obtain a service, there are serious concerns about the privacy implications of the constant disclosure of location information. This concern was demonstrated, for instance, when in 2011 Allan and Warden 2 discovered that iOS4 was constantly logging the user’s location in the device together with time-stamps for each location. The discovery made the news and Apple had to release a press statement claiming that the information was stored for the sole purpose of helping localizing the device when requested. Nonetheless the amount of information stored and especially the number of devices affected were enough to create alarm. More recently, during the Snowden scandal, The Guardian reported that NSA and GCHQ were target-ing leaky mobile applications to collect user’s private information, location being among them [Bal14]. Indeed location can be easily linked to a variety of other information that individuals wish to protect, such as sexual preferences, political views, religious inclinations, etc. Again from the Snowden documents we know that NSA was inferring personal relationships from phones tracking [Pos13].
On the private sector, building (or accessing) high quality maps, in the past 5 years, has been a foundational step of any IT company. Consider the eco-nomical effort of Apple and Microsoft to build their own in-house solutions to catch up with Google Maps. Incredibly large sums have been proposed by the automotive industry for the acquisition of Nokia Here. More recently Mapbox, a young startup, enjoyed extremely successful funding drives to build maps on top of OpenStreetMap data.
This interest for maps is not surprising if we consider that access to rich geo-located information is the fundamental starting point for all the other data-driven companies. These companies typically dispose of a large user base, whose pri-vate data is collected, analyzed and joined with the public information from the map. Companies generate most of their revenu from advertising – e.g. Foursquare, Facebook, Google, . . . – and personalized ads, tuned to the profile of the user, are the most profitable. More recently however data licensing has become a source of revenue too. The data extracted from these huge datasets are sold to other compa-nies or public administrations e.g. Strava Metro provides statistics for roads and city planning among other things. The size of the players in the sector of geolo-cated services should serve as indication of the strategic importance of this field. From the user perspective, location information is often considered the most valu-able among personal informations [SOL+14] and at the same time there is little awareness of the how and when this information is collected. For example, the location of a wifi connected device through a geolocated database of wifi SSIDs is an extremely opaque mechanism for the average user despite being by far the most often used in mobile devices.
State of the Art
In this thesis we consider the problem of a user accessing a LBS while wishing to hide her location from the service provider. We should emphasize that, in con-trast to several works in the literature [GG03, MCA06], we are interested not in hiding the user’s identity, but instead her location. In fact, the user might be au-thenticated to the provider, in order to obtain a personalized service (personalized recommendations, friend information from a social network, etc); still she wishes to keep her location hidden.
Several techniques to address this problem have been proposed in the liter-ature, satisfying a variety of location privacy definitions. A widely-used such notion is k-anonymity (often called l-diversity in this context), requiring that the user’s location is indistinguishable among a set of k points. A different approach is to report an obfuscated location z to the service provider, typically obtained by adding random noise to the real one. Shokri et al. [STT+12] propose a method to construct an obfuscation mechanism of optimal privacy for a given quality loss constraint, where privacy is measured as the expected error of a Bayesian adver-sary trying to guess the user’s location [STBH11].
The main drawback of the aforementioned location privacy definitions is that they depend on the adversary’s background knowledge, typically modeled as a prior distribution on the set of possible locations. If the adversary can rule out some locations based on his prior knowledge, then k-anonymity will be trivially violated. Similarly, the adversary’s expected error directly depends on his prior. As a consequence, these definitions give no precise guarantees in the case when the adversary’s prior is different.
Differential privacy [Dwo06] was introduced for statistical databases exactly to cope with the issue of prior knowledge. The goal in this context is to answer aggregate queries about a group of individuals without disclosing any individual’s value. This is achieved by adding random noise to the query, and requiring that, when executed on two databases x; x0 differing on a single individual, a mech-anism should produce the same answer z with similar probabilities. Differen-tial privacy has been successfully used in the context of location-based systems [MKA+08, HR11, CAC12] when aggregate location information about a large number of individuals is published. However, in the case of a single individual accessing a LBS, this property is too strong, as it would require the information sent to the provider to be independent from the user’s location.
vacy adapted to location-based systems, introduced recently in [ABCP13]. Based on the idea that the user should enjoy strong privacy within a small radius, and weaker as we move away from his real location, geo-indistinguishability requires that the closer (geographically) two locations are, the more indistinguishable they should be. This means that when locations x; x0 are close they should produce the same reported location z with similar probabilities; however the probabilities can become substantially different as the distance between x and x0 increases. The result is that the mechanism protects the accuracy of the real location but reveals that the user is, say, in Paris instead than London, which is appropriate for the kind of applications (LBSs) we are targeting. Together with the privacy definition the authors propose a mechanism to enforce it, the Planar Laplace mechanism, which adds noise to the user’s location drawn from a 2-dimensional Laplace distribution.
Despite the simplicity and effectiveness of geo-indistinguishability together with its Planar Laplace mechanism, it presents two major drawbacks for its appli-cability in practice. First, the repeated use of the mechanism causes a rapid erosion of the privacy guarantee due to the correlation between locations in a trace. This is common to all obfuscation techniques that in general treat privacy as a budget that eventually is depleted, limiting greatly their use over time. The mechanism needs to be more flexible over time, adapting to different budget saving strategies and privacy goals of the user. Second, geo-indistinguishability needs to be tuned to a fixed degree of privacy, while in practice the protection should adapt depending on the characteristics of the user’s location. The privacy need in a dense city is different from a empty countryside and the mechanism needs to be flexible with the user’s movement in space.
Contributions
We show that the correlation in the trace can be exploited in terms of a prediction function that tries to guess the new location based on the previously reported lo-cations. The proposed mechanism tests the quality of the predicted location using a private test; in case of success the prediction is reported otherwise the location is sanitized with new noise. If there is considerable correlation in the input trace, the extra cost of the test is small compared to the savings in budget, leading to a more efficient predictive mechanism.
For an accurate use of the budget we also introduced a budget manager, a component that given a global privacy budget can tune the mechanism to a different level of privacy at each location release. The two strategies that we developed allow, from a fixed privacy budget to prolong the use the mechanism over time, degrading the accuracy of the reported location, or viceversa to limit the noise ad-dition at the cost of a limit number of reported locations. We evaluate the mecha-nism in the case of a user accessing a location-based service while moving around the city of Beijing thanks to two large datasets of real GPS traces, Geolife and TDrive. Using a simple prediction function and two budget spending strategies, optimizing either the utility or the budget consumption rate, we show that the pre-dictive mechanism offers substantial improvements over independently applied noise.
Additionally, for highly recurrent locations such as work and home, we pro-pose the use of geographic fences that on one side are publicly known to the attacker while on the other have no cost in terms of privacy, no matter how many times the mechanism is used (while in these locations). This technique can be used in combination with any other dX -private mechanism to form a fenced mechanism effectively stopping the privacy erosion.
The predictive and fenced mechanisms extends geo-indistinguishability with great flexibility of the budget usage over a prolonged time.
In the second part of this thesis we use semantic information extracted from the OpenStreetMap geographical database to build an elastic mechanism that adapts the privacy protection to the features of the user’s location. We develop an efficient graph-based algorithm to compute the distinguishability metric and test its scala-bility by building an elastic mechanism for the Paris’ wide metropolitan area. The mechanism is evaluated over the datasets of two location-based social networks, Gowalla and Brightkite, and shows a flexible behavior over space, providing ade-quate protection in the city as well as in less dense areas.
Moreover we propose an idea for a more efficient, although less flexible, ver-sion of the elastic mechanism. This mechanism provides a different protection degree, but for larger areas, like tiles covering a map. The tiled mechanism de-spite being more coarse than the elastic, can be computed for a fraction of the resources and this allows for easy integration with existing tools such as Location Guard, a popular browser extension for geo-indistinguishability.
Finally we describe in detail Location Guard and its central role for experi-menting with the techniques developed in this thesis.
Publications
The work presented in this thesis has been partially published in the following conferences. The predictive mechanism, in Section 4.2, has first appeared in the paper A Predictive Differentially-Private Mechanism for Mobility Traces in the proceedings of Privacy Enhancing Technologies Symposium 2014 [CPS14]. The elastic and fenced mechanism appeared in the paper Constructing elastic distin-guishability metrics for location privacy in the proceedings of Privacy Enhancing Technologies Symposium 2015 [CPS15]. Additionally all the mechanisms have been implemented and, together with their evaluation, they are available with an open source license on github [ela]. Finally Location Guard has been available in the different browsers websites for download and its source code is available at [loc] since November 2013.
Plan of the thesis
In Chapter 3.1 we introduce the concepts developed in the literature that are nec-essary to build our work upon. In Chapter 4 we present the predictive mechanism, explaining in details the challenges posed by repetitive use, the various compo-nents of the mechanism and its proofs of privacy and utility. A large section is de-voted to the experimental evaluation of the predictive mechanism with two large datasets of location traces from real users and the comparison with standard geo-indistinguishability. The chapter concludes with the presentation of geographical fences and their integration in the other mechanisms. In Chapter 5 we explain in detail the rigidity of standard geo-indistinguishability and propose our elas-tic mechanism. We propose an efficient and scalable graph-based algorithm to compute the mechanism and perform an extensive evaluation using two datasets of location based social networks. We then introduce the tiled mechanism as a lighter alternative, explaining how we can obtain a approximate elasticity with modest computing resources. We conclude outlining a practical implementation of the tiled mechanism for inclusion in Location Guard. More details on Location Guard can be found in Chapter 6, including motivation and adoption.
Table of contents :
1 Introduction
1.1 They key role of privacy research
1.2 Location Privacy
1.3 State of the Art
1.4 Contributions
1.5 Publications
1.6 Plan of the thesis
2 State of the Art
2.1 k-anonymity
2.1.1 l-diversity and t-closeness
2.1.2 Background knowledge
2.2 Differential Privacy
2.2.1 Compositionality and Budget
2.2.2 Background Knowledge
2.3 Bayesian Approach
2.3.1 min-entropy
2.3.2 g-leakage
2.3.3 Relation with Differential Privacy
2.4 Location Privacy
2.4.1 Location Data
2.4.2 Attacks
2.4.3 k-anonymity
2.4.4 Differential Privacy
2.4.5 Bayesian metrics for Location Privacy
2.4.6 Others metrics
3 Preliminaries
3.1 Location Privacy through reduced Accuracy
3.2 Privacy Definitions
3.2.1 dX -privacy
3.2.2 Geo-indistinguishability
3.2.3 Distinguishability level and
3.2.4 Traces
3.3 Privacy Mechanisms
3.3.1 Laplace Mechanism
3.3.2 Planar Laplace Mechanism
3.3.3 Exponential Mechanism
3.3.4 Independent Mechanism
3.4 Utility
3.4.1 Expected Error
3.4.2 ()-accuracy
4 Repeated Use over Time
4.1 Introduction
4.2 A Predictive dX -private Mechanism
4.2.1 Budget management
4.2.2 The mechanism
4.2.3 Privacy
4.2.4 Utility
4.2.5 Skipping the test
4.3 Predictive mechanism for location privacy
4.3.1 Prediction Function
4.3.2 Budget Managers
4.3.3 Configuration of the mechanism
4.3.4 Evaluation
4.3.5 Future Work
4.4 Incorporating fences in the metric
4.4.1 Fences
4.4.2 Mechanism
4.4.3 Future work
4.5 Conclusion
5 Flexible Use over Space
5.1 Introduction
5.2 An elastic distinguishability metric
5.2.1 Privacy mass
5.2.2 Requirement
5.2.3 Extracting location quality
5.2.4 An efficient algorithm to build elastic metrics
5.3 Evaluation
5.3.1 Metric construction for Paris’ metropolitan area
5.3.2 Evaluation using the Gowalla and Brightkite datasets
5.4 Tiled Mechanism
5.5 Conclusion
6 Location Guard
6.1 A web browser extension
6.2 Desktop and mobile
6.3 Operation
6.4 Adoption
6.5 Future Work
7 Conclusion