Get Complete Project Material File(s) Now! »
Search Proles
It is reasonable to assume that the better a search engine is at matching the intent of the user with relevant content, then the more likely the user will be to return to the same search engine for more information in the future. The incredible eciency and speed at which modern search engines are able to do exactly this in today’s world is astounding. Finding information through the search engines has never been easier. The interesting business model employed by search engines, that of Internet advertising [43], has the search engines not only being easy, but also free. One can truly say that, for the first time, the world’s information is literally at one’s ngertips.
In this work, a search prole is the history of the queries issued by a user to a specic search engine. In order for a search engine to construct such a prole, it must have the ability to recognise or track users through time.
There are a number methods that can be employed by a search engine to achieve this goal, Aljifri et al [8] highlight two of them:
By recording the IP address associated with the query request. Whether fixed or dynamic, an IP address can be considered Personally Identi- able Information (PII).
A unique cookie placed on the user’s machine through theWeb Browser will result in the search engine being able to recognise the user upon his/her return.
Another mechanism that may be used to track a user is that of unique characteristics in the Web request headers, for example, the User-Agent string coupled with the browser’s dimensions, reported location and number of sessions. Regardless of the technique used, we assume that search engines are aware of and have implemented at least one of the methods available.
The ability to track a user’s queries through time means that although it is easy for a user to issue queries to a search engine, unless they are using a PET of some kind then it is just as easy for the search engine to associate the user’s query with his/her prole. The nature of the Web is such that every user of a search engine can be mapped to a unique search prole consisting of the following:
A set of search queries
The time at which each query was issued
Where applicable, the link that was eventually of relevance to the user, i.e., the link he/she chose to traverse to in the set of links the search engine matched to the original query. Tracking these links is trivial, hyperlinks to relevant content are simply redirected through the search engine.
Oine Proles
It can be argued that search proles are nothing new in the sense that they are the online equivalent of a system that has been around for some time.
Consider the library, users of this system are much like users of a search engine since they too are looking for information. Online users issue a query to a search engine which results in a set of links which redirect to Web pages with more information. The onus is then upon the user to scan the content of the Web pages in order to determine if they are relevant.
Instead of a set of links on a Web page, users in a library have a number of alternatives:
The library may classify its content using the Dewey Decimal System, in which case the user could go directly to the relevant section and begin going through the available sources.
The user could ask the librarian for assistance or perhaps to make a recommendation.
The library may have a computer system in place which has already indexed the content of its books. Finding relevant information would consist of using the system to determine which book is relevant, and then nding the book itself.
With this in mind, the search prole built up by the search engine could be compared to the history of a user in a library. In each system, there is a request for information which results in the persistence of a record.
The search engine records what the user searched for (a query) and, at the very least, the library records what the user may have checked out (a book, magazine, journal et cetera). If the information stored by the library and the search engine are equivalent, then the applicable privacy regulation of a library could be applied to that of a search engine.
Of course, this is not the case. A search prole is vastly dierent to the checkout history in a library. When one considers the scenario of a user in a library, a search engine prole would be equivalent to a librarian following the user around and recording not only what may eventually be checked out but, more importantly, how this was found (which books were looked through) and for what purpose (which book provided the most relevance). The reason for this is simple: with the astounding growth of search engine popularity, not only are search engines used as a starting point to almost all queries for information, but search engines are able to track what the user is searching for over time in addition to what was eventually relevant.
1 Aims and Scope
1.1 Introduction
1.2 Outline
2 Privacy Enhancing Technologies
2.1 Introduction
2.2 Privacy
2.3 Anonymity
2.4 Privacy Enhancing Technologies
2.5 Conclusion
3 Search Privacy
3.1 Introduction
3.2 Search
3.3 Search Proles
3.4 PETs and Search Privacy
3.5 Conclusion
4 Search Privacy Through Anonymity
4.1 Introduction
4.2 Motive
4.3 Background
4.4 The Case for an External Attack
4.5 Assumptions
4.6 The Attack
4.7 A Simulation
4.8 Conclusion
5 Search Privacy Through Personal Control
5.1 Introduction
5.2 P3P
5.3 Trust and P3P
5.4 Proxies and P3P
5.5 Search Privacy and Personal Control
6 Search Privacy Through Private Communication
6.1 Introduction
6.2 TrackMeNot
6.3 Recognising TMN
6.4 Obfuscation and Search
6.5 Conclusion
7 A Search Network
7.1 Introduction
7.2 A Case for Sharing
7.2.1 Analysis of Search Data
7.3 A Search Network
7.4 Formalisation
7.5 Conclusion
8 Conclusion
8.1 Does this enhance search privacy?
8.2 Is the search engine still a threat?