1 Introduction
With the rapid development of mobile networks, location-based services has become popular in the daily lives of people. The service providers can recommend the profitable services to persons through mining the frequent interests or places of persons. However, one aspect is that the historical data on Internet can easily cause the leakage of user-relationship privacy, another aspect is that the historical interests of person are always bound to time. Therefore, this paper devotes to study a privacy protection method on time-constrained point of interests (PoIs) based on the group relationships of users.
Existing technologies provide different strategies for the privacy protection problem, that can be roughly divided into two categories. First category [
1–
3] is to protect the private data by replacing part of the original data by noisy data. A representative strategy of this category is generalization-based method, that integrates a series of lower-level concepts into a higher-level similar ones. Second category [
4–
6] is to classify and divide data based on the attributes of data, the type of service provided by third-party applications, and the type of privacy protection. The method usually divides the data into sensitive and non-sensitive data. A more detailed division can be employed in the graph-based data, including nodes, attributes and edges, etc.
In practical applications, the above two classified methods can be combined to protect the privacy of user PoIs. Our contributions are shown as following. Firstly, we mine the inter-user relationships, to extract the common PoIs of multiple users within the different time dimensions. Secondly, we propose an algorithm of relational privacy protection based on the above mined inter-user relationships to hide the PoIs of users. Finally, extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms.
2 Our model
The model of -anonymity is proposed to protect the time-constrained PoIs of users, denoted in Definition 1.
Definition 1 ( -anonymity). Given a time-constrained data set of users, a -anonymity is used to make the number of each vulnerable PoIs in not less than , where and denote the number of common PoIs of two-user and multi-user as the prior knowledge known by attackers, respectively.
Where, a time-constrained data set contains a set of users and time-constrained PoIs , and each point of interest at time-stamp is denoted as .
3 Privacy protection of user relationship
The privacy protection of user relationship is classified as the anonymity of single-relationship and group-relationship within the prior knowledge of attackers.
3.1 Anonymity of single-relationship
Assumed that an attacker already knows that two users contain common PoIs, the anonymity of single-relationship is to anonymize the vulnerable PoIs, defined as , such that the number of anonymized PoIs is more than at each timestamp.
Considering a time-constrained data set of user PoIs in Tab.1, that contains a set of users = , , , , , time = , , , , , and PoIs , , , , , . Regarding the prior knowledge = and = , the single-relationship between users and is calculated as a set ( , ) = , , , , , , , , The anonymity of ( , ) is to anonymize the common PoIs, such that the number of common PoIs is not more than and the number of column-sorted PoIs at each timestamp is not less than . Thus, the way of replacing the one of common PoIs , and can satisfy the constraint of (= ) prior knowledge. The anonymity of ( , ) can replace to at timestamps and or replace to at timestamp in user . If is replaced to at timestamp , the number of is , that is less than , thus the replacement of to at timestamp does not satisfy the constrain of (= ) prior knowledge.
Further, the selected PoIs are filtered based on the distance of PoIs. The distance should not exceed a threshold , that is used to calculate the value of pairwise PoIs by two users. The larger the value is, the easier the attackers can be perceived, thus the a smaller value is employed to anonymize the vulnerable PoIs of user. Considering an assumed distance matrix of PoIs in Tab.2, there exists = , the distance from to (= ) is more than (= ) and the distance from to (= ) is less than (= ) in single-relationship ( , ), thus is only replaced to in the limits of anonymous distance in Tab.2.
3.2 Anonymity of group-relationship
Assumed that an attacker already knows that multiple users contain common PoIs, the anonymity of group-relationship is to anonymize the vulnerable PoIs, defined as , such that the number of common anonymized PoIs is not less than at each timestamp.
The anonymity of group-relationship is similar to the one of single-relationship. Considering a time-constrained data set of user PoIs in Tab.1 and prior knowledge = = , the group relationship of users , and is calculated as a set ( , , ) = , , , , , , thus the PoI or of one user can be replaced by or to destroy the prior knowledge of group-relationship in the limits of prior knowledge and anonymous distance.
Not that, if is bigger than , the prior knowledge is meaningless in group relationship. Since the anonymity of single-relationship should ensure that the number of common PoIs is not more than , there dose not exist a group prior knowledge of , such that .
3.3 Anonymous algorithm of user relationship
The anonymous algorithm of user relationship is denoted in Algorithm 1. The inputs are the time-constrained data set of user PoIs ( , ), prior knowledge , , and a distance matrix of PoIs . The output is an anonymous data set from .
Algorithm 1 consists of two subroutines. The first subroutine is to construct the user relationships as a tree (Lines 1−5). The tree is rooted by the single-relationship and takes the group-relationship as node, where the branches denote a user linking single-relationship and group-relationship (Line 5). The second subroutine is to anonymize the user PoIs in time-constrained data set (Lines 6−11). According to Section 3.2, the prior knowledge is meaningless in group relationship if is bigger than . Therefore, the user PoIs of single-relationship are only anonymized if (Lines 6−7), otherwise both user PoIs of single-relationship and group-relationship are anonymized (Lines 9−11). The anonymous strategy is executed in a tree, iteratively (Line 11). If one node have been anonymized, its descendant nodes are verified whether there still exists group-relationship or not. If one descendant node still satisfies the group-relationship, the node is anonymized and verifies its descendant nodes.
4 Experimental evaluation
Experiments were conducted on a machine with an Intel i3 3.80GHz CPU and 4GB memory. Smart Driving Dataset is a real user trajectory data set, which contains the trajectory data of 6,180 users, and the data of each user is stored in a single text file. Shanghai PoIs Dataset records the location preference of users in the city of shanghai, whose geographic location is in the range of latitude 120.85°−122.13° and longitude 30.67°−31.87°.
Experimental parameters: error rate of query (ERQ), denotes the error rate of querying a user PoI at a certain time; clustering rate of change (CRC), denotes a ratio between original PoI and its anonymity; clustering average rate of change (CARC), defines the average of CRC; change of attributes (CA), denotes a ratio between original PoI and its anonymity at a certain time.
The experimental evaluations of parameters on the different scales of data tables are shown in Fig.1 and its settings are denoted in Tab.3. Data sets of smart driving and shanghai PoI are divided by time and location, respectively. Fig.1 shown that the values of CARC, ERQ and ACA appear an overall upward trend as the increase of and the unmodified number of points of interest NO has been shown a downward trend. This upward trend mean that the anonymous data has a worse data and query availability and decreased number of NO refers to a more laborious anonymous operations. In general, the larger the value of m is, the worse the data availability of anonymous data is.
5 Conclusions
This paper designed an algorithm to protect the privacy relationship exposed by the user PoIs. Through the experimental evaluation, our method can protect the time-constrained PoIs under the hypothetical prior knowledge of attackers.