TPII: tracking personally identifiable information via user behaviors in HTTP traffic

Yi LIU, Tian SONG, Lejian LIAO

PDF(518 KB)
PDF(518 KB)
Front. Comput. Sci. ›› 2020, Vol. 14 ›› Issue (3) : 143801. DOI: 10.1007/s11704-018-7451-z
RESEARCH ARTICLE

TPII: tracking personally identifiable information via user behaviors in HTTP traffic

Author information +
History +

Abstract

It is widely common that mobile applications collect non-critical personally identifiable information (PII) from users’ devices to the cloud by application service providers (ASPs) in a positive manner to provide precise and recommending services. Meanwhile, Internet service providers (ISPs) or local network providers also have strong requirements to collect PIIs for finer-grained traffic control and security services. However, it is a challenge to locate PIIs accurately in the massive data of network traffic just like looking a needle in a haystack. In this paper, we address this challenge by presenting an efficient and light-weight approach, namely TPII, which can locate and track PIIs from the HTTP layer rebuilt from raw network traffics. This approach only collects three features from HTTP fields as users’ behaviors and then establishes a tree-based decision model to dig PIIs efficiently and accurately.Without any priori knowledge, TPII can identify any types of PIIs from any mobile applications, which has a broad vision of applications. We evaluate the proposed approach on a real dataset collected from a campus network with more than 13k users. The experimental results show that the precision and recall of TPII are 91.72% and 94.51% respectively and a parallel implementation of TPII can achieve 213 million records digging and labelling within one hour, reaching near to support 1Gbps wirespeed inspection in practice. Our approach provides network service providers a practical way to collect PIIs for better services.

Keywords

network traffic analysis / personally identifiable information / privacy leakage / mobile applications / HTTP

Cite this article

Download citation ▾
Yi LIU, Tian SONG, Lejian LIAO. TPII: tracking personally identifiable information via user behaviors in HTTP traffic. Front. Comput. Sci., 2020, 14(3): 143801 https://doi.org/10.1007/s11704-018-7451-z

References

[1]
Falahrastegar M, Haddadi H, Uhlig S, Mortier R. Tracking personal identifiers across theWeb. In: Proceedings of International Conference on Passive and Active Network Measurement. 2016, 30–41
[2]
Felt A P, Ha E, Egelman S, Haney A, Chin E, Wagner D. Android permissions: user attention, comprehension, and behavior. In: Proceedings of the 8th Symposium on Usable Privacy and Security. 2012, 1–14
[3]
Liu Y B, Gummadi K P, Krishnamurthy B, Mislove A. Analyzing facebook privacy settings:user expectations vs. reality. In: Proceedings of ACM Sigcomm Conference on Internet Measurement Conference. 2011, 61–70
[4]
Krishnamurthy B, Wills C E. On the leakage of personally identifiable information via online social networks. In: Proceedings of ACM Workshop on Online Social Networks. 2009, 7–12
[5]
Krishnamurthy B, Wills C E. Privacy diffusion on the Web: a longitudinal perspective. In: Proceedings of the 18th International Conference on World Wide Web. 2009, 541–550
[6]
Krishnamurthy B, Naryshkin K, Wills C E. Privacy leakage vs. protection measures: the growing disconnect. In: Proceedings of the Web Workshop on Security & Privacy. 2011, 2–11
[7]
Roesner F, Kohno T, Wetherall D. Detecting and defending against third-party tracking on the web. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. 2012, 12
[8]
Felt A P, Chin E, Hanna S, Song D, Wagner D. Android permissions demystified. In: Proceedings of the 18th ACM Conference on Computer and Communications Security. 2011, 17–21
[9]
Bartel A, Klein J, Traon Y L, Monperrus M. Automatically securing permission-based software by reducing the attack surface: an application to android. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. 2012, 274–277
[10]
Au K W Y, Zhou Y F, Huang Z, Lie D. Pscout: analyzing the android permission specification. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security. 2012, 217–228
[11]
Atzeni A, Su T, Baltatu M, D’Alessandro R. How dangerous is your android app? An evaluation methodology. In: Proceedings of the 11th International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services. 2014, 130–139
[12]
Jeon J, Micinski K K, Vaughan J A, Fogel A, Reddy N, Foster J S. Dr. Android and Mr. Hide: fine-grained permissions in android application. In: Proceedings of ACM Workshop on Security and Privacy in Smartphones and Mobile Devices. 2012, 3–14
[13]
Backes M, Gerling S, Hammer C,Maffei M, Styp-Rekowsky P V. App-Guard- fine-grained policy enforcement for untrusted android applications. In: Proceedings of International Workshop on Data Privacy Management and Autonomous Spontaneous Security. 2013, 213
[14]
Xu R, Sadi H, Anderson R J. Aurasium: practical policy enforcement for android applications. In: Proceedings of Usenix Conference on Security Symposium. 2012, 27
[15]
Sun M, Tan G. Nativeguard: protecting android applications from third-party native libraries. In: Proceedings of the 2014 ACM Conference on Security and Privacy in Wireless & Mobile Networks. 2014, 165–176
[16]
Gerber P, Volkamer M, Renaud K. Usability versus privacy instead of usable privacy: Google’s balancing act between usability and privacy. ACM Sigcas Computers & Society, 2015, 45(1): 16–21
[17]
Schwartz E J, Avgerinos T, Brumley D. All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask). In: Proceedings of IEEE Symposium on Security and Privacy. 2010, 317–331
[18]
Cheng W, Ports D R K, Blankstein A, Cowling J. Abstractions for usable information flow control in aeolus. In: Proceedings of USENIX Annual Technical Conference. 2012, 139–151
[19]
Gibler C, Crussell J, Erickson J, Hao C. AndroidLeaks: automatically detecting potential privacy leaks in android applications on a large scale. In: Proceedings of International Conference on Trust and Trustworthy Computing. 2012, 291–307
[20]
Lu L, Li Z, Wu Z, Lee W, Jiang G. Chex: statically vetting android apps for component hijacking vulnerabilities. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security. 2012, 229–240
[21]
Bichhawat A, Rajani V, Garg D, Hammer C. Information flow control in WebKit’s JavaScript bytecode. In: Proceedings of International Conference on Principles of Security and Trust. 2014, 159–178
[22]
Efstathopoulos P, Krohn M, Vandebogart S, Frey C, Ziegler D, Kohler E. Labels and event processes in the asbestos operating system. ACM Transactions on Computer Systems, 2005, 39(5): 17–30
[23]
Zeldovich N, Boyd-Wickizer S, Kohler E, Mazieres D. Making information flow explicit in HiStar. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation. 2006, 263–278
[24]
Enck W, Gilbert P, Chun B G, Cox L P, Jung J, Mcdaniel P, Sheth A. TaintDroid: an information flow tracking system for real-time privacy monitoring on smartphones. ACM Transactions on Computer Systems, 2010, 32(2): 1–29
[25]
Arzt S, Rasthofer S, Fritz C, Bodden E, Bartel A, Klein J, Traon Y L, Octeau D, Mcdaniel P. Flowdroid: precise context, flow, field, objectsensitive and lifecycle-aware taint analysis for Android apps. ACM Sigplan Notices, 2014, 49(6): 259–269
[26]
King D, Hicks B, Hicks M, Jaeger T. Implicit flows: can’t live with ’em, can’t live without ’em. In: Proceedings of International Conference on Information Systems Security. 2008, 56–70
[27]
Vallina-Rodriguez N, Shah J, Finamore A, Grunenberger Y, Papagiannaki K, Haddadi H. Breaking for commercials: characterizing mobile advertising. In: Proceedings of ACM Conference on Internet Measurement Conference. 2012, 343–356
[28]
Gill P, Erramilli V, Chaintreau A, Krishnamurthy B, Rodriguez P. Follow the money: understanding economics of online aggregation and advertising. In: Proceedings of the 2013 Conference on Internet Measurement Conference. 2013, 141–148
[29]
Ren J, Lindorfer M, Lindorfer M, Legout A, Choffnes D. Recon: revealing and controlling PII leaks in mobile network traffic. In: Proceedings of the 14th International Conference on Mobile Systems, Applications, and Services. 2016, 361–374
[30]
Liu Y, Song H H, Bermudez I, Mislove A, Baldi M, Tongaonkar A. Identifying personal information in internet traffic. In: Proceedings of ACM Conference on Online Social Networks. 2015, 59–70
[31]
Xia N, Song H H, Liao Y, Iliofotou M, Nucci A, Zhang Z L. Mosaic: quantifying privacy leakage in mobile networks. Computer Communication Review, 2013, 43(4): 279–290
[32]
Lee S, Wong E L, Goel D, Dahlin M, Shmatikov V. πBox: a platform for privacy-preserving apps. In: Proceedings of the 10th Usenix Conference on Networked Systems Design and Implementation. 2013, 501–514
[33]
Herbster R, Dellatorre S, Druschel P, Bhattacharjee B. Privacy capsules: preventing information leaks by mobile apps. In: Proceedings of International Conference on Mobile Systems, Applications, and Services. 2016, 399–411
[34]
Song Y, Hengartner U. Privacyguard: a VPN-based platform to detect information leakage on android devices. In: Proceedings of the 5th ACM CCS Workshop on Security and Privacy in Smartphones and Mobile Devices. 2015, 15–26
[35]
Le A, Varmarken J, Langhoff S, Shuba A, Gjoka M, Markopoulou A. AntMonitor: a system for monitoring from mobile devices. In: Proceedings of ACM SIGCOMM Workshop on Crowdsourcing and Crowdsharing of Big Data. 2015, 15–20
[36]
Razaghpanah A, Vallinarodriguez N, Sundaresan S, Kreibich C, Gill P, Allman M. Haystack: a multi-purpose mobile vantage point in user space. Computer Science, 2015, 1–15
[37]
Xu Q, Erman J, Gerber A, Mao Z M, Pang J, Venkataraman S. Identifying diverse usage behaviors of smartphone apps. In: Proceedings of ACM SIGCOMM Conference on Internet Measurement Conference. 2011, 329–344
[38]
Falaki H, Lymberopoulos D, Mahajan R, Kandula S, Estrin D. A first look at traffic on smartphones. In: Proceedings of ACM SIGCOMM Conference on Internet Measurement. 2010, 281–287
[39]
Lindorfer M, Neugschwandtner M, Weichselbaum L, Fratantonio Y, Veen V V D, Platzer C. Andrubis – 1,000,000 apps later: a view on current android malware behaviors. In: Proceedings of the 3rd International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security. 2014, 3–17
[40]
Mccallister E, Grance T, Scarfone K A. SP 800-122. Guide to protecting the confidentiality of personally identifiable information (PII). Washington: National Institute of Standards & Technology, 2010
[41]
Johnson L A, Dempsey K L, Bailey D. SP 800-128. Guide for securityfocused configuration management of information systems. Journal of Dairy Science, 2011, 77(6): 1604–1617
[42]
Greene S S. Security Program and Policies: Principles and Practices. Pearson Education, 2014, 349
[43]
Dai S, Tongaonkar A, Wang X, Nucci A, Song D. NetworkProfiler: towards automatic fingerprinting of Android apps. In: Proceedings of IEEE INFOCOM. 2013, 809–817
[44]
Han S, Jung J, Wetherall D. A study of third-party tracking by mobile apps in the wild. University of Washington: Technical Report UWCSE-12-03-01, 2012

RIGHTS & PERMISSIONS

2019 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary AI Mindmap
PDF(518 KB)

Accesses

Citations

Detail

Sections
Recommended

/