Home location inference from sparse and noisy data: models and applications
Tian-ran HU, Jie-bo LUO, Henry KAUTZ, Adam SADILEK
Home location inference from sparse and noisy data: models and applications
Accurate home location is increasingly important for urban computing. Existing methods either rely on continuous (and expensive) Global Positioning System (GPS) data or suffer from poor accuracy. In particular, the sparse and noisy nature of social media data poses serious challenges in pinpointing where people live at scale. We revisit this research topic and infer home location within 100 m×100 m squares at 70% accuracy for 76% and 71% of active users in New York City and the Bay Area, respectively. To the best of our knowledge, this is the first time home location has been detected at such a fine granularity using sparse and noisy data. Since people spend a large portion of their time at home, our model enables novel applications. As an example, we focus on modeling people’s health at scale by linking their home locations with publicly available statistics, such as education disparity. Results in multiple geographic regions demonstrate both the effectiveness and added value of our home localization method and reveal insights that eluded earlier studies. In addition, we are able to discover the real buzz in the communities where people live.
Home location / Mobility patterns / Healthcare
[1] |
Ashbrook, D., Starner, T., 2003. Using GPS to learn significant locations and predict movement across multiple users. Pers. Ubiq. Comput., 7(5):275–286. http://dx.doi.org/10.1007/s00779-003-0240-0
|
[2] |
Backstrom, L., Sun, E., Marlow, C., 2010. Find me if you can: improving geographical prediction with social and spatial proximity. Proc. 19th Int. Conf. on World Wide Web, p.61–70. http://dx.doi.org/10.1145/1772690.1772698
|
[3] |
Cheng, Z., Caverlee, J., Lee, K., 2010. You are where you tweet: a content-based approach to geo-locating twitter users. Proc. 19th ACM Int. Conf. on Information and Knowledge Management, p.759–768. http://dx.doi.org/10.1145/1871437.1871535
|
[4] |
Cheng, Z., Caverlee, J., Lee, K.,
|
[5] |
Cho, E., Myers, S.A., Leskovec, J., 2011. Friendship and mobility: user movement in location-based social networks. Proc. 17th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.1082–1090. http://dx.doi.org/10.1145/2020408.2020579
|
[6] |
Cranshaw, J., Toch, E., Hong, J.,
|
[7] |
Culotta, A., 2010. Towards detecting influenza epidemics by analyzing Twitter messages. Proc. 1st Workshop on Social Media Analytics, p.115–122. http://dx.doi.org/10.1145/1964858.1964874
|
[8] |
Hoh, B., Gruteser, M., Xiong, H.,
|
[9] |
Krumm, J., 2007. Inference attacks on location tracks. Proc. 5th Int. Conf. on Pervasive Computing, p.127–143. http://dx.doi.org/10.1007/978-3-540-72037-9_8
|
[10] |
Krumm, J., Rouhana, D., 2013. Placer: semantic place labels from diary data. Proc. ACM Int. Joint Conf. on Pervasive and Ubiquitous Computing, p.163–172. http://dx.doi.org/10.1145/2493432.2493504
|
[11] |
Lin, M., Hsu, W., Lee, Z., 2012. Predictability of individuals’ mobility with high-resolution positioning data. Proc. ACM Conf. on Ubiquitous Computing, p.381–390. http://dx.doi.org/10.1145/2370216.2370274
|
[12] |
Mahmud, J., Nichols, J., Drews, C., 2012. Where is this tweet from? Inferring home locations of Twitter users. Proc. 6th Int. AAAI Conf. on Weblogs and Social Media, p.511–514.
|
[13] |
Paul, M.J., Dredze, M., 2011. A Model for Mining Public Health Topics from Twitter. Technical Report, Johns Hopkins University, USA.
|
[14] |
Pontes, T., Magno, G., Vasconcelos, M.,
|
[15] |
Pontes, T., Vasconcelos, M., Almeida, J.,
|
[16] |
Sadilek, A., Krumm, J., 2012. Far out: predicting long-term human mobility. Proc. 26th AAAI Conf. on Artificial Intelligence, p.814–820.
|
[17] |
Sadilek, A., Kautz, H., 2013. Modeling the impact of lifestyle on health at scale. Proc. 6th ACM Int. Conf. on Web Search and Data Mining, p.637–646. http://dx.doi.org/10.1145/2433396.2433476
|
[18] |
Sadilek, A., Kautz, H., Silenzio, V., 2012. Modeling spread of disease from social interactions. Proc. 6th Int. AAAI Conf. on Weblogs and Social Media.
|
[19] |
Sapolsky, R.M., 2004. Social status and health in humans and other animals. Ann. Rev. Anthropol., 33:393–418.
|
[20] |
Scellato, S., Noulas, A., Lambiotte, R.,
|
[21] |
Scellato, S., Noulas, A., Mascolo, C., 2011b. Exploiting place features in link prediction on location-based social networks. Proc. 17th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.1046–1054. http://dx.doi.org/10.1145/2020408.2020575
|
[22] |
Smith, G., Wieser, R., Goulding, J.,
|
[23] |
Song, C., Qu, Z., Blumm, N.,
|
[24] |
Winkleby, M.A., Jatulis, D.E., Frank, E.,
|
[25] |
Xing, W., Ghorbani, A., 2004. Weighted pagerank algorithm. Proc. 2nd Annual Conf. on Communication Networks and Services Research, p.305–314. http://dx.doi.org/10.1109/DNSR.2004.1344743
|
/
〈 | 〉 |