Mining the Web and the Internet for Accurate IP Address Geolocations
- Chuanxiong Guo ,
- Yunxin Liu ,
- Wenchao Shen ,
- Helen Wang ,
- Qing Yu ,
- Yongguang Zhang
IEEE Infocom mini conference |
Published by IEEE Communications Society
In this paper, we present Structon, a novel approach that uses Web mining together with inference and IP traceroute to geolocate IP addresses with significantly better accuracy than existing automated approaches. Structon is composed of three ideas which we realize in three corresponding steps. First, we extract geolocation information of Web server IP addresses from Web pages. Second, we devise heuristic algorithms to improve both the accuracy and the coverage of the IP geolocation database using these Web server IP addresses and their geolocations as input. Third, for those segments that are not covered in the first two steps, we use IP traceroute to identify the access routers of those segments. When the location of the access router is known, we can deduce the location of the associated segment since it is co-located together with the access router.
By mining 500-million Web pages collected in China in 2006 (11 percent of the total Web pages in China at that time), we are able to identify the geolocations for 103 million IP addresses. This represents nearly 88 percent IP addresses allocated to China in March 2008. Structon is 87.4 percent accurate at city granularity and up to 93.5 percent accurate at province level. We also used 10 day Windows Live client log to evaluate our client IP addresses coverage: Structon identified geolocations of 98.9 percent of client IP addresses.
Copyright © 2007 IEEE. Reprinted from IEEE Communications Society. This material is posted here with permission of the IEEE. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.