Cross-device user matching based on massive browse logs: The runner-up solution for the 2016 CIKM Cup

Presented at the workshop of 2016 CIKM Cup

Related File

As the number and variety of smart devices increase, users may use myriad devices in their daily lives and the online activities become highly fragmented. Building an accurate user identity becomes a difficult and important problem for advertising companies. The task for the CIKM Cup 2016 Track 1 was to find the same user cross multiple devices. This paper discusses our solution to the challenge. It is mainly comprised of three parts: comprehensive feature engineering, negative sampling, and model selection. For each part we describe our special steps and demonstrate how the performance is boosted. We took the second prize of the competition with an F1-score of 0.41669.