News Provenance: Revealing News Text Reuse at Web-Scale in an Augmented News Search Experience
The media industry has a practice of reusing news content, which may be a surprise to news consumers. Whether by agreement or plagiarism, a lack of explicit citations makes it difficult to understand where news comes from and how it spreads. We reveal news provenance by reconstructing the history of near-duplicate news in the web index – identifying the origins of republished content and the impact of original content. By aggregating provenance information and presenting it as part of news search results, users may be able to make more informed decisions about which articles to read and which publishers to trust. We report on early analysis and user feedback, highlighting the critical tension between the desire for media transparency and the risks of disrupting an already fragile ecosystem.