By Rob Knies, Managing Editor, Microsoft Research
Battling search spam. Streamlining Web-page monitoring. Helping protect online privacy. Enabling the illiterate to use computers.
These are just a few of the ways Microsoft Research is demonstrating its commitment to making the Internet a more secure, easily searchable, user-friendly destination for consumers worldwide.
Spotlight: Blog post
Each of those goals was featured May 8-12 during the 16th International World Wide Web Conference (opens in new tab) (WWW 2007), to be held at the Fairmont Banff Springs Hotel, located in Alberta’s Banff National Park.
The conference, which attracts innovators, decision-makers, technologists, businesses, and standards bodies from around the globe, is an annual gathering to discuss the future of the Web. And, as is customary, Microsoft Research was fully invested in supporting those efforts.
In its five labs worldwide, Microsoft Research undertakes a wide variety of projects designed to enhance the value of the World Wide Web, in areas as diverse as security, search, user interfaces, data mining, and technology for emerging markets.
Of 111 papers accepted for the conference, 16—14 percent—were submitted by Microsoft Research (opens in new tab), the most of any single organization represented at the event. Four of Microsoft Research’s five worldwide labs had papers accepted, and one of the papers—Wherefore Art Thou R3579? Anonymized Social Networks, Hidden Patterns, and Structural Steganography (opens in new tab), co-authored by Lars Backstrom and Jon Kleinberg of Cornell University in collaboration with Cynthia Dwork (opens in new tab), a principal researcher for Microsoft Research Silicon Valley—received the conference’s Best Paper Award.
Bill Buxton (opens in new tab), Microsoft Research principal researcher, served as a plenary speaker on May 11, delivering a commentary on social networking and Web communities entitled Design for the World Narrow Web (opens in new tab).
He was hardly alone. Colleague Kentaro Toyama, assistant managing director of Microsoft Research India (opens in new tab), participated in a panel discussion on Web Delivery Models for Developing Regions (opens in new tab). Susan Dumais (opens in new tab), principal researcher for Microsoft Research Redmond (opens in new tab), also served as a panelist, on the topic of Searching Personal Content (opens in new tab).
A workshop on Adversarial Information Retrieval on the Web (opens in new tab) included participation by Microsoft Research’s Krysta Svore (opens in new tab), Qiang Wu, and Chris J.C. Burges (opens in new tab), along with Microsoft’s Aaswath Raman, authors of the paper Improving Web Spam Classification using Rank-Time Features (opens in new tab). Another paper delivered as part of that workshop was Transductive Link Spam Detection (opens in new tab), written by Burges, colleague Dengyong Zhou (opens in new tab), and Microsoft’s Tao Tao.
Marc Najork, principal researcher for Microsoft Research Silicon Valley, served as track chair for the Tutorials and Workshops committee. Toyama was deputy chair for the Technology for Developing Regions committee, and Xing Xie (opens in new tab), lead researcher for Microsoft Research Asia (opens in new tab), was the deputy chair for the Browsers and User Interfaces committee. No fewer than a dozen other Microsoft Research representatives participated as members of various WWW 2007 committees.
Such conference support will be further in evidence in 2008, when the event will be held in Beijing. Hsiao-Wuen Hon (opens in new tab), principal researcher and deputy managing director for Microsoft Research Asia, will be the vice general chair for WWW 2008, and Wei-Ying Ma (opens in new tab), principal researcher and research manager for the same lab, will be a program chair.
Collaboration, as always, was a hallmark of Microsoft Research’s participation in WWW 2007. Of the 16 papers accepted from the organization, 10 of them featured co-authorships with academic colleagues, representing 12 universities from around the world. Microsoft Research also contributed five poster papers to the conference, and four of those represented collaboration with academic partners.
Stopping Search Spam
Among those academic collaborations was a paper entitled Spam Double-Funnel: Connecting Web Spammers with Advertisers (opens in new tab), part of the conference’s Industrial Practice and Experience track. The paper was co-written by Yi-Min Wang (opens in new tab), principal researcher of Microsoft Research Redmond’s Cybersecurity and Systems Management research group; Ming Ma, a research software-design engineer in the same group; and Yuan Niu and Hao Chen of the University of California, Davis.
“Our goal is to provide visibility into the complicated structure of the search-spam industry,” Wang says, “to educate the user community and the search industry on how search spammers operate and to suggest how good guys can work together to win the war against the bad guys.”
Search spammers use questionable search-engine-optimization techniques to promote low-quality Web pages into top search results, Wang explains. These attempts waste the time of users, who are conned into visiting junk pages before finding one with useful content.
“In contrast with the common approach to search spam by merely detecting and blacklisting spam pages,” Wang says, “our study pursues a new, ‘follow the money’ strategy by identifying the actual companies and individuals who are involved in the search-spam industry to make money.
“We show that a large part of the search-spam industry is based on advertising syndication, and it can be modeled as a double funnel with five layers. We expose the major players at each level and suggest a more effective anti-spam approach by attacking the bottleneck.”
Consolidating Web Updates
Another way to assist Web users is to make it easier for them to monitor pages they have identified as personally useful. This is the idea behind Homepage Live: Automatic Block Tracing for Web Personalization (opens in new tab), a WWW 2007 paper co-written by Jie Han, Dingyi Han, and Yong Yu, of Shanghai Jiao Tong University, along with Chenxi Lin, Hua-Jun Zeng, and Zheng Chen of Microsoft Research Asia, to be delivered as part of the Personalization session of the conference’s Browsers and User Interfaces track.
“We want to enable Web users to mark blocks in Web pages and trace this block through the life of the Web page,” Chen says. “Our application allows users the freedom to virtually mark any block within a Web page and automatically trace the blocks when the pages change.”
The Homepage Live project works like this: A user selects a section of a Web page to track, and a technique called block tracing keeps that selection updated as the page is updated. The user can collect a number of sections of his or her favorite Web pages and assemble those sections on a customized page, thereby keeping abreast of pertinent information as it is updated.
“Our application can enhance the Web experience for users,” Chen explains, “by making browsing more efficient. Users no longer need to visit their favorite Web pages repeatedly. They can just mark blocks within their favorite Web pages and organize those blocks into a single page. With those simple steps, users will be able to follow all their favorite Web pages from a single page.”
Helping Protect Privacy on Social Networks
Then there is the winner of the WWW 2007 Best Paper Award, Wherefore Art Thou R3579? Anonymized Social Networks, Hidden Patterns, and Structural Steganography (opens in new tab), part of the WWW 2007 Data Mining track’s Mining in Social Networks session.
The paper’s amusing title masks a serious concern. Some social-network sites on the Web have suggested anonymization of the communications within those networks. Dwork and her Cornell colleagues argue that such efforts would destroy the privacy of participants.
“We described two attacks, one active, one passive,” Dwork says. “The heart of both attacks is to create a small structure in the communication graph that can be recognized. This structure corresponds to a small subgraph, where each vertex is a user account and an edge between vertices indicates communication between the two user accounts.
“Once an attacker has located the structure, she or he can find the connection pattern between any two accounts that are both connected to the structure. For example, a small group of friends can together find out whether Alice and Bob, each of whom is linked to the small group, are in communication with one another.”
Such discoveries could wreak havoc on the implied trust social networks seem to offer.
“Our project,” Dwork says, “is on privacy-preserving analysis of data. The goal is to enable site hosts to reveal interesting information about the social-networking graph hosted on their computers without compromising privacy.”
Enabling Non-Readers to Use Computers
Microsoft Research India has been pursuing intriguing work on enabling illiterate or semi-literate persons to make effective use of PCs. A paper by Indrani Medhi, Archana Prasad, and Toyama of Microsoft Research India—called Optimal Audio-Visual Representations for Illiterate Users of Computers (opens in new tab), part of the Communication in Developing Regions session of the conference’s Technology for Developing Regions track—marks the latest step in the lab’s research.
“We wanted to find out what was the most comprehensible way to represent concepts to a non-literate person,” Medhi explains. “The project was a careful study comparing a variety of different representational types.
“We tested how health symptoms could best be represented, with a subject group of 200 illiterate people. For each, we randomly selected one representation from among 10: text, static drawings, static photographs, hand-drawn animations, and video, each with and without voice annotation.”
The results of the study were interesting:
- Voice annotation helped users understand quicker, but the target population was sometimes confused by the combination of audio and visual information.
- Richer information was not necessarily better understood.
- Various factors influence the comparative effectiveness of dynamic versus static images.
“We hope that the results of our research will improve the design of user interfaces for illiterate and first-time computer users,” Medhi concludes. “The results of this study would apply to help make any Web site comprehensible to illiterate users.”