Fostering appropriate reliance on AI

septembre 3, 2024

Partagez cette page

Presented by Mihaela Vorvoreanu at Microsoft Research Forum, September 2024

Mihaela Vorvoreanu is a caucasian woman in her forties with chin-length reddish hair and glasses

“This is where I think it is our responsibility as people working in UX disciplines—as people researching UX and human-computer interaction—to really, really step up to the front and see how it is our moment to shine and to address this problem.”
– Mihaela Vorvoreanu, Director UX Research and Responsible AI Education, Microsoft Aether

Microsoft research copilot experience What approaches are being developed to foster appropriate reliance on AI, ensuring users can discern when to accept or reject AI recommendations?

Transcript: Lightning Talk

Fostering appropriate reliance on AI

Mihaela Vorvoreanu, Director UX Research and Responsible AI Education, Microsoft Aether

Because of their probabilistic nature, all AI systems will make mistakes. One of the main challenges in human-AI interaction is to foster appropriate reliance on AI and empower users of AI systems to determine when to accept or not accept an AI system’s recommendation. Hear about the work we’re doing at Microsoft to foster appropriate reliance and help people accept AI outputs when they are correct and reject them when they are wrong.

Microsoft Research Forum, September 3, 2024

MIHAELA VORVOREANU: Hi, everyone. My name is Mihaela, or Mickey, Vorvoreano. I lead UX Research and Responsible AI Education in Aether, Microsoft’s research and advisory body on AI ethics and effects in engineering and research. And in a previous life, I was a professor of UX design and research.

During the past few months, I’ve had the privilege of leading a cross-company team of researchers and product builders focused on fostering appropriate reliance on AI, specifically generative AI and our Copilot product. In this working group, we think of fostering appropriate reliance on AI as striking a balance between people not overrelying too much on AI and accepting its outputs when they are incorrect or incomplete, and not under-relying and not using or trusting AI outputs even when they could be useful. And so across all of us, we have started looking into how we can foster research that leads to improvement in our own products.

My team started looking into the problem of overreliance on AI quite a while back. About two years ago, we released this first review of research literature about overreliance on AI. In that paper, we isolated antecedents, mechanisms, and consequences of overreliance on AI and the series of mitigations that showed promise in the research literature. However, as we know, many such mitigations can backfire, actually increasing overreliance rather than mitigating it.

More recently, we released a second synthesis of research literature. This one focused specifically on generative AI. We find that generative AI makes this tricky problem of overreliance even more difficult for several reasons, one of them being that it is so much more difficult to spot incorrect or incomplete AI outputs, especially when they are formulated so fluently and with such impressive grammar. In this paper, we also looked at some overreliance mitigations. Some of them have been mentioned in the literature before, such as cognitive forcing functions, and others [are] quite new that involved using generative AI to critique existing answers or to stimulate critical thinking in generative AI users.

As Eric Horvitz and Abby Sellen point out in the recent opinion piece [that] using generative AI places a high cognitive burden on regular people during everyday life. Such levels of attention and vigilance were only previously expected of highly trained professionals, such as airline pilots. And so in our group, we wonder how might we make use of generative AI products a little bit easier so people can maximize the benefits [and] minimize the risks while not spending as much mental energy as an airplane pilot would.

In our internal research—and here I want to acknowledge my wonderful team members who have done all of this research—we have identified three possible directions. Each one of these is a problem/an opportunity. The first one is that most people, even advanced users of generative AI, don’t have useful mental models of how these technologies work. They mostly think of them as traditional web search, and that doesn’t always come in handy. This points to the opportunity of helping people form useful mental models through AI literacy. We can create AI literacy, not only through formal or informal education, but also through responsible communication in journalism and in marketing, and also during interaction with a product.

We could do a better job of teaching people about generative AI while they interact with generative AI products. Here, the guidelines for human AI interaction from the HAX Toolkit—particularly guidelines 1, 2, and 11—which really emphasize how important it is to make clear to users the system’s not only capabilities, but also limitations, and [it] provide some explanations of how it works so that they can form mental models. This is where these guidelines can really come into play.

I also invite you to keep an eye out on the HAX Toolkit because we have been adding new examples and content related to appropriate reliance specifically. This is one idea of how we could intervene at the user interaction layer to actually foster AI literacy and more useful mental models.

The second direction and the second research finding is that overall, people are not inclined to verify AI outputs. Also, if you think about one of the most popular strategies that’s used in most products to date, is what I like to call the warning sticker strategy, where we might show something like, “AI-generated content might be incorrect.” This is partially useful. People seem to have learned that.

However, this type of notice doesn’t mention that AI-generated content might also be incomplete. And so people might miss out altogether on the fact that important or useful information is not in the answer in the first place. That also raises the opportunity of how might we get people’s attention, arouse that attention and vigilance just a little bit, so they know when it is time to check answers versus not in more important or high-risk situations.

In the research that we highlight on the working team’s webpage (opens in new tab), we show some papers that talk about communicating uncertainty verbally, via text output, or via highlights that might help users spot when it might be time to increase their alertness level and verify outputs more carefully.

Finally, the third direction is that the user experience of verifying generative AI outputs is rather difficult for many people. The primary UI paradigm that we use for this is to cite sources like we do in a research or a school paper. Now, this format in itself suggests a level of rigor and trustworthiness that AI-generated outputs might not be equal with research papers. Because of this signal, people might not be inclined to verify because what’s really more trustworthy than a research or a school paper.

This raises the opportunity of how might we make the relationship between AI-generated outputs and the information that they work with—their grounding data—more transparent. How might we make it easier to verify, to spot discrepancies, to spot incompleteness? But also looking even further into how we might use LLMs to propose critiques of their own responses, or as we see in some research that we highlight on the webpage, to actually not just give people a response, but stimulate people to engage in critical thinking, which could be a very different paradigm of interacting with generative AI and large language models in particular.

Throughout all this, what I would really like to highlight, and I do this with my co-authors in this piece (opens in new tab) that appeared as an opening article in ACM interactions not very long ago, is really that this is a moment for UX disciplines to shine. As you can see, a lot of these mitigations, a lot of these techniques for fostering appropriate reliance, are UX interventions.

This is where I think it is our responsibility as people working in UX disciplines—as people researching UX and human computer interaction—to really, really step up to the front and see how it is our moment to shine and to address this problem. That being said, I hope you stay in touch. I hope you follow our research, which we publish on the team’s webpage (opens in new tab), and I hope you help us follow your research. So maybe together, we can work towards making progress on this very tricky but important problem. With that, I want to thank you so much for following this presentation. I look forward to working with you.

Project The HAX Toolkit Project

Download HAX Playbook

Article Appropriate reliance research initiative

Publication Overreliance on AI: Literature Review

Publication Appropriate reliance on Generative AI: Research synthesis

Publication The Rise of the AI Co-Pilot: Lessons for Design from Aviation and Beyond

Research Forum Brief | September 2024

Fostering appropriate reliance on AI

Partagez cette page

Transcript: Lightning Talk

Related resources