Making our generative AI products safer for consumers

Sumit Chauhan, Corporate Vice President, Office Product Group, Microsoft

February 13, 2024

Over the past year, generative AI has seen tremendous growth in popularity and is increasingly being adopted by people and organizations. At its best, AI can deliver incredible inspiration and help unlock new levels of creativity and productivity. However, as with all new technologies, a small subset of people may attempt to misuse these powerful tools. At Microsoft, we are deeply focused on minimizing the risks of harmful use of these technologies and are committed to keeping these tools even more reliable and safer.

The goal of this blog is to outline the steps we are taking to ensure a safe experience for customers who use our consumer services like the Copilot website and Microsoft Designer.

Responsible AI process and mitigation

Since 2017, we’ve been building a responsible AI program that helps us map, measure, and manage issues before and after deployment. Governing—including policies that implement our AI principles, practices that help our teams build safeguards into our products, and processes to enable oversight—is critical throughout all stages of the Map, Measure, Manage framework as illustrated below. This overall approach reflects the core functions of NIST’s AI Risk Management Framework.

The Map, Measure, Manage framework

Map: The best way to develop AI systems responsibly is to identify issues and map them to user scenarios and to our technical systems before they occur. With any new technology, this is challenging because it’s hard to anticipate all potential uses. For that reason, we have several types of controls in place to help identify potential risks and misuse scenarios prior to deployment. We use techniques such as responsible AI impact assessments to identify potential positive and negative outcomes of our AI systems across a variety of scenarios and as they might affect a variety of stakeholders. Impact assessments are required for all AI products, and they help inform our design and deployment decisions.

We also conduct a process called red teaming that simulates attacks and misuse scenarios, along with general use scenarios that could result in harmful outputs, on our AI systems to test their robustness and resilience against malicious or unintended inputs and outputs. These findings are used to improve our security and safety measures.

Measure: While mapping processes like impact assessments and red teaming help to identify risks, we draw on more systematic measurement approaches to develop metrics that help us test, at scale, for those risks in our AI systems pre-deployment and post-deployment. These include ongoing monitoring through a diverse and multifaceted dataset that represents various scenarios where threats may arise. We also establish guidelines to annotate measurement datasets that help us develop metrics as well as build classifiers that detect potentially harmful content such as adult content, violent content, and hate speech.

We are working to automate our measurement systems to help with scale and coverage, and we scan and analyze AI operations to detect anomalies or deviations from expected behavior. Where appropriate, we also establish mechanisms to learn from user feedback signals and detected threats in order to strengthen our mitigation tools and response strategies over time.

Manage: Even with the best systems in place, issues will occur, and we have built processes and mitigations to manage issues and help prevent them from happening again. We have mechanisms in place in each of our products for users to report issues or concerns so anyone can easily flag items that could be problematic, and we monitor how users interact with the AI system to identify patterns that may indicate misuse or potential threats.

In addition, we strive to be transparent about not only risks and limitations to encourage user agency, but also that content itself may be AI-generated. For example, we take steps to disclose the role of generative AI to the user, and we label audio and visual content generated by AI tools. For content like AI-generated images, we deploy cryptographic methods to mark and sign AI-generated content with metadata about its source and history, and we have partnered with other industry leaders to create the Coalition for Content Provenance and Authenticity (C2PA) standards body to help develop and apply content provenance standards across the industry.

Finally, as generative AI technology evolves, we actively update our system mitigations to ensure we are effectively addressing risks. For example, when we update a generative AI product’s meta prompt, it goes through rigorous testing to ensure it advances our efforts to deliver safe and effective responses. There are several types of content filters in place that are designed to automatically detect and prevent the dissemination of inappropriate or harmful content. We employ a range of tools to address unique issues that may occur in text, images, video, and audio AI technologies and we draw on incident response protocols that activate protective actions when a possible threat is identified.

Ongoing improvements

We are aware that some users may try to circumvent our AI safety measures and use our systems for malicious purposes. We take this threat very seriously and we are constantly monitoring and improving our tools to detect and prevent misuse.

We believe it is our responsibility to stay ahead of bad actors and protect the integrity and trustworthiness of our AI products. In the rare cases where we encounter an issue, we aim to address it promptly and adjust our controls to help prevent it from recurring. We also welcome feedback from our users and stakeholders on how we can improve our AI safety architecture and policies and each of our products includes a feedback form for comments and suggestions.

We are committed to ensuring that our AI systems are used in a safe, responsible, and ethical manner.