Experience Platform (ExP) header - two flask icons

Experimentation Platform

Articles

文章

External Validity of Online Experiments: Can We Predict the Future?

2024年11月20日

“It is difficult to make predictions, especially about the future” – Yogi Berra (perhaps apocryphal) How well can experiments be used to predict the future? At Microsoft’s Experimentation Platform (ExP), we pride ourselves on ensuring the trustworthiness of our experiments.…

文章

Experimentation in Generative AI: C++ Team’s Practices for Continuous Improvement

2024年11月12日

By Sinem Akinci, Microsoft Developer Division and Cindy Chiu, Microsoft Experimentation Platform Generative AI [1] leverages deep learning models to identify underlying patterns and generate original content, such as text, images, and videos. This technology has been applied to various…

Diagram illustrating an A/B test splitting traffic between two backends.

文章

A/B Testing Infrastructure Changes at Microsoft ExP

2024年1月29日

The Experimentation Platform at Microsoft (ExP) has evolved over the past sixteen-plus years and now runs thousands of online A/B tests across most major Microsoft products every month. Throughout this time, we have seen impactful A/B tests on a huge…

文章

How to Evaluate LLMs: A Complete Metric Framework

2023年9月27日

Over the past year, excitement around Large Language Models (LLMs) skyrocketed. With ChatGPT and BingChat, we saw LLMs approach human-level performance in everything from performance on standardized exams to generative art. However, many of these LLM-based features are new and…

文章

A/B Interactions: A Call to Relax

2023年8月2日

If you’re a regular reader of the Experimentation Platform blog, you know that we’re always warning our customers to be vigilant when running A/B tests. We warn them about the pitfalls of even tiny SRMs (sample ratio mismatches), small bits…

CUPED adjusts metrics by the predicted value from a regression of Y on X. The treatment effect estimate has lower standard error. Estimated confidence intervals are narrower as a consequence, and power of tests are increased.

文章

Deep Dive Into Variance Reduction

2022年11月15日

Variance Reduction (VR) is a popular topic that is frequently discussed in the context of A/B testing. However, it requires a deeper understanding to maximize its value in an A/B test.  In this blog post, we will answer questions including:…

文章

For Event-based A/B tests: why they are special

2022年9月26日

An “event-based” A/B test is a method used to test two or more variables during a limited duration. We can use what we learn to increase user engagement, satisfaction, or retention of a product, while also applying our insights to…

文章

STEDII Properties of a Good Metric

2022年4月6日

Good metrics enable good decisions. What makes a metric good? In this blog post we introduce the STEDII (Sensitivity, Trustworthiness, Efficiency, Debuggability, Interpretability, and Inclusivity) framework to define and evaluate the good properties of a metric and of an A/B…

Diagram that shows that quantitative plus qualitative methods equal improved product development.

文章

Measurably improve your product by combining qualitative and quantitative methods

2022年2月11日

Imagine that you have developed a new hypothesis for how to improve the user experience of your product. Now you need to test it. There are many ways that you could approach this. For instance, running an A/B test, engaging…