Editor’s note: This is the second in an ongoing series on moving our network to the cloud internally at Microsoft.
At Microsoft, the Microsoft Digital Employee Experience (MDEE) team—our company IT organization—is using the Azure SDK, Azure Container Instances, and the Azure Compute Gallery to create a platform for deploying our virtual labs into secure, user-defined hub-and-spoke networks in Microsoft Azure. These labs provide isolated environments where our employees can create their own on-demand, scalable virtual machine and network environments for testing and development purposes.
This collection of technologies enables our employees to create virtual lab environments across multiple Azure tenants at scale, using infrastructure as code (IaC) to quickly deploy lab templates using the Azure Compute Gallery.
[Read the first blog in our “Moving our network to the cloud” series.]
ACI for flexibility and scalability
Azure Container Instances (ACI) is a critical component of our provisioning process. ACI is a fully managed service offered by Azure that enables users to deploy and run containerized applications in the cloud without having to manage virtual machines or learn new tools. It offers exceptional flexibility and scalability, making it ideal for managing our virtual labs environment.
ACI enables simplified orchestration of containers, especially when compared to more complex solutions like Kubernetes. ACI offers simple configuration for isolated containers, eliminating the need for deep knowledge of the network stack and the need to create complex YAML-based configurations. This simplicity streamlines the development process, reduces complexity, and ensures that container security measures are always included.
ACI also supports a wide variety of container images, including Docker containers and containers from other sources, such as Azure Container Registry, Docker Hub, or private container registries. In our experience, it scales very well with lightweight .Net Core images.
ACI offers rapid container deployment and orchestration. Our containers are available quickly to coordinate virtual lab deployment and can be shut down promptly when their work is completed. This dynamic allocation ensures that resources are only utilized when necessary. This works well in our stateless workload scenarios and is especially useful for batch processing. It also eliminates the overhead of cluster management tasks and lets us focus on deploying containers immediately.
We configure ACI to ensure graceful region-based failover. ACI offers versatile options for region failover and makes our business continuity and disaster recovery scenarios simple to implement. We use an Azure function to initialize failover groups based on region availability, creating a seamless user experience.
We use ACI for data processing, batch jobs, and event-driven functions where the workload varies and can be executed independently from the API services. We use messaging queues like Azure Service Bus to coordinate between the APIs running in Azure Kubernetes Service (AKS) and the background processing tasks in ACI. This configuration ensures that the API services can trigger or communicate with the background processing components when necessary.
Due to its ability to scale horizontally and quickly spin up instances without delay, we could continue delivering high performance to our users, even during heavy loads on our system. Our platform creates almost 40 thousand ACI instances each month.
The dynamic nature of ACI ensures that the resources are only utilized when necessary, keeping costs at a minimum. Additionally, we initialize containers with the fewest vCPU and memory resources required for their specific tasks to optimize resource allocation and cost tracking.
Getting started with containers can be intimidating, but ACI makes it very simple to deploy a container. With Hyper-V isolation by default, support for burst workloads, and a wide array of powerful capabilities, we can scale to the highest performance applications.
— Justin Song, senior software engineering manager, Azure Container Instances team
This fine-grained resource allocation ensures efficient utilization and simplifies cost tracking for each lab deployment, resulting in highly available, high-performing, cost-effective operations.
ACI’s serverless infrastructure allows developers to focus on developing their applications, not managing infrastructure. ACI provides the capacity to deploy containers and apply platform updates promptly to ensure security and compliance.
“Getting started with containers can be intimidating, but ACI makes it very simple to deploy a container,” says Justin Song, a senior software engineering manager on the Azure Container Instances team at Microsoft. “With Hyper-V isolation by default, support for burst workloads, and a wide array of powerful capabilities, we can scale to the highest performance applications.”
Azure Compute Gallery for rapid VM provisioning
We use the Azure Compute Gallery to bring efficiency and scalability to VM provisioning for our labs.
Azure Compute Gallery enables us to manage lab virtual machine images globally, with replication across multiple Azure regions.
Managed replication helps us ensure that VM images are readily available wherever our users need them. We’re also using custom least recently used (LRU) cache logic on top of the Gallery Image SDK to reduce the costs associated with hosting images across multiple regions. This custom logic ensures that unused replications are cleaned when not needed, reducing costs while still maintaining the accessibility and reliability of our virtual labs.
We allow our users to deploy pre-configured lab environments called templates. We can create versioned labs using Azure Compute Gallery’s versioning capabilities, effectively capturing unique lab configurations at different development stages. This feature enables our users to save and share meticulously crafted lab setups through templates, fostering global collaboration and knowledge sharing.
They can effortlessly create snapshots of their labs, simplifying collaboration, promoting consistency, and providing control over their virtual lab experiences. Azure Compute Gallery’s versioning puts lab management in the hands of our users, offering flexible, streamlined collaboration.
Role-based access control provides the core access management functionality for Azure Compute Gallery images. Using RBAC and Azure Active Directory identities, access to images and image versions can be shared or restricted to other users, service principals, and groups.
Azure SDK for efficient resource orchestration at scale
The Azure SDK for .NET provides the foundation for our platform’s scalability and resource management. We’re using the Azure SDK’s comprehensive set of open-source libraries, tools, and resources to simplify and expedite application and service development in Azure. The Azure SDK enables our development teams to ensure uniform features and design patterns for Azure applications and services across different programming languages and platforms.
Azure SDK packages adhere to common design guidelines—the Azure.Core package that is included in the SDK supplies a broad feature set, including HTTP request handling, authentication, retry policies, logging, diagnostics, and pagination. We’ve used the SDK to develop additional APIs that are easily integrated with other cloud-based services.
With the Azure SDK APIs, our developers have a unified interface to Azure services without needing to learn distinct APIs for each resource type. Development and resource management are streamlined across the entire Azure platform.
With a unified approach, we can use the Azure SDK to manage diverse resources across multiple Azure subscriptions and accounts.
Here are some tips for getting started with the Azure SDK, Azure Container Instances, and the Azure Compute Gallery at your company:
- Use ACI to simplify container orchestration with a smaller developer learning curve, especially when compared to more complex solutions like Kubernetes.
- Configure region failover using resources across multiple Azure regions to quickly deploy containers in healthy regions when another region fails. This ensures service continuity and provides a seamless experience for users.
- Use ACI scaling to quickly deploy instances across Azure regions, delivering high performance and availability for heavy loads systems.
- Configure replication in Azure Compute Gallery to provide global replication management for virtual machine images, ensuring images are readily available to users worldwide.
- Use Azure Compute Gallery versioning capabilities to allow users to capture unique virtual machine configurations at different development stages.
- Access important resources that can help you navigate this process with the Azure SDK. The Azure.Core package in the SDK offers a unified, standardized approach to accessing Azure functionality across various resource types.
- Use the Azure SDK to enable seamless management and deployment of data plane resources at scale across different Azure subscriptions and accounts.
Try out managed identities with Azure Container Instances.
- Read the first blog in our “Moving our network to the cloud” series.
- Check out these best practices and considerations for managing your Azure Container Instances.
- Learn how to manage your Azure security baseline for your Azure Container Instances.
- Discover more about cloud infrastructure and agility at Microsoft.