Keep data virtually forever with Stretch Database in SQL Server 2016

April 13, 2016

Product
SQL Server 2016

This post was authored by Joe Yong, Sr. Program Manager, SQL Server.

Stretch Database in SQL Server 2016 allows users to keep as much data as they need, for as long as they need, without incurring business service level agreement (SLA) risks or the high costs of enterprise storage. Unlike typical cold data solutions, by leveraging the endless storage and compute capacity of Azure your data is always online and most applications don’t require changes. DBAs only need to enable the database for stretch.

As organizations continue to accumulate massive amounts of data in their transactional systems, much of it eventually becomes cold, and therefore infrequently accessed. Overtime, individual tables can grow to millions or billions of rows and up to terabytes in size.

Stretch Database

For business and regulatory compliance purposes, users need the data to remain online and accessible on-demand. Storage administrators and IT managers are continually looking for ways to meet business goals with inadequate IT budgets. Meanwhile, DBAs are pulled in different directions by these conflicting goals, trying to keep database performance and availability within business SLAs as the database grows, even as maintenance windows and budgets continue to shrink.

Core scenarios

Stretch entire table: If you currently already have a dedicated table for cold data, you can stretch this entire table. For example, you may have an Order_details and an Order_details_history table where the Order_details_history table only contains cold data moved from the Order_details table.
Stretch cold rows: If you have hot and cold data in the same table, you can stretch just the cold rows from the table to Azure. You only need to define which rows are cold (usually by date or status) and SQL Server will take care of the movement.

Benefits

With Azure SQL Stretch Database, you can leverage Azure on your terms:

Get as much enterprise-class storage as needed, when it is needed. Automated backup and geo-redundancy are available by default.
Scale compute and storage resources independently based on workload requirements—and only pay for what is consumed.
Centralize access control via integrated security for customers who federate their on-premises Active Directory with Azure Active Directory.
Leverage existing knowledge and tools such as SQL Server Management Studio, SQL Server Data Tools, T-SQL and PowerShell, while adding enhanced experiences via Azure portal.

And the best part is, most applications will require no code changes in order to take advantage of Stretch Database.

Base requirements

Stretch Database is only available in SQL Server 2016, so users will need to first upgrade their existing SQL Server database to 2016. It works with RC0 or later, which can be downloaded here. Customers will also need an Azure subscription to create a new SQL Server Stretch Database. Any subscription admin, co-admin, owner or contributor account will have this privilege. If you don’t currently have an Azure subscription, sign-up for a free trial subscription. Implementing Stretch Database generally requires little effort if your database does not have unsupported objects or features.

How it works

To get started, first identify either a cold table that you wish to stretch entirely or a table where you will stretch the cold rows. The actual process to enable Stretch Database is simple and can be accomplished via a SQL Server Management Studio wizard or via T-SQL. Details and step-by-step guidance is available on this documentation page.

As part of the Stretch Database process, SQL Server establishes a secure connection to Azure to create a new server (if you choose to do so) and a new Stretch Database (occurs each time a database is stretched). Once you have a table enabled for stretch, SQL Server will create a new table in the Stretch Database created earlier and begin migrating data silently in the background.

SQL Server always performs Stretch Database tasks over a secure channel and validates the target certificate; nothing is ever sent to Azure in clear text. It ignores user defined settings for this specific area to ensure users cannot accidently configure Stretch Database in an un-secure manner.

Data is trickled to Azure instead of a high volume migration. This ensures minimal impact to the production database. The entire system is online and applications continue working against the database throughout the entire process. Enabling Stretch Database does not incur downtime.

Enabling Stretch Database

Queries against a Stretch Database do not change. If the query needs to retrieve data from the remote database, the query processor automatically executes the query and pushes appropriate filters (WHERE clause) to the remote database for execution. Only required rows are returned and joined with local data, if any, and presented to the application/user. If the query does not require data from the remote database, the query processor will not execute the query remotely. This avoids the overhead and latency of the Azure roundtrip.

When Stretch Database is enabled, you can monitor its activity and progress in different ways. We offer Dynamic Management Views (DMVs) that provide current migration activity or schema update status. You can also view the overall status via the Stretch Database Monitor in SSMS (figure below). Full details are documented here.

Stretch Database Monitor in SSMS

After a period of time when a large percentage of the cold data has moved to Azure, you will see that important maintenance operations like backup/restore will take less time and fewer resources. This is because the cold data in Azure no longer affect these operations.

So, how big of an impact can this have? As for impact, that really depends on how large the tables have gotten and what your business SLAs are. The largest single table we’ve found working with customers is 45 billion rows (and growing), with about 99% of that being cold data. All of it needs to be online and query-able at any time, although actual access to cold data is rare. Imagine you were the DBA and were tasked with index maintenance for that table. If that weren’t big enough for you, the largest partitioned table we found with a customer is 1.3 trillion rows (and still growing rapidly).

While these are pretty extreme examples, they do show what the future holds. Most users today are well below the billion-row range, but still face similar challenges.

Next steps

Download the latest release of SQL Server 2016 if you haven’t already done so, and check out Stretch Database running against the new Stretch Database service in Azure. Share you experiences with us and let us know what works and doesn’t work for you.

Finally, if you have a really large table or even a lot of moderately large tables, we’re very interested in working with you. Talk to us in the comment section or on MSDN.