Building and sharing Jupyter Books in Azure Data Studio
The notebook experience in Azure Data Studio allows users to create and share documents containing live code, execution results, and narrative text. Potential usage includes data cleaning and transformation, statistical modeling, troubleshooting guides, data visualization, and machine learning. Jupyter books compile a collection of notebooks into a richer experience with more structure and a table of contents. In Azure Data Studio we are able not only to use Jupyter books but also create and share them. Learn the basics of notebooks in Azure Data Studio from the documentation and read on to learn how to leverage a GitHub Action to publish and share remote Jupyter books.
Create a Jupyter Book
Use the command “Jupyter Books: Create Book (Preview)” to launch a preview experience for creating a Jupyter book through an Azure Data Studio notebook. The notebook handles the installation of any Python dependencies and prompts for the location of notebooks and markdown files to be compiled into a Jupyter book.
Jupyter books can be shared widely with low-end user friction through two methods, remote books, and Jupyter book extensions. The former requires specifically formatted GitHub releases and the latter requires packaging an extension containing the Jupyter book. We will further explore remote Jupyter books, including leveraging a GitHub action to facilitate creating remote Jupyter books.
Accessing Remote Jupyter Books
Remote Jupyter books load a Jupyter book from a public repository into Azure Data Studio, either in the currently open folder or a temporary location. Adding a remote Jupyter book to Azure Data Studio starts through the action menu in the notebooks pane.
The resulting dialog has a text input for the repository URL where a remote Jupyter book is hosted. Once the repository URL is input, you are presented with the Jupyter book releases hosted in that repository. A single repository can host multiple releases and multiple Jupyter books, whether they are variations for different SQL engines, use cases, or documentation language.
Creating a Remote Jupyter Book
The remote Jupyter books feature in Azure Data Studio is an integration with GitHub releases, and it follows that creating a remote Jupyter book is a variation of creating a GitHub release. The remote book release on GitHub requires the Jupyter book to be attached as both a .zip archive and a .tar.gz archive for full cross-platform compatibility. The Azure Data Studio fields for book name, version number, and language are populated from the GitHub release title and the name of the uploaded compressed Jupyter book.
This release can be created through the GitHub release interface and after manually creating the Jupyter book .zip and .tar.gz archives. The naming scheme for the archive files is a crucial step, where hyphens separate the book, version, and language parameters. Once the release is published in GitHub, it is available in Azure Data Studio as a remote book.
Sharing this Jupyter book with users is now as straightforward as giving them the repository URL, such as “repos/Microsoft/tigertoolbox”. While the process of creating a remote Jupyter book might seem daunting, it is possible to streamline this with the use of GitHub actions.
Automating a Remote Jupyter Book Release
GitHub actions are hosted workflow runners that are capable of automating software development processes right from a repository on GitHub. Prepackaged actions are available on the GitHub marketplace and can be combined to create custom workflows. To expedite the creation of a GitHub release for a remote Jupyter book, a GitHub action is available that takes inputs similar to those in the Azure Data Studio interface and creates the corresponding GitHub release to publish a remote Jupyter book.
The “Remote Jupyter Book Publish” action pairs nicely with a manual trigger, also known as workflow_dispatch, which creates a form in GitHub with inputs for each of the remote book variables. GitHub actions are managed with YAML file definitions that are stored within the /.github/workflows directory in the repository. By leveraging the workflow_dispatch trigger, inputs can be defined and made available to repository maintainers through a familiar-looking form in GitHub. By entering the required inputs and clicking “run workflow”, a release for a remote Jupyter book will be created on the repository.
This complete sample is available in the documentation on the GitHub marketplace where the action “Remote Jupyter Book Publish” is now available in preview. Check out the GitHub action in use on this sample repository. To learn more about GitHub actions, check out the quickstart documentation.