A new best practice to protect technology supply chain integrity
This post is authored by Mark Estberg, Senior Director, Trustworthy Computing.
The success of digital transformation ultimately relies on trust in the security and integrity of information and communications technology (ICT). As ICT systems become more critical to economic prosperity, governments and organizations around the world are increasingly concerned about threats to the technology supply chain. These concerns stem from fear that an adversary might tamper with or manipulate products during development, manufacture, or delivery. This poses a challenge to the technology industry: If our products are to be fully trusted, we must be able to provide assurance to our customers that the technology they reviewed and approved before deployment is the same software that is running on their computers.
To increase confidence, organizations have increasingly turned to source code analysis through direct inspection of the supply chain by a human expert or an automated tool. Source code is a set of computer instructions written in a programming language that humans can read. This code is converted (or compiled) into a binary file of instructions—a language of zeroes and ones that machines can process and execute, or executable. This conversion of human-readable code to machine-readable code, however, raises the unsettling question of whether the machine code—and ultimately the software program running on computers—was built from the same source code files that the expert or tool analyzed. There has been no efficient and reliable method to answer this, even for open source software. Until now.
At Microsoft, we have developed a way to definitively demonstrate that a compiled machine-readable executable was generated from the same human-readable source code that was reviewed. It’s based on the concept of a “birth certificate” for binary files, which consists of unique numbers (or hash values) that are cryptographically strong enough to identify individual source code files.
As source code is compiled in Visual Studio, the compiler assigns the source code a hash value generated in such a way that it is virtually impossible that any other code will produce the same hash value. By matching hash values from the compiler to those generated from the examined source code files, we can verify that the executable code did indeed result from the original source code files.
This method is described in more detail in Hashing Source Code Files with Visual Studio to Assure File Integrity. The paper gives a full description of the new Visual Studio switch for choosing a hashing algorithm, suggested scenarios where such hashes might prove useful, and how to use Visual Studio to generate these source code hashes.
Microsoft believes that the technology industry must do more to assure its stakeholders of the integrity of software and the digital supply chain. Our work on hashing is both a way to help our customers and a way to further how the industry is addressing this growing problem:
- This source file hashing can be employed when building C, C++, and C# executable programs in Visual Studio.
- Technology providers can use unique hash value identifiers in their own software development for tracking, processing, and controlling source code files that definitively demonstrate a strong linkage to the specific executable files.
- Standards organizations can include in their best practices the requirement to take this very specific and powerful step toward authenticity.
We believe that capabilities such as binary source file hashing are necessary to establish adequate trust to fulfill the potential of digital transformation. Microsoft is committed to building trust in the technology supply chain and will continue to innovate with our customers, partners and other industry stakeholders.
Practical applications of digital birth certificates
There are many practical applications for our binary source file hashing capability, including these:
- Greater assurance through automated scanning. As an automated analysis tool scans the source code files, it can also generate a hash value for each of the files being scanned. Matching hash values from the compiler with hash values generated by the analysis not only definitively demonstrates that they were compiled into the executable code, but that the source code files were scanned with the approved tool.
- Improved efficiency in identifying vulnerabilities. If a vulnerability is identified in a source file, the hash value of the source file can be used to search among the birth certificates of all the executable programs to identify programs likely to include the same vulnerability.