a tall building lit up at night

Microsoft Research Lab – Asia

Molecular Dynamics Simulation Accelerates Research of the Pathogenic Mechanism of COVID-19

Share this page

Since COVID-19 first broke out, nearly 280 million people worldwide have contracted the disease, and more the 5.4 million people have died as a result, bringing huge losses and injuries to the global economy and to social life. In contrast, the SARS epidemic in 2003 lasted over a year and resulted in over 8,000 reported cases and over 900 deaths. In 2012, MERS was prevalent mainly in the Middle East. While these are all infectious diseases caused by a coronavirus, why in particular is SARS-CoV-2 so contagious? And how does it infect the human body?

Facing the current battle with COVID-19, scientists from around the world have moved quickly to conduct research on the virus, and this has also further accelerated the integration of new technologies such as AI with the life sciences field. In the past two years, researchers at Microsoft Research Asia have been putting a lot of thought into how they can combine the advantages they have in AI, deep learning, and other computer fields with professional knowledge in the life sciences to contribute their effort to fighting the pandemic. Not long ago, Microsoft Research Asia collaborated with the School of Life Sciences and the Infectious Disease Research Center at Tsinghua University to achieve two important results in the cross-disciplinary research of COVID-19, providing a new direction for understanding the mechanism of the virus.

Research on pathogenic mechanism of COVID-19 yields results and highlights the potential of computational biology

Research has found that COVID-19 is caused by the SARS-CoV-2 virus. Like other coronaviruses, its surface consists of a spike glycoprotein structure, or the S (spike) protein. For the virus to enter human cells, the S protein needs to bind with receptors in the human cells. The configuration of the S protein is similar to the letter “Y.” The vertical S2 region plays a supporting role, and then from it, two branches protrude upwards, with one being RBD and the other being NTD. Scientists have found that it is the RBD region that directly causes the infection; whether it is in an “up” state or “down” state directly affects receptor binding. Only when RBD is “up” can the receptor bind, thereby infecting the human body.

Based on this background knowledge, researchers at Microsoft Research Asia raised a series of questions: Now that the function of RBD is known, what can be said about the role that NTD plays in the infection process? Does NTD have a synergistic effect on RBD state changes during virus infection? If we discover the rules surrounding the “up” and “down” states of RBD, would it then be possible to inhibit viral invasion? To answer these questions, researchers hoped to use computational biology—in particular, molecular dynamics simulation technology—to conduct in-depth research on NTD. When they approached Professor Haipeng Gong of Tsinghua University’s School of Life Sciences with this idea, the two parties immediately launched into collaborative research.

Professor Haipeng Gong sharing his research at Microsoft Research Asia

Professor Haipeng Gong sharing his research at Microsoft Research Asia

Following analysis, the researchers found that many previous studies only conducted simulation on a small portion of RBD or NTD and was therefore unable to model their changes within the background of the entire S protein. Simulation accuracy was also lacking. Although we are only talking about the simulation of one protein, there are millions of atoms involved, and so one can imagine the computational complexity of this task. To tackle this, researchers at Microsoft Research Asia used enhanced sampling and accelerated algorithms to establish a large-scale, all-atom molecular dynamics simulation model based on a powerful computing platform and enabled to make ultra-long time molecular dynamics simulation.

  • A large-scale, all-atom model refers to the construction of a complete S protein with millions of atoms, as opposed to the carrying out of abstract simulation on 100,000 or 10,000 selective points. This would improve simulation accuracy.
  • An extended calculation refers to when the researchers run a calculation consisting of billions of steps, with each step representing 1 femtosecond (1 trillionth of a second) and the entire simulation running for 20 microseconds.

In the end, Microsoft Research Asia was the first to propose the “wedge” model that shows NTD playing a regulatory role in the viral infection process. Related results were published as a cover article in the renowned journal Advanced Theory and Simulations in October 2021 (opens in new tab). Tong Wang, a senior researcher at Microsoft Research Asia, explained their simulation findings, saying, “RBD actually has a tendency to tilt downward because just like how people feel, lying down is definitely more comfortable. But when RBD tries to tilt downward, NTD would block the gap under RBD like a wedge to enforce the standing state that has infectious ability to humans.”

Schematic illustration of the regulatory function of NTD in the conformational change of the S protein of SARS-CoV-2

Schematic illustration of the regulatory function of NTD in the conformational change of the S protein of SARS-CoV-2

Using this “wedge” model, the researchers further conducted virtual screening of traditional Chinese medicine compounds found in the Chinese herbal medicine database TCMSP and detected 18 compounds in eight types of traditional Chinese medicine that have strong binding abilities with NTD at the site of action, thus providing reference value for the research and development of COVID-19 drugs.

This manner of using computer simulations to perform biological experiments—and even to make predictions and inferences—is called “dry experimenting.” But biological research still cannot depart from “wet experiments,” or biological experiments that are based on molecules, cells, physiological traits, etc. While carrying out the exploratory research on NTD, Tong Wang learned that the teams under Professors Xinquan Wang and Linqi Zhang at Tsinghua University were collaborating on researching the pathogenic mechanism of COVID-19. The three parties decided to collaborate. Through the experiments in structural biology and immunology carried out by the teams led by the two Tsinghua University professors, it was found that compared with other coronaviruses, the mutation at position 372 of the SARS-CoV-2 S protein caused a loss of glycosylation at position 370. This change prompts RBD to more frequently maintain a standing position, thus enhancing the infectivity of the virus. Researchers from Microsoft Research Asia then used molecular dynamics simulation and other computational methods to further analyze the glycosylation effect of position 370 in the S protein on the conformational change of the S protein and on the virus’ ability for infection. Ultimately, the conclusion was verified by a combination of dry and wet experiments, and the related paper was accepted by the top journal in the field of biology Cell Research (opens in new tab).

On this forward-looking scientific research carried out by the three parties, Professor Linqi Zhang expressed, “We have found an extremely important point of intersection between computer science and the life sciences. This is the result of having completed a large amount of data analyses, experimental verifications, and predictions. By cooperating with Microsoft Research Asia, we’ve been able to recognize that by connecting the two fields, we can accelerate the discovery of key links in life-related phenomena, solve problems in life sciences, and further understand the field, thereby playing a critical role in the development of new drugs to either block or promote certain life phenomena.”

AI unlocks new directions for life science research and opens up new industries

As Professor Linqi Zhang has articulated, the deep integration of innovative means such as AI and big data with the life sciences is introducing new directions for life science research and even changing the research paradigm of the field. The development of life science research has gone through many stages, from descriptive observations prior to the 20th century to experimental analyses in the 20th century. With the effort of scientists over time, the code of life has been cracked little by little. However, these traditional biological research methods rely on continuous trial and error and accumulation, which is not only costly but involves long cycles. At the same time, the development of low-level data collection technologies such as z-genomics and the continuous generation of data from drug trials have led to an explosive growth of biological data. Although this brings about the possibility for personalized targeted drug research and development and precision medicine, the massive amount of data has rendered it an impossible task to carry out data sorting, analysis, and mining by manpower alone.

Now, with the improvement of computing power and the refinement of models such as machine learning, big data has allowed for increasing perfection in the research conditions of computational biology and is playing an increasingly important role in basic scientific research. With regards to the merging of AI and the life sciences, Professor Haipeng Gong said, “Are we able to find patterns in the data from wet experiments? Human logic can yield general judgment, but it is not detailed enough. AI can demonstrate its advantages in this regard.” Professor Linqi Zhang agrees with this. He believes that the life sciences cannot rely on mere feelings; rather, there must be development towards quantification and precision. “The results seen in wet experiments are often static,” he said, “but all life processes are dynamic, and changes in molecular structures, furthermore, consist of instantaneous reactions that flash by under natural conditions, where the human eye has no chance of seeing them. Certain new algorithms and techniques can play a very important role in molecular dynamics simulation and quantitative evaluation.”

Professor Linqi Zhang (left) and Doctor Tie-Yan Liu (right)

Professor Linqi Zhang (left) and Doctor Tie-Yan Liu (right)

In addition to promoting the development of basic scientific research such as of viruses and pathogenic mechanisms, the merging of computer science and the life sciences may also end up creating a whole new biomedical industry.

Traditional research and development of new drugs is extremely risky and difficult, consisting of long cycles and high costs. In the past decade, the success rate of drug development projects advancing from Phase 1 clinical trials to gaining FDA approval has only been 7.9%.

Professor Linqi Zhang has personal experiences in this matter. A little while back, the combination therapy of ambavirumab/romisvirimab for treating COVID-19 that he led in developing was approved by the China National Medical Products Administration (NMPA). He said, “AI can play a huge role at every step of the way in developing a new drug, such as providing support for antibody screening, evaluation, prediction, optimization, etc., shortening development time, and reducing development costs. Additionally, if we can use AI technology to pick up patterns and make predictions based on big data analysis or design antibodies for mutations before the virus mutates, we can then have a head start and turn passive responses into active measures.”

In the future, the seamless connection of dry and wet experiments through cross-disciplinary research in everything from initial research to clinical trials will unite real world issues and theoretical data, ultimately broadening the horizons for the life sciences field.

Cross-disciplinary collaboration is key to breaking through set dimensions

Although cross-disciplinary cooperation between computer science and the life sciences is looking promising, the collaborative process still needs work. The scientists from the two fields are used to working with very different structures of knowledge and language systems. How these industry barriers can be broken down and how a cooperative ecosystem can be jointly built are key questions that need to be answered. The collaborative work carried out so far between Microsoft Research Asia and Tsinghua University has contributed to an accumulation of experience in bringing together these two fields.

So what is the secret to collaboration between scientists from different backgrounds?

First of all, you must know your strengths and your weaknesses and work towards complementing each other. Professor Linqi Zhang has long been focusing on the pathogenesis of major infectious viral diseases such as AIDS, as well as studying antiviral drugs, antibodies, and vaccines; Professor Xinquan Wang’s main research direction is structural biology; and Professor Haipeng Gong has been committed to applying new methods such as molecular dynamics simulation in analyzing large-scale conformational changes of biological macromolecules. All three professors and their teams have a solid background and world-class influence in their respective fields. The professional, cutting-edge insights of these experts in the life sciences provide the foundation for realizing algorithms, and they can help algorithm experts understand the science behind the data. Microsoft, meanwhile, is a platform company with computer technology as its core competency. The company can provide advanced support in artificial intelligence and cloud computing for other fields and disciplines.

Tie-Yan Liu, Assistant Managing Director of Microsoft Research Asia, stated, “Microsoft Research Asia does not have expertise in biology, materials science, physics, or chemistry, and so we need to work closely together with real experts in those fields. During this process, each party will influence and affect change on the other. AI scientists can provide data-based, end-to-end problem-solving ideas that are more effective than traditional scientific computing, and natural science experts can provide unique domain knowledge to enable computing power to be used in a way that conforms with scientific laws.”

Secondly, cross-disciplinary collaboration requires people to ask the most forward-looking and challenging scientific questions. Only cutting-edge topics can give full play to the strengths of both parties, motivating researchers to overcome difficulties and allocate resources rationally. According to Tie-Yan Liu, “Although people believe that AI can play a role in any field, the key lies with how to find the key scientific questions, and this requires domain experts and AI experts to sit down and have discussions in order to uncover the truly important issues.” At the beginning of the cooperation, the researchers from Microsoft Research Asia and the professors and students from Tsinghua University had encountered problems such as misaligned expectations and communication gaps. It was through subsequent meetings and discussions, held regularly, that the two sides gradually came to recognize each other’s strengths as well as the most difficult problems they were dealing with. When differences arose in experimental results, members of the collaboration jointly analyzed the probable causes of the problems, approaching the matter from different angles, and strengthened their trust in each other with time.

Lastly, there needs to be patience and persistence. Life science research can be a long and tedious process, and oftentimes, basic research cannot bring about direct benefits in a short period of time. Professor Haipeng Gong said, “You must be down-to-earth to be able to do scientific research. In order to solve practical problems in biology, your goal must be set to promoting scientific development and not to simply publish papers. Microsoft Research Asia, while providing powerful computing resources and AI algorithms, also delivers patience, which is the foundation for cross-disciplinary cooperation.”

During the collaboration, the two parties were able to deepen their understanding of each other’s industry and organization. Prior to starting their work with Microsoft Research Asia, the professors at Tsinghua had harbored some doubt. “In our view, corporate research departments were more oriented towards short-term performance,” said Professor Xinquan Wang. “But after having worked together, we’ve realized that Microsoft Research Asia is a true academic institution, and its academic stances and values are very consistent with Tsinghua’s. Only thus can solid academic research collaboration be carried out.”

Professor Xinquan Wang is sharing his research at Microsoft Research Asia

Professor Xinquan Wang is sharing his research at Microsoft Research Asia

Whether it is using deep learning to optimize air pollution emissions, using Graphormer on catalyst designs, using neural networks for new physical discoveries, or the recent science-related keynote speeches at the top AI conference NeurIPS that had sparked huge interest, it all shows that AI for Science has become a trend. The intertwining and collisions of computer science and artificial intelligence with fields such as the life sciences, biomedicine, quantum science, and astronomy in a series of basic scientific research projects will inject strong impetus into scientific development.