An eye-opening 238 pages of just-released documents from the National Institutes of Health (NIH) disclose that in June 2020—at the request of researchers at China’s Wuhan University—the NIH deleted information about COVID-19 genetic sequencing. The tranche of emails, obtained by the non-partisan group Empower Oversight following a Freedom of Information Act (FOIA) request, reveal the frenzy of activity at the NIH following the deletions and show that an expert advised then NIH Director Francis Collins and Dr. Anthony Fauci that the coronavirus driving the global pandemic originated outside of the Wuhan food market as asserted by the Chinese Communist Party (CCP).

Requested last summer, the documents obtained by Empower Oversight highlight the circumstances surrounding the significant deletions by the NIH and are in stark contrast to the agency’s “best practices of scientific openness and collaboration.”

Introduction to the Deleted NIH Coronavirus Sequences

The deletions from the NIH’s Sequence Read Archive (SRA) were first underscored in a June 22, 2021, published preprint by Jesse Bloom, a virologist at the Fred Hutchinson Cancer Research Center. Bloom had discovered that public access to the sequence had been removed. He contacted the NIH in June 2021 to discuss his findings, explaining in an email that the gene sequences may help understand how the pandemic began. As the NIH remained silent, Bloom’s subsequent preprint that same month prompted several media reports and letters from U.S. Senators. The first paragraph of Bloom’s research paper, titled “Recovery of deleted deep sequencing data shed more light on the early Wuhan SARS-CoV-2 epidemic,” states:

“The origin and early spread of SARS-CoV-2 remains shrouded in mystery. Here I identify a data set containing SARS-CoV-2 sequences from early in the Wuhan epidemic that has been deleted from the NIH’s Sequence Read Archive. I recover the deleted files from the Google Cloud, and reconstruct partial sequences of 13 early epidemic viruses. Phylogenetic analysis of these sequences in the context of carefully annotated existing data suggests that the Huanan Seafood Market sequences that are the focus of the joint WHO-China report are not fully representative of the viruses in Wuhan early in the epidemic. Instead, the progenitor of known SARS-CoV-2 sequences likely contained three mutations relative to the market viruses that made it more similar to SARS-CoV-2’s bat coronavirus relatives.”

On July 14, 2021, following Bloom’s article, Empower Oversight filed a FOIA request with the NIH, seeking transparency about controversial deletions from the SRA “within the next 20 days.” With open, global cooperation as the intended goal, the agency operates the database as part of its participation in the International Nucleotide Sequence Database Collaboration (INSDC) to “capture, organize, preserve and present nucleotide sequence data as part of the open scientific record.” Echoing this objective, a statement released by INSDC on SARS-CoV-2 sequence data sharing during the pandemic reinforced the need for mutual effort and transparency. The group asserted, “The global COVID-19 crisis has brought an urgent need for the rapid open sharing of data related to the outbreak.” 

Undoubtedly, the global pandemic set the stage for the mandatory requirement of transparency. Frustrated, after four months and no response from the NIH to its FOIA request, on Nov. 17, 2021, Empower Oversight filed a lawsuit (with an amended complaint) against the agency to force its compliance with the FOIA and obtain the requested documents. Commenting on the 238-page batch of emails finally received, Empower Oversight notes that NIH’s FOIA staff has made significant errors when searching for relevant records and reviewing records for FOIA exemptions, resulting in erroneously redacted content. Nonetheless, the documents shared by the NIH thus far contain crucial new information. 

Critical Findings in the 238 Pages of NIH Documents

The recently released emails show that on Mar. 17, 2020, a researcher from Wuhan University submitted genetic sequences to the NIH to upload to the SRA. Then, in early June 2020, the researcher requested that the agency remove them. The NIH—which admittedly funded gain-of-function research in Wuhan—initially declined to delete the data. However, when the researchers made the same request in mid-June, with a different rationale for the deletion, the NIH removed the sequences.

Screenshot / First and second request from researcher at Wuhan University to NIH for deletion of genetic sequence in SRA database.

Interestingly, Empower Oversight notes that “the researcher’s first rationale for removal was compliant with the NIH’s conditions for removal, but his latter rationale was not.” The day after the researcher’s second request, the NIH agreed to the deletion and sought clarification from the Wuhan University researcher on whether the previous submission should also be deleted, despite the agency’s refusal to remove it a week earlier. The researcher responded, indicating they wanted both submissions, as well as all related bioprojects and biosamples, removed.

Screenshot / Confirmation to remove both submissions, bioprojects, biosamples, and SRA objects from NIH database.

The identity of the Chinese researcher(s) requesting the deletions was obscured by the NIH when producing the documents through FOIA. Still, according to a story in the New York Times, the name of one of the researchers was Ben Hu of Wuhan University (the FOIA emails also mention a researcher named Aisi Fu). The July 31, 2021 article claims:

“On July 5, more than a year after the researchers withdrew the sequences from the Sequence Read Archive and two weeks after Dr. Bloom’s report was published online, the sequences were quietly uploaded to a database maintained by China National Center for Bioinformation by Ben Hu, a researcher at Wuhan University and a co-author of the Small paper.”

In addition to the NIH’s indifference towards Bloom and its refusal to examine the sequence deletions with him in a transparent process following his June 2021 email, the FOIA documents reveal the agency appears to have misled reporters about its policy for removing sequences. Likewise, off-the-record emails show that an NIH official guided reporters towards “more favorable” Washington Post coverage of Bloom’s paper and away from the New York Times article because of its “tone.” NIH’s Renate Myles wrote to a reporter at The Hill, “Off the record: we think this WaPo story does a good job characterizing the situation.”

Besides the sequence deletions mentioned above, a week after proposing a conversation with the NIH on transparency of the deleted sequences, Bloom pressed the NIH about another, separate set of deletions being examined by “an investigative entity.” He noted that this set of deleted sequences had “reappeared” without explanation. He questioned the NIH about “the puzzling” reappearance of “another previously unreported deletion of pangolin coronavirus sequences at the request of South China Agricultural University.” Again, Bloom did not receive information from the NIH to sufficiently answer his straightforward questions.

Americans Deserve Transparency from the NIH

In releasing these documents, Empower Oversight points out they “raise several questions that need further investigation to answer fully.” The group says one of the most disturbing elements of the emails is proof showing the NIH has refused to participate in a transparent process to examine data on the deleted sequences. The group—which released a COVID-19 origins timeline in Sept. 2021—urges Congress to “press the NIH for answers on why it is stonewalling Senate inquiries and dragging its feet on basic transparency through FOIA,” adding:

“Most importantly, why has NIH refused to examine archival copies of deleted sequences in an open scientific process to determine whether any of that information might be able to shed light on the origins of the COVID-19 pandemic?”

This article was last updated on March 30, 2022.