Last year, 13 gene sequences were mysteriously deleted from online databases-isolated from people infected with COVID-19 at the beginning of the Chinese pandemic, but they have now been recovered.
Jesse Bloom, a computational biologist and virus evolution expert at the Fred Hutchinson Cancer Research Center, discovered that these sequences have been deleted from online databases at the request of scientists in Wuhan, China. But through some Internet investigations, he was able to restore a copy of the data stored on Google Cloud.
These sequences have not fundamentally changed scientists’ The origin of COVID-19 -Including the worrying question of whether the coronavirus naturally spreads from animals to people or escapes in laboratory accidents. But their deletion has increased concerns that the Chinese government’s secrecy has hindered the international community’s efforts to understand how COVID-19 emerged.
Bloom’s research results are published in Preprinted paper, Which has not been peer-reviewed by other scientists, and was released on Tuesday. “I think this must be consistent with the attempt to hide the sequence,” he told BuzzFeed News.
After Bloom learned of the deleted data Read the paper A team led by Carlos Farkas from the University of Manitoba in Canada, on some of the earliest gene sequences of SARS-CoV-2. Farkas’ paper describes a sequence of samples taken from hospital outpatients by Wuhan researchers in a project to develop a virus diagnostic test.But when Bloom tried Serial read archive, An online database run by the National Institutes of Health, he received an error message indicating that they have been deleted.
Bloom realized that a copy of the SRA data was also maintained on a server run by Google, and was able to figure out that the URL of the missing sequence could be found in the cloud. In this way, he recovered 13 gene sequences that may help answer questions about how the coronavirus evolved and where it came from.
Bloom found that, like other sequences later collected outside the city, the deleted sequence was more similar to the bat coronavirus (presumably the ultimate ancestor of the virus that caused COVID-19), rather than the sequence related to the Wuhan South China Seafood Market . This adds to the earlier suggestion that the seafood market may be an early victim of COVID-19, rather than where the coronavirus first spreads from animals to people.
“This is a very interesting study conducted by Dr. Bloom. In my opinion, the analysis is completely correct,” Farkas told BuzzFeed News via email. Scott Gottlieb, the former director of the US Food and Drug Administration, also praised the discovery on Twitter.
But some scientists don’t have a deep impression on this. Robert Garry of Tulane University in New Orleans told BuzzFeed News via email: “This does not help the origin debate.” Gary believes that the South China market or other markets in Wuhan may still be the source of COVID-19.
Bloom is one of the 18 scientists in May Published a letter Criticize the WHO and China’s research on the origin of SARS-CoV-2. Scientists believe that the WHO-China report fails to “balance the consideration” of the competing views of the natural spread of coronavirus from animals to humans or escape from the laboratory-the report considers this theory “extremely impossible.” After the WHO-China report was published, the governments of the United States and 13 other countries complain It “cannot access complete raw data and samples”.
The deleted virus sequence was uploaded to the SRA for the first time in early March 2020, approximately by researchers led by Yan Li and Tiangang Liu of Wuhan University Published a preprint Describe their work using gene sequencing to diagnose COVID-19.Just a few days ago, the State Council of China Has ordered All papers related to COVID-19 have been approved by the central government.
These sequences were subsequently withdrawn from the SRA in June, at approximately Final version of the paper Appeared in scientific journals. According to NIH, the author requested that these sequences be deleted. “The requester stated that the sequence information has been updated, is being submitted to another database, and hopes to delete the data from the SRA to avoid version control issues,” NIH spokesperson Amanda Fine told BuzzFeed News via email.
However, it is not clear whether these sequences have been published online in another database.
“There is no reasonable scientific reason for deletion,” Bloom wrote in his preprint, arguing that these sequences may be “deleted to conceal their existence.” He writes that this shows that “the effort to track the early spread of the epidemic is not full-hearted.”
Although these sequences have been deleted, Gary pointed out that the key genetic mutations they contain are still published in the table of the Wuhan team’s final paper. “Jesse Bloom did not discover anything new that is not part of the scientific literature,” Gary told BuzzFeed News, accusing Bloom of writing his preprint in “unscientific and unnecessary inflammatory methods.”
Bloom wrote to researchers in Wuhan, asking them why they deleted these sequences, but did not get a response. Li and Liu also did not immediately respond to queries from BuzzFeed News.
This is not the first time scientists have expressed concern about deleting data that may help answer questions about the origin of COVID-19. The main database containing the sequence information of the coronavirus maintained by the Wuhan Institute of Virology-this is the focus of speculation about the virus’s possible “laboratory leak”- Offline September 2019. Research origin The pandemic patients visited the institute in February and they were informed of the database, which Reported to include data The 22,000 coronavirus samples and sequence records have been deleted after repeated hacking attacks.