Forster and colleagues found that the closest type of COVID-19 to the one discovered in bats – type ‘A’, the “original human virus genome” – was present in Wuhan, but surprisingly was not the city’s predominant virus type.
Mutated versions of ‘A’ were seen in Americans reported to have lived in Wuhan, and a large number of A-type viruses were found in patients from the US and Australia.
Zhou et al. (7) recently reported a closely related bat coronavirus, with 96.2% sequence similarity to the human virus. We use this bat virus as an outgroup, resulting in the root of the network being placed in a cluster of lineages which we have labeled “A.”
There are two subclusters of A which are distinguished by the synonymous mutation T29095C. In the T-allele subcluster, four Chinese individuals (from the southern coastal Chinese province of Guangdong) carry the ancestral genome, while three Japanese and two American patients differ from it by a number of mutations. These American patients are reported to have had a history of residence in the presumed source of the outbreak in Wuhan. The C-allele subcluster sports relatively long mutational branches and includes five individuals from Wuhan, two of which are represented in the ancestral node, and eight other East Asians from China and adjacent countries. It is noteworthy that nearly half (15/33) of the types in this subcluster, however, are found outside East Asia, mainly in the United States and Australia.
Two derived network nodes are striking in terms of the number of individuals included in the nodal type and in mutational branches radiating from these nodes. We have labeled these phylogenetic clusters B and C.
For type B, all but 19 of the 93 type B genomes were sampled in Wuhan (n = 22), in other parts of eastern China (n = 31), and, sporadically, in adjacent Asian countries (n = 21). Outside of East Asia, 10 B-types were found in viral genomes from the United States and Canada, one in Mexico, four in France, two in Germany, and one each in Italy and Australia. Node B is derived from A by two mutations
A complex founder scenario is one possibility, and a different explanation worth considering is that the ancestral Wuhan B-type virus is immunologically or environmentally adapted to a large section of the East Asian population, and may need to mutate to overcome resistance outside East Asia.
Type C differs from its parent type B by the nonsynonymous mutation G26144T which changes a glycine to a valine. In the dataset, this is the major European type (n = 11), with representatives in France, Italy, Sweden, and England, and in California and Brazil. It is absent in the mainland Chinese sample, but evident in Singapore (n = 5) and also found in Hong Kong, Taiwan, and South Korea.
February 2020, the first Brazilian was reported to have been infected following a visit to Italy, and the network algorithm reflects this with a mutational link between an Italian and his Brazilian viral genome in cluster C (SI Appendix, Fig. S1). In another case, a man from Ontario had traveled from Wuhan in central China to Guangdong in southern China and then returned to Canada, where he fell ill and was conclusively diagnosed with coronavirus disease 2019 (COVID-19) on 27 January 2020. In the phylogenetic network (SI Appendix, Fig. S2), his virus genome branches from a reconstructed ancestral node, with derived virus variants in Foshan and Shenzhen (both in Guangdong province), in agreement with his travel history. His virus genome now coexists with those of other infected North Americans (one Canadian and two Californians) who evidently share a common viral genealogy. The case of the single Mexican viral genome in the network is a documented infection diagnosed on 28 February 2020 in a Mexican traveler to Italy. Not only does the network confirm the Italian origin of the Mexican virus (SI Appendix, Fig. S3), but it also implies that this Italian virus derives from the first documented German infection on 27 January 2020 in an employee working for the Webasto company in Munich, who, in turn, had contracted the infection from a Chinese colleague in Shanghai who had received a visit by her parents from Wuhan. This viral journey from Wuhan to Mexico, lasting a month, is documented by 10 mutations in the phylogenetic network.
這篇報告還附了一個病毒基因的突變演化系譜:
局部放大:
還附圖說:
Phylogenetic network of 160 SARS-CoV-2 genomes. Node A is the root cluster obtained with the bat (R. affinis) coronavirus isolate BatCoVRaTG13 from Yunnan Province. Circle areas are proportional to the number of taxa, and each notch on the links represents a mutated nucleotide position. The sequence range under consideration is 56 to 29,797, with nucleotide position (np) numbering according to the Wuhan 1 reference sequence (8). The median-joining network algorithm (2) and the Steiner algorithm (9) were used, both implemented in the software package Network5011CS (https://www.fluxus-engineering.com/), with the parameter epsilon set to zero, generating this network containing 288 most-parsimonious trees of length 229 mutations. The reticulations are mainly caused by recurrent mutations at np11083. The 161 taxa (160 human viruses and one bat virus) yield 101 distinct genomic sequences. The phylogenetic diagram is available for detailed scrutiny in A0 poster format (SI Appendix, Fig. S5) and in the free Network download files.