For the first time, a Chinese research team has successfully sequenced the entire Chinese genome from telomere to telomere (T2T), producing a high-quality, real human diploid. This includes the Y chromosome and a complete, gapless whole-genome reference sequence (44+XY).
The sample originated from the Locust Tree of Hongdong, Shanxi Province, near the ruins of the ancient Tang state established by Emperor Yao thousands of years ago. This led to the team calling the program Tang Yao, and the reference genome T2T-YAO.
Specific Chinese genome required
The human reference genome, as the standard genome for worldwide researchers, is biased and heavily underrepresnts the Chinese population with its genetic diversity.
Research leader Gao Zhancheng found that a number of disease syndromes differed considerably in their clinical manifestations among different ethnic groups.
"To date, all sequencing diagnostic reports for tumors and genetic diseases rely on the U.S. reference genome GRCh37/38," said Gao."The U.S.-based genome is derived from Africans and Europeans, is incomplete, and hardly representative of the Chinese and broader Asian populations."
"The prevailing opinion posits that the genetic variance between different ethnic groups is only one in a thousand. However, from the clinical practice, the actual difference may be much larger than this figure," he said. "As a result, it is necessary for Chinese scientists to create a national reference genome."
Over the past three decades, scientists worldwide have been endeavoring to build a more complete and accurate reference genome in the biomedical research community. Two leading entities, Human Pangenome Reference Consortium (HPRC) and T2T Consortium have already researched Chinese inaugural human pangenome references. However, all participating scientists in both HPRC and T2T Consortium are from Europe and the U.S.
Choosing appropriate sample
In 2020, Gao assembled a team in Shanxi to develop a Chinese genome reference.
Selecting the appropriate samples marked the initial and crucial phase for the research team. The aim of crafting a Chinese-specific genome is to better serve contemporary medical practices, so the samples need to better represent the genomic characteristics of modern Chinese. As a result, the team picked a healthy male Han Chinese as a sample.
Kang Yu, a member of the research team, emphasized the importance of the sample, saying, "It would better represent the modern Chinese genetic traits." T2T-YAO was designated as the project's primary focus, so the team decided to start with the Han Chinese, the largest ethnic group in the nation.
The team's choice was influenced by a historical belief held by many Chinese, both domestically and abroad, concerning their ancestral migration 600 years ago. The majority of the T2T-YAO is characterized by East Asian populations according to ancestry analysis. "We are confident that the genome will serve as an accurate representation of the contemporary Han Chinese population," said Gao.
T2T-YAO showed significant differences between the Chinese and European genomes. When compared with the newly released human reference genome T2T-CHM13 by the T2T Consortium in 2022, the comparison disclosed variances in 11 percent of the sequences and 5 percent of the genes.
Chen Runsheng, an academician at the Chinese Academy of Sciences(CAS), highlighted that the unveiling of the complete Chinese genome sequence will change the previous perception that genetic variances among various human populations only differ by one-thousandth of a percent.
T2T-YAO release within just two years
The Human Genome Project (HGP) took three decades of work to obtain the complete haploid human genome sequence, including the Y chromosome, but T2T-YAO was completed within just two years.
Assessed by Merqury, the number shows that the quality value of T2T-YAO is better than T2T-CHM13. Moreover, T2T-YAO is the inaugural diploid genome, containing both sets of chromosomes, inclusive of the Y chromosome.
"The reason we could assemble T2T-YAO so fast is thanks to rapid advances in DNA sequencing and splicing technology, as well as the accumulation of a great deal of technological advancement and theoretical knowledge, including HGP," said Kang.
"Moving forward, we will conduct further parsing and annotation of T2T-YAO, so that it can be better used in clinical settings," said Kang. He hopes to pioneer sequencing techniques, genomic analyses and diagnostic tools based on an indigenous reference genome to better serve Chinese people, and to promote the development of new drugs in the future.