Though it seems like a lifetime ago, the Human Genome Project released the first human genome sequences only two decades ago. However, there were several gaps in the final sequence due to technical limitations of the time, among them the inability to piece together DNA that contained long repeated base pair sections. That meant that at the time of release, 15% of the human genome was missing. Even as time has marched forward, the most recent version patched in 2019 is missing 8% of the sequence due to heterochromatin and similar difficult CNA sections. But Telomere-to-Telomere (T2T) Consortium geneticists are using new technology to complete sequencing more of the human genome, leaving only the Y chromosome unexplored. Here’s an update on their progress.

Human Genome Sequence Progress by the T2T Consortium

Referring to an international organization covering about 30 institutions, the T2T consortium was launched by Karen Miga of the University of California, Santa Cruz, Adam Phillippy of the National Human Genome Research Institute and Evan Eichler of the University of Washington School of Medicine so that previously unmappable centromere regions could be researched. A preprint (not yet peer-reviewed) entitled “The Complete Sequence of a Human Genome” was published by T2T in May 2021, claiming that the remainder of the human genome had been sequenced. The paper adds 200 million DNA base pairs as well as 115 protein-coding genes to the sequence, providing a 4.5% increase in base pairs numbers, up to 3.05 billion, as well as a 0.4% increase in protein-coding genes, up to 19,969.

Next Generation Sequencing Technology

Using next generation sequencing technology made available from Pacific Bioscience in the US and Canada as well as Oxford Nanopore in the UD, T2T’s latest draft of the sequence is T2T-CHM13. Able to read long chains of DNA stretching up to 20,000 base pairs at a time, this is a huge improvement over traditional sequences that are only able to read a few hundred at a time. Researchers can then reassemble the base pairs, much like pieces of a puzzle. Because the new technology has longer stretches, they’re easier to assemble both due to smaller numbers of pieces as well as common sequences that overlap.

Rather than taking living human DNA, the process used cell lines from a hydatidiform mole, or a tissue type formed in humans when sperm fertilizes an egg without a nucleus. This is because researchers only deal with one X chromosome from the sperm, but also means that the new sequence that has been completed doesn’t contain the Y chromosome, which triggers male development biologically. T2T does recognize that challenges related to passaged cell lines and problematic genome areas that are difficult to quality check could result in an estimated 0.3% of the sequence containing errors.

The Human Genome Sequence in the Future

So where will the human genome sequence go in the future? Currently, the T2T genomics researchers are using the same basic methods to be able to sequence the Y chromosome, giving us a full sequence of biological male genetics. They’re also planning to sequence a genome containing chromosomes from two parents, providing a sequence of a human being at conception.

T2T has also partnered with the Human Pangenome Reference Consortium to sequence over 300 genomes worldwide over the next three years. The organizations will use T2T-CHM13 as a reference to understand differences in the genome from person to person.  With so many resources and tools available for sequencing, the research should identify links between human disease and newly sequenced regions. This may result in sequencing being a daily activity.