Conventional genome assemblies collapse the genetic information of a diploid or polyploid individual into a single-haplotype representation. A single-haplotype representation of a genome is useful for some of the downstream analyses, for example for variant discovery, however it also has disadvantages. Most importantly, it omits the information on alternative haplotype(s), which might be important for traits of interest. This is particularly inconvenient for species that harbor large genetic diversity, as well as for traits that tend to be associated to complex regions (e.g. resistance-related genes).
To overcome the limitations of a conventional (haploidized) assembly, we generated a haplotype-resolved assembly of the most important ornamental nightshade, petunia (Petunia hybrida) using the latest PacBio HiFi sequencing technology combined with a Phase Genomics Hi-C kit for scaffolding. The haplotype-resolved assembly is comprised of two sets of seven chromosomes, with each haplotype approximately 1.3 Gb in size (2n = 14), as well as a chloroplast and mitochondrion assembly. Remarkably, the PacBio HiFi data in combination with Hi-C achieved higher contiguity than the gold-standard trio-binning approach that uses sequencing data of parents of the sequenced P. hybrida individual. Neither contained evidence of haplotype switches.
Based on the statistics, the haplotype-resolved genome assembly was a remarkable success. However, a genome assembly is only as valuable as the information that we can derive from it. We thus next annotated the petunia assembly and integrated it into a pangenome with two publicly available (haploidized) P. axilaris and P. inflata genome assemblies. Using the graph-based pangenomics toolkit PanTools we analyzed gene presence/absence polymorphisms, and found species-specific regions as well as larger structural variants. For this analysis, the availability of information on both haplotypes for the highly heterozygous P. hybrida sample provides crucial information on candidate genes for a number of traits.
This work was presented by our colleague Bart Nijland at the AGBT Ag meeting in San Diego on April 4, 2022, and by Peter van Dam at Plant Genomes Online on April 28, 2022.