![]() However, constructing the pan-genome sequences from hundreds of individual genomes is a huge challenge. The explosive growth of human whole-genome sequencing data brings significant challenges and tremendous opportunities to study the pan-genome of a specific population. The possibility of these non-reference genomic regions to be the driver mutations for some diseases, especially for those dominated by a certain specific ethnic group, is worth our effort to investigate. ![]() These studies indicated the significance of population-specific genome diversity. Notably, most of these novel sequences were individual-specific, and only 81 Mb sequences were shown in two or more individuals. It contained about 300 Mb unique sequences missing in the human reference genome. In another Chinese genome HX1, 12.8 Mb sequences were detected not present in GRCh38 but 68% of these novel sequences could be found in Asian populations. In a subsequent study, re-analysis of the 5 Mb novel sequences from a Chinese individual showed that 3.7 Mb sequences could be aligned to GRCh38 human reference genome. In this study, about 5 Mb novel sequences absent in the reference genome (hg19 assembly) were detected for each individual and the total sequences absent in the reference genome were estimated to be 19~40 Mb, which might have been underestimated considering the study of 10 Danish trios. The first human pan-genome study was carried out in 2010, and only two representative genomes from Africa and Asia were analyzed. ) containing genes in a subset of individuals of this species. The pan-genome is composed of a “core genome” containing genes present in all individual genomes and a “distributed genome” (or dispensable genome, which is somewhat misleading as discussed by Marroni et al. in Streptococcus agalactiae study and aimed to reveal gene or gene family presence-absence variation (PAVs) within a species or a population. The approach of pan-genome analysis was first introduced by Tettelin et al. Over the past decade, due to the rapid decrease of sequencing cost, pan-genome analysis has become popular in bacteria and plants. Adding these novel sequences into the human reference genome could improve the efficiency of mapping and variant calling process. For example, a 766-bp non-repetitive non-reference sequence was found to have an association with myocardial infarction in Icelanders. These novel sequences may harbor functional genomic elements that are ethnic specific, and may affect gene regulations or transcriptional diversity. The Simons Genome Diversity Project reported high-quality genomes of 300 individuals from 142 diverse populations and suggested at least 5.8 Mb sequences from these genomes were not present in the human reference genome. found that each genome carried an average of 0.7 Mb sequences that were not found in the human reference genome. ![]() ![]() In another study, by analyzing the unmapped reads from ~ 10,000 deep sequencing human genomes, Telenti et al. For example, more than 3700 non-repetitive non-reference (NRNR) sequences were called from whole-genome sequence data of 15,219 Icelanders by de novo assembly of the unmapped reads into contigs. Actually, previous studies have discovered various types of novel sequences, which are not present in the human reference genome. Therefore, reference-based methods may miss some sequence variations within or between populations. However, most of these studies are based on the human reference genome, which was built from several individuals, and only a consensus of these genomes was included. Single nucleotide variations (SNVs), small insertions and deletions (INDELs), and structural variations (SVs) of the human genome are routinely explored to study the genomic variations in biomedical studies.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |