Study of largest human genome database reports its findings on disease-causing gene variants

The Genome Aggregation Database has gathered 15,708 genomes and 125,748 exomes.

A story of the largest-ever human genome database to help shed light on how genetic mutations can lead to disease, has now produced its first results.

The gnomAD Project has scoured through the Genome Aggregation Database – a library of over 15,708 human genomes and 125,748 exome sequences, which are the segments in the genome that code for proteins. Proteins, in turn, are involved in every major biochemical process in the body.

In a conversation with Science FocusDr Daniel MacArthur, scientific lead of the gnomAD Project, shared his experiences and recounted how the project began.

The Genome Aggregation Database has gathered 15,708 genomes and 125,748 exomes.

The Genome Aggregation Database has gathered 15,708 genomes and 125,748 exomes.

MacArthur said the project was started to obtain better databases of normal variation in genomes. The study, he said, was aimed at making sense of the genetic changes seen in patients affected by severe muscle diseases, like muscular dystrophy.

"We don’t actually generate the data ourselves. The gnomAD Project teams are data parasites: we take advantage of the data that’s been generated by others when they agree to contribute that data to our project," he revealed.

MacArthur also explained the significance of exomes. Exomes, he said, are the genes that are the most commonly the reason behind very rare diseases like muscular dystrophy or severe retinal disease.

He asserted that humans carry somewhere between three and six million points of variation across their genomes.

"In some people, such as people who suffer from severe diseases like muscular dystrophy, those genetic changes can be catastrophic," he said.

According to the World Health Organization, dysfunctional gene behaviour is commonly termed as mutation, which is responsible for diseases. If the gene mutations exist in the egg or sperm cell, children can inherit the defective gene from their parents.

As more sequencing data is generated by various scientific enterprises, researchers are being given the opportunity to study how frequent some of these variants are in large, multi-ethnic groups. This, the study says, helps predict the variant most common in a population. 

Disease-causing mutations/variants in some genes were less common than others in the data researchers analysed. McArthur suggests that these variants would more likely be harmful compared to more prevalent genes. They will also likely be diluted in a large population in the process of natural selection, leaving most healthy people with normal variants.

The current study looks at a specific inherited illness – arrhythmogenic right ventricular cardiomyopathy, which was found to correlate with mutations in five specific genes in the human genome: PKP2, DSP, DSG2, DSC2 and JUP. They found that analysing large cross-ethnic population sequencing data significantly improved the interpretation of disease variants using information from online databases.

Researchers hope that analysis of variants can also be used to help identify other disease-causing variants based on ethnic groups.

Find latest and upcoming tech gadgets online on Tech2 Gadgets. Get technology news, gadgets reviews & ratings. Popular gadgets including laptop, tablet and mobile specifications, features, prices, comparison.