Your Cart
Loading

HGDP00491 Imputed and Real ENA versions

On Sale
$6.00
$6.00
Added to cart

Imputation in genetics is the process of filling in missing genotypes in a DNA dataset by comparing the known variants to a larger reference panel of sequenced genomes. But does imputation really work? To put this to the test, I’ve chosen a case study from Melanesia, specifically a DNA sample from Bougainville. This is an especially interesting population because of their deep and unique genetic history, which makes them an ideal candidate for testing the accuracy of imputation methods.

For this project, I started with Reich Lab’s .HO dataset and converted the Bougainville sample into a microarray-style format using PLINK. The problem was that the conversion produced a very low-quality file with poor SNP overlap, making it nearly useless for predicting health, traits, or phenotypes. To see if I could rescue the data, I decided to try imputation. Using BEAGLE and 1k Genomes reference VCFs, I imputed the missing genotypes. Then, to check accuracy, I compared the imputed results with a high-quality version of the same sample that I generated from raw FASTQ files downloaded from the European Nucleotide Archive. The results were surprising: about 90% of the genotypes ended up being completely identical.

You will get the following files:
  • TXT (30MB)
  • TXT (49MB)