ASMC data

    █████╗   ███████╗  ███╗   ███╗   ██████╗
   ██╔══██╗  ██╔════╝  ████╗ ████║  ██╔════╝
   ███████║  ███████╗  ██╔████╔██║  ██║     
   ██╔══██║  ╚════██║  ██║╚██╔╝██║  ██║     
   ██║  ██║  ███████║  ██║ ╚═╝ ██║  ╚██████╗
   ╚═╝  ╚═╝  ╚══════╝  ╚═╝     ╚═╝   ╚═════╝

This page contains genomic annotations from the paper:

High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability. P. Palamara, J. Terhorst, Y. Song, A. Price. Nature Genetics. August 2018. [paper] [preprint] [full-text]

The software page can be found here. For any questions or comments, please contact Pier Palamara using <lastname>@stats.ox.ac.uk. Additional data from the paper may be available upon request.

Bed format annotations

  • DRC_150 annotation of recent positive selection (hg19 coordinates, computed using UK Biobank data). This annotation was computed using the posterior probability of pairwise coalescent times for all analyzed pairs of individuals in the UKBB data set. The posterior probability tables for all pairs of samples were summed together and renormalized. The DRC_150 statistic is the observed average coalescent rate between generations 0 and 150. The 4-th and 5-th columns of the bed file report the DRC_150 statistic and the p-value for each site.

  • ASMC_avg annotation of background selection (hg19 coordinates, computed using GoNL data). This annotation was computed using the posterior probability of pairwise coalescent times for all pairs of individuals in the GoNL data set. The posterior probability tables for all pairs of samples were summed together and renormalized. This averaged posterior probability was then used to compute the expected coalescent time at each site. The ASMC_avg annotation used in LDSC analyses (see below) involves additional processing steps.

  • ASMC_med annotation (hg19 coordinates, computed using GoNL data). As above, using median instead of mean.

  • ASMC_med.het annotation (hg19 coordinates, computed using GoNL data). As above, using median instead of mean. Here, however, we only consider sites where the pair of individuals is heterozygous (genotype = 0,1 or 1,0), so that this quantitiy is related to allele age.

LD-Score format annotations

  • ASMC_avg annotation of background selection (hg19 coordinates, computed using GoNL data). This annotation contains the same values as the bed format ASMC_avg annotation above, but only for SNPs used in the LD-Score analysis, which are also quantile-normalized using 10 minor allele frequency bins (see paper for details). The tarball contains an annot.gz file (ASMC_avg annotation values) and an l2.ldscore.gz file (LD-Score values) for each chromosome. This annotation is now included in the baseline-LD v2.0 model, which you can find here (version info here). The baseline-LD v2.0 model is recommended for LD-Score analyses, as it jointly considers a total of 76 genomic annotations.

Average posterior TMRCA values

  • UKBB_posteriors. Files containing average posterior TMRCA density inferred for the UK Biobank data set at each site. These can be used to plot the heatmaps in figures 3.b, 3.c, and S7 of the paper, running the script sh getFigures.sh included in the tarball, which in turn calls the plotPosteriorHeatMap.py script.