Hot takes: interesting papers from June

Transcriptome/Proteome

Defining the consequences of genetic variation on a proteome-wide scale, Chick et al. Nature

“This study quantified both protein and transcript abundance in a genetically diverse population of mice, mapping their genetic architecture … We conclude that most local pQTL affected both protein and transcript abundance, consistent with a transcriptional mode of regulation. However, distant pQTL affected protein abundance independently of the transcript, consistent with a post- transcriptional mode of regulation.”

Neuronal subtypes and diversity revealed by single-nucleus RNA sequencing of the human brain, Lake et al. Science

“Our results demonstrate that postmortem SNS [single-nucleus RNA sequencing] can identify expected and previously unidentified neuronal subtypes that provide insight into brain function through distinct profiles of activity- defining genes”

Epigenome

The molecular hallmarks of epigenetic control, Allis et al. Nat Rev Genet

“Here, we provide a personal perspective on the development of epigenetics, from its historical origins to what we define as ‘the modern era of epigenetic research’.”

Epigenome-wide Association Studies and the Interpretation of Disease -Omics, Birney et al. PLOS Genet

“As EWAS have generally been only rarely performed with concurrent genotyping of the same individuals or transcriptional studies of the same cells, we have no way of knowing whether the positive results of EWAS to date are testing the starting hypothesis that genuine epigenetic changes occur within a subset of cells in the population. Instead, the results may be due to residual meta-epigenomic effects of cell subtypes or attributable to untested influences of genomic or transcriptomic variability. This being the case, and with similar caveats affecting transcriptomic studies, no EWAS to date can be said to be fully interpretable … Analytically, insights into DNA sequence variants upon DNA methylation (methylation quantitative trait loci, mQTLs) for the cell type studied will allow approaches to be developed to account for this major influence upon the epigenome. One particular approach, two- step mendelian randomization, is being applied in prospective and case/control EWAS, build- ing on the non-modifiable nature of germline genetic variation to provide causal anchors within a causal inference setting.”

The landscape of accessible chromatin in mammalian preimplantation embryos, Wu et al. Nature

“Here we provide a genome-wide survey of accessible chromatin in the mouse preimplantation embryos using ATAC-seq. A fundamental question in preimplantation development is to what extent gene expression is linked to epigenome reprogramming. Our data indicate that gene activation and establishment of open chromatin could occur, at least in part, through different pathways from those for epigenetic modification reprogramming.”

Heritability/correlation

Contrasting the Genetic Architecture of 30 Complex Traits from Summary Association Data, Shi et al. AJHG

“Here, we introduce methods that estimate the total trait variance explained by the typed variants at a single locus in the genome (local SNP heritability) from genome-wide association study (GWAS) summary data while accounting for linkage disequilibrium among variants. We applied our estimator to ultra-large-scale GWAS summary data of 30 common traits and diseases to gain insights into their local genetic architecture. First, we found that common SNPs have a high contribution to the heritability of all studied traits. Second, we identified traits for which the majority of the SNP heritability can be confined to a small percentage of the genome. Third, we identified GWAS risk loci where the entire locus explains significantly more variance in the trait than the GWAS reported variants. Finally, we identified loci that explain a significant amount of heritability across multiple traits.”

Transethnic Genetic-Correlation Estimates from Summary Statistics, Brown et al. AJHG

“We have developed transethnic genetic-effect and genetic- impact correlations and provided a method for estimating these quantities on the basis of only summary-level GWAS information and suitable reference panels … In all pheno- types analyzed, the genetic correlation was significantly different from both 0 and 1.”

Imputing Phenotypes for Genome-wide Association Studies, Hormozdiari et al. AJHG

“We propose an approach called phenotype imputation that allows one to perform a GWAS on a phenotype that is difficult to collect. Our approach leverages the correlation structure between multiple phenotypes to impute the uncollected phenotype … Our method can impute the summary statistic of the target phenotype as a weighted linear combination of the summary statistics of related phenotypes.”

Multikernel linear mixed models for complex phenotype prediction, Weissbrod et al. Gen Res

“Here, we present multikernel linear mixed model (MKLMM), a flexible modeling framework that allows for both global and local high-order interactions modeled via RKHS, as well as modeling of a heterogeneous effect-size distribution … MKLMM-Adapt held a statistically significant advantage over AMB [adaptive multi-BLUP] in prediction of Crohn’s disease (CD), type 1 diabetes (T1D), and ulcerative colitis (UC), whereas AMB did not hold a statistically significant advantage over MKLMM-Adapt in any data set “

Computational phasing

Fast and accurate long-range phasing in a UK Biobank cohort, Loh et al. Nat Genet

“The basic idea of our approach is to harness IBD from distant relatedness (up to ~12 generations from a common ancestor) that is pervasive within very large cohorts … we observed that Eagle analysis of all N ≈ 150,000 samples together completed three times faster than SHAPEIT2 10 × 15,000 analysis while achieving a 67% (1%) decrease in switch error rate: switch error rate of 0.30% (0.01%) for Eagle 1 × 150,000 versus 0.90% (0.06%) for SHAPEIT2 10 × 15,000 … corresponding to perfect phase in a majority of 10-Mb segments”

Haplotype estimation for biobank-scale data sets, O’Connell et al. Nat Genet

“SHAPEIT3 enhances SHAPEIT2 in two ways that enable it to deal with very large data sets, such as the UKB study. The first advance is based on the intuition that larger sample sizes are likely to result in increased local similarity between groups of haplotypes due to the higher probability of more recent shared ancestry … The second advance involves changes to the Markov chain Monte Carlo (MCMC) sampling routine that result in additional gains in speed. As sample size grows, it becomes more likely that two individuals will have a long stretch of sequence in common within a particular window … We removed the trio parents from the data set and phased the whole of chromosome 20 (16,265 genotyped sites) for the remaining 152,112 individuals. This run resulted in a median switch error rate of 0.4% and took 38.5 h using ten threads.

Popgen

The genetic history of Ice Age Europe, Fu et al. Nature

“Here we analyse genome-wide data from 51 Eurasians from ~45,000–7,000 years ago. Over this time, the proportion of Neanderthal DNA decreased from 3–6% to around 2%, consistent with natural selection against Neanderthal variants in modern humans. Whereas there is no evidence of the earliest modern humans in Europe contributing to the genetic composition of present-day Europeans, all individuals between ~37,000 and ~14,000 years ago descended from a single founder population which forms part of the ancestry of present-day Europeans.”