bioinfo-statistics

Plasma proteome variation and its genetic determinants in children and adolescents, Nat Genet (2025) 본문

논문 읽기/Mendelian Randomization

Plasma proteome variation and its genetic determinants in children and adolescents, Nat Genet (2025)

spnz3 2025. 4. 20. 10:44

읽은 이유:

- UK Biobank 같은 대규모 바이오뱅크 코호트들 대부분 참가자들의 나이가 많아, 실제 전체 population을 제대로 반영하지 못함. 

- 더 어린 나이에서의 데이터가 없다는 점이 아쉬웠는데 어린이와 청소년 대상의 proteome 데이터에 대한 논문이 나와서 흥미로웠음.   

- 뿐만 아니라 Olink 나 SomaScan 같은 Affinity-based 방법이 아닌 MS(mass-spectrometry)를 사용해서 단백질을 측정함. 아직까지는 MS로는 대규모 코호트에서 많은 수의 단백질을 측정하는 것이 어려운데 현재까지 MS로 proteome 분석한 것 중 cohort 규모가 가장 크다고 함. 

Study design

 

 

 

요약

1. Protein과 age/ sex/ BMI/ Obsesity의 association을 분석함

  • 반정도의 protein이 위 네가지 trait 중 적어도 하나와 association이 있음
  •  또한 protein level들로부터 age와 BMI를 예측할 수 있었음  

 

2. pQTL 분석을 함 (genetic variant-protein association 분석)

pQTL 분석을 했으며, 발견한 pQTL 수와 종류별 분포는 다음과 같음.

특히 현재 Affinity-based proteomic assay로는 못하는 peptide 레벨의 분석을 한 점이 흥미로움:

Genetic variations normally affect the expression level of the entire protein. Therefore, all peptides identifying the same protein should generally show the same fold change between genotypes. Our peptide-level analysis showed that 77% of reported pQTLs had at least two supporting peptides (Fig.3d and Supplementary Table 4). Notably, in 94% of these cases, all peptides exhibited the same direction of effect, indicating highly consistent quantitative information at the peptide level (Extended Data Fig. 4a–c). The peptide-level data also helps quantify protein variants affected by amino acid substitutions, a limitation in affinity-based proteomics as demonstrated by the influence of rs9898 on histidine-rich glycoprotein abundance 42. We identified the association between rs9898 and circulating histidine-rich glycoprotein levels, which was successfully replicated in both the pediatric and adult cohorts, with a 62% protein sequence coverage and 26 supporting peptides (Extended Data Fig.5a–d). Importantly, protein quantification was unaffected by the missense mutation (Pro204Ser), which was not identified, probably because it would only produce a four-amino-acid sequence (NCPR) but would have been an outlier if it had. This example illustrates an important advantage of MS-based proteomics in navigating the complexities of protein quantification across variants.

 

3. Protein level의 variance를 분석(decompose)함

- For 63% of proteins, pQTLs contributed more variance than age, sex, BMI-SDS and obesity combined 

- 나이 별로 비교했을 태 remarkable stability in genetic influences, with Pearson correlations between 0.95 and 0.98 (Fig. 4b–d).

*beta의 correlation이 아니라 variance explained의 correlation??

 

 

4. pQTL들의 effect size를 자세히 들여다봄

- (Fig.5a-f)  143 pQTLs for 71 proteins에서 pQTL에 의해 단백질 양이 2배 씩 차이 남, 또한 이런 경우 variant로 부터 phenotype 예측이 가능했음. 

 

5. pQTL 결과를 variant-trait association 결과와 통합함

- 이전에 알려진 pQTL인지 조사, 35개 논문과 비교: 1,947  pQTLs for 443 proteins 중 새로운 건 643개  pQTLs for 213 proteins  (of which 140 proteins had no previously reported genetic regulation) 

- GWAS variant와 겹치는지 비교 + colocalization 분석 

- Mendelian randomization 분석

 267 proteins x 47 cardiometabolic GWAS outcomes 분석, 45 causal relationships between 106 proteins and 36 traits (P < 2.3 × 10−6) 를 발견. 이 중 101 (29%) causal relationships between 41 genes and 33 traits 는 colocalization도 됨 

 

 

6. Highly replicated pQTLs in children and adults

아쉽게도 대부분 pQTL이 adult에서도 replicate돼서 child-specific pQTL은 별로 없었던 것으로 보인다. 

We successfully replicated 97% of pQTLs in children (99% of the cis, 92% of the trans and 92% of the novel) and 91% in adults (92% of the cis, 88% of the trans and 90% of the novel) with nominal significance (P < 0.05). The high replication rate in adults suggests that the vast majority of detected pQTLs are not life-stage-specific. To detect such potential pQTLs would probably require larger cohorts with similar sizes and health status.

 

Discussion 

첫 문단 

Prior studies have demonstrated the importance of large sample sizes in pQTL studies15. As far as we know, ours is the largest of its kind, although it remains moderate compared to affinity-based proteomics studies. Nevertheless, we identified pQTLs for over one-third of the quantified plasma proteome with robust peptide-level evidence, including hundreds of novel pQTLs. Our findings suggest that technological improvements in MS-based plasma proteomics will probably enhance the genetic associations detected, even within current studies. Thus, expanding proteome depth without compromising throughput or accuracy is crucial, which may be achievable with emerging workflows combining depletion, multiplexing and further improved MS acquisition schemes.

 

나이에 대해

이 연구에서는 나이에 따라 pQTL이 달라지는 것을 관찰하지 못했으나, Medawar’s mutation accumulation theory에 의하면 나이가 들면서 환경의 영향을 더 많이 받아 pQTL 분석에도 영향을 줄 가능성이 있음 eQTL에서 그런 연구가 있음을 언급함 

According to Medawar’s mutation accumulation theory, genetic regulation is more robust in early life owing to stronger selective pressures57,58. This aligns with our observation that the proportions of variance explained by pQTLs remained relatively stable across the three age groups. The high replication rate in adults further suggests that pQTLs are largely stable between children and adults, despite influences like disease. However, it would be valuable to investigate how aging affects pQTL detection in the older population as environmental factors and age-related processes become more prominent. Prior studies have shown a 4.7% decline in detected expression QTLs in the blood of patients aged 70–80 years57 and that aging impacts the predictive power of expression QTLs differently across tissues59. This indicates that pQTL stability observed in younger populations may be altered as individuals age.

 

나이 관련 이번 연구의 몇 가지 한계점들 또한 언급함. cross-sectional design이라는 한계점이 있음. longitudinal approach가 age-dependent plasma protein level trajectory를 보기에 더 적합할 것임. 또한 나이가 정규분포 따라, 나이가 평균 12살에서 멀먼 샘플 수가 적음.  이것이 trajectory 정확성에 영향을 줄 수 있음. 

 

MS-based proteome 분석에 대해 

- 이전 pQTL 데이터들의 effect direction을 MS-based 방법으로 검증, epitope-effect에 의한 가짜 pQTL을 제외함. 

- 단, MS-based 방법도 epitope-effect가 있을 수 있음  pQTL을 confidence tier로 나누고 pQTL validation framework를 고안하여 이 문제를 보완함. 

- 앞으로의 방향으로 tissue-, cell-type 레벨 pQTL로 분석을 확장해야 한다는 것과 affinity-based pQTL을 MS-based 방법으로 검증해야 한다는 것을 언급함. 

 

* tissue specificity annotation of proteins was downloaded from the Human Protein Atlas database (https://www.proteinatlas.org/about/download).

흥미로운 부분

- 앞으로 더 큰 규모의 child cohort가 생기면 age-specific pQTL을 찾을 수 있을까? (한 코호트 내에 child, adult 모두 있어 한번에 분석, 비교할 수 있다면 더 좋을 것 같음)

- discussion에서 더 older population (70-80) 에서는 pQTL이 달라질 수 있다고 한 부분이 흥미로움.