생물정보학 분석 방법 - 통계 원리 리스트

본문 바로가기

Notice

Recent Posts

Recent Comments

Link

Tags more

Archives

Today

Total

관리 메뉴

bioinfo-statistics

생물정보학 분석 방법 - 통계 원리 리스트 본문

생각 정리

생물정보학 분석 방법 - 통계 원리 리스트

spnz3 2025. 5. 11. 23:08

chatGPT 4O 버전. 업데이트 예정

🧬 1. Genomic Association Analysis

AnalysisStatistical MethodsStatistical Principle

GWAS	Linear regression, Logistic regression, Linear Mixed Models (LMMs), GLMMs, Firth regression, Bayesian fine-mapping	Linear/logistic models estimate marginal SNP effects via maximum likelihood; LMMs introduce random effects to model relatedness (REML); Bayesian fine-mapping applies posterior probability inference under sparsity priors.
Meta-analysis of GWAS	Fixed-effects, Random-effects, Inverse variance weighting, Bayesian meta-analysis	Combines effect sizes assuming shared (fixed) or varying (random) true effects; weights estimates by inverse variance; Bayesian models use hierarchical priors on effect heterogeneity.
Rare variant association	Burden test, SKAT, SKAT-O, C-alpha, REGENIE	Burden tests use linear regression on aggregated rare variant counts; SKAT uses variance component score tests in mixed model framework; REGENIE uses ridge regression for stepwise LMM estimation.

🧪 2. Expression and QTL Mapping

AnalysisStatistical MethodsStatistical Principle

eQTL/sQTL/pQTL analysis	Linear regression, ANOVA, FastQTL, Matrix eQTL, Bayesian QTL mapping	Tests genotype-expression association via linear models; Bayesian methods infer posterior SNP effects using sparsity and sharing priors; Matrix eQTL applies vectorized least squares.
Allele-specific expression (ASE)	Binomial/Beta-binomial models	Models allelic imbalance via discrete distributions; overdispersion is handled with beta-binomial likelihood.
Multi-tissue QTL analysis	Meta-Tissue, mashr, TensorQTL	Uses multivariate normal priors for joint effect estimation; mashr performs adaptive shrinkage via empirical Bayes; TensorQTL uses regression with Kronecker covariance structure.

🔍 3. Causal Inference and Mediation

AnalysisStatistical MethodsStatistical Principle

Mendelian Randomization (MR)	IVW, MR-Egger, Weighted median/mode, GSMR, Bayesian MR	IVW is two-stage least squares with inverse variance weights; MR-Egger performs bias-adjusted regression with intercept; Bayesian MR uses prior-informed inference on causal parameters.
Mediation MR / Two-step MR	Product of coefficients, Sobel test, Delta method, Parametric bootstrap	Estimates indirect effects as product of path coefficients; standard errors from asymptotic delta method or empirical resampling; assumes linear causal model and independence of instruments.
Causal discovery	PC algorithm, GES, LiNGAM, FCI, DAG-GNN	Constraint-based (PC, FCI) methods use conditional independence tests; score-based (GES) optimize over DAG likelihoods; LiNGAM assumes non-Gaussianity; DAG-GNN fits structural causal models via variational inference.

🔬 4. Transcriptomics and Epigenomics

AnalysisStatistical MethodsStatistical Principle

Differential gene expression (DGE)	DESeq2, edgeR, limma	Negative binomial models (DESeq2, edgeR) and linear modeling with empirical Bayes moderation (limma); shrinkage of dispersion or variance estimates.
Alternative splicing analysis	rMATS, MAJIQ, SUPPA2	Generalized linear models and likelihood-ratio tests for splicing event inclusion levels (PSI); MAJIQ uses Bayesian local splicing variations.
DNA methylation analysis	Beta regression, M-value linear regression, Limma	Methylation levels modeled via beta distribution or transformed M-values using linear models; empirical Bayes for variance shrinkage.
ATAC-seq/ChIP-seq differential peak	DiffBind, csaw, DESeq2	Negative binomial or sliding window models for read counts; significance via likelihood-ratio or Wald tests.
RNA editing analysis	REDItools, GLMs for mismatch rates	Binomial/Poisson modeling of editing proportions at specific loci; sometimes beta-binomial for overdispersion.

🧫 5. Multi-Omics Integration

AnalysisStatistical MethodsStatistical Principle

Multi-omics factor analysis	MOFA, iCluster, SNF	MOFA uses Bayesian group factor analysis with variational inference; iCluster fits penalized Gaussian latent variable models; SNF fuses similarity graphs across omics.
Multi-omics regression / prediction	Elastic net, Random forest, SVM, Kernel Ridge Regression	Regularized linear models with L1/L2 penalties (elastic net); non-linear ensemble models; kernel-based methods for high-dimensional data.
Multi-omics QTL integration	eQTL + meQTL + pQTL	Joint linear models or Bayesian multivariate models; conditional independence and co-mapping of variants across omics layers.
Pathway-based integration	PARADIGM, NetGSA	Bayesian factor graph models (PARADIGM) and multivariate linear models with structured gene set priors (NetGSA).
Mediation with multi-omics	Two-step MR, structural equation modeling	Estimates indirect effects across omics layers using path analysis, product of coefficients, or SEM with latent variables.

🤖 6. Machine Learning and Predictive Modeling

AnalysisStatistical MethodsStatistical Principle

Classification / regression	Logistic regression, Random forest, SVM, XGBoost	Logistic regression uses maximum likelihood; random forests are ensemble decision trees using bootstrapping; SVM maximizes margin; XGBoost uses gradient-boosted trees.
Survival analysis	Cox regression, DeepSurv	Cox models estimate hazard ratios via partial likelihood; DeepSurv uses neural networks to model risk functions under right-censoring.
Dimensionality reduction	PCA, t-SNE, UMAP, Autoencoders	PCA decomposes variance via SVD; t-SNE/UMAP use manifold learning; autoencoders learn latent representations via neural nets.
Feature selection	LASSO, RFE, Boruta	LASSO penalizes coefficients with L1 norm; RFE uses backward selection based on model weights; Boruta uses permutation importance.
Deep learning for sequences	CNN, RNN, Transformers	CNN captures local patterns; RNN models temporal dependencies; Transformers learn attention-based representations in sequence data.

🧠 7. Population and Evolutionary Genetics

AnalysisStatistical MethodsStatistical Principle

Ancestry inference	PCA, ADMIXTURE, STRUCTURE	PCA performs eigen-decomposition of genotype matrix; ADMIXTURE uses maximum likelihood estimation of ancestry proportions; STRUCTURE uses Bayesian clustering with MCMC.
Phasing and imputation	SHAPEIT, Beagle, IMPUTE	Hidden Markov Models (HMMs) for haplotype state transitions; genotype imputation via forward-backward or phasing-assisted likelihoods.
Selection scans	iHS, XP-EHH, Fst, PBS	Detects selection via extended haplotype homozygosity (iHS/XP-EHH), population differentiation (Fst), or allele frequency branch lengths (PBS).
IBD / ROH detection	KING, GERMLINE, PLINK	Pairwise segment matching via HMM or shared haplotype; runs of homozygosity via window-based homozygosity testing.
Demographic inference	MSMC, fastsimcoal2, dadi	Coalescent models and diffusion approximations; infer demographic history via composite likelihood over site frequency spectra or pairwise coalescent rates.

'생각 정리' 카테고리의 다른 글

단백질/ 유전자 - 질병 연관성이 인종에 따라 왜 달라지는지에 대해 (0)	2025.05.21
Mendelian Randomization with Molecular QTLs: Methods, Challenges, and Advances (1)	2025.05.12
통계 방법들의 리스트와 관계, 발전 방향 (0)	2025.05.10
Gene-disease causality network 방법 리스트 (DeepSeek) (0)	2025.02.19
변이 영향 예측 딥러닝 모델 - 멀리 있는(trans) 유전자 발현에의 영향을 예측하는 모델이 있는지? (1)	2024.08.31

'생각 정리' Related Articles

more

티스토리툴바