Skip to content

Tool Comparisons & Benchmarks

Differential Expression Demystified: DESeq2 vs. edgeR vs. limma-voom

The choice of statistical tool can make or break your RNA-seq analysis. We break down the strengths and weaknesses of the big three.

HoppeSyler Bioinformatics Team

Published November 29, 2025

11 minute read

Executive Summary

Differential expression analysis is the cornerstone of RNA-seq, but "one size fits all" does not apply. We compared DESeq2, edgeR, and limma-voom across various experimental designs.

  • DESeq2 is the most robust choice for small sample sizes (n < 10) due to its conservative normalization and outlier handling.
  • limma-voom shines with large datasets (n > 20) and complex designs, offering speed and flexibility for heteroscedastic data.
  • edgeR is a versatile middle ground, offering powerful GLM capabilities for time-series and multi-factor experiments.

The Statistical Landscape

All three tools are excellent, but they make different assumptions about the data. DESeq2 and edgeR are based on the Negative Binomial (NB) distribution, which models count data directly and accounts for the overdispersion typically seen in RNA-seq data (where variance > mean). limma-voom takes a different approach: it transforms counts to logCPM (counts per million), estimates the mean-variance relationship (the "voom" part), and then applies linear models originally developed for microarrays.

Under the Hood: Normalization & Dispersion

The "secret sauce" of these tools often lies in how they handle two critical steps before any testing occurs: normalization and dispersion estimation.

Normalization Strategies

  • DESeq2 (Median of Ratios): Calculates a pseudo-reference sample (geometric mean of all samples) and estimates size factors based on the median ratio of each sample to this reference. This is incredibly robust to outlier genes that might skew total counts.
  • edgeR (TMM - Trimmed Mean of M-values): Compares each sample to a reference sample, trimming the most highly expressed genes and the genes with the largest log-fold changes before calculating a scaling factor. It assumes that the majority of genes are not differentially expressed.
  • limma-voom: Typically uses TMM normalization (via edgeR's `calcNormFactors`) to adjust library sizes before the log-transformation and precision weighting steps.

Dispersion Estimation (Sharing Information)

In small sample sizes, it's impossible to accurately estimate the variance of a single gene. All three tools solve this by "borrowing information" across genes.

  • DESeq2: Assumes that genes with similar average expression strength have similar dispersion. It fits a curve to the dispersion estimates and shrinks gene-wise estimates towards this curve.
  • edgeR: Uses a similar "common dispersion" and "tagwise dispersion" approach, utilizing conditional maximum likelihood to share information.
  • limma-voom: Estimates the mean-variance trend non-parametrically (Lowess fit) and assigns a weight to each observation based on its predicted variance. This allows the linear model to "trust" precise measurements more than noisy ones.

When to Use What

Scenario Recommended Tool Why?
Small Sample Size (n < 5) DESeq2 Its shrinkage estimators for dispersion and fold change are superior at controlling false positives in low-replicate data.
Large Sample Size (n > 50) limma-voom It is significantly faster and handles the variability of large cohorts better than the iterative fitting of NB models.
Complex Designs (Time-series) All (limma is fastest) All three handle GLMs and interaction terms, but limma-voom is significantly faster for complex models.
Outlier Sensitivity DESeq2 Cook's distance outlier detection is built-in and automatically flags or filters genes with extreme values.

Code in Action: Standard Workflows

Seeing the syntax can help clarify the differences in workflow. Here are the minimal standard pipelines for each tool in R.

Note: In the examples below, counts refers to a raw integer count matrix (e.g., from featureCounts or STAR), not normalized values.

# 1. Create DESeqDataSet
dds <- DESeqDataSetFromMatrix(countData = counts,
                              colData = coldata,
                              design = ~ condition)

# 2. Run the pipeline (Normalization, Dispersion, Fitting)
dds <- DESeq(dds)

# 3. Get Results
res <- results(dds)
resOrdered <- res[order(res$padj),]

# 1. Create DGEList and Filter
y <- DGEList(counts=counts, group=group)
keep <- filterByExpr(y)
y <- y[keep, , keep.lib.sizes=FALSE]

# 2. Normalize and Estimate Dispersion
y <- calcNormFactors(y)
y <- estimateDisp(y, design)

# 3. Fit Model (QL framework recommended)
fit <- glmQLFit(y, design)
qlf <- glmQLFTest(fit, coef=2)
topTags(qlf)

# 1. Create DGEList and Filter
dge <- DGEList(counts=counts)
keep <- filterByExpr(dge, design)
dge <- dge[keep,,keep.lib.sizes=FALSE]

# 2. Normalize
dge <- calcNormFactors(dge)

# 3. Voom Transformation (Mean-Variance modeling)
v <- voom(dge, design, plot=TRUE)

# 4. Linear Modeling
fit <- lmFit(v, design)
fit <- eBayes(fit)
topTable(fit, coef=2)

Common Pitfalls

1. Ignoring Batch Effects

None of these tools can magically fix a confounded experimental design. If all your controls were sequenced on Monday and all your treatments on Friday, no statistical test can save you. Always include batch as a covariate in your design matrix (e.g., `~ Batch + Condition`).

2. Using FPKM/RPKM/TPM for Differential Expression

Never use FPKM, RPKM, or TPM for differential expression. These metrics normalize for library size and gene length, which can introduce biases when comparing across samples. Always start with raw counts and let the tool handle the normalization (TMM for edgeR, Median of Ratios for DESeq2).

3. Failing to Filter Lowly Expressed Genes

Genes with very low counts (e.g., 0 or 1 across most samples) provide no statistical power and only serve to increase the burden of multiple testing correction. Tools like `edgeR` and `limma` require explicit filtering (e.g., `filterByExpr`). DESeq2 handles this automatically via independent filtering, but pre-filtering can still improve speed and dispersion estimation.

Key Visualizations

Don't trust the p-values blindly. Use these plots to diagnose your analysis:

  • PCA Plot: The first step in any analysis. Samples should cluster by condition. If they cluster by date or technician, you have a batch effect.
  • MA Plot (Mean-Difference): Plots log-fold change vs. mean expression. It should look like a funnel centered at y=0. If the cloud is skewed, your normalization failed.
  • P-value Histogram: Plot a histogram of your raw p-values. You should see a flat uniform distribution for null genes and a peak near zero for significant genes. If you see a hill or a U-shape, something is wrong with your model or covariates.

References

  • Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology, 15(12), 550.
  • Robinson, M. D., McCarthy, D. J., & Smyth, G. K. (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26(1), 139-140.
  • Law, C. W., Chen, Y., Shi, W., & Smyth, G. K. (2014). voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome biology, 15(2), R29.

Conclusion

Stop using default settings blindly. Understand your data's structure. If you have a small pilot study, stick with DESeq2. If you are analyzing a massive clinical trial, switch to limma-voom. And always, always visualize your data with PCA before running any test.

Struggling with your analysis?

Our bioinformaticians are experts in statistical design.