Skip to content

Guides & White Papers

From FastQ to Figure 1: A Step-by-Step Guide to Publication-Ready Single-Cell Visualization

Your data is only as good as your ability to communicate it. Here is how to create single-cell figures that reviewers love.

HoppeSyler Scientific Team

Published November 29, 2025

13 minute read

Executive Summary

In single-cell genomics, the UMAP plot has become the iconic "Figure 1." However, creating a visualization that is both scientifically accurate and aesthetically pleasing is an art form that balances statistical rigor with graphic design principles. This guide moves beyond default settings to help you produce publication-quality figures.

  • Clarity is King: Avoid "over-plotting" where millions of dots obscure the underlying structure.
  • Accessibility Matters: Use colorblind-friendly palettes (like Viridis or Magma) to ensure your data is interpretable by everyone.
  • Tell a Story: Don't just show clusters; use dot plots and heatmaps to demonstrate why those clusters are biologically distinct.

The Role of Figure 1: Setting the Stage

The first figure of your paper sets the tone for the entire manuscript. It tells the reviewer whether the data is clean, the analysis is rigorous, and the biological conclusions are supported by the evidence. A messy, cluttered, or unreadable Figure 1 raises immediate red flags about the quality of the underlying QC and processing.

Your goal is not just to show data, but to guide the viewer's eye through the logic of your experiment: Here are the cells we captured, here is how they group, and here is what they are.


Step 0: The Prerequisites (FastQ to Count Matrix)

While this guide focuses on the visualization aspect, we must acknowledge the journey from raw data. "Figure 1" is rarely just a UMAP. A complete Figure 1 typically includes:

  • Panel A (Study Design): A schematic (often created in BioRender or Illustrator) showing sample collection, processing, and sequencing workflow.
  • Panel B (Quality Control): Violin plots showing nFeature_RNA, nCount_RNA, and percent.mt. This proves to the reviewer that your beautiful UMAP isn't just clustered debris or doublets.

Once your raw FastQ files have been aligned and quantified (via CellRanger, STARsolo, or Alevin) and passed QC, you are ready for the fun part.

Step 1: The Foundation (Data Prep)

Great figures start with great data. No amount of ggplot wizardry can fix a UMAP derived from poorly normalized data or uncorrected batch effects.

  • Feature Selection: Ensure you are clustering on highly variable genes that drive biological signal, not technical noise (like mitochondrial genes or ribosomal proteins, unless that is your focus).
  • Batch Correction: If your UMAP shows clusters separated by "Sample_ID" rather than "Cell_Type," you need to integrate your data (using Harmony, Seurat v5 Integration, or scVI) before visualizing.
  • Annotation Hygiene: Remove "doublet" clusters and low-quality "debris" clusters before generating final figures. A "clean" UMAP implies a clean dataset.

Step 2: Mastering the UMAP

The UMAP (Uniform Manifold Approximation and Projection) is the centerpiece. But the default output from Seurat or Scanpy is rarely publication-ready.

The Labeling Problem

Legends with 25 colors are cognitively exhausting. Instead, label clusters directly on the plot.

R / Seurat Tip
DimPlot(seurat_obj, reduction = "umap", label = TRUE, label.box = TRUE, repel = TRUE) +
  NoLegend() +
  theme(axis.text = element_blank(), axis.ticks = element_blank()) +
  ggtitle("Cell Type Annotation")

Using label.box = TRUE adds a background to text, making it readable against complex backgrounds.

The Over-plotting Problem

When you have >50,000 cells, points start to overlap, hiding density. If you have a massive dataset, consider:

  • Rasterization: Use ggrastr to rasterize the points while keeping axes and text as vectors. This keeps file sizes manageable.
  • Density Plots: Instead of raw points, use density contours (geom_density_2d) to show where the bulk of cells are located.

Step 3: Evidence of Identity (Dot Plots vs. Heatmaps)

Once you show where the cells are, you must prove what they are. This is usually done with marker genes.

Why Dot Plots Win

Heatmaps can be misleading because they often scale expression from 0 to 1 for every gene, hiding the fact that a "highly expressed" marker might only be found in 5% of the cells in that cluster.

Dot Plots encode two dimensions of information:

  1. Size: Percent of cells expressing the gene (Sensitivity).
  2. Color: Average expression level (Intensity).

When to use Heatmaps

Heatmaps are superior when you want to show broad patterns across many genes, such as:

  • Gene modules or pathways.
  • Regulon activity (SCENIC).
  • Copy number variations (InferCNV).

Tip: Use ComplexHeatmap in R for adding annotation bars (e.g., patient sex, treatment condition) to the top of your heatmap.

Step 4: Color Theory for Genomics

Your choice of color palette can distort data or make it intuitive.

  • Categorical Data (Clusters, Samples): Use distinct, high-contrast palettes. Avoid gradients.
    Tools: ggsci (Nature, Lancet, NEJM palettes), tableau20.
  • Continuous Data (Gene Expression): Use "perceptually uniform" palettes where the change in color intensity matches the change in value.
    Good: Viridis, Magma, Plasma.
    Bad: Jet/Rainbow (these introduce artificial boundaries).
  • Divergent Data (Up/Down Regulation): Use a palette with a neutral center (white/grey) and two distinct colors.
    Good: Scico "Berlin" or "Vik".

The 5 Most Common Visualization Mistakes Reviewers Hate

  1. The "Fruit Salad" UMAP: Using a default rainbow color palette with 50 clusters. It's impossible to distinguish cluster 12 from cluster 13. Fix: Group related clusters into broader "super-clusters" for the main figure.
  2. Over-plotting: Plotting 100,000 cells with large dot sizes, creating a solid blob where no density information is visible. Fix: Use smaller point sizes (pt.size = 0.1) or density plots.
  3. Unreadable Axes: Tiny font sizes that require a microscope to read. Fix: All text should be legible when the figure is printed on a standard A4 page (usually >8pt font).
  4. Missing Legends: Showing expression levels without a scale bar. Is red high or low? Fix: Always include a clear, annotated color bar.
  5. "Cherry-picked" Markers: Showing only the one gene that works perfectly on a feature plot. Fix: Use a Dot Plot to show the expression of top 5 markers per cluster to prove specificity.

The Toolkit for Success

While Seurat and Scanpy have built-in plotting functions, taking your figures to the next level often requires dedicated visualization libraries. We've included both R and Python options below.

Figure QC Checklist

  • Color Palette: Is it colorblind safe? (Check with Coblis or CVDSimulator)
  • Resolution: Is the output vector-based (PDF/SVG) or high-res raster (300+ DPI PNG)?
Tool Language Best For...
scCustomize R Enhancing Seurat plots with better themes, legends, and helper functions.
dittoSeq R Colorblind-friendly visualizations and easy bar plots of cell type proportions.
ComplexHeatmap R The gold standard for annotated heatmaps showing metadata alongside expression.
Nebulosa R Kernel density estimation plots (great for visualizing gene expression in sparse data).
Patchwork R Stitching multiple ggplot objects into a single cohesive figure panel.
Scanpy Python The core Python framework. Use sc.pl.umap and sc.pl.dotplot as your base.

Figure QC Checklist

  • Color Palette: Is it colorblind safe? (Check with CVDSimulator)
  • Resolution: Is the output vector-based (PDF/SVG) or high-res raster (300+ DPI PNG)?
  • Labels: Are clusters labeled directly on the plot (where possible) to avoid legend fatigue?
  • Consistency: Do all panels use the same color mapping for the same cell types?
  • Context: Does the figure caption explain what is being shown, not just how it was plotted?
  • Scale: Do all axes have units (e.g., "UMAP_1", "Log Normalized Expression")?

Related Content

Need publication-quality figures?

We turn complex data into cover-worthy art.