Published on 2026-02-22

How PDF Generation Works Under the Hood

A technical deep-dive into how Markdown documents are transformed into print-ready PDFs — covering parsing, rendering, layout engines, and the headless browser pipeline.

Generating a pixel-perfect PDF from a Markdown document involves a surprisingly deep pipeline. Understanding it helps you write Markdown that exports cleanly every time.

The Four-Stage Pipeline

Stage 1: Parsing to an Abstract Syntax Tree (AST)

The raw Markdown text is parsed into a structured tree of nodes — headings, paragraphs, code blocks, lists, images, and so on. This is done by a parser such as markdown-it, remark, or CommonMark.

The parser applies the CommonMark specification to resolve ambiguous syntax. For example, is a * an italic marker or a bullet point? The AST resolves these unambiguously.

Extensions (plugins) extend the AST with custom node types:

KaTeX plugin adds math expression nodes
Mermaid plugin marks fenced code blocks as diagram nodes
TOC plugin builds a heading index and injects a navigation tree

Stage 2: Rendering to HTML and CSS

The AST is walked recursively and each node type is converted into corresponding HTML elements. A code block becomes a <pre><code> pair with syntax-highlighted spans. A table becomes a <table> with proper <thead> and <tbody> structure.

At this stage, diagrams are resolved. Fenced code blocks with diagram language identifiers (e.g., ```mermaid) are sent to a rendering service:

Mermaid and Chart.js diagrams render client-side via JavaScript
PlantUML, Graphviz, D2, and others are sent to a Kroki server, which returns an SVG image

The SVG is embedded directly in the HTML, ensuring it renders at full vector quality in the PDF.

Stage 3: Layout by a Headless Browser

A real web browser (Chromium, running headlessly without a visible window) loads the HTML page. This is the same layout engine that powers the website preview — meaning the PDF will always match the preview pixel-for-pixel.

Why a real browser instead of a dedicated PDF library?

Perfect HTML/CSS support — every CSS property, flexbox layout, and CSS grid works exactly as in a browser
JavaScript execution — client-side diagram libraries (Mermaid, Chart.js) execute and render their output before the page is captured
Accurate font rendering — web fonts load and render identically to what you see on screen

The browser is instructed to render the page at a specific viewport width (matching the PDF page width), load all external assets, wait for JavaScript to complete, and then capture the layout.

Stage 4: Printing to PDF

Once the page is fully laid out, the browser’s print engine generates the PDF. This is functionally identical to pressing Ctrl+P in Chrome and choosing “Save as PDF” — but automated and reproducible.

The print engine handles:

Page breaking — CSS page-break-before, page-break-after, and break-inside properties are respected
Page size — A4, Letter, or custom dimensions
Margins — configurable via CSS @page rules
Headers and footers — injected via browser print settings or CSS

Fonts in PDF

Fonts embedded in the PDF are either:

System fonts — resolved at render time from the server’s installed fonts
Web fonts — downloaded from Google Fonts or a CDN during rendering

The browser downloads and embeds the font data in the PDF file, so the PDF displays correctly even on devices without the font installed.

Why PDF Generation Takes Time

Several steps in the pipeline are inherently slow:

Step	Why it takes time
External diagram rendering	Network round-trip to Kroki server
Web font loading	HTTP download
Mermaid rendering	JavaScript execution
Browser startup	Cold-start of headless Chromium
PDF compression	Large documents take longer to compress

For long documents with many diagrams, generation can take 10–30 seconds. The progress indicator shows your position in the queue and the current stage.

The Difference Between HTML Preview and PDF

The HTML preview and the PDF use the same HTML source, but with two differences:

CSS media queries: The PDF render uses @media print styles, which can override screen styles (e.g., hiding navigation, adjusting margins).
Page dimensions: The PDF is rendered at a fixed page width, while the HTML preview fills the available browser viewport.

This is why it’s always worth checking the PDF output even if the preview looks perfect.