Writing Academic Papers with Math Using KaTeX
How to write academic papers, theses, and technical reports in Markdown with fully typeset mathematical notation using KaTeX — complete with structure, citations, and PDF export.
Markdown with KaTeX is a surprisingly capable platform for academic writing — fast to type, version-controllable, and exportable to a professional PDF. This guide covers everything from paper structure to complex equation typesetting.
When to Use Markdown + KaTeX for Academic Writing
✅ Good fit:
- Conference papers, technical reports, and theses with moderate formatting needs
- Working drafts shared across a team
- Documentation of mathematical models, algorithms, or proofs
- Any paper where the author values writing speed over layout micro-control
⚠️ Not ideal for:
- Papers requiring journal-specific LaTeX templates (use LaTeX directly)
- Complex multi-column layouts (use LaTeX or InDesign)
- Papers with custom bibliography packages (LaTeX + BibTeX is better)
Paper Structure Template
# Title of the Paper
**Author Name**¹, **Co-Author Name**²
¹ University of Example, Department of Computer Science
² Institute of Things, Research Division
*Submitted: February 22, 2026*
---
## Abstract
A concise summary of the paper's contribution, methods, and results.
This should be 150–250 words.
---
## 1. Introduction
Introduce the problem and motivation.
## 2. Related Work
Survey relevant prior work.
## 3. Methodology
Describe your approach.
## 4. Results
Present your findings.
## 5. Discussion
Interpret the results.
## 6. Conclusion
Summarise and identify future work.
## References
[1] Author, A. (2024). *Title of Paper*. Journal Name, 12(3), 45–67.
Typesetting Equations
Inline Equations
Use single dollar signs for inline math. This is ideal for referencing variables in text:
Let $f: \mathbb{R}^n \to \mathbb{R}$ be a convex function. The gradient
at point $x^*$ satisfies $\nabla f(x^*) = 0$.
Display Equations (Numbered)
Use double dollar signs for display equations. To number them, pair with an HTML anchor:
The loss function is defined as:
$$
\mathcal{L}(\theta) = -\frac{1}{N}\sum_{i=1}^{N} \left[ y_i \log \hat{y}_i + (1 - y_i) \log(1 - \hat{y}_i) \right]
\tag{1}
$$
Multi-Line Equations with Alignment
Use the align environment to align equations at the equals sign:
$$
\begin{align}
\nabla_\theta \mathcal{L} &= -\frac{1}{N}\sum_{i=1}^{N} \left(y_i - \hat{y}_i\right) x_i \\
\theta_{t+1} &= \theta_t - \eta \nabla_\theta \mathcal{L}
\end{align}
$$
Theorems and Proofs
Use blockquotes and bold for theorem-style formatting:
> **Theorem 1** (Cauchy–Schwarz Inequality).
> For all vectors $\mathbf{u}, \mathbf{v} \in \mathbb{R}^n$:
>
> $$|\langle \mathbf{u}, \mathbf{v} \rangle| \leq \|\mathbf{u}\| \cdot \|\mathbf{v}\|$$
**Proof.** Consider the function $f(t) = \|\mathbf{u} + t\mathbf{v}\|^2 \geq 0$ for all $t \in \mathbb{R}$...
*QED* ∎
Matrices
The covariance matrix $\Sigma \in \mathbb{R}^{n \times n}$ is:
$$
\Sigma = \frac{1}{N-1} \sum_{i=1}^{N} (x_i - \bar{x})(x_i - \bar{x})^\top
= \begin{pmatrix}
\sigma_{11} & \sigma_{12} & \cdots & \sigma_{1n} \\
\sigma_{21} & \sigma_{22} & \cdots & \sigma_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
\sigma_{n1} & \sigma_{n2} & \cdots & \sigma_{nn}
\end{pmatrix}
$$
Common Academic Symbols
| Symbol | KaTeX Syntax |
|---|---|
| Real numbers $\mathbb{R}$ | \mathbb{R} |
| Expectation $\mathbb{E}[X]$ | \mathbb{E}[X] |
| Probability $\mathbb{P}(A)$ | \mathbb{P}(A) |
| Norm $|x|$ | |x| |
| Inner product $\langle u,v \rangle$ | \langle u,v \rangle |
| Partial derivative $\partial f / \partial x$ | \partial f / \partial x |
| Nabla $\nabla f$ | \nabla f |
| Infinity $\infty$ | \infty |
| Approximately $\approx$ | \approx |
| Proportional to $\propto$ | \propto |
| For all $\forall$ | \forall |
| There exists $\exists$ | \exists |
| In set $x \in \mathcal{X}$ | x \in \mathcal{X} |
| Implies $\Rightarrow$ | \Rightarrow |
| If and only if $\iff$ | \iff |
Including Figures and Tables
Label figures and tables for cross-referencing:
The architecture is shown in Figure 1.
<figure>
<img src="./model-architecture.png" alt="Model architecture" style="max-width: 80%;" />
<figcaption><strong>Figure 1:</strong> The proposed model architecture. The encoder (left)
processes input sequences; the decoder (right) generates output.</figcaption>
</figure>
Results are summarised in Table 1.
**Table 1:** Comparison of model performance on the MNIST benchmark.
| Model | Accuracy | Parameters | Training Time |
|-------|----------|-----------|---------------|
| Baseline CNN | 99.1% | 430K | 12 min |
| Proposed Model | **99.4%** | 210K | 8 min |
| Transformer | 99.2% | 1.2M | 45 min |
References
Use numbered references with consistent formatting:
## References
[1] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. *Proceedings of the IEEE*, 86(11), 2278–2324.
[2] Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. *Advances in Neural Information Processing Systems*, 30.
[3] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. *Proceedings of CVPR*, 770–778.
Cite inline with: ...as shown in prior work [1, 2]... or LeCun et al. [1] demonstrated...