Skip to main content
[░░░░░░░░░░░░░░░░░░░░]0% — 22 min left
~/blog/how-this-mdx-blog-works.mdx
$cat~/blog/how-this-mdx-blog-works.mdx

How I Built My MDX Blog System (and Why It's Fast Enough)

February 21, 202522min

How I Built My MDX Blog System (and Why It’s Fast Enough)

I wanted a blog that felt like Markdown, but could still render React components, callouts, and embeds without turning into a templating mess. MDX hit the sweet spot.

MDX pipeline diagram MDX compilation flow diagram

Why MDX (not just Markdown)

Markdown is simple and portable. The downside: you’re stuck with plain content. MDX gives you the ability to import React components right inside a post, which means you can drop in things like callouts, counters, or custom embeds. Under the hood, MDX turns your content into a structured syntax tree (think AST) before it becomes React.

If you’ve used a static site generator before, MDX feels like “Markdown with superpowers” — but it’s still readable in plain text, which keeps the writing workflow fast.

Example:

import Callout from '@/components/mdx/Callout';

<Callout type="info" title="Quick tip">
  This is an MDX component rendered inside your post.
</Callout>

The pipeline: files → frontmatter → compiled content

At a high level, my posts live in src/content/blog/*.mdx. When a blog page is requested, the site:

  1. Reads the file from disk
  2. Parses front matter with gray-matter
  3. Compiles MDX using next-mdx-remote
  4. Runs a remark/rehype pipeline (Markdown → HTML) for math, tables, syntax highlighting, and heading slugs

This happens in the getBlogPostBySlug function and getAllBlogPosts list builder.

The key point: the MDX compilation happens on the server, not in the browser. That means visitors aren’t shipping a Markdown parser to render your content — they get clean HTML.

Here’s the actual code that loads a post:

export async function getBlogPostBySlug(
  slug: string
): Promise<BlogPost | null> {
  // Try MDX file first, then fall back to MD
  let fullPath = path.join(blogDirectory, `${slug}.mdx`);

  if (!fs.existsSync(fullPath)) {
    fullPath = path.join(blogDirectory, `${slug}.md`);
    if (!fs.existsSync(fullPath)) {
      return null;
    }
  }

  const fileContents = fs.readFileSync(fullPath, "utf8");
  const { data, content } = matter(fileContents);

  // Extract plain text for searching
  const searchableContent = extractPlainText(content);

  // Extract headings from the raw content
  const headings = extractHeadings(content);

  const { content: compiledContent } = await compileMDX({
    source: content,
    components,
    options: {
      parseFrontmatter: false, // We already parsed it with gray-matter
      mdxOptions: {
        remarkPlugins: [remarkMath, remarkGfm],
        rehypePlugins: [
          rehypeSlug,
          rehypeKatex,
          [rehypeHighlight, rehypeHighlightOptions],
        ],
        development: process.env.NODE_ENV === "development",
      },
    },
  });

  return {
    slug,
    content: compiledContent,
    searchableContent,
    headings,
    metadata: { /* ... */ },
  };
}

Deep dive: the remark/rehype plugin pipeline

Here’s where things get interesting. MDX doesn’t just parse Markdown — it runs your content through a unified pipeline with two processing stages:

  1. Remark — operates on the Markdown AST (mdast)
  2. Rehype — operates on the HTML AST (hast)

The flow looks like this:

MDX Source

Parse to mdast (Markdown AST)

Remark plugins transform mdast

Convert to hast (HTML AST)

Rehype plugins transform hast

Stringify to JSX/React components

My plugin stack

Here’s exactly what I’m using and why:

mdxOptions: {
  remarkPlugins: [remarkMath, remarkGfm],
  rehypePlugins: [
    rehypeSlug,      // Add IDs to headings
    rehypeKatex,     // Render LaTeX math
    [rehypeHighlight, rehypeHighlightOptions],
  ],
}
PluginStagePurpose
remark-mathRemarkParses $inline$ and $$block$$ math syntax
remark-gfmRemarkEnables GitHub Flavored Markdown (tables, strikethrough, autolinks)
rehype-slugRehypeAuto-generates id attributes on headings for anchor links
rehype-katexRehypeRenders math expressions using KaTeX
rehype-highlightRehypeSyntax highlighting via highlight.js
💬Plugin order matters

Remark plugins run first (on Markdown), then Rehype plugins run on the resulting HTML. Within each stage, plugins run in array order. Put rehypeSlug before anything that depends on heading IDs.

Syntax highlighting configuration

I configure rehype-highlight with auto-detection so I don’t have to specify every language:

const rehypeHighlightOptions = {
  detect: true,      // Auto-detect language if not specified
  ignoreMissing: true, // Don't throw on missing language
  subset: false,     // Use all languages
};

The ignoreMissing: true is key — it prevents the build from exploding if I use a language that highlight.js doesn’t recognize.

Table of contents + searchability

I wanted a TOC on every post and quick search on the blog index. So I added two features:

  • Heading extraction for a table of contents (I parse raw Markdown headings so the TOC is stable and predictable)
  • Plain‑text extraction for search (strip imports, JSX, code blocks, then index the result)

The TOC uses rehype-slug to ensure headings have predictable IDs, which is why the anchor links stay stable across builds.

Here’s the heading extraction logic:

export function extractHeadings(content: string): Heading[] {
  const headings: Heading[] = [];
  const usedIds = new Set<string>();

  // Regular expression to match markdown headings (# ## ### etc.)
  const headingRegex = /^(#{1,6})\s+(.+)$/gm;
  let match;

  while ((match = headingRegex.exec(content)) !== null) {
    const level = match[1].length;
    const text = match[2].trim();

    // Generate unique ID
    const baseId = generateSlug(text);
    let id = baseId;
    let counter = 1;

    // Ensure ID is unique
    while (usedIds.has(id)) {
      id = `${baseId}-${counter}`;
      counter++;
    }

    usedIds.add(id);
    headings.push({ id, text, level });
  }

  return headings;
}
📝Tiny performance win

Doing those extra passes once during compilation keeps the client UI lightweight.

The search index strips away all the MDX noise to get searchable content:

function extractPlainText(content: string): string {
  // Remove import statements
  let plainText = content.replace(/import\s+.*?from\s+['"].*?['"]/g, "");

  // Remove JSX/HTML tags
  plainText = plainText.replace(/<[^>]*>/g, " ");

  // Remove code blocks
  plainText = plainText.replace(/```[\s\S]*?```/g, "");

  // Remove inline code
  plainText = plainText.replace(/`.*?`/g, "");

  // Remove Markdown formatting
  plainText = plainText
    .replace(/#{1,6}\s+/g, "")     // Headers
    .replace(/\*\*(.*?)\*\*/g, "$1") // Bold
    .replace(/\*(.*?)\*/g, "$1")   // Italic
    .replace(/\[(.*?)\]\(.*?\)/g, "$1") // Links
    .replace(/!\[(.*?)\]\(.*?\)/g, "$1") // Images
    .replace(/\n>/g, "\n");        // Blockquotes

  // Remove extra whitespace
  plainText = plainText.replace(/\s+/g, " ").trim();

  return plainText;
}

This gives me instant client-side filtering without needing a full-text search engine.

Creating custom MDX components

Want to add a new component? Here’s the pattern I use:

Step 1: Create the component

// src/components/mdx/MyComponent.tsx
import React from "react";

interface MyComponentProps {
  title: string;
  children: React.ReactNode;
}

export default function MyComponent({ title, children }: MyComponentProps) {
  return (
    <div className="my-4 p-4 border border-accent/30 rounded-lg">
      <h4 className="font-bold mb-2">{title}</h4>
      <div>{children}</div>
    </div>
  );
}

Step 2: Register it in the MDX components map

In src/lib/blog.ts, add your component to the registry:

import Counter from "@/components/mdx/Counter";
import Callout from "@/components/mdx/Callout";
import MyComponent from "@/components/mdx/MyComponent";

const components = {
  Counter,
  Callout,
  MyComponent, // Add it here
};

Step 3: Use it in your MDX

import MyComponent from '@/components/mdx/MyComponent';

<MyComponent title="Hello">
  This content renders inside the component.
</MyComponent>
⚠️Client components need 'use client'

If your component uses hooks like useState or useEffect, add "use client" at the top. Server components can’t use React state.

Current component inventory

Here’s what I’ve got available:

ComponentTypePurpose
CalloutServerTips, warnings, notes with styled boxes
CounterClientInteractive counter demo (uses useState)
NotebookEmbedClientLazy-loads Jupyter notebook HTML with collapsible UI

The constraint is intentional: fewer components means fewer ways for content to break over time.

Styling: a focused MDX layer

All post content is wrapped in .mdx-content, with typography and media styles in src/app/markdown.css. That means:

  • Code blocks look good without extra wrappers
  • Tables are readable
  • Images and iframes are responsive by default

Because the styles are centralized in a single file, I can tweak typography without editing every post. That’s a small thing that pays off over time.

The CSS uses CSS custom properties for theming:

.mdx-content {
  color: var(--primary-text);
  font-family: "Atkinson Hyperlegible", system-ui, sans-serif;
}

.mdx-content code:not(pre code) {
  background-color: var(--code-bg);
  color: var(--code-text);
  padding: 0.125rem 0.375rem;
  border-radius: 0.25rem;
}

.mdx-content pre {
  background-color: var(--code-bg);
  padding: 1rem;
  border-radius: 0.5rem;
  overflow-x: auto;
}

The build process: what actually happens

When you run next build, here’s the flow:

  1. generateStaticParams scans src/content/blog/ for all .mdx and .md files
  2. For each slug, Next.js calls the page component at build time
  3. getBlogPostBySlug reads, parses, and compiles each post
  4. The compiled React components get serialized into the static output
  5. At runtime, the server returns pre-rendered HTML — no MDX compilation needed
export async function generateStaticParams() {
  const files = fs.readdirSync(blogDirectory);
  return files
    .filter((file) => file.endsWith(".md") || file.endsWith(".mdx"))
    .map((file) => ({
      slug: file.replace(/\.(md|mdx)$/, ""),
    }));
}

This is Static Site Generation (SSG) — pages are built once and served as static HTML.

SEO considerations

MDX plays nicely with SEO because the output is clean HTML. Here’s what I’m doing:

Semantic HTML structure

  • One <h1> per page (the post title)
  • Proper heading hierarchy (h2h3h4)
  • <time> elements for dates
  • <article> wrapper for the main content

Meta tags and Open Graph

Next.js 13+ uses the Metadata API. You can generate these from frontmatter:

export async function generateMetadata({ params }) {
  const post = await getBlogPostBySlug(params.slug);
  return {
    title: post.metadata.title,
    description: post.metadata.excerpt,
    openGraph: {
      title: post.metadata.title,
      description: post.metadata.excerpt,
      type: 'article',
      publishedTime: post.metadata.date,
    },
  };
}

Structured data

For blog posts, you can add JSON-LD structured data:

const jsonLd = {
  '@context': 'https://schema.org',
  '@type': 'BlogPosting',
  headline: post.metadata.title,
  datePublished: post.metadata.date,
  author: { '@type': 'Person', name: 'Your Name' },
};
💬Test your structured data

Use Google’s Rich Results Test to validate your JSON-LD before deploying.

URL structure

Clean URLs like /blog/my-post-slug are better than query strings. The file-based routing handles this automatically — my-post.mdx becomes /blog/my-post.

Performance: where it matters (and where it doesn’t)

The two main performance costs are:

  1. Compilation time – MDX compilation is expensive compared to plain Markdown.
  2. List page aggregation – the blog index compiles every post so it can render excerpts, tags, and searchable text.
  3. Syntax highlighting – code highlighting adds overhead, but it keeps posts readable for technical content.

Actual metrics from my site

MetricValueNotes
Build time (10 posts)~8-12sIncludes full MDX compilation
Per-post compile time~200-400msVaries with code blocks and math
Page weight (typical post)~80-120KBGzipped, includes fonts
First Contentful Paint~0.8-1.2sStatic HTML helps a lot
Largest Contentful Paint~1.0-1.5sUsually the hero image

If I ever hit scaling issues, the next step would be pre‑compiling MDX into cached artifacts at build time and only re‑compiling changed posts.

In practice, this is fine for a small‑to‑mid sized blog. If I were scaling to hundreds of posts, I’d pre‑compile content or cache compiled output.

Performance optimization tips

1. Lazy load heavy components

The NotebookEmbed component only fetches HTML when expanded:

useEffect(() => {
  // Only load HTML when expanded
  if (isOpen && !htmlContent) {
    setLoading(true);
    fetch(`/downloads/${notebookHtml}.html`)
      .then((res) => res.text())
      .then((content) => {
        setHtmlContent(content);
        setLoading(false);
      });
  }
}, [isOpen, notebookHtml, htmlContent]);

2. Use Next.js Image optimization

For images in MDX, consider using Next.js <Image> instead of raw <img>:

import Image from 'next/image';

<Image
  src="/images/blog/my-image.png"
  alt="Description"
  width={800}
  height={400}
  priority={false}
/>

This gets you automatic WebP conversion, lazy loading, and responsive sizing.

3. Minimize client-side JavaScript

Keep components server-rendered when possible. Only add "use client" when you need interactivity. My Callout component is a server component — it renders to static HTML with zero JavaScript.

4. Code splitting for syntax highlighting

If you’re using a lot of languages, consider dynamic imports for highlight.js languages:

const rehypeHighlightOptions = {
  detect: true,
  ignoreMissing: true,
  subset: false, // Use subset: ['javascript', 'python', 'bash'] for smaller bundles
};

Rendering model (why the page still feels snappy)

The heavy lifting happens on the server. By the time the browser gets the page, the MDX is already rendered into HTML and styled by the .mdx-content layer. That’s classic server‑side rendering: the client only needs to display the result, not compile it.

With Next.js 13+ and the App Router, you get the benefits of React Server Components — most of the page is static HTML with no client-side React hydration needed.

The parts I don’t want to compromise on

  • Readable source files (writing should feel like writing, not coding).
  • Composable components (callouts, embeds, and custom UI).
  • Plain SEO‑friendly HTML on the other side.

MDX checks all three boxes for me.

Measuring performance (lightweight but honest)

I keep an eye on:

  • Build time (does compiling all posts start to drag?)
  • Page load on mobile (do embeds slow things down?)
  • Core Web Vitals for individual posts

If any of those start to slip, I’ll adjust the pipeline rather than throwing more JS at the browser.

💡Why I'm okay with it

For my current post volume, build‑time compilation is a trade‑off I’m happy with: simpler code, fewer moving parts, and no extra build pipeline.

What I’d improve next

  • Incremental compilation (only rebuild changed posts) — Next.js ISR could help here
  • Prebuilt search index for faster client filtering — consider Fuse.js or FlexSearch
  • Image optimization with metadata (width/height) for layout stability
  • RSS feed generation from the post metadata

A note on images and embeds

Images live in public/images/blog/..., which keeps paths stable and makes them easy to reference from MDX. For videos, I use responsive <iframe> embeds that inherit the .mdx-content styling so they don’t overflow the layout on small screens.

A quick video that inspired the approach

Further reading

If you’re building your own MDX pipeline and want to compare notes, I’m happy to share more details.