quikdown Architecture

Design Philosophy

quikdown is designed with these core principles:

Small & Fast - Optimized for size (~8.5KB minified) and performance
Secure by Default - All HTML is escaped unless explicitly trusted
Zero Dependencies - No external libraries required
Extensible - Plugin system for custom rendering
Practical - Focused on the markdown subset actually used in chat/LLM outputs

Parser Architecture

Overview

quikdown uses a multi-phase regex-based parser that prioritizes safety and simplicity:

Input Markdown
    ↓
Phase 1: Extract & Protect (Code blocks, inline code)
    ↓
Phase 2: Escape HTML (XSS protection)
    ↓
Phase 3: Process Block Elements (Tables, headings, lists)
    ↓
Phase 4: Process Inline Elements (Bold, italic, links)
    ↓
Phase 5: Create Paragraphs
    ↓
Phase 6: Restore Protected Content
    ↓
Output HTML

Phase 1: Extract & Protect

Before any processing, we extract code blocks and inline code, replacing them with placeholders:

Fenced code blocks → %%%CODEBLOCK0%%%, %%%CODEBLOCK1%%%, etc.
Inline code → %%%INLINECODE0%%%, %%%INLINECODE1%%%, etc.

This prevents code content from being processed as markdown and ensures special characters remain intact.

Phase 2: HTML Escaping

All remaining content is HTML-escaped to prevent XSS attacks:

< → <
> → >
& → &
" → "
' → '

This is done BEFORE markdown processing, ensuring no user input can inject HTML tags.

Phase 3: Block Elements

Process larger structural elements:

Tables - Multi-line processing with alignment support
Headings - ATX-style headers (# through ######)
Blockquotes - Line-by-line > prefixes
Horizontal rules - Three or more hyphens
Lists - Both ordered and unordered with nesting

Phase 4: Inline Elements

Process text formatting within blocks:

Images - Processed before links to avoid conflicts
Links - Standard markdown link syntax
Bold - **text** or __text__
Italic - *text* or _text_
Strikethrough - ~~text~~
Line breaks - Two trailing spaces

Phase 5: Paragraphs

Double newlines are converted to paragraph breaks, then we unwrap block elements that shouldn't be inside <p> tags.

Phase 6: Restore Protected Content

Finally, we replace the placeholders with the actual code content, properly formatted.

Key Design Decisions

Why Regex Instead of AST?

Size - No parser/lexer overhead
Speed - Single pass for most operations
Simplicity - Easier to audit and understand
Good enough - Handles 95% of real-world markdown

Why Extract-Escape-Process-Restore?

This pattern ensures:

Code blocks are never modified
HTML is always escaped (security)
Markdown syntax inside code is preserved
Processing order is predictable

Why No HTML Passthrough?

By default, all HTML is escaped for security. However, trusted HTML can be rendered using the fence plugin system:

// Controlled HTML rendering via fence blocks
const plugin = (content, lang) => {
  if (lang === 'html-render') {
    return content; // Trust this specific block
  }
  return undefined; // Use default escaping
};

This makes trust explicit and granular.

Performance Considerations

Optimizations

Single-pass regex where possible
Pre-compiled patterns (via JavaScript's regex literals)
Minimal string concatenation
Early returns for empty/invalid input

Trade-offs

No streaming - Entire document processed at once
Regex limitations - Some edge cases in deeply nested structures
No incremental updates - Full re-parse on change

These trade-offs are acceptable for the target use case (chat messages, LLM outputs) where documents are typically small.

Memory Usage

Linear with input size - No exponential growth
Temporary arrays for code blocks and placeholders
No AST - No intermediate tree structure

Browser Compatibility

ES6 features used: Template literals, arrow functions, const/let
No polyfills needed for modern browsers (2017+)
Regex compatibility: Avoided lookbehind for older Safari

Extensibility Points

1. Fence Plugin System

Custom renderers for fenced code blocks:

function myPlugin(content, language) {
  // content: Raw, unescaped content
  // language: The language identifier (if any)
  // Return: HTML string or undefined (fall back to default)
}

2. Style Options

Inline styles: Embed CSS directly in elements
CSS classes: Use external stylesheets
Custom prefix: Avoid class name collisions

3. Configuration

The configure() method creates reusable configured instances:

const myParser = quikdown.configure({
  inline_styles: true,
  fence_plugin: myPlugin
});

Security Model

Default Protections

HTML Escaping - All user input is escaped
No Script Execution - No eval() or dynamic code
No HTML Parsing - No innerHTML on untrusted content
Protected Code Blocks - Code content is preserved exactly

Trust Boundaries

Input: Untrusted markdown text
Output: Safe HTML (escaped)
Plugins: Trusted code (developer-provided)
Plugin Content: Potentially unsafe (plugin's responsibility)

Recommended Practices

Only use fence plugins from trusted sources
Validate plugin output if accepting third-party plugins
Use Content Security Policy (CSP) headers
Sanitize URLs in production applications

Limitations by Design

Not Supported

Full CommonMark specification
HTML blocks (security)
Reference-style links (complexity)
Footnotes (uncommon in chat)
Definition lists (uncommon)
Nested blockquotes with different markers

Edge Cases

Mixed emphasis markers can mis-parse
Deeply nested lists beyond 10 levels
Tables without proper separator rows
Unclosed fenced code blocks

These limitations keep the parser small, fast, and secure.