Examples › Track 4 — Integrations › Example 16

Streaming LLM output

When LLMs stream tokens, you typically want them to render as markdown — code blocks, tables, lists — not as raw text. Re-parse the entire buffer with quikdown(buffer) on every chunk. The parser is small and fast enough that you can't see the re-renders.

Live simulated stream

The pattern

import quikdown from 'quikdown';

async function streamFromLLM(prompt, target) {
  let buffer = '';

  const response = await fetch('/api/chat', {
    method: 'POST',
    body: JSON.stringify({ prompt }),
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    buffer += decoder.decode(value, { stream: true });
    target.innerHTML = quikdown(buffer);   // ← re-render on each chunk
  }
}

Why this works for streaming

  • quikdown is small. ~9 KB of parser, no virtual DOM, no diffing. Re-parsing a 4 KB buffer takes microseconds.
  • setting innerHTML is cheap. Modern browsers handle it in a single layout pass when the markup hasn't changed structurally.
  • Mid-token markdown is fine. If a chunk arrives partway through a code fence (\`\`\` followed by no closing), quikdown still produces sensible HTML — it just renders the partial code as a code block. When the closing fence arrives, the next re-render snaps into place.
  • XSS-safe. LLM output is untrusted. quikdown escapes HTML entities and sanitizes javascript: URLs before they hit the DOM.

This is the pattern quikchat uses to render markdown in streaming chat bubbles.