Streaming LLM output — Integration example 17

Live simulated stream

Chunk delay: 35ms

The pattern

import quikdown from 'quikdown';

async function streamFromLLM(prompt, target) {
  let buffer = '';

  const response = await fetch('/api/chat', {
    method: 'POST',
    body: JSON.stringify({ prompt }),
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    buffer += decoder.decode(value, { stream: true });
    target.innerHTML = quikdown(buffer);   // ← re-render on each chunk
  }
}

Why this works for streaming

quikdown is small. ~9 KB of parser, no virtual DOM, no diffing. Re-parsing a 4 KB buffer takes microseconds.
setting innerHTML is cheap. Modern browsers handle it in a single layout pass when the markup hasn't changed structurally.
Mid-token markdown is fine. If a chunk arrives partway through a code fence (\`\`\` followed by no closing), quikdown still produces sensible HTML — it just renders the partial code as a code block. When the closing fence arrives, the next re-render snaps into place.
XSS-safe. LLM output is untrusted. quikdown escapes HTML entities and sanitizes javascript: URLs before they hit the DOM.

This is the pattern quikchat uses to render markdown in streaming chat bubbles.