How to Clean Up HTML Into Markdown
Most documentation starts in Markdown, gets published as HTML, and then someone needs to put it back into a Markdown-based system. Going HTML to Markdown is messier than the reverse — and worth knowing which parts of your HTML will survive the trip and which won't.
When this conversion makes sense
HTML to Markdown conversion makes sense when: you're migrating documentation from a CMS or website builder to a Markdown-based system (GitHub, Notion, Hugo, MkDocs), you're pulling content from a web page to edit in a text editor or repository, or you want to clean up HTML-heavy content for easier editing and version control.
If your platform supports native Markdown input and converts it to HTML internally, don't convert HTML back to Markdown and reconvert — work with the Markdown source directly.
What converts cleanly
Standard document structure converts reliably to Markdown equivalents:
<h1>–<h6>→#through######<strong>,<b>→**bold**<em>,<i>→*italic*<a href="url">text</a>→[text](url)<code>→ backtick inline code<pre><code>→ fenced code block<ul>,<ol>,<li>→ Markdown lists<blockquote>→>blockquote<hr>→---
What doesn't convert cleanly
Tables with merged cells. Markdown tables are simple grids — no colspan, no rowspan. Complex HTML tables either get flattened into something inaccurate or preserved as raw HTML.
CSS classes and inline styles. <span class="highlight"> has no Markdown equivalent. The span is either dropped entirely (losing the styling) or preserved as HTML (adding noise to the Markdown source).
Complex layout elements. Divs, flexbox layouts, multi-column structures — these are presentation concerns Markdown doesn't model. They usually become either empty Markdown or raw HTML blocks.
HTML comments. <!-- comment --> are dropped by most converters. If you use HTML comments for metadata, they're lost.
Preserving complex HTML in Markdown
Markdown allows inline HTML — most Markdown renderers will pass HTML tags through as-is. For complex elements that can't be represented in Markdown (complex tables, forms, embedded media), leave them as HTML in the Markdown source. They'll render correctly and you don't lose functionality, just the pure-Markdown property of the document.