Build a Markdown to HTML Converter in Python

March 25, 2026 · 7 min read · Updated March 25, 2026 · intermediate

markdown html mistune pygments bleach security

Markdown powers much of the web — from README files to blog posts to documentation. Converting Markdown to HTML is a common task, and Python gives you excellent tools to do it well.

This guide walks through the full conversion pipeline: reading a Markdown file, parsing it to HTML, adding syntax highlighting for code blocks, sanitizing the output against XSS attacks, and wrapping it all in a working CLI tool.

The Conversion Pipeline

Every Markdown-to-HTML converter follows the same five-step pipeline:

read .md file
    → parse with a markdown library
    → (optional) apply syntax highlighting to code blocks
    → (optional) sanitize HTML to strip dangerous tags/attributes
    → wrap in an HTML template
    → write to output file

The two optional steps — highlighting and sanitization — are where most real-world converters spend their logic. The parsing step itself is straightforward; it’s the surrounding concerns that require care.

Setting Up

Install the dependencies you need for this guide:

pip install mistune markdown pygments bleach

mistune — fast pure-Python markdown parser
markdown — the classic Python-Markdown library with a rich extension ecosystem
pygments — syntax highlighting engine used by both libraries
bleach — HTML sanitizer for stripping dangerous tags

Converting Markdown with mistune

mistune is the simplest library for converting Markdown to HTML. Its API is a single function call. Pass your markdown text to mistune.html() and it returns a string of raw HTML:

import mistune

md = """
## Heading

This is **bold** and this is *italic*.

- Item one
- Item two
"""

html = mistune.html(md)
print(html)

<h2>Heading</h2>
<p>This is <strong>bold</strong> and this is <em>italic</em>.</p>
<ul>
<li>Item one</li>
<li>Item two</li>
</ul>

mistune.html() handles tables, strikethrough, task lists, and autolinks out of the box. No extensions to configure for basic parsing.

Converting Markdown with Python-Markdown

The markdown library takes a different approach: a lean core with an extensions system. Enable features by passing extension names:

import markdown

md = """
## Heading

This is **bold** text.

> A blockquote for emphasis.
"""

html = markdown.markdown(md)
print(html)

<h2>Heading</h2>
<p>This is <strong>bold</strong> text.</p>
<blockquote>
<p>A blockquote for emphasis.</p>
</blockquote>

For fenced code blocks, add the fenced_code extension. For syntax highlighting, also add codehilite:

import markdown

md = """
Here's some Python:

    def add(a, b):
        return a + b

And a JavaScript example:

    const add = (a, b) => a + b;
"""

html = markdown.markdown(md, extensions=['fenced_code', 'codehilite'])
print(html)

Python-Markdown requires explicit extension activation. The codehilite extension adds CSS classes to code blocks — include a Pygments stylesheet in your HTML output to see colors.

Adding Syntax Highlighting with Pygments

Neither mistune nor Python-Markdown includes syntax highlighting built-in. Both use Pygments for code block coloring, but they integrate differently.

Highlighting with mistune

mistune requires a custom renderer. Subclass mistune.renderers.html.HTMLRenderer and override the block_code method:

import mistune
from pygments import highlight
from pygments.lexers import get_lexer_by_name, TextLexer
from pygments.formatters import HtmlFormatter


class HighlightRenderer(mistune.renderers.html.HTMLRenderer):
    def block_code(self, code, lang=None):
        if lang:
            lexer = get_lexer_by_name(lang, stripall=True)
        else:
            lexer = TextLexer()
        formatter = HtmlFormatter()
        highlighted = highlight(code, lexer, formatter)
        return f'<div class="highlight">{formatter.get_style_defs()}{highlighted}</div>\n'


renderer = HighlightRenderer()
markdown = mistune.create_markdown(renderer=renderer)

md = """
Here is Python:

    def multiply(a, b):
        return a * b

And JavaScript:

    const multiply = (a, b) => a * b;
"""

print(markdown(md))

The block_code method receives the raw code string and optional language identifier. It looks up the appropriate Pygments lexer, highlights the code, and returns the formatted HTML.

Highlighting with Python-Markdown

Python-Markdown’s codehilite extension handles highlighting automatically:

import markdown

md = """
Python:

    def add(a, b):
        return a + b

JavaScript:

    const add = (a, b) => a + b;
"""

html = markdown.markdown(md, extensions=['codehilite', 'fenced_code'])
print(html)

To actually see colors in the browser, include a Pygments stylesheet in your HTML template:

from pygments.formatters import HtmlFormatter

def get_pygments_css():
    formatter = HtmlFormatter(style='monokai')
    return f'<style>{formatter.get_style_defs(".codehilite")}</style>'

Sanitizing HTML Output

Raw Markdown conversion is not safe for untrusted input. Markdown parsers pass through raw HTML, which means a user can inject <script> tags or event handlers like <img onerror="...">.

Render untrusted Markdown through a sanitizer before outputting HTML. This is not optional — it is a security requirement.

Using bleach

bleach.clean() strips dangerous tags and attributes while preserving safe HTML:

import bleach

dirty = """
<p>Hello <script>alert("xss")</script></p>
<p>Click <a href="javascript:alert(1)">here</a></p>
<p><img src=x onerror="alert(1)"></p>
"""

clean = bleach.clean(
    dirty,
    tags=['p', 'a'],
    attributes={'a': ['href']},
    strip=True
)
print(clean)

<p>Hello alert("xss")</p>
<p>Click <a>here</a></p>
<p></p>

The strip=True option removes disallowed tags entirely. The href attribute on <a> tags is preserved, but javascript: URLs are neutralized.

Full Pipeline: mistune to bleach

Combine parsing and sanitization into a single conversion function:

import mistune
import bleach


def convert_md_to_html(md_text, sanitize=True):
    raw_html = mistune.html(md_text)

    if not sanitize:
        return raw_html

    return bleach.clean(
        raw_html,
        tags=[
            'h1', 'h2', 'h3', 'h4', 'p', 'ul', 'ol', 'li',
            'strong', 'em', 'b', 'i', 'u', 's',
            'a', 'code', 'pre', 'blockquote',
            'div', 'span', 'br', 'hr',
            'img', 'table', 'thead', 'tbody', 'tr', 'th', 'td',
        ],
        attributes={
            'a': ['href', 'title', 'target', 'rel'],
            'img': ['src', 'alt', 'title', 'width', 'height'],
            'code': ['class'],
            'span': ['class'],
            'div': ['class'],
            '*': ['id'],
        },
        strip=True
    )


# Test with a potentially dangerous input
md_input = """
A Safe Article

Click [here](https://example.com)!

<script>document.cookie</script>

    print("safe")
"""

result = convert_md_to_html(md_input)
print(result)

<h2>A Safe Article</h2>
<p>Click <a href="https://example.com" rel="nofollow">here</a>!</p>
<p></p>
<pre><code>print("safe")
</code></pre>

The script tag is stripped, the link is preserved with rel="nofollow", and the code block is rendered safely.

Building a Full CLI Converter

Now assemble the pieces into a reusable command-line tool. This project structure keeps concerns separate:

markdown_converter/
├── converter/
│   ├── __init__.py
│   ├── core.py
│   ├── renderer.py
│   └── sanitizer.py
├── main.py
└── requirements.txt

converter/renderer.py

import mistune
from pygments import highlight
from pygments.lexers import get_lexer_by_name, TextLexer
from pygments.formatters import HtmlFormatter


class HighlightRenderer(mistune.renderers.html.HTMLRenderer):
    def block_code(self, code, lang=None):
        if lang:
            lexer = get_lexer_by_name(lang, stripall=True)
        else:
            lexer = TextLexer()
        formatter = HtmlFormatter()
        highlighted = highlight(code, lexer, formatter)
        return f'<div class="highlight">{highlighted}</div>\n'

converter/sanitizer.py

import bleach

ALLOWED_TAGS = [
    'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
    'p', 'ul', 'ol', 'li',
    'strong', 'em', 'b', 'i', 'u', 's',
    'a', 'code', 'pre', 'blockquote',
    'div', 'span', 'br', 'hr',
    'img', 'table', 'thead', 'tbody', 'tr', 'th', 'td',
]

ALLOWED_ATTRIBUTES = {
    'a': ['href', 'title', 'target', 'rel'],
    'img': ['src', 'alt', 'title', 'width', 'height'],
    'code': ['class'],
    'span': ['class'],
    'div': ['class'],
    '*': ['id'],
}


def sanitize_html(raw_html):
    return bleach.clean(
        raw_html,
        tags=ALLOWED_TAGS,
        attributes=ALLOWED_ATTRIBUTES,
        strip=True
    )

converter/core.py

import mistune
from .renderer import HighlightRenderer
from .sanitizer import sanitize_html


def convert_md_to_html(md_text, sanitize=True):
    renderer = HighlightRenderer()
    markdown = mistune.create_markdown(renderer=renderer)
    raw_html = markdown(md_text)

    if sanitize:
        return sanitize_html(raw_html)
    return raw_html


def convert_file(input_path, output_path, sanitize=True):
    with open(input_path, 'r', encoding='utf-8') as f:
        md_text = f.read()

    html = convert_md_to_html(md_text, sanitize=sanitize)
    full_page = wrap_in_html_template(html)

    with open(output_path, 'w', encoding='utf-8') as f:
        f.write(full_page)

    return output_path


def wrap_in_html_template(body_content, title="Document"):
    return f"""<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>{title}</title>
    <style>
        body {{ font-family: system-ui, sans-serif; line-height: 1.6; max-width: 800px; margin: 0 auto; padding: 2rem; }}
        .highlight {{ background: #f4f4f4; padding: 1rem; overflow-x: auto; border-radius: 4px; }}
        pre {{ margin: 0; }}
    </style>
</head>
<body>
{body_content}
</body>
</html>"""

main.py

import argparse
import sys
from converter.core import convert_file


def main():
    parser = argparse.ArgumentParser(description='Convert Markdown to HTML')
    parser.add_argument('input', help='Input Markdown file')
    parser.add_argument('-o', '--output', help='Output HTML file', default=None)
    parser.add_argument(
        '--no-sanitize',
        action='store_true',
        help='Skip HTML sanitization'
    )
    args = parser.parse_args()

    output = args.output or args.input.replace('.md', '.html')

    try:
        result = convert_file(args.input, output, sanitize=not args.no_sanitize)
        print(f"Written: {result}")
    except FileNotFoundError:
        print(f"Error: '{args.input}' not found", file=sys.stderr)
        sys.exit(1)
    except Exception as e:
        print(f"Error: {e}", file=sys.stderr)
        sys.exit(1)


if __name__ == '__main__':
    main()

Run it from the command line:

python main.py article.md -o article.html

Written: article.html

With --no-sanitize, the converter skips the bleach step — useful when you control the input and want maximum fidelity, but dangerous for user-submitted content.

Summary

You now have a complete picture of converting Markdown to HTML in Python:

Step	What happens	Key tool
Read	Load `.md` file	built-in `open()`
Parse	Convert Markdown syntax to HTML	`mistune.html()` or `markdown.markdown()`
Highlight	Colorize code blocks	Pygments via custom renderer or `codehilite`
Sanitize	Strip dangerous HTML/attributes	`bleach.clean()`
Output	Wrap in HTML template and write	string formatting

mistune is the faster, simpler choice for new projects. Python-Markdown’s extension ecosystem is valuable when you need TOC generation, footnotes, or SmartyPants processing. Either way, make sanitization a mandatory step for any untrusted input.

The CLI tool gives you a reusable command for batch conversions or integration into static site generators. Adapt the project structure to your needs — swap the renderer for a different highlighting theme, tighten the allowed tags list for stricter security, or add a watch mode for live preview.