pyguides

email module

Overview

The email module is part of Python’s standard library and handles email message parsing, construction, and encoding. It is not an SMTP client — for sending email, look at smtplib. The email module reads and builds the message structure itself.

The module is divided into several sub-packages:

  • email.message — the core Message class
  • email.parser — parses raw email text into Message objects
  • email.generator — generates plain text from Message objects
  • email.policy — controls formatting and line wrapping behavior
  • email.contentmanager — handles MIME type dispatch
  • email.header — encodes non-ASCII header values
  • email.encoders — base64 and other content encodings

Parsing Email

From a string

from email import message_from_string

raw = """From: alice@example.com
To: bob@example.com
Subject: Hello
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"

Hello Bob,
This is a test email.
"""

msg = message_from_string(raw)
print(msg['From'])          # => alice@example.com
print(msg['Subject'])       # => Hello
print(msg.get_content())   # => Hello Bob,\nThis is a test email.\n

From a file

from email import message_from_file

with open("email.txt") as f:
    msg = message_from_file(f)

Using BytesParser for binary data

Email messages can arrive as bytes, particularly from network sockets or IMAP servers. Use BytesParser:

from email import message_from_bytes
from email.parser import BytesParser

raw_bytes = b"From: alice@example.com\r\n\r\nHello"
msg = BytesParser().parsebytes(raw_bytes)

Message Object

The email.message.Message object is the central type. Access headers with dict-style syntax and body content through get_content() or walk().

Accessing headers

msg['From']        # => 'alice@example.com' (returns first matching header)
msg.get_all('Received')  # => ['from server1', 'from server2'] (all values)
msg.keys()        # => ['From', 'To', 'Subject', ...]

Headers are case-insensitive.

Inspecting the payload

msg.get_content()              # raw body as string
msg.is_multipart()             # => True / False
msg.get_content_type()        # => 'text/plain' / 'multipart/alternative' / etc.
msg.get_content_disposition()  # => 'inline' / 'attachment' / None

Walking multipart messages

For multipart messages, walk() iterates over every part:

for part in msg.walk():
    content_type = part.get_content_type()
    if content_type == 'text/plain':
        print(part.get_content())
    elif content_type == 'text/html':
        print(part.get_content())

Building Email Messages

Plain text message

from email import message_from_string
from email.header import Header

msg = message_from_string("Hello Bob,\n\nMeeting at 3pm.")

msg['From'] = 'alice@example.com'
msg['To'] = 'bob@example.com'
msg['Subject'] = 'Reminder'

print(msg.as_string())

Unicode headers

Non-ASCII characters in header values must be encoded:

from email.header import Header

msg['Subject'] = Header('Meeting at 3pm', charset='utf-8')
# Or inline:
msg['Subject'] = str(Header('Réunion à 15h', charset='utf-8'))

This produces =?utf-8?b?...?= encoded-word format for email transport safety.

Multipart message

from email.message import Message
from email.policy import HTTP
from email.contentmanager import raw_data_manager

msg = Message()
msg['From'] = 'alice@example.com'
msg['To'] = 'bob@example.com'
msg['Subject'] = 'Report'
msg['Content-Type'] = 'multipart/mixed'

# Attach a text part
text = Message()
text['Content-Type'] = 'text/plain; charset="utf-8"'
text.set_payload("Here is the report you requested.\n")
msg.attach(text)

# Attach a file
attachment = Message()
attachment['Content-Type'] = 'application/pdf'
attachment['Content-Disposition'] = 'attachment; filename="report.pdf"'
attachment.set_payload(open("report.pdf", "rb").read(), charset='base64')
msg.attach(attachment)

MIME Types

Common content types

TypeDescription
text/plainPlain text, no formatting
text/htmlHTML content
multipart/mixedMultiple unrelated parts (body + attachments)
multipart/alternativeSame content in different formats (plain + HTML)
multipart/relatedParts related to a main body (images embedded in HTML)
application/pdfBinary PDF attachment
application/octet-streamGeneric binary data

Creating HTML email with images

from email.message import Message

msg = Message()
msg['From'] = 'alice@example.com'
msg['To'] = 'bob@example.com'
msg['Subject'] = 'Newsletter'
msg['Content-Type'] = 'multipart/related'

html_part = Message()
html_part['Content-Type'] = 'text/html; charset="utf-8"'
html_part.set_payload("<p>See the image below:</p><img src='cid:logo'/>")
msg.attach(html_part)

image_part = Message()
image_part['Content-Type'] = 'image/png'
image_part['Content-ID'] = '<logo>'
image_part['Content-Disposition'] = 'inline'
image_part.set_payload(image_bytes, charset='base64')
msg.attach(image_part)

Encoding and Decoding

Base64 attachments

import base64
from email.message import Message

attachment = Message()
attachment['Content-Type'] = 'application/pdf'
attachment['Content-Disposition'] = 'attachment; filename="doc.pdf"'
attachment['Content-Transfer-Encoding'] = 'base64'

pdf_bytes = open("doc.pdf", "rb").read()
attachment.set_payload(base64.encodebytes(pdf_bytes).decode('ascii'))

Parsing encoded words

When reading email, decode automatically:

from email.header import decode_header

raw_subject = msg['Subject']  # e.g. "=?utf-8?b?w6nD?="
parts = decode_header(raw_subject)
# parts => [('émoj', 'utf-8')] or [('=?utf-8?b?w6nD?= ', 'utf-8')]
decoded = ''.join(text for text, charset in parts if text)

Policy Control

Email policies govern header folding, line wrapping, and content transfer encoding. Python 3.3+ introduced email.policy to manage these.

Using a specific policy

from email import message_from_string
from email.policy import compat32, HTTP, UTF8

msg = message_from_string(raw, policy=compat32)  # legacy behavior
msg2 = message_from_string(raw, policy=UTF8)    # modern, UTF8-friendly

HTTP policy is useful when generating emails that will be submitted via an HTTP API.

Common Use Cases

Extracting all email addresses from a message

import re

def extract_addresses(msg):
    addresses = set()
    for header in ['From', 'To', 'Cc', 'Bcc']:
        val = msg.get(header, '')
        found = re.findall(r'[\w.+-]+@[\w.-]+', val)
        addresses.update(found)
    return addresses

Saving attachments

for part in msg.walk():
    if part.get_content_disposition() == 'attachment':
        filename = part.get_filename()
        if filename:
            with open(filename, 'wb') as f:
                f.write(part.get_payload(decode=True))

Building a text+HTML alternative email

from email.message import Message

msg = Message()
msg['From'] = 'alice@example.com'
msg['To'] = 'bob@example.com'
msg['Subject'] = 'Update'
msg['Content-Type'] = 'multipart/alternative'

plain = Message()
plain['Content-Type'] = 'text/plain; charset="utf-8"'
plain.set_payload("Plain text version of the update.")
msg.attach(plain)

html = Message()
html['Content-Type'] = 'text/html; charset="utf-8"'
html.set_payload("<p>HTML version of the <strong>update</strong>.</p>")
msg.attach(html)

Gotchas

Header parsing is case-insensitive but preserves original casing. msg['subject'] and msg['Subject'] both work but the original spelling from the raw message is preserved in keys().

get_payload() returns a string by default for text/* types. For non-text parts like images or PDFs, pass decode=True and use the raw bytes:

# Wrong for binary
data = part.get_payload()  # base64-encoded string, not the actual bytes

# Correct for binary
data = part.get_payload(decode=True)  # raw bytes

Multipart messages require a boundary. When building a multipart message, the boundary is generated automatically by set_payload() or when you call as_string(). Do not set a Content-Type: multipart/* manually.

smtplib does not sign or encrypt. The email module constructs messages but does not handle SMTP delivery. For signing (DKIM) or encryption (S/MIME), use additional libraries.

See Also