Regular Expressions with the re Module

March 13, 2026 · 5 min read · Updated March 13, 2026 · intermediate

python stdlib regex text-processing

Regular expressions (regex) are a powerful tool for working with text. They let you define patterns to match, extract, and replace strings. Python’s re module provides these capabilities as part of the standard library.

Why Use Regular Expressions?

Suppose you need to find all email addresses in a block of text, validate that a phone number follows a specific format, or replace all URLs with links. Writing manual string parsing code for each of these tasks gets messy fast.

Regular expressions solve this by letting you describe patterns with a compact syntax. Instead of writing loops and conditionals, you write a pattern like \d{3}-\d{4} to match phone numbers, and the regex engine handles the rest.

Your First Regex

The simplest way to use regex in Python is with re.search():

import re

text = "My phone number is 555-1234."
match = re.search(r"\d{3}-\d{4}", text)

if match:
    print(f"Found: {match.group()}")
# output: Found: 555-1234

The r prefix creates a raw string, which avoids issues with backslash escape sequences. The \d{3} matches exactly three digits, and \d{4} matches exactly four digits.

Common Patterns

Here are patterns you will use most often:

Pattern	Matches
`\d`	Any digit (0-9)
`\D`	Any non-digit
`\w`	Word character (a-z, A-Z, 0-9, _)
`\W`	Non-word character
`\s`	Whitespace (space, tab, newline)
`\S`	Non-whitespace
`.`	Any character except newline
`^`	Start of string
`$`	End of string

You can combine these with quantifiers:

import re

# Match one or more digits
re.search(r"\d+", "123 abc")      # Matches "123"

# Match zero or more digits  
re.search(r"\d*", "abc")          # Matches "" (empty)

# Match exactly 3 letters
re.search(r"[a-zA-Z]{3}", "abcdef")  # Matches "abc"

# Optional character
re.search(r"colou?r", "color")     # Matches "color"
re.search(r"colou?r", "colour")    # Matches "colour"

Finding All Matches

Use re.findall() when you need every match in a string:

import re

text = "There are 3 apples, 5 oranges, and 2 bananas."

numbers = re.findall(r"\d+", text)
print(numbers)
# output: ['3', '5', '2']

# Extract emails from text
emails = """Contact us at info@example.com or support@company.org"""
email_pattern = r"\w+@\w+\.\w+"
found_emails = re.findall(email_pattern, emails)
print(found_emails)
# output: ['info@example.com', 'support@company.org']

The findall() function returns a list of all non-overlapping matches.

Match Objects

When you need more than just the matched text, use re.search() or re.match() which return match objects:

import re

text = "Price: $19.99"

match = re.search(r"\$(\d+\.\d{2})", text)

if match:
    print(f"Full match: {match.group(0)}")   # $19.99
    print(f"Captured: {match.group(1)}")     # 19.99
    print(f"Position: {match.span()}")       # (7, 13)

The parentheses create capture groups. group(0) returns the entire match, while group(1) returns the first captured group.

Splitting and Replacing

The re.sub() function replaces matches with new text:

import re

text = "Hello    World   !"

# Replace multiple spaces with single space
cleaned = re.sub(r" +", " ", text)
print(cleaned)
# output: Hello World !

# Replace numbers with placeholder
text = "Call 555-1234 for help"
anonymized = re.sub(r"\d{3}-\d{4}", "XXX-XXXX", text)
print(anonymized)
# output: Call XXX-XXXX for help

# Use capture groups in replacement
text = "2026-03-13"
reformatted = re.sub(r"(\d{4})-(\d{2})-(\d{2})", r"\3/\2/\1", text)
print(reformatted)
# output: 13/03/2026

The re.split() function splits on patterns:

import re

text = "apple, banana; cherry: date"

# Split on any non-alphanumeric
fruits = re.split(r"[,;:]", text)
print(fruits)
# output: ['apple', ' banana', ' cherry', ' date']

# Split and remove whitespace
fruits = [f.strip() for f in re.split(r"[,;:]", text)]
print(fruits)
# output: ['apple', 'banana', 'cherry', 'date']

Compiling Patterns

If you use the same pattern repeatedly, compile it for better performance:

import re

# Compile once
phone_pattern = re.compile(r"\d{3}-\d{4}")

# Use the compiled pattern
numbers = ["555-1234", "555-5678", "123-4567"]
for number in numbers:
    match = phone_pattern.search(number)
    if match:
        print(f"Valid: {number}")
# output: Valid: 555-1234
# output: Valid: 555-5678

Compiled patterns also let you add flags and use methods directly:

import re

pattern = re.compile(r"\b\w+\b", re.IGNORECASE)

# Find all words containing "py"
matches = pattern.findall("Python py PYTHON")
print(matches)
# output: ['Python', 'py', 'PYTHON']

Practical Examples

Validating Email Addresses

import re

def is_valid_email(email):
    pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
    return re.match(pattern, email) is not None

print(is_valid_email("user@example.com"))      # True
print(is_valid_email("invalid-email"))         # False
print(is_valid_email("user@domain"))           # False

Extracting Data from Log Files

import re

log_line = "2026-03-13 14:30:45 ERROR Connection failed from 192.168.1.100"

timestamp = re.search(r"(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})", log_line)
level = re.search(r"(ERROR|WARNING|INFO)", log_line)
ip = re.search(r"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})", log_line)

print(f"Time: {timestamp.group(1)}")
print(f"Level: {level.group(1)}")
print(f"IP: {ip.group(1)}")
# output: Time: 2026-03-13 14:30:45
# output: Level: ERROR
# output: IP: 192.168.1.100

Removing HTML Tags

import re

html = "<p>Hello, <strong>World</strong>!</p>"
text = re.sub(r"<[^>]+>", "", html)
print(text)
# output: Hello, World!

Common Pitfalls

Greedy vs. Non-Greedy Matching

The * and + quantifiers are greedy—they match as much as possible. Use *? or +? for non-greedy matching:

import re

html = "<div>content</div>"

# Greedy (captures too much)
re.search(r"<.+>", html).group()
# output: <div>content</div>

# Non-greedy (stops at first match)
re.search(r"<.+?>", html).group()
# output: <div>

Character Classes vs. Predefined Classes

Remember that \d matches digits, but [0-9] does the same. Character classes need explicit ranges:

import re

# \d matches any digit
re.search(r"\d", "abc123")    # Matches "1"

# [0-9] matches any digit
re.search(r"[0-9]", "abc123")  # Matches "1"

# [abc] matches only a, b, or c
re.search(r"[abc]", "defabc") # Matches "a"

Getting Started

The re module is built into Python, so you can start using it immediately. Begin with simple patterns and gradually add complexity. Remember to use raw strings (r"...") to avoid escape sequence issues.

For complex patterns, build them piece by piece and test each part. The online regex testers can help visualize what your pattern matches.