Python re Module - A Complete Guide

In this guide, you'll explore the re module in Python, used for handling regular expressions. We’ll cover its key functions, patterns, and practical examples to help you understand and apply it effectively.

The re module in Python provides support for working with regular expressions, which are patterns used to match character combinations in strings. Regular expressions are used for searching, matching, and manipulating strings based on specific patterns.

Table of Contents

  1. Introduction
  2. re Module Functions
    • re.compile
    • re.search
    • re.match
    • re.fullmatch
    • re.split
    • re.findall
    • re.finditer
    • re.sub
    • re.subn
  3. Regular Expression Syntax
  4. Examples
    • Basic Usage
    • Using Groups and Capturing
    • Using Flags
    • Advanced Substitution
  5. Real-World Use Case
  6. Conclusion
  7. References

Introduction

The re module in Python is used for working with regular expressions. Regular expressions allow you to specify patterns for searching and manipulating strings. With the re module, you can perform various operations such as searching for patterns, splitting strings, replacing substrings, and more.

re Module Functions

re.compile

Compiles a regular expression pattern into a regex object, which can be used for matching.

import re

pattern = re.compile(r'\d+'

re.search

Searches the string for a match to the pattern. Returns a match object if found.

import re

result = re.search(r'\d+', 'Sample123String')
print(result.group())

Output:

123

re.match

Checks for a match only at the beginning of the string. Returns a match object if found.

import re

result = re.match(r'\d+', '123Sample')
print(result.group())

Output:

123

re.fullmatch

Checks for a match only if the entire string matches the pattern. Returns a match object if found.

import re

result = re.fullmatch(r'\d+', '123')
print(result.group())

Output:

123

re.split

Splits the string by occurrences of the pattern.

import re

result = re.split(r'\d+', 'Sample123String456Another789')
print(result)

Output:

['Sample', 'String', 'Another', '']

re.findall

Finds all non-overlapping matches of the pattern in the string. Returns a list of matches.

import re

result = re.findall(r'\d+', 'Sample123String456Another789')
print(result)

Output:

['123', '456', '789']

re.finditer

Finds all non-overlapping matches of the pattern in the string. Returns an iterator yielding match objects.

import re

result = re.finditer(r'\d+', 'Sample123String456Another789')
for match in result:
    print(match.group())

Output:

123
456
789

re.sub

Replaces occurrences of the pattern with a replacement string.

import re

result = re.sub(r'\d+', '#', 'Sample123String456Another789')
print(result)

Output:

Sample#String#Another#

re.subn

Replaces occurrences of the pattern with a replacement string. Returns a tuple containing the new string and the number of replacements.

import re

result = re.subn(r'\d+', '#', 'Sample123String456Another789')
print(result)

Output:

('Sample#String#Another#', 3)

Regular Expression Syntax

Regular expressions use special characters to define patterns. Here are some commonly used special characters:

  • .: Matches any character except a newline.
  • ^: Matches the start of the string.
  • $: Matches the end of the string.
  • *: Matches 0 or more repetitions of the preceding pattern.
  • +: Matches 1 or more repetitions of the preceding pattern.
  • ?: Matches 0 or 1 repetition of the preceding pattern.
  • {n}: Matches exactly n repetitions of the preceding pattern.
  • {n,}: Matches n or more repetitions of the preceding pattern.
  • {n,m}: Matches between n and m repetitions of the preceding pattern.
  • []: Matches any one of the characters inside the brackets.
  • |: Matches either the pattern before or the pattern after the |.
  • (): Creates a group for extracting or manipulating the matched text.

Examples

Basic Usage

Search for all digits in a string.

import re

pattern = re.compile(r'\d+')
matches = pattern.findall('Sample123String456Another789')
print(matches)

Output:

['123', '456', '789']

Using Groups and Capturing

Use groups to capture parts of the match.

import re

pattern = re.compile(r'(\d+)-(\d+)-(\d+)')
match = pattern.search('Phone number: 123-456-7890')
if match:
    print(match.groups())

Output:

('123', '456', '7890')

Using Flags

Use flags to modify the behavior of the pattern.

import re

pattern = re.compile(r'sample', re.IGNORECASE)
matches = pattern.findall('Sample123String456sample789')
print(matches)

Output:

['Sample', 'sample']

Advanced Substitution

Use a function as the replacement argument in re.sub.

import re

def replace(match):
    return str(int(match.group()) * 2)

result = re.sub(r'\d+', replace, 'Sample123String456Another789')
print(result)

Output:

Sample246String912Another1578

Conclusion

The re module in Python provides functions for working with regular expressions. From searching and matching patterns to splitting strings and performing substitutions, the re module is essential for any text-processing task. Understanding regular expressions and the re module can significantly enhance your ability to manipulate and analyze string data in Python.

References

Comments

Spring Boot 3 Paid Course Published for Free
on my Java Guides YouTube Channel

Subscribe to my YouTube Channel (165K+ subscribers):
Java Guides Channel

Top 10 My Udemy Courses with Huge Discount:
Udemy Courses - Ramesh Fadatare