Python stringprep Module

In this guide, you'll explore Python's stringprep module, used for preparing strings for network protocols. Learn its functions with examples.

The stringprep module in Python provides support for preparing Unicode strings according to the Stringprep profiles defined in RFC 3454. This module is primarily used for preparing strings for comparison and canonicalization in network protocols and other applications requiring consistent string handling.

Table of Contents

  1. Introduction
  2. stringprep Module Functions and Constants
    • stringprep.in_table_a1
    • stringprep.in_table_b1
    • stringprep.in_table_c11_c12
    • stringprep.in_table_c21_c22
    • stringprep.in_table_c3
    • stringprep.in_table_c4
    • stringprep.in_table_c5
    • stringprep.in_table_c6
    • stringprep.in_table_c7
    • stringprep.in_table_c8
    • stringprep.in_table_c9
    • stringprep.map_table_b2
  3. Examples
    • Checking Characters Against Stringprep Tables
    • Mapping Characters Using Stringprep Tables
  4. Real-World Use Case
  5. Conclusion
  6. References

Introduction

The stringprep module provides functions to check and map Unicode characters according to the Stringprep profiles defined in RFC 3454. Stringprep profiles are used in various network protocols to ensure consistent and secure handling of Unicode strings. This module is essential for preparing strings for comparison, normalization, and canonicalization.

stringprep Module Functions and Constants

stringprep.in_table_a1

Checks if a character is in Table A.1 (Unassigned Code Points in Unicode 3.2).

import stringprep

is_in_table_a1 = stringprep.in_table_a1('\u0378')
print(is_in_table_a1)

Output:

True

stringprep.in_table_b1

Checks if a character is in Table B.1 (Commonly Mapped to Nothing).

import stringprep

is_in_table_b1 = stringprep.in_table_b1('\u00AD')
print(is_in_table_b1)

Output:

True

stringprep.in_table_c11_c12

Checks if a character is in Table C.1.1 or C.1.2 (ASCII Space Characters and Non-ASCII Space Characters).

import stringprep

is_in_table_c11_c12 = stringprep.in_table_c11_c12('\u0020')
print(is_in_table_c11_c12)

Output:

True

stringprep.in_table_c21_c22

Checks if a character is in Table C.2.1 or C.2.2 (ASCII Control Characters and Non-ASCII Control Characters).

import stringprep

is_in_table_c21_c22 = stringprep.in_table_c21_c22('\u0001')
print(is_in_table_c21_c22)

Output:

True

stringprep.in_table_c3

Checks if a character is in Table C.3 (Private Use).

import stringprep

is_in_table_c3 = stringprep.in_table_c3('\uE000')
print(is_in_table_c3)

Output:

True

stringprep.in_table_c4

Checks if a character is in Table C.4 (Non-Character Code Points).

import stringprep

is_in_table_c4 = stringprep.in_table_c4('\uFDD0')
print(is_in_table_c4)

Output:

True

stringprep.in_table_c5

Checks if a character is in Table C.5 (Surrogate Codes).

import stringprep

is_in_table_c5 = stringprep.in_table_c5('\uD800')
print(is_in_table_c5)

Output:

True

stringprep.in_table_c6

Checks if a character is in Table C.6 (Inappropriate for Plain Text).

import stringprep

is_in_table_c6 = stringprep.in_table_c6('\uFFF9')
print(is_in_table_c6)

Output:

True

stringprep.in_table_c7

Checks if a character is in Table C.7 (Inappropriate for Canonical Representation).

import stringprep

is_in_table_c7 = stringprep.in_table_c7('\u2FF0')
print(is_in_table_c7)

Output:

True

stringprep.in_table_c8

Checks if a character is in Table C.8 (Change Display Properties or Deprecated).

import stringprep

is_in_table_c8 = stringprep.in_table_c8('\u0340')
print(is_in_table_c8)

Output:

True

stringprep.in_table_c9

Checks if a character is in Table C.9 (Tagging Characters).

import stringprep

# Example character in table C.9
char = '\u200E'  # LEFT-TO-RIGHT MARK

is_in_table_c9 = stringprep.in_table_c9(char)
print(is_in_table_c9)  

Output:

False

stringprep.map_table_b2

Maps characters in Table B.2 (Case Mapping) to their lowercase equivalents.

import stringprep

mapped_char = stringprep.map_table_b2('\u0041')  # 'A'
print(mapped_char)

Output:

a

Examples

Checking Characters Against Stringprep Tables

import stringprep

char = '\u00AD'  # Soft hyphen
if stringprep.in_table_b1(char):
    print(f"{char} is commonly mapped to nothing")

Output:

� is commonly mapped to nothing

Mapping Characters Using Stringprep Tables

import stringprep

char = '\u0041'  # 'A'
mapped_char = stringprep.map_table_b2(char)
print(f"Original: {char}, Mapped: {mapped_char}")

Output:

Original: A, Mapped: a

Real-World Use Case

Preparing Usernames for Comparison

Normalize and prepare usernames for comparison in a chat application to ensure consistency.

import stringprep

def prepare_username(username):
    prepared = []
    for char in username:
        if stringprep.in_table_b1(char):
            continue
        if stringprep.in_table_c12(char):
            char = ' '
        char = stringprep.map_table_b2(char)
        prepared.append(char)
    return ''.join(prepared).strip()

username1 = "User Name"
username2 = "user name"
prepared_username1 = prepare_username(username1)
prepared_username2 = prepare_username(username2)

print(f"Prepared Username1: {prepared_username1}")
print(f"Prepared Username2: {prepared_username2}")
print(f"Usernames are equal: {prepared_username1 == prepared_username2}")

Output:

Prepared Username1: user name
Prepared Username2: user name
Usernames are equal: True

Conclusion

The stringprep module in Python is used for preparing Unicode strings according to the Stringprep profiles defined in RFC 3454. It ensures consistent and secure handling of Unicode strings, which is essential for applications involving text comparison, normalization, and canonicalization. By using the functions provided by this module, you can ensure that your application correctly handles Unicode data in a consistent manner.

References

Comments

Spring Boot 3 Paid Course Published for Free
on my Java Guides YouTube Channel

Subscribe to my YouTube Channel (165K+ subscribers):
Java Guides Channel

Top 10 My Udemy Courses with Huge Discount:
Udemy Courses - Ramesh Fadatare