🎓 Top 15 Udemy Courses (80-90% Discount): My Udemy Courses - Ramesh Fadatare — All my Udemy courses are real-time and project oriented courses.

▶️ Subscribe to My YouTube Channel (178K+ subscribers): Java Guides on YouTube

▶️ For AI, ChatGPT, Web, Tech, and Generative AI, subscribe to another channel: Ramesh Fadatare on YouTube

In this guide, you'll explore Python's unicodedata module to work with Unicode characters. Learn its key functions and examples for handling Unicode data.

The unicodedata module in Python provides access to the Unicode Character Database, which contains detailed information about every character defined in the Unicode standard. This module can be used to retrieve properties of Unicode characters, normalize Unicode strings, and perform various other operations related to Unicode data.

Introduction
unicodedata Module Functions
- unicodedata.lookup
- unicodedata.name
- unicodedata.decimal
- unicodedata.digit
- unicodedata.numeric
- unicodedata.category
- unicodedata.bidirectional
- unicodedata.combining
- unicodedata.mirrored
- unicodedata.east_asian_width
- unicodedata.decomposition
- unicodedata.normalize
- unicodedata.unidata_version
More Examples
- Using lookup
- Using name
- Using decimal, digit, and numeric
- Using category
- Using bidirectional
- Using combining
- Using mirrored
- Using east_asian_width
- Using decomposition
- Using normalize
Real-World Use Case
Conclusion
References

Introduction

The unicodedata module is a part of Python's standard library that allows you to work with Unicode data. Unicode is a standard for representing text in different writing systems.

This module provides various functions to query properties of Unicode characters, such as their names, categories, numeric values, and more. It also includes functions to normalize Unicode strings, which is essential for consistent text processing.

unicodedata Module Functions

unicodedata.lookup

Looks up a character by name and returns the corresponding character.

import unicodedata

char = unicodedata.lookup('GREEK SMALL LETTER ALPHA')
print(char)

Output:

α

unicodedata.name

Returns the name of a character. If no name is defined, raises a ValueError.

import unicodedata

name = unicodedata.name('α')
print(name)

Output:

GREEK SMALL LETTER ALPHA

unicodedata.decimal

Returns the decimal value of a character. If no such value is defined, raises a ValueError.

import unicodedata

decimal_value = unicodedata.decimal('5')
print(decimal_value)

Output:

unicodedata.digit

Returns the digit value of a character. If no such value is defined, raises a ValueError.

import unicodedata

digit_value = unicodedata.digit('Ⅴ')
print(digit_value)

Output:

unicodedata.numeric

Returns the numeric value of a character. If no such value is defined, raises a ValueError.

import unicodedata

numeric_value = unicodedata.numeric('⅕')
print(numeric_value)

Output:

0.2

unicodedata.category

Returns the general category assigned to the character.

import unicodedata

category = unicodedata.category('α')
print(category)

Output:

Ll

unicodedata.bidirectional

Returns the bidirectional class assigned to the character.

import unicodedata

bidi_class = unicodedata.bidirectional('α')
print(bidi_class)

Output:

unicodedata.combining

Returns the canonical combining class assigned to the character.

import unicodedata

combining_class = unicodedata.combining('́')  # Combining acute accent
print(combining_class)

Output:

unicodedata.mirrored

Returns 1 if the character has the "mirrored" property, 0 otherwise.

import unicodedata

is_mirrored = unicodedata.mirrored('∑')
print(is_mirrored)

Output:

unicodedata.east_asian_width

Returns the east Asian width assigned to the character.

import unicodedata

east_asian_width = unicodedata.east_asian_width('か')
print(east_asian_width)

Output:

unicodedata.decomposition

Returns the Unicode decomposition of the character.

import unicodedata

decomposition = unicodedata.decomposition('½')
print(decomposition)

Output:

0031 2044 0032

unicodedata.normalize

Returns the normal form of a Unicode string.

import unicodedata

normalized_str = unicodedata.normalize('NFC', 'é')
print(normalized_str)

Output:

é

unicodedata.unidata_version

Returns the version of the Unicode Character Database used.

import unicodedata

version = unicodedata.unidata_version
print(version)

Output:

14.0.0

More Examples

Using lookup

import unicodedata

char = unicodedata.lookup('LATIN SMALL LETTER A')
print(char)

Output:

Using name

import unicodedata

name = unicodedata.name('a')
print(name)

Output:

LATIN SMALL LETTER A

Using decimal, digit, and numeric

import unicodedata

decimal_value = unicodedata.decimal('9')
digit_value = unicodedata.digit('Ⅳ')
numeric_value = unicodedata.numeric('½')

print(f"Decimal: {decimal_value}, Digit: {digit_value}, Numeric: {numeric_value}")

Output:

Decimal: 9, Digit: 4, Numeric: 0.5

Using category

import unicodedata

category = unicodedata.category('A')
print(category)

Output:

Lu

Using bidirectional

import unicodedata

bidi_class = unicodedata.bidirectional('A')
print(bidi_class)

Output:

Using combining

import unicodedata

combining_class = unicodedata.combining('́')  # Combining acute accent
print(combining_class)

Output:

Using mirrored

import unicodedata

is_mirrored = unicodedata.mirrored('∑')
print(is_mirrored)

Output:

Using east_asian_width

import unicodedata

east_asian_width = unicodedata.east_asian_width('か')
print(east_asian_width)

Output:

Using decomposition

import unicodedata

decomposition = unicodedata.decomposition('½')
print(decomposition)

Output:

0031 2044 0032

Using normalize

import unicodedata

normalized_str = unicodedata.normalize('NFC', 'e\u0301')
print(normalized_str)

Output:

é

Real-World Use Case

Normalizing User Input

Normalize user input to ensure consistency in text processing.

import unicodedata

def normalize_input(user_input):
    return unicodedata.normalize('NFC', user_input)

user_input = "e\u0301"  # 'e' followed by combining acute accent
normalized = normalize_input(user_input)
print(normalized)

Output:

é

Conclusion

The unicodedata module in Python provides comprehensive access to the Unicode Character Database, allowing for detailed querying and manipulation of Unicode characters. This module is essential for ensuring consistency and correctness in text processing, especially when dealing with internationalization and multilingual text.

References

Python unicodedata module documentation

Related Python Programs with Output and Step-By-Step Explanation:

Spring 7 and Spring Boot 4 for Beginners (Includes 8 Projects)

Building Real-Time REST APIs with Spring Boot - Blog App

Building Microservices with Spring Boot and Spring Cloud

Full-Stack Java Development with Spring Boot 4 and React

Build 5 Spring Boot Projects with Java: Line-by-Line Coding

Testing Spring Boot Application with JUnit and Mockito

ChatGPT for Java Developers: Boost Your Productivity with AI

Spring Boot Thymeleaf Real-Time Web Application - Blog App

Master Spring Data JPA with Hibernate

Spring Boot + Apache Kafka Course - The Practical Guide

Java Testing: Mastering JUnit 5 Framework

Reactive Programming in Java: Spring WebFlux and Testing

Spring Boot + RabbitMQ Course - The Practical Guide

Free Courses on YouTube Channel

Python unicodedata Module

Table of Contents

Introduction

unicodedata Module Functions

unicodedata.lookup

unicodedata.name

unicodedata.decimal

unicodedata.digit

unicodedata.numeric

unicodedata.category

unicodedata.bidirectional

unicodedata.combining

unicodedata.mirrored

unicodedata.east_asian_width

unicodedata.decomposition

unicodedata.normalize

unicodedata.unidata_version

More Examples

Using lookup

Using name

Using decimal, digit, and numeric

Using category

Using bidirectional

Using combining

Using mirrored

Using east_asian_width

Using decomposition

Using normalize

Real-World Use Case

Normalizing User Input

Conclusion

References

Related Python Programs with Output and Step-By-Step Explanation:

Comments

Post a Comment

Spring Boot 3 Paid Course Published for Free on my Java Guides YouTube Channel

Spring Boot 3 Paid Course Published for Free
on my Java Guides YouTube Channel