🎓 Top 15 Udemy Courses (80-90% Discount): My Udemy Courses - Ramesh Fadatare — All my Udemy courses are real-time and project oriented courses.
▶️ Subscribe to My YouTube Channel (178K+ subscribers): Java Guides on YouTube
▶️ For AI, ChatGPT, Web, Tech, and Generative AI, subscribe to another channel: Ramesh Fadatare on YouTube
The html module in Python provides tools for handling HTML data, including escaping and unescaping HTML characters and parsing HTML documents. It is useful for web scraping, web development, and any application that needs to process HTML content.
Table of Contents
- Introduction
- Key Functions and Classes
html.escapehtml.unescapehtml.parser.HTMLParser
- Examples
- Escaping HTML Characters
- Unescaping HTML Characters
- Basic HTML Parsing
- Real-World Use Case
- Conclusion
- References
Introduction
The html module provides functions for escaping and unescaping HTML special characters, as well as a base class for parsing HTML documents. This is essential for web scraping, web development, and data processing tasks involving HTML content.
Key Functions and Classes
html.escape
Escapes HTML characters in a string.
import html
escaped_string = html.escape('<div class="content">Hello, World!</div>')
print(escaped_string)
Output:
<div class="content">Hello, World!</div>
html.unescape
Unescapes HTML characters in a string.
import html
unescaped_string = html.unescape('<div class="content">Hello, World!</div>')
print(unescaped_string)
Output:
<div class="content">Hello, World!</div>
html.parser.HTMLParser
A base class for parsing HTML documents.
from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
print("Start tag:", tag)
for attr in attrs:
print(" attr:", attr)
def handle_endtag(self, tag):
print("End tag :", tag)
def handle_data(self, data):
print("Data :", data)
parser = MyHTMLParser()
parser.feed('<div class="content">Hello, World!</div>')
Output:
Start tag: div
attr: ('class', 'content')
Data : Hello, World!
End tag : div
Examples
Escaping HTML Characters
import html
html_string = '<div class="content">Hello, World!</div>'
escaped_string = html.escape(html_string)
print('Escaped:', escaped_string)
Output:
Escaped: <div class="content">Hello, World!</div>
Unescaping HTML Characters
import html
escaped_string = '<div class="content">Hello, World!</div>'
unescaped_string = html.unescape(escaped_string)
print('Unescaped:', unescaped_string)
Output:
Unescaped: <div class="content">Hello, World!</div>
Basic HTML Parsing
from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
print("Start tag:", tag)
for attr in attrs:
print(" attr:", attr)
def handle_endtag(self, tag):
print("End tag :", tag)
def handle_data(self, data):
print("Data :", data)
parser = MyHTMLParser()
parser.feed('<div class="content">Hello, World!</div>')
Output:
Start tag: div
attr: ('class', 'content')
Data : Hello, World!
End tag : div
Real-World Use Case
Extracting Links from HTML
from html.parser import HTMLParser
class LinkExtractor(HTMLParser):
def __init__(self):
super().__init__()
self.links = []
def handle_starttag(self, tag, attrs):
if tag == 'a':
for attr in attrs:
if attr[0] == 'href':
self.links.append(attr[1])
html_content = '''
<html>
<body>
<a href="http://example.com">Example</a>
<a href="http://example.org">Example Org</a>
</body>
</html>
'''
parser = LinkExtractor()
parser.feed(html_content)
print('Extracted links:', parser.links)
Output:
Extracted links: ['http://example.com', 'http://example.org']
Conclusion
The html module in Python provides essential tools for handling HTML data. Whether you need to escape or unescape HTML characters or parse HTML documents, this module has the functionality you need for web scraping, web development, and data processing tasks involving HTML content.
References
My Top and Bestseller Udemy Courses. The sale is going on with a 70 - 80% discount. The discount coupon has been added to each course below:
Build REST APIs with Spring Boot 4, Spring Security 7, and JWT
[NEW] Learn Apache Maven with IntelliJ IDEA and Java 25
ChatGPT + Generative AI + Prompt Engineering for Beginners
Spring 7 and Spring Boot 4 for Beginners (Includes 8 Projects)
Available in Udemy for Business
Building Real-Time REST APIs with Spring Boot - Blog App
Available in Udemy for Business
Building Microservices with Spring Boot and Spring Cloud
Available in Udemy for Business
Java Full-Stack Developer Course with Spring Boot and React JS
Available in Udemy for Business
Build 5 Spring Boot Projects with Java: Line-by-Line Coding
Testing Spring Boot Application with JUnit and Mockito
Available in Udemy for Business
Spring Boot Thymeleaf Real-Time Web Application - Blog App
Available in Udemy for Business
Master Spring Data JPA with Hibernate
Available in Udemy for Business
Spring Boot + Apache Kafka Course - The Practical Guide
Available in Udemy for Business
Comments
Post a Comment
Leave Comment