🎓 Top 15 Udemy Courses (80-90% Discount): My Udemy Courses - Ramesh Fadatare — All my Udemy courses are real-time and project oriented courses.
▶️ Subscribe to My YouTube Channel (178K+ subscribers): Java Guides on YouTube
▶️ For AI, ChatGPT, Web, Tech, and Generative AI, subscribe to another channel: Ramesh Fadatare on YouTube
In this guide, you'll explore the difflib module in Python, which helps compare and highlight differences between sequences. We’ll cover its key functions, classes, use cases, and examples to help you use it efficiently.
The difflib module in Python provides classes and functions for comparing sequences, such as strings or lists, and generating differences (diffs) between them. This module is useful for tasks like comparing text files, computing deltas, and producing human-readable differences.
Table of Contents
- Introduction
SequenceMatcherClass- Methods
__init__set_seq1set_seq2set_seqsfind_longest_matchget_matching_blocksget_opcodesratioquick_ratioreal_quick_ratio
- Methods
DifferClass- Methods
compare
- Methods
HtmlDiffClass- Methods
make_filemake_table
- Methods
- Utility Functions
context_diffunified_diffndiffrestoreIS_CHARACTER_JUNKIS_LINE_JUNK
- Examples
- Using
SequenceMatcher - Using
Differ - Using
HtmlDiff - Using Utility Functions
- Using
- Real-World Use Case
- Conclusion
- References
Introduction
The difflib module provides a variety of classes and functions to compare sequences, find differences, and produce human-readable diff outputs. This is particularly useful for comparing text files, generating patches, or implementing features like version control systems.
SequenceMatcher Class
The SequenceMatcher class compares pairs of sequences of any type and generates information about how they differ.
Methods
__init__
Initializes a SequenceMatcher object.
import difflib
s = difflib.SequenceMatcher(isjunk=None, a='', b='')
isjunk: A function that takes a sequence element and returnsTrueif it is junk.a: The first sequence to compare.b: The second sequence to compare.
set_seq1
Sets the first sequence to be compared.
s.set_seq1('new_sequence')
set_seq2
Sets the second sequence to be compared.
s.set_seq2('new_sequence')
set_seqs
Sets both sequences to be compared.
s.set_seqs('sequence1', 'sequence2')
find_longest_match
Finds the longest contiguous matching subsequence.
match = s.find_longest_match(0, len(s.a), 0, len(s.b))
print(match) # Match object with attributes (i, j, size)
get_matching_blocks
Returns a list of triples describing matching subsequences.
matches = s.get_matching_blocks()
print(matches)
get_opcodes
Returns a list of 5-tuples describing how to turn the first sequence into the second.
opcodes = s.get_opcodes()
print(opcodes)
ratio
Returns a measure of the sequences' similarity as a float in the range [0, 1].
similarity = s.ratio()
print(similarity)
quick_ratio
Returns an upper bound on ratio() relatively quickly.
quick_ratio = s.quick_ratio()
print(quick_ratio)
real_quick_ratio
Returns an upper bound on ratio() very quickly.
real_quick_ratio = s.real_quick_ratio()
print(real_quick_ratio)
Differ Class
The Differ class computes the difference between two sequences.
Methods
compare
Compares two sequences of lines, generating human-readable differences.
import difflib
d = difflib.Differ()
diff = d.compare('one\ntwo\nthree\n'.splitlines(), 'ore\ntwo\nthree\n'.splitlines())
print('\n'.join(diff))
HtmlDiff Class
The HtmlDiff class generates HTML side-by-side comparison with change highlights.
Methods
make_file
Creates an HTML file with the differences between two sequences.
import difflib
hd = difflib.HtmlDiff()
html = hd.make_file('one\ntwo\nthree\n'.splitlines(), 'ore\ntwo\nthree\n'.splitlines(), context=True, numlines=1)
print(html)
make_table
Creates an HTML table with the differences between two sequences.
html_table = hd.make_table('one\ntwo\nthree\n'.splitlines(), 'ore\ntwo\nthree\n'.splitlines(), context=True, numlines=1)
print(html_table)
Utility Functions
context_diff
Generates context differences.
import difflib
diff = difflib.context_diff('one\ntwo\nthree\n'.splitlines(), 'ore\ntwo\nthree\n'.splitlines(), lineterm='')
print('\n'.join(diff))
unified_diff
Generates unified differences.
import difflib
diff = difflib.unified_diff('one\ntwo\nthree\n'.splitlines(), 'ore\ntwo\nthree\n'.splitlines(), lineterm='')
print('\n'.join(diff))
ndiff
Generates a delta from two sequences of lines.
import difflib
diff = difflib.ndiff('one\ntwo\nthree\n'.splitlines(), 'ore\ntwo\nthree\n'.splitlines())
print('\n'.join(diff))
restore
Generates one of the two sequences from a delta.
import difflib
delta = list(difflib.ndiff('one\ntwo\nthree\n'.splitlines(), 'ore\ntwo\nthree\n'.splitlines()))
restored = difflib.restore(delta, 1)
print('\n'.join(restored))
IS_CHARACTER_JUNK
Returns True for whitespace characters, False otherwise.
import difflib
print(difflib.IS_CHARACTER_JUNK(' '))
IS_LINE_JUNK
Returns True for lines that are all whitespace, False otherwise.
import difflib
print(difflib.IS_LINE_JUNK(' '))
Examples
Using SequenceMatcher
import difflib
s = difflib.SequenceMatcher(None, "abcdef", "abcfgh")
print("Similarity ratio:", s.ratio())
Output:
Similarity ratio: 0.6666666666666666
Using Differ
import difflib
d = difflib.Differ()
diff = d.compare("one\ntwo\nthree\n".splitlines(), "ore\ntwo\nthree\n".splitlines())
print('\n'.join(diff))
Output:
- one
+ ore
two
three
Using HtmlDiff
import difflib
hd = difflib.HtmlDiff()
html = hd.make_file("one\ntwo\nthree\n".splitlines(), "ore\ntwo\nthree\n".splitlines())
print(html)
Using Utility Functions
context_diff
import difflib
diff = difflib.context_diff("one\ntwo\nthree\n".splitlines(), "ore\ntwo\nthree\n".splitlines(), lineterm='')
print('\n'.join(diff))
Output:
***
---
***************
*** 1,3 ****
! one
two
three
--- 1,3 ----
! ore
two
three
unified_diff
import difflib
diff = difflib.unified_diff("one\ntwo\nthree\n".splitlines(), "ore\ntwo\nthree\n".splitlines(), lineterm='')
print('\n'.join(diff))
Output:
---
+++
@@ -1,3 +1,3 @@
- one
+ ore
two
three
ndiff
import difflib
diff = difflib.ndiff("one\ntwo\nthree\n".splitlines(), "ore\ntwo\nthree\n".splitlines())
print('\n'.join(diff))
Output:
- one
? ^
+ ore
? ^
two
three
restore
import difflib
delta = list(difflib.ndiff("one\ntwo\nthree\n".splitlines(), "ore\ntwo\nthree\n".splitlines()))
restored = difflib.restore(delta, 1)
print('\n'.join(restored))
Output:
ore
two
three
Real-World Use Case
Generating Diffs for Version Control
Use the difflib module to generate diffs for a simple version control system.
import difflib
def generate_diff(old_text, new_text):
diff = difflib.unified_diff(old_text.splitlines(), new_text.splitlines(), lineterm='')
return '\n'.join(diff)
old_version = "one\ntwo\nthree\n"
new_version = (
"ore\ntwo\nthree\n"
)
diff = generate_diff(old_version, new_version)
print(diff)
Output:
---
+++
@@ -1,3 +1,3 @@
- one
+ ore
two
three
Conclusion
The difflib module in Python provides built-in classes and functions for comparing sequences and generating human-readable differences. Whether you need to compare text files, implement a simple version control system, or generate HTML diffs, the difflib module has you covered.
Comments
Post a Comment
Leave Comment