In this guide, you'll explore the difflib module in Python, which helps compare and highlight differences between sequences. We’ll cover its key functions, classes, use cases, and examples to help you use it efficiently.
The difflib
module in Python provides classes and functions for comparing sequences, such as strings or lists, and generating differences (diffs) between them. This module is useful for tasks like comparing text files, computing deltas, and producing human-readable differences.
Table of Contents
- Introduction
SequenceMatcher
Class- Methods
__init__
set_seq1
set_seq2
set_seqs
find_longest_match
get_matching_blocks
get_opcodes
ratio
quick_ratio
real_quick_ratio
- Methods
Differ
Class- Methods
compare
- Methods
HtmlDiff
Class- Methods
make_file
make_table
- Methods
- Utility Functions
context_diff
unified_diff
ndiff
restore
IS_CHARACTER_JUNK
IS_LINE_JUNK
- Examples
- Using
SequenceMatcher
- Using
Differ
- Using
HtmlDiff
- Using Utility Functions
- Using
- Real-World Use Case
- Conclusion
- References
Introduction
The difflib
module provides a variety of classes and functions to compare sequences, find differences, and produce human-readable diff outputs. This is particularly useful for comparing text files, generating patches, or implementing features like version control systems.
SequenceMatcher Class
The SequenceMatcher
class compares pairs of sequences of any type and generates information about how they differ.
Methods
__init__
Initializes a SequenceMatcher
object.
import difflib
s = difflib.SequenceMatcher(isjunk=None, a='', b='')
isjunk
: A function that takes a sequence element and returnsTrue
if it is junk.a
: The first sequence to compare.b
: The second sequence to compare.
set_seq1
Sets the first sequence to be compared.
s.set_seq1('new_sequence')
set_seq2
Sets the second sequence to be compared.
s.set_seq2('new_sequence')
set_seqs
Sets both sequences to be compared.
s.set_seqs('sequence1', 'sequence2')
find_longest_match
Finds the longest contiguous matching subsequence.
match = s.find_longest_match(0, len(s.a), 0, len(s.b))
print(match) # Match object with attributes (i, j, size)
get_matching_blocks
Returns a list of triples describing matching subsequences.
matches = s.get_matching_blocks()
print(matches)
get_opcodes
Returns a list of 5-tuples describing how to turn the first sequence into the second.
opcodes = s.get_opcodes()
print(opcodes)
ratio
Returns a measure of the sequences' similarity as a float in the range [0, 1].
similarity = s.ratio()
print(similarity)
quick_ratio
Returns an upper bound on ratio()
relatively quickly.
quick_ratio = s.quick_ratio()
print(quick_ratio)
real_quick_ratio
Returns an upper bound on ratio()
very quickly.
real_quick_ratio = s.real_quick_ratio()
print(real_quick_ratio)
Differ Class
The Differ
class computes the difference between two sequences.
Methods
compare
Compares two sequences of lines, generating human-readable differences.
import difflib
d = difflib.Differ()
diff = d.compare('one\ntwo\nthree\n'.splitlines(), 'ore\ntwo\nthree\n'.splitlines())
print('\n'.join(diff))
HtmlDiff Class
The HtmlDiff
class generates HTML side-by-side comparison with change highlights.
Methods
make_file
Creates an HTML file with the differences between two sequences.
import difflib
hd = difflib.HtmlDiff()
html = hd.make_file('one\ntwo\nthree\n'.splitlines(), 'ore\ntwo\nthree\n'.splitlines(), context=True, numlines=1)
print(html)
make_table
Creates an HTML table with the differences between two sequences.
html_table = hd.make_table('one\ntwo\nthree\n'.splitlines(), 'ore\ntwo\nthree\n'.splitlines(), context=True, numlines=1)
print(html_table)
Utility Functions
context_diff
Generates context differences.
import difflib
diff = difflib.context_diff('one\ntwo\nthree\n'.splitlines(), 'ore\ntwo\nthree\n'.splitlines(), lineterm='')
print('\n'.join(diff))
unified_diff
Generates unified differences.
import difflib
diff = difflib.unified_diff('one\ntwo\nthree\n'.splitlines(), 'ore\ntwo\nthree\n'.splitlines(), lineterm='')
print('\n'.join(diff))
ndiff
Generates a delta from two sequences of lines.
import difflib
diff = difflib.ndiff('one\ntwo\nthree\n'.splitlines(), 'ore\ntwo\nthree\n'.splitlines())
print('\n'.join(diff))
restore
Generates one of the two sequences from a delta.
import difflib
delta = list(difflib.ndiff('one\ntwo\nthree\n'.splitlines(), 'ore\ntwo\nthree\n'.splitlines()))
restored = difflib.restore(delta, 1)
print('\n'.join(restored))
IS_CHARACTER_JUNK
Returns True
for whitespace characters, False
otherwise.
import difflib
print(difflib.IS_CHARACTER_JUNK(' '))
IS_LINE_JUNK
Returns True
for lines that are all whitespace, False
otherwise.
import difflib
print(difflib.IS_LINE_JUNK(' '))
Examples
Using SequenceMatcher
import difflib
s = difflib.SequenceMatcher(None, "abcdef", "abcfgh")
print("Similarity ratio:", s.ratio())
Output:
Similarity ratio: 0.6666666666666666
Using Differ
import difflib
d = difflib.Differ()
diff = d.compare("one\ntwo\nthree\n".splitlines(), "ore\ntwo\nthree\n".splitlines())
print('\n'.join(diff))
Output:
- one
+ ore
two
three
Using HtmlDiff
import difflib
hd = difflib.HtmlDiff()
html = hd.make_file("one\ntwo\nthree\n".splitlines(), "ore\ntwo\nthree\n".splitlines())
print(html)
Using Utility Functions
context_diff
import difflib
diff = difflib.context_diff("one\ntwo\nthree\n".splitlines(), "ore\ntwo\nthree\n".splitlines(), lineterm='')
print('\n'.join(diff))
Output:
***
---
***************
*** 1,3 ****
! one
two
three
--- 1,3 ----
! ore
two
three
unified_diff
import difflib
diff = difflib.unified_diff("one\ntwo\nthree\n".splitlines(), "ore\ntwo\nthree\n".splitlines(), lineterm='')
print('\n'.join(diff))
Output:
---
+++
@@ -1,3 +1,3 @@
- one
+ ore
two
three
ndiff
import difflib
diff = difflib.ndiff("one\ntwo\nthree\n".splitlines(), "ore\ntwo\nthree\n".splitlines())
print('\n'.join(diff))
Output:
- one
? ^
+ ore
? ^
two
three
restore
import difflib
delta = list(difflib.ndiff("one\ntwo\nthree\n".splitlines(), "ore\ntwo\nthree\n".splitlines()))
restored = difflib.restore(delta, 1)
print('\n'.join(restored))
Output:
ore
two
three
Real-World Use Case
Generating Diffs for Version Control
Use the difflib
module to generate diffs for a simple version control system.
import difflib
def generate_diff(old_text, new_text):
diff = difflib.unified_diff(old_text.splitlines(), new_text.splitlines(), lineterm='')
return '\n'.join(diff)
old_version = "one\ntwo\nthree\n"
new_version = (
"ore\ntwo\nthree\n"
)
diff = generate_diff(old_version, new_version)
print(diff)
Output:
---
+++
@@ -1,3 +1,3 @@
- one
+ ore
two
three
Conclusion
The difflib
module in Python provides built-in classes and functions for comparing sequences and generating human-readable differences. Whether you need to compare text files, implement a simple version control system, or generate HTML diffs, the difflib
module has you covered.
Comments
Post a Comment
Leave Comment