Python: Huffman Coding Algorithm

1. Introduction

Huffman coding is a popular algorithm used for lossless data compression. The idea is to assign variable-length codes to input characters, with shorter codes for more frequent characters. It's a greedy algorithm that utilizes a priority queue.

2. Program Overview

1. Create a frequency dictionary from the input string.

2. Build the Huffman Tree using a priority queue.

3. Assign binary codes to characters by traversing the Huffman Tree.

4. Encode and decode the input string.

3. Code Program

# Importing the necessary libraries
from queue import PriorityQueue

# Defining the class for the node of the tree
class Node:
    def __init__(self, char, freq):
        self.char = char
        self.freq = freq
        self.left = None
        self.right = None

    # Defining comparators for nodes
    def __lt__(self, other):
        return self.freq < other.freq

# Building Huffman Tree
def build_tree(string):
    freq = {}
    for char in string:
        if char in freq:
            freq[char] += 1
        else:
            freq[char] = 1

    pq = PriorityQueue()
    for key in freq:
        pq.put(Node(key, freq[key]))

    while pq.qsize() > 1:
        left = pq.get()
        right = pq.get()

        merged = Node(None, left.freq + right.freq)
        merged.left = left
        merged.right = right

        pq.put(merged)

    return pq.get()

# Helper function to assign codes through Huffman Tree traversal
def assign_codes(node, code, bin_codes):
    if node is None:
        return

    if node.char is not None:
        bin_codes[node.char] = code
        return

    assign_codes(node.left, code + "0", bin_codes)
    assign_codes(node.right, code + "1", bin_codes)

# Encoding the string
def encode(string):
    root = build_tree(string)
    bin_codes = {}
    assign_codes(root, "", bin_codes)
    encoded_string = "".join([bin_codes[char] for char in string])

    return encoded_string, bin_codes

# Decoding the encoded string
def decode(encoded_string, bin_codes):
    reversed_codes = {value: key for key, value in bin_codes.items()}
    current_code = ""
    decoded_string = ""

    for bit in encoded_string:
        current_code += bit
        if current_code in reversed_codes:
            decoded_string += reversed_codes[current_code]
            current_code = ""

    return decoded_string

# Test
input_str = "this is an example for huffman encoding"
encoded_data, binary_codes = encode(input_str)
decoded_data = decode(encoded_data, binary_codes)
print(f"Encoded Data: {encoded_data}")
print(f"Decoded Data: {decoded_data}")

Output:

Encoded Data: [The encoded binary string will be displayed here]
Decoded Data: this is an example for huffman encoding

4. Step By Step Explanation

1. A frequency dictionary is created for all the characters in the input string.

2. Using these frequencies, a Huffman Tree is built using a priority queue. In the tree, nodes with higher frequencies are placed near the root.

3. Once the tree is built, we traverse the tree to assign binary codes to each character. The traversal is done such that going left is '0' and going right is '1'.

4. For encoding, the input string is then converted to its binary representation using the previously generated codes.

5. For decoding, the binary string is converted back to the original string using the binary codes.

Comments