Python mmap Module

The mmap module in Python provides memory-mapped file support, allowing a file's contents to be mapped directly into memory. This can lead to efficient file I/O operations and is particularly useful for accessing parts of a file without reading the entire file into memory.

Table of Contents

  1. Introduction
  2. Key Classes and Methods
    • mmap.mmap
    • mmap.close
    • mmap.find
    • mmap.flush
    • mmap.read
    • mmap.readline
    • mmap.resize
    • mmap.seek
    • mmap.size
    • mmap.write
    • mmap.write_byte
  3. Examples
    • Creating a Memory-Mapped File
    • Reading from a Memory-Mapped File
    • Writing to a Memory-Mapped File
    • Searching within a Memory-Mapped File
    • Resizing a Memory-Mapped File
  4. Real-World Use Case
  5. Conclusion
  6. References

Introduction

Memory mapping a file is a way to create a direct byte-for-byte mapping of a file into memory. This can significantly improve file I/O performance by allowing programs to access file data as if it were in-memory data. The mmap module provides a simple interface for memory-mapping files in Python.

Key Classes and Methods

mmap.mmap

Creates a memory-mapped file object.

import mmap

# Open a file for reading and writing
with open('example.txt', 'r+b') as f:
    mm = mmap.mmap(f.fileno(), 0)

mmap.close

Closes the memory-mapped file.

mm.close()

mmap.find

Searches for a substring in the memory-mapped file.

index = mm.find(b'substring')

mmap.flush

Flushes changes made to the memory-mapped file to the disk.

mm.flush()

mmap.read

Reads data from the memory-mapped file.

data = mm.read(size)

mmap.readline

Reads a line from the memory-mapped file.

line = mm.readline()

mmap.resize

Resizes the memory-mapped file.

mm.resize(new_size)

mmap.seek

Moves the file pointer to a new position.

mm.seek(position)

mmap.size

Returns the size of the memory-mapped file.

size = mm.size()

mmap.write

Writes data to the memory-mapped file.

mm.write(b'data')

mmap.write_byte

Writes a single byte to the memory-mapped file.

mm.write_byte(b'X')

Examples

Creating a Memory-Mapped File

import mmap

# Open a file for reading and writing
with open('example.txt', 'r+b') as f:
    mm = mmap.mmap(f.fileno(), 0)

# Remember to close the memory-mapped file
mm.close()

Reading from a Memory-Mapped File

import mmap

# Open a file for reading
with open('example.txt', 'r+b') as f:
    mm = mmap.mmap(f.fileno(), 0)
    # Read the first 10 bytes
    print(mm.read(10))
    # Move the file pointer to the beginning
    mm.seek(0)
    # Read the first line
    print(mm.readline())
    mm.close()

Writing to a Memory-Mapped File

import mmap

# Open a file for reading and writing
with open('example.txt', 'r+b') as f:
    mm = mmap.mmap(f.fileno(), 0)
    # Write data to the file
    mm.write(b'Hello, World!')
    # Flush changes to the disk
    mm.flush()
    mm.close()

Searching within a Memory-Mapped File

import mmap

# Open a file for reading
with open('example.txt', 'r+b') as f:
    mm = mmap.mmap(f.fileno(), 0)
    # Find a substring
    index = mm.find(b'World')
    if index != -1:
        print(f'Substring found at index {index}')
    else:
        print('Substring not found')
    mm.close()

Resizing a Memory-Mapped File

import mmap

# Open a file for reading and writing
with open('example.txt', 'r+b') as f:
    mm = mmap.mmap(f.fileno(), 0)
    # Resize the file
    mm.resize(100)
    mm.close()

Real-World Use Case

Efficient Log File Processing

When processing large log files, it can be inefficient to load the entire file into memory. Using memory-mapped files allows you to access and process the file in chunks.

import mmap

def process_log(file_path):
    with open(file_path, 'r+b') as f:
        mm = mmap.mmap(f.fileno(), 0)
        while True:
            line = mm.readline()
            if not line:
                break
            process_line(line)
        mm.close()

def process_line(line):
    # Implement your log processing logic here
    print(line)

if __name__ == '__main__':
    process_log('log.txt')

Conclusion

The mmap module in Python provides a powerful way to work with files by mapping their contents directly into memory. This can lead to significant performance improvements for I/O-intensive applications. By using memory-mapped files, you can efficiently read, write, and search within large files without loading the entire file into memory.

References

Comments

Spring Boot 3 Paid Course Published for Free
on my Java Guides YouTube Channel

Subscribe to my YouTube Channel (165K+ subscribers):
Java Guides Channel

Top 10 My Udemy Courses with Huge Discount:
Udemy Courses - Ramesh Fadatare