Warning

This library is in the planning stages. The API will be very unstable and I do not recommend its use for anything serious.

This documentation may in places be more aspirational than accurate.

_images/bitformat_logo.png

bitformat is a Python module for creating and parsing file formats, especially at the bit rather than byte level.

It is intended to complement the bitstring module from the same author, and uses its Dtype, Bits and Array classes as the basis for building complex bit formats.

Features

  • A bitformat is a specification of a binary format using fields that can say how to build it from supplied values, or how to parse binary data to retrieve those values.

  • A wide array of data types is supported. Want to use a 13 bit integer or an 8-bit float? Fine - there are no special hoops to jump through.

  • Several field types are available:

    • The simplest is just a Field which contains a single data type, and either a single value or an array of values. These can usually be constructed from just a string.

    • A Format contains a list of other fields. These can be nested to any depth.

    • [Coming soon] Fields like Repeat, Find and Condition can be used to add more logical structure.

  • The values of other fields can be used in later calculations via an f-string-like expression syntax.

  • Data is always stored efficiently as a contiguous array of bits.

An Example

A quick example to whet the appetite: the MPEG-2 video standard specifies a ‘sequence_header’ that could be defined in bitformat by

seq_header = Format(['hex32 <sequence_header_code> = 0x000001b3',
                     'u12   <horizontal_size_value>',
                     'u12   <vertical_size_value>',
                     'u4    <aspect_ratio_information>',
                     'u4    <framte_rate_code>',
                     'u18   <bit_rate_value>',
                     'bool  <marker_bit>',
                     'u10   <vbv_buffer_size_value>',
                     'bool  <constrained_parameters_flag>',
                     'bool  <load_intra_quantizer_matrix>',
                     Repeat('{load_intra_quantizer_matrix}',
                         'u8 * 64 <intra_quantizer_matrix>'),
                     'bool  <load_non_intra_quantizer_matrix>',
                     Repeat('{load_non_intra_quantizer_matrix}',
                         'u8 * 64 <non_intra_quantizer_matrix>')
                     ], 'sequence_header')

To parse such a header you can write simply

seq_header.parse(some_bytes_object)

then you can access and modify the field values

seq_header['bit_rate_value'].value *= 2

before rebuilding the binary object

b = seq_header.build()

Installation and download

I am planning on a minimal viable product release by April 2024, with a fuller release later in the year. If you wish to try it out now then I recommend installing from the main branch on GitHub as that will be far ahead of the release on PyPI.

pip install git+https://github.com/scott-griffiths/bitformat

To download the module, as well as for defect reports, enhancement requests and Git repository browsing go to the project’s home on GitHub.

Documentation

These docs are styled using the Piccolo theme.