While developing the at we encountered the following challenge: G3D geometry file format VIM AEC Given a collection of byte arrays, how do you pack them efficiently into a single block to use in memory, write to disk, and transmit over a network, so that it can be easily unpacked on different platforms including Web, Mobile, and AR devices like the Magic Leap. This simple use case of storing and processing collections of large arrays of binary data pops up in a large number of problem domains, such as audio, image, video, 3D, geographical information systems (GIS), continuous fluid dynamics (CFD), photogrammetry, etc. In fact a lot of tabular data can be represented of as collections of named arrays. “Isn’t this a solved problem?” you might ask, “Can’t we just use one of several cross-platform binary solutions like the or any of other that exist?”. Certainly you can use them, but all of these binary formats are for nested structured data, so as a result they are necessarily more complex and less efficient than a specialized solution for the common use case of serializing byte arrays. Concise Binary Object Representation (CBOR) various Binary formats A Naïve Solution Obviously, one could implement a simple solution with relative ease. A common first approach to the problem (which several serialization libraries use) is to write the number of arrays, then for each byte array, write the number of bytes, followed by the byte data. Here is some sample code in C# to illustrate this approach: { ( stream = MemoryStream()) ( bw = BinaryWriter(stream))
    {
        bw.Write(buffers.Count); ( b buffers)
        {
            bw.Write(b.Length);
            bw.Write(b);
        } stream.ToArray();
    }
} IList< []> NaiveUnpack( [] data)
{ r = List< []>(); ( stream = MemoryStream()) ( br = BinaryReader(stream))
    { n = br.ReadInt32(); ( i = ; i < n; ++i)
        { localN = br.ReadInt32(); bytes = br.ReadBytes(localN);
            r.Add(bytes);
        }
    } r;
} [] ( ) public static byte NaivePack IList< []> buffers byte using var new using var new foreach var in return public static byte byte var new byte using var new using var new var for var 0 var var return This works well enough in simple cases and has acceptable performance for some use cases. However, it suffers from several shortcomings: The read code must be executed on an architecture that has the same than the writer endianness You can’t encode arrays that are larger than 2GB. A reader has no easy way to know that a data block was encoded using this scheme. Accessing data in an array can require multiple disk accesses (one to each array, to figure out where the next array is) Individual data buffers aren’t strictly aligned, so casting data to pointers won’t work reliably Overall, this is not a very robust or efficient solution. A Better Approach The following simple steps improve upon the previous naïve solution: Add a magic number to Identify the endianness of the reader. This will also help decoders quickly identify the encoding scheme of the data. Add a pointer table that identifies where each binary buffer starts and ends within the data block. Align the data-buffers to 64-byte addresses Given these observations, we decided to put together a formal specification called for encoding binary arrays, so that we could easily share binary arrays across different applications written in different languages. BFAST Introducing the BFAST Specification BFAST is an acronym for “Binary Format for Array Serialization and Transmission”. The BFAST format consists of three sections: Header — Fixed size descriptor (32 bytes) describing the file contents Ranges — An array of offset pairs indicating the begin and end of each buffer (relative to file begin) Data — 64-byte aligned data buffers The header is a 32-byte struct with the following layout: [ ] Header
{
    [ ] Magic; [ ] DataStart; [ ] DataEnd; [ ] NumArrays; } StructLayout(LayoutKind.Explicit, Pack = 8, Size = 32) public struct FieldOffset(0) public long // 0xBFA5 FieldOffset(8) public long // <= File size and >= 32 + Sizeof(Range) * NumArrays FieldOffset(16) public long // >= DataStart and <= file size FieldOffset(24) public long // Number of all buffers, including name buffer The ranges start at byte 32. There are of them. This is the total count of all buffers, including the first buffer that contains the names. Each and values are byte offsets relative to the beginning of the file. The ranges have the following layout: NumArrays Begin End [ ] Range
{
    [ ] Begin;
    [ ] End;
} StructLayout(LayoutKind.Explicit, Pack = 8, Size = 16) public struct FieldOffset(0) public long FieldOffset(8) public long The data section starts at the first 64 byte aligned address immediately following the last Range struct. This value is stored for validation purposes in the header as . DataStart The first data buffer contain the names of the subsequent buffers as a concatenated list of Utf-8 encoded strings separated by null characters. Names may be zero-length and are not guaranteed to be unique. A name may contain any Utf-8 encoded character except the null character. There must be N-1 names where N is the number of ranges (i.e. the value in header). NumArrays Final Words We have released our implementation as an open-source project at with a commercially friendly open-source license (the MIT license) in the hope that others find it useful and can contribute back improvement. Let us if you use the format or the implementation. It is always rewarding for developers to hear when their code is useful to others. https://github.com/vimaec/bfast know

We Built A New Data Format for Serializing Named Binary Buffers. Introducing BFAST

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

[Deep Dive] What is the G3D Geometry Exchange Format?

What No One Told Me About Being a Product Manager at an Early Stage Startup

What Are Convolution Neural Networks? [ELI5]

Rules of Thumb for Software Engineering

10,331,579,614 Records Leaked in 2019 And Counting...

10 Ways to Future-Proof Your Business With Cloud

[Deep Dive] What is the G3D Geometry Exchange Format?

What No One Told Me About Being a Product Manager at an Early Stage Startup

What Are Convolution Neural Networks? [ELI5]

Rules of Thumb for Software Engineering

10,331,579,614 Records Leaked in 2019 And Counting...

10 Ways to Future-Proof Your Business With Cloud

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps