NLF (Nested-List File) format

Table of contents

You must have JavaScript enabled in your browser to generate the table of contents.

1  Overview

NLF (Nested-List File) format is a general, chunk-based file metaformat for binary data. It is intended as a modern replacement for the Electronic Arts Interchange File Format (EA IFF 85). Back in 1985, when the IFF specification was published, memory and mass storage were scarce resources by today's standards, and file sizes of several gibibytes must have seemed a long way off. The IFF's 4-byte chunk-size field, even when used (contrary to the original specification) to hold unsigned values, results in an upper bound of 4 gibibytes on the size of a chunk. For applications such as video and multi-channel audio, the 4 GiB limit is already proving too restrictive, and new formats (eg, the EBU's RF64 format for audio) have been developed to overcome the limit. NLF format is one such: a chunk-based metaformat that can accommodate chunks of up to 262−1 bytes.

These days, when designing a file format, the obvious technology to investigate is XML. XML, a text-based format, may be unsuitable for large amounts of binary data because the encoding of the data as text (eg, using Base64) increases the size of the data (by at least 33%, using Base64) and reduces efficiency. If neither of these disadvantages is likely to be significant, then XML is the way to go.

The past decade has seen the introduction of several metaformats that are intended to be a binary equivalent of XML (eg, Fast Infoset, an ISO standard; the W3C's EXI; Extensible Binary Meta Language, used in the Matroska container). As yet, their adoption appears to be quite limited, but one of them is almost certain to be a better alternative to NLF in any particular case.

The NLF format retains the key concept of the Interchange File Format: the division of a file into tagged chunks that have a header consisting of an identifier and a size field. An NLF chunk identifier can be any valid unprefixed XML name that is not longer than 255 bytes when encoded as a UTF-8 sequence. The size of a size field is fixed at 8 bytes.

In contrast to the variety of IFF structures — FORM, LIST and CAT — the NLF format has only one structure, which gives it its name: the nested list. A nested list is a rooted tree whose leaf nodes are chunks, and whose branch nodes are lists of chunks and other lists. This general structure was considered to be adequate for the uses to which a Nested-List File might be put.

1.1  NLF format and XML

One of the design considerations of the NLF format was that it should be possible to convert any NLF to XML without much effort. To this end, some of the constraints on values in an NLF have been chosen in order to make them compatible with their analogues in XML. In particular, identifiers and the names of attributes must be valid unprefixed XML names, and namespace names must be well-formed URI references. There is an obvious similarity between the tree structure of an NLF and that of an XML document, but there is not a one-to-one correspondence between the nodes of the two trees. A simple chunk — a leaf node — in an NLF corresponds to an XML element that has no attributes or child elements, and a list — a branch node — in an NLF corresponds to an XML element that has attributes and child elements but contains no character data.

2  Files

A Nested-List File consists of a file header and a root list. The root list is the list at the top level of an NLF document. The structure of an NLF is shown in table 2.1.

Nested-List File
Multiplicity Key Description Size
1   File header 8 bytes
1 rootList Root list n bytes

Table 2.1

The file header, shown in table 2.2, identifies a file as a Nested-List File, and indicates the NLF version. There is currently only one version, 0 (zero), so the version field of the file header contains "00". The header also contains flags and a reserved byte. Only one flag is defined in version 0: the byte-order flag. The reserved byte must be set to zero.

File header
Multiplicity Key Description Value Size
1 fileId Nested-List File identifier U+0095 + "NLF" 4 bytes
1 version File version, ASCII decimal digits dd 2 bytes
1 flags1 Flags   1 bytes
1   Reserved 0 1 bytes

Table 2.2

Bit Value Meaning
0 0 Multi-byte size fields are big-endian
1 Multi-byte size fields are little-endian

Table 2.3

2.1  Byte order

As standards go, IFF is a pretty good one — general and flexible enough for many uses. Maybe it was too general and flexible for Microsoft, who, jointly with IBM, introduced a variant, the Resource Interchange File Format (RIFF), a few years after the introduction of IFF. RIFF is essentially the same as IFF except that the byte order of its chunk-size fields is little-endian rather big-endian. Ostensibly, RIFF was "needed" for the little-endian 80x86 processor, even though the byte order of IFF applied only to the chunk-size fields — the chunks themselves can contain anything you like, which is one of the virtues of IFF.

The NLF format caters for both big-endian and little-endian multi-byte integers by the simple expedient of a flag in the file header. The byte order denoted by the flag applies globally to all multi-byte size fields in chunk headers and within attributes chunks.

3  Identifiers

Identifiers are used in two ways in a Nested-List File: to identify a chunk, and to identify the instance of a list. Every chunk in a Nested-List File has an identifier, which need not be unique. The identifier is the first field in a chunk header; it consists of a size byte followed by a UTF-8 sequence of between 1 and 255 bytes that encodes a Unicode string. The string is referred to as the value of the identifier. The structure of an identifier is shown in table 3.1.

The legal values of an identifier are restricted to valid unprefixed XML names (XML names that don't contain a ':') to allow the conversion of an NLF to XML. Identifiers that begin with the character '$' are reserved.

Multiplicity Key Description Value Size
1 size Size of identifier 1..255 1 bytes
1 value Identifier value   size bytes

Table 3.1

4  Chunks

The chunk is the primary structural unit of a Nested-List File. A chunk is either a simple chunk or a list. A list is a chunk that can contain other chunks. In the tree structure of a Nested-List File, simple chunks are leaf nodes and lists are branch nodes.

Chunks are also categorised as either general or special; a special chunk is one that has a reserved identifier. There are currently two kinds of special chunk: a list and an attributes chunk. The form and role of a special chunk are predetermined.

A chunk is composed of a header and data. The chunk data follow immediately after the header. The header consists of an identifier and a size field, as shown in table 4.1. The size of a chunk that appears in the size field is the size of the chunk data; it does not include the size of the header. Note that, unlike IFF chunks, the chunks in an NLF are not padded to an even length.

The extent of a chunk can be determined from its header in two steps:

  1. read the first byte of the header to get the size of the identifier, and thus to locate the size field;
  2. read the size field to locate the end of the chunk.

A chunk is in the namespace that was declared by its most recent ancestor; ie, the first namespace name that is encountered when ascending the list tree. If none of its ancestors has a namespace name, the chunk is not in a namespace.

Chunk header
Multiplicity Key Description Value Size
1 id Identifier   1 + n bytes
1 size Size of chunk 0..262−1 8 bytes

Table 4.1

5  Lists

A list is a special chunk that can contain other chunks, including other lists. It has the reserved identifier "$LIST". The chunks that are elements of a list are the children of the list, and the list that is the immediate container of a chunk is the parent of the chunk. There is no restriction on the number of children that a list may have; the only restriction is on the size of the list chunk.

A list has a header that is composed of a chunk header and a list-header extension, which consists of a list-instance identifier and a namespace name. The size of a list does not include the size of the chunk header, but it does include the size of the list-header extension. The structure of a list header is shown in table 5.1.

List header
Multiplicity Key Description Value Size
1 id List identifier "$LIST" 6 bytes
1 size Size of list 0..262−1 8 bytes
1 instanceId List-instance identifier   1 + n bytes
1 nsNameSize Size of namespace name 0..65535 2 bytes
1 namespaceName Namespace name   nsNameSize bytes

Table 5.1

5.1  Namespaces

Like other aspects of the NLF format, namespaces are modelled on the equivalent feature in XML, though namespaces in NLF are coarser and less flexible than those in XML. In particular, there are no namespace prefixes in an NLF: a namespace name in an NLF list corresponds to a default namespace declaration in an XML element. A more elaborate namespace scheme, with prefixed chunk identifiers and namespace bindings, was considered unnecessary for NLFs.

An NLF list may have a namespace name, which is specified in a field in the list header. The field consists of two size bytes followed by a UTF-8 sequence of between 0 and 65535 bytes that encodes the namespace name. A namespace name must be a well-formed URI reference, eg, or data:text/plain;charset=UTF-8,undenkbar. If the field contains a namespace name, a namespace with that name is declared on the list and its contents. The namespace applies to the list itself and to all of its descendants that do not come within the scope of a narrower namespace. In other words, the namespace that applies to a chunk is the namespace that was declared by its most recent ancestor; ie, the first namespace name that is encountered when ascending the list tree. If none of its ancestors has a namespace name, the chunk is not in a namespace.

6  Attributes chunks

An attributes chunk is a special chunk that contains a list of name–value pairs. It has the reserved identifier "$ATTR". A list may contain one attributes chunk, which must be the first chunk in the list. The attributes of a list are intended to be analogous to the attributes of an element in an XML document.

Attributes chunk
Multiplicity Key Description Value Size
1 id Attributes chunk identifier "$ATTR" 6 bytes
0..*   Attribute   n bytes

Table 6.1

The name and value of an attribute are Unicode strings that each appear in an NLF as two size bytes followed by a UTF-8 sequence of up to 65535 bytes that encodes the string. The structure of an attribute is shown in table 6.2. An attribute name must contain at least one character, whereas an attribute value may be empty. The names of the attributes within an attributes chunk must be unique. The legal values of an attribute name are restricted to valid unprefixed XML names (XML names that don't contain a ':') to allow the conversion of an NLF to XML.

Although an attributes chunk is in the same namespace as its parent list, the attributes themselves are considered not to be in a namespace.

Multiplicity Key Description Value Size
1 nameSize Size of name 1..65535 2 bytes
1 name Name   nameSize bytes
1 valueSize Size of value 0..65535 2 bytes
1 value Value   valueSize bytes

Table 6.2

Last modified: 2015-05-20