Working with Manifests
Every data package is backed by a manifest. A manifest is a self-contained reference sheet for a package, containing all of the files data and metadata necessary to work with a package.
Every time you save a data package to a registry you also save its manifest. You can inspect the manifest yourself using the manifest
property:
Manifests saved to disk are in the jsonl/ndjson format, e.g. JSON strings separated by newlines (). They are represented as a list
of dict
fragments in-memory.
Manifest specification
The first item in the manifest contains the manifest version number (version
), the package metadata (user_meta
), and a package commit message (message
).
version
is used to ensure backwards compatibility should the serialization format change. There is currently only one valid version
: v0
.
user_meta
is used to store any user-defined package metadata. In this case this package has no package metadata (yet) so user_meta
is omitted.
message
stores the package commit message. It will be None
if the package is pushed without a commit message. The field is omitted if the package was never subject to a push
.
Every item after that is a manifest entry. Entries may be files/objects or directories. All files in the package will have a corresponding entry, as will all directories with metadata. Directories without metadata are omitted.
The manifest fields are as follows:
logical_key
- The path to the entry within the package.physical_keys
- A list of files. Currently this field will always have a single entry. This field is omitted if the entry is a directory.size
- The size of the entry in raw bytes. This field is omitted if the entry is a directory.hash
- Materialized packages record a content hash for every entry in the package. This field is used to ensure package immutability (the tophash is partly a hash of these hashes).If the hash is present it will be a
dict
fragment of the form{'type': 'SHA256', 'value': '...'}
. Un-materialized package entries have ahash
ofNone
, as in our example. Directory entries omit this field.meta
- Package entry metadata. Package entries lacking metadata will have ameta
of{}
(emptydict
).
Saving and loading manifests
In almost all cases you should be using registries, build
, and push
to handle sending manifests to and fro. However, there may be advanced use cases where you want to save or load a manifest directly. For that, you can use the low-level manifest API:
Last updated