quilt3.Package
Package(self)
In-memory representation of a package
manifest
Provides a generator of the dicts that make up the serialized package.
top_hash
Returns the top hash of the package.
Note that physical keys are not hashed because the package has the same semantics regardless of where the bytes come from.
Returns
A string that represents the top hash of the package
Package.__repr__(self, max_lines=20)
String representation of the Package.
Package.install(name, registry=None, top_hash=None, dest=None, dest_registry=None)
Installs a named package to the local registry and downloads its files.
Arguments
name(str): Name of package to install.
registry(str): Registry where package is located.
Defaults to the default remote registry.
top_hash(str): Hash of package to install. Defaults to latest.
dest(str): Local path to download files to.
dest_registry(str): Registry to install package to. Defaults to local registry.
Returns
A new Package that points to files on your local machine.
Package.resolve_hash(registry, hash_prefix)
Find a hash that starts with a given prefix. Arguments
registry(string): location of registry
hash_prefix(string): hash prefix with length between 6 and 64 characters
Package.browse(name, registry=None, top_hash=None)
Load a package into memory from a registry without making a local copy of the manifest. Arguments
name(string): name of package to load
registry(string): location of registry to load package from
top_hash(string): top hash of package version to load
Package.__contains__(self, logical_key)
Checks whether the package contains a specified logical_key.
Returns
True or False
Package.__getitem__(self, logical_key)
Filters the package based on prefix, and returns either a new Package or a PackageEntry.
Arguments
prefix(str): prefix to filter on
Returns
PackageEntry if prefix matches a logical_key exactly otherwise Package
Package.fetch(self, dest='./')
Copy all descendants to dest
. Descendants are written under their logical names relative to self.
Arguments
dest: where to put the files (locally)
Returns
None
Package.keys(self)
Returns logical keys in the package.
Package.walk(self)
Generator that traverses all entries in the package tree and returns tuples of (key, entry), with keys in alphabetical order.
Package.load(readable_file)
Loads a package from a readable file-like object.
Arguments
readable_file: readable file-like object to deserialize package from
Returns
A new Package object
Raises
file not found json decode error invalid package exception
Package.set_dir(self, lkey, path=None, meta=None)
Adds all files from path
to the package.
Recursively enumerates every file in path
, and adds them to the package according to their relative location to path
.
Arguments
lkey(string): prefix to add to every logical key,
use '/' for the root of the package.
path(string): path to scan for files to add to package.
If None, lkey will be substituted in as the path.
meta(dict): user level metadata dict to attach to lkey directory entry.
Returns
self
Raises
When path
doesn't exist
Package.get(self, logical_key)
Gets object from logical_key and returns its physical path. Equivalent to self[logical_key].get().
Arguments
logical_key(string): logical key of the object to get
Returns
Physical path as a string.
Raises
KeyError
: when logical_key is not present in the packageValueError
: if the logical_key points to a Package rather than PackageEntry.
Package.readme(self)
Returns the README PackageEntry
The README is the entry with the logical key 'README.md' (case-sensitive). Will raise a QuiltException if no such entry exists.
Package.set_meta(self, meta)
Sets user metadata on this Package.
Package.build(self, name, registry=None, message=None)
Serializes this package to a registry.
Arguments
name: optional name for package
registry: registry to build to
message: the commit message of the package
Returns
The top hash as a string.
Package.dump(self, writable_file)
Serializes this package to a writable file-like object.
Arguments
writable_file: file-like object to write serialized package.
Returns
None
Raises
fail to create file fail to finish write
Package.set(self, logical_key, entry=None, meta=None, serialization_location=None, serialization_format_opts=None)
Returns self with the object at logical_key set to entry.
Arguments
logical_key(string): logical key to update
entry(PackageEntry OR string OR object): new entry to place at logical_key in the package.
If entry is a string, it is treated as a URL, and an entry is created based on it.
If entry is None, the logical key string will be substituted as the entry value.
If entry is an object and quilt knows how to serialize it, it will immediately be serialized and written
to disk, either to serialization_location or to a location managed by quilt. List of types that Quilt
can serialize is available by calling
quilt3.formats.FormatRegistry.all_supported_formats()
meta(dict): user level metadata dict to attach to entry
serialization_format_opts(dict): Optional. If passed in, only used if entry is an object. Options to help
Quilt understand how the object should be serialized. Useful for underspecified file formats like csv
when content contains confusing characters. Will be passed as kwargs to the FormatHandler.serialize()
function. See docstrings for individual FormatHandlers for full list of options -
https: //github.com/quiltdata/quilt/blob/master/api/python/quilt3/formats.py
serialization_location(string): Optional. If passed in, only used if entry is an object. Where the
serialized object should be written, e.g. "./mydataframe.parquet"
Returns
self
Package.delete(self, logical_key)
Returns the package with logical_key removed.
Returns
self
Raises
KeyError
: when logical_key is not present to be deleted
Package.push(self, name, registry=None, dest=None, message=None, selector_fn= at 0x10d02aa70>)
Copies objects to path, then creates a new package that points to those objects. Copies each object in this package to path according to logical key structure, then adds to the registry a serialized version of this package with physical keys that point to the new copies.
Note that push is careful to not push data unnecessarily. To illustrate, imagine you have a PackageEntry: pkg["entry_1"].physical_key = "/tmp/package_entry_1.json"
If that entry would be pushed to s3://bucket/prefix/entry_1.json
, but s3://bucket/prefix/entry_1.json
already contains the exact same bytes as '/tmp/package_entry_1.json', quilt3
will not push the bytes to s3, no matter what selector_fn('entry_1', pkg["entry_1"])
returns.
However, selector_fn will dictate whether the new package points to the local file or to s3:
If selector_fn('entry_1', pkg["entry_1"]) == False
, new_pkg["entry_1"] = ["/tmp/package_entry_1.json"]
If selector_fn('entry_1', pkg["entry_1"]) == True
, new_pkg["entry_1"] = ["s3://bucket/prefix/entry_1.json"]
Arguments
name: name for package in registry
dest: where to copy the objects in the package
registry: registry where to create the new package
message: the commit message for the new package
selector_fn: An optional function that determines which package entries should be copied to S3. The function
Returns
A new package that points to the copied objects.
Package.rollback(name, registry, top_hash)
Set the "latest" version to the given hash.
Arguments
name(str): Name of package to rollback.
registry(str): Registry where package is located.
top_hash(str): Hash to rollback to.
Package.diff(self, other_pkg)
Returns three lists -- added, modified, deleted.
Added: present in other_pkg but not in self. Modified: present in both, but different. Deleted: present in self, but not other_pkg.
Arguments
other_pkg: Package to diff
Returns
added, modified, deleted (all lists of logical keys)
Package.map(self, f, include_directories=False)
Performs a user-specified operation on each entry in the package.
Arguments
f(x, y): function
The function to be applied to each package entry.
It should take two inputs, a logical key and a PackageEntry.
include_directories: bool
Whether or not to include directory entries in the map.
Returns: list The list of results generated by the map.
Package.filter(self, f, include_directories=False)
Applies a user-specified operation to each entry in the package, removing results that evaluate to False from the output.
Arguments
f(x, y): function
The function to be applied to each package entry.
It should take two inputs, a logical key and a PackageEntry.
This function should return a boolean.
include_directories: bool
Whether or not to include directory entries in the map.
Returns
A new package with entries that evaluated to False removed
Package.verify(self, src, extra_files_ok=False)
Check if the contents of the given directory matches the package manifest.
Arguments
src(str): URL of the directory
extra_files_ok(bool): Whether extra files in the directory should cause a failure.
Returns
True if the package matches the directory; False otherwise.
PackageEntry(self, physical_key, size, hash_obj, meta)
Represents an entry at a logical key inside a package.
__init__
Creates an entry.
Arguments
physical_key: a URI (either
s3://
orfile://
)size(number): size of object in bytes
hash({'type': string, 'value': string}): hash object
for example: {'type': 'SHA256', 'value': 'bb08a...'}
meta(dict): metadata dictionary
Returns
a PackageEntry
slots
Built-in mutable sequence.
If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.
physical_keys
Deprecated
PackageEntry.as_dict(self)
Returns dict representation of entry.
PackageEntry.set_meta(self, meta)
Sets the user_meta for this PackageEntry.
PackageEntry.set(self, path=None, meta=None)
Returns self with the physical key set to path.
Arguments
logical_key(string): logical key to update
path(string): new path to place at logical_key in the package
Currently only supports a path on local disk
meta(dict): metadata dict to attach to entry. If meta is provided, set just
updates the meta attached to logical_key without changing anything
else in the entry
Returns
self
PackageEntry.get(self)
Returns the physical key of this PackageEntry.
PackageEntry.get_cached_path(self)
Returns a locally cached physical key, if available.
PackageEntry.get_bytes(self, use_cache_if_available=True)
Returns the bytes of the object this entry corresponds to. If 'use_cache_if_available'=True, will first try to retrieve the bytes from cache.
PackageEntry.get_as_json(self, use_cache_if_available=True)
Returns a JSON file as a dict
. Assumes that the file is encoded using utf-8.
If 'use_cache_if_available'=True, will first try to retrieve the object from cache.
PackageEntry.get_as_string(self, use_cache_if_available=True)
Return the object as a string. Assumes that the file is encoded using utf-8.
If 'use_cache_if_available'=True, will first try to retrieve the object from cache.
PackageEntry.deserialize(self, func=None, **format_opts)
Returns the object this entry corresponds to.
Arguments
func: Skip normal deserialization process, and call func(bytes),
returning the result directly.
**format_opts: Some data formats may take options. Though
normally handled by metadata, these can be overridden here.
Returns
The deserialized object from the logical_key
Raises
physical key failure hash verification fail when deserialization metadata is not present
PackageEntry.fetch(self, dest=None)
Gets objects from entry and saves them to dest.
Arguments
dest: where to put the files
Defaults to the entry name
Returns
None
PackageEntry.__call__(self, func=None, **kwargs)
Shorthand for self.deserialize()
Last updated