Background

  • package handles
    • Packages are referenced by a handle of the form OWNER/NAME
    • Teams packages include a prefix, TEAM:OWNER/NAME
  • READMEs
  • Short hashes
    • Commands that take hashes support "short hashes", up to uniqueness. In practice, 6-8 characters is sufficient to achieve uniqueness.
      quilt install akarve/examples -x 4594b5
      # matches hash 4594b58d64dd9c98b79b628370618031c66e80cbbd1db48662be0b7cac36a74e
      
  • Requirements file (quilt.yml)
    $ quilt install [@FILENAME]
    # quilt.yml is the default if @filename is absent
    
    • Installs a list of packages specified by a YAML file. The YAML file must contain a packages node with a list of packages:
      packages:
        - USER/PACKAGE[/SUBPACKAGE][:h[ash]|:t[ag]|:v[ersion]][:HASH|TAG|VERSION]
      
    • Example
      packages:
      - vgauthier/DynamicPopEstimate   # get latest
      - danWebster/sgRNAs:a972d92      # get a specific version via hash
      - akarve/sales:tag:latest        # get a specific version via tag
      - asah/snli:v:1.0                # get a specific version via version
      

API

Team users

See teams docs for additional commands and syntax.

Core: build, push, and install packages

Command line Python Description
quilt build USER/PACKAGE PATH quilt.build("USER/PACKAGE", "PATH") PATH may be a build.yml file or a directory. If a directory is given, Quilt will internally generate a build file (useful, e.g. for directories of images). build.yml is for users who want fine-grained control over parsing.
quilt push USER/PACKAGE [--public │ --team] quilt.push("USER/PACKAGE", is_public=False, is_team=False) Stores the package in the registry
quilt install USER/PACKAGE[/SUBPATH/...] [-x HASH │ -t TAG │ -v VERSION] quilt.install("USER/PACKAGE[/SUBPATH/...]", hash="HASH", tag="TAG", version="VERSION") Installs a package or sub-package
quilt install @FILE=quilt.yml Not supported Installs all specified packages using the requirements syntax (above)
quilt delete USER/PACKAGE quilt.delete("USER/PACKAGE") Removes the package from the registry. Does not delete local data.

Versioning

Command line Python Description
quilt log USER/PACKAGE quilt.log(USER/PACKAGE) Display push history
quilt version list USER/PACKAGE quilt.version_list(USER/PACKAGE) Display versions of a package
quilt version add USER/PACKAGE VERSION HASH quilt.version_add(USER/PACKAGE, VERSION, HASH) Associate a version with a hash
quilt tag list USER/PACKAGE quilt.tag_list(USER/PACKAGE) List available tags
quilt tag add USER/PACKAGE TAG HASH quilt.tag_add(USER/PACKAGE, TAG, HASH) Associate a tag with a hash
quilt tag remove USER/PACKAGE TAG quilt.tag_remove(USER/PACKAGE, TAG) Remove a tag

Instances, hashes, tags, and versions

  • A package instance is a package handle plus a hash. akarve/sales:fc7f0b is an instance. Instances are immutable.
  • Hashes are automatically generated by Quilt for each package build.
  • Tags are human-readable strings associated with a package instance. Tags can be altered to point to different instances of the same package. The most recent build is automatically tagged "latest".
  • Versions are human-readable strings associated with a package instance. Unlike tags, versions can only ever point to a single package instance.

Access

Command line Python Description
quilt login [TEAM] quilt.login(["TEAM"]) Authenticate to a registry
quilt access list USER/PACKAGE quilt.access_list("USER/PACKAGE") List user who have access to a package
quilt access add USER/PACKAGE USER_OR_GROUP quilt.access_add("USER/PACKAGE", "USER_OR_GROUP") Grant read access to a user or group (one of public or team)
quilt access remove USER_OR_GROUP quilt.access_remove("USER/PACKAGE", "USER_OR_GROUP") Remove read access

Local storage

Command line Python Description
quilt ls quilt.ls() List installed packages
quilt rm USER/PACKAGE quilt.rm("USER/PACKAGE") Remove a package from local storage (but not from the registry)
Command line Python Description
quilt search "SEARCH STRING" quilt.search("SEARCH STRING") Search registry for packages by user or package name

Export a package or subpackage

Command line Python Description
quilt export USER/PACKAGE quilt.export("USER/PACKAGE") Export data to current dir
quilt export USER/PACKAGE DEST quilt.export("USER/PACKAGE", "DEST") Export data to specified destination
quilt export USER/PACKAGE [DEST] --force quilt.export("USER/PACKAGE", "DEST", force=True) Overwrite files at destination

If a node references raw (file) data, symlinks may be used instead of copying data when exporting. But be cautious when using symlinks for export:

  • When using any OS
    • If a file is edited, it may corrupt the local quilt repository
      • Preventing this is up to you
  • When using Windows
    • Symlinks may not be supported
    • Symlinks may require special permissions
    • Symlinks may require administrative access (even if an administrator has the appropriate permissions)
Command line Python Description
quilt export USER/PACKAGE [DEST] [--symlinks] quilt.export("USER/PACKAGE", "DEST", symlinks=True) Export data, using symlinks where possible

Import and use data

For a package in the public cloud:

from quilt.data.USER import PACKAGE

For a package in a team registry:

from quilt.team.TEAM_NAME.USER import PACKAGE

Using packages

Packages contain three types of nodes:

  • PackageNode - the root of the package tree
  • GroupNode - like a folder; may contain one or more GroupNode or DataNode objects
  • DataNode - a leaf node in the package; contains actual data

Working with package contents

  • List node contents with dot notation: PACKAGE.NODE.ANOTHER_NODE
  • Retrieve the contents of a DataNode with _data(), or simply (): PACKAGE.NODE.ANOTHER_NODE()
    • Columnar data (XLS, CSV, TSV, etc.) returns as a pandas.DataFrame
    • All other data types return a string to the path of the object in the package store

Enumerating package contents

  • quilt.inspect("USER/PACKAGE") shows package columns, types, and shape
  • NODE._keys() returns a list of all children
  • NODE._data_keys() returns a list of all data children (leaf nodes containing actual data)
  • NODE._group_keys() returns a list of all group children (groups are like folders)
  • NODE._items() returns a generator of the node's children as (name, node) pairs.

Example

from quilt.data.uciml import wine
In [7]: wine._keys()
Out[7]: ['README', 'raw', 'tables']
In [8]: wine._data_keys()
Out[8]: ['README']
In [9]: wine._group_keys()
Out[9]: ['raw', 'tables']

Editing Package Contents

  • PACKAGENODE._set(PATH, VALUE) sets a child node. PATH is an array of strings, one for each level of the tree. VALUE is the new value. If it's a Pandas dataframe, it will be serialized. A string will be interpreted as a path to a file that contains the data to be packaged. Common columnar formats will be serialized into data frames. All other file formats, e.g. images, will be copied as-is.
  • GROUPNODE._add_group(NAME) adds an empty GroupNode with the given name to the children of GROUPNODE.

Example

import pandas as pd
import quilt
quilt.build('USER/PKG') # create new, empty packckage
from quilt.data.USER import PKG as pkg
pkg._set(['data'], pd.DataFrame(data=[1, 2, 3]))
pkg._set(['foo'], "example.txt")
quilt.build('USER/PKG', pkg)

This adds a child node named data to the new empty package, with the new DataFrame as its value. Then it adds the contents of example.txt to a node called foo. Finally, it commits this change to disk by building the package with the modified object.

See the examples repo for additional usage examples.

results matching ""

    No results matching ""