Working with the Catalog
Last updated
Last updated
The Quilt Catalog is the second half of Quilt. It provides an interface on top of your S3 bucket that brings Quilt features like data packages and search to the web.
For a hands-on demo, check out the public demo catalog.
Note that you can use the Quilt Python API without using the catalog product, but they are designed to work together.
The Quilt catalog provides a homepage for your catalog, based on a README.md
file that you can optionally create at the top of your bucket.
The catalog lets you navigate packages in the registry on the packages tab.
You can also browse the underlying S3 files using the files tab.
Catalogs also enable you to search the contents of your bucket. We support both unstructured (e.g. "San Francisco
") and structured with Query String Queries (e.g. "metadata_key: metadata_value
") search. Hits are previewed right in the search results.
You can upload a new package providing the name of the package, commit message, files, metadata, and workflow.
The name should have the format namespace/package-name
.
The message needs to add notes on a new revision for this package.
Files are the content of your package.
The associated workflow contains the rules for validating your package.
The metadata can be added with JSON editor, represented as a key/value table with infinite nesting. If workflow contains JSON schema, you will have predefined key/value pairs according to the schema.
JSON editor
To add a new key/value field double click on an empty cell and type key name, then press "Enter" or "Tab", or click outside of the cell. To change value double click on that value.
Values can be strings, numbers, arrays, or objects. Every value that you type will be parsed as JSON.
We don't support references and compound types yet.
You can push the existing package from one bucket to another. To use this feature consult workflows page.
Adding a quilt_summarize.json
file to a data package (or S3 directory path) will enable content preview right on the landing page.
Colocating data with context in this way is a simple way of making your data projects approachable and accessible to collaborators.
quilt_summarize.json
can be a list of paths to files in S3 that you want to include in your summary. For example: ["description.md", "../notebooks/exploration.ipynb"]
. Additionally, note that if a README.md
file is present, it will always be rendered as well.
There are currently some small limitations with preview:
Objects linked to in
quilt_summarize.json
are always previewed as of the latest version, even if you are browsing an old version of a package.Object titles and image thumbnails link to the file view, even if you are in the package view.
The Quilt catalog includes an admin panel that allows you to manage users and buckets in your stack and to customize your Quilt catalog. See Admin UI docs for details.