Quilt allows you to create, read, and write packages both on your local filesystem and on S3 buckets configured to work with Quilt3. For convenience, we provide a simple API for working with S3 buckets that serves as an alternative to boto3.
To connect to an S3 Bucket
:
import quilt3b = quilt3.Bucket("s3://quilt-example")
This requires that the bucket is configured to work with Quilt 3. Unless this bucket is public, you will also first need to log into the catalog that controls this bucket:
# only need to run this once# ie quilt3.config('https://your-catalog-homepage/')quilt3.config('https://open.quiltdata.com/')# follow the instructions to finish loginquilt3.login()
To see the contents of a Bucket
, use keys
:
# returns a list of objects in the bucketb.keys()
To download a file or folder from a bucket use fetch
:
# b.fetch("path/to/directory", "path/to/local")b.fetch("aleksey/hurdat/", "./aleksey/")b.fetch("README.md", "./read.md")
100%|██████████| 4.07M/4.07M [00:13<00:00, 304kB/s]100%|██████████| 1.55k/1.55k [00:01<00:00, 972B/s]
You can write data to a bucket.
# put a file to a bucketb.put_file("read.md", "./read.md")# or put everything in a directory at onceb.put_dir("stuff", "./aleksey")
Note that set
operations on a Package
are put
operations on a Bucket
.
# always be careful when deleting# delete a fleb.delete("read.md")# delete a directoryb.delete_dir("stuff/")
You can search for individual objects using search
.
Note that this feature is currently only supported for buckets backed by a Quilt catalog instance. Before performing a search you must first configure a connection to that instance using quilt3.config
.
# for examplequilt3.config(navigator_url="https://open.quiltdata.com")
Quilt supports unstructured search:
# returns all files containing the word "thor"b.search("thor")
{'took': 16,'timed_out': False,'_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0},'hits': {'total': 10,'max_score': 5.5741544,'hits': [{'_index': 'quilt-example-reindex-v8bc2377','_type': '_doc','_id': 'dima/node_modules2/highlight.js/README.md:KOGAC2bPIY9o7vQ3d3ryrD04VpGPmaH2','_score': 5.5741544,'_source': {'size': 19316,'comment': '','version_id': 'KOGAC2bPIY9o7vQ3d3ryrD04VpGPmaH2','last_modified': '2019-12-12T01:33:15+00:00','updated': '2019-12-12T01:33:15.209387','key': 'dima/node_modules2/highlight.js/README.md'}},{'_index': 'quilt-example-reindex-v8bc2377','_type': '_doc','_id': 'akarve/amazon-reviews/camera-reviews.parquet:yoLoCR6tdnqE141f5F4EvbFbn2J12AJt','_score': 0.080087036,'_source': {'user_meta': {},'size': 100764599,'comment': '','version_id': 'yoLoCR6tdnqE141f5F4EvbFbn2J12AJt','last_modified': '2019-10-08T02:53:01+00:00','updated': '2019-10-08T02:53:31.040985','key': 'akarve/amazon-reviews/camera-reviews.parquet'}}]}}
As well as structured search on metadata (note that this feature is experimental):
# returns all files annotated {'name': 'thor'}b.search("user_meta.name:'thor'")
{'took': 0,'timed_out': False,'_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0},'hits': {'total': 0, 'max_score': None, 'hits': []}}