Search & query
Quilt provides support for queries in the Elasticsearch DSL, as well as SQL queries in Athena.
The objects in Amazon S3 buckets connected to Quilt are synchronized to an Elasticsearch cluster, which provides Quilt's search features.
Quilt maintains a near-realtime index of the objects in your S3 bucket in Elasticsearch. Each bucket corresponds to one or more Elasticsearch indexes. As objects are mutated in S3, Quilt uses an event-driven system (via SNS and SQS) to update Elasticsearch.
There are two types of indexing in Quilt:
- shallow indexing includes object metadata (such as the file name and size)
- deep indexing includes object contents. Quilt supports deep indexing for the following file extensions:
- .csv, .html, .json, .md, .rmd, .rst, .tab, .txt, .tsv (plain-text formats)
- .fcs (FlowJo)
- .ipynb (Jupyter notebooks)
- .parquet
- .pdf
- .pptx
- .xls, .xlsx
By default, Quilt indexes a limited number of bytes per document for specified file formats (100KB). Both the max number of bytes per document and which file formats to deep index can be customized per Bucket in the Catalog Admin settings.

Example of Admin Bucket indexing options
The navigation bar on every page in the catalog provides a convenient shortcut for searching objects and packages in an Amazon S3 bucket.
The following are all valid search parameters:
Fields
comment
: Package comment.comment: TODO
content
: Object content.content:Hello
ext
: Object extension.ext:*.fastq.gz
handle
: Package name.handle:examples\/metadata
hash
: Package hash.hash:3192ac1*
key
: Object key.key:phase*
key_text
: Analyzed object key.key:"phase"
last_modified
: Last modified date.last_modified:[2022-02-04 TO 2022-02-20]
metadata
: Package metadata.metadata:dapi
size
: Object size in bytes.size:>=4096
version_id
: Object version id.version_id:t.LVVCx*
pointer_file
: Package revision tag in S3; either "latest" or a top hash.pointer_file: latest
package_stats.total_files
: Package total files.package_stats.total_files:>100
package_stats.total_bytes
: Package total bytes.package_stats.total_bytes:<100
Logical operators and grouping
AND
: Conjunction.a AND b
OR
: Disjunction.a OR b
NOT
: Negation.NOT a
_exists_
: Matches any non-null value for the given field._exists_: content
()
: Group terms.(a AND b) NOT c
Wildcard and regular expressions
*
: Zero or more characters, avoid leading*
(slows performance).ext:config.y*ml
?
: Exactly one character.ext:React.?sx
//
: Regular expression (slows performance).content:/lmnb[12]/

Quilt Elasticsearch queries support the following keys:
_source
— boolean that adds or removes the_source
field, or a list of fields to return (learn more)
Saved queries
You can provide pre-canned queries for your users by providing a configuration file at
s3://YOUR_BUCKET/.quilt/queries/config.yaml
:version: "1"
queries:
query-1:
name: My first query
description: Optional description
url: s3://BUCKET/.quilt/queries/query-1.json
query-2:
name: Second query
url: s3://BUCKET/.quilt/queries/query-2.json
The Quilt catalog displays your saved queries in a drop-down for your users to select, edit, and execute.
You can park reusable Athena Queries in the Quilt catalog so that your users can run them. You must first set up you an Athena workgroup and Saved queries per AWS's Athena documentation.
"Run query" executes the selected query and waits for the result.


Last modified 1mo ago