Introduction

docs on_gitbook chat on_slack codecov pypi

Below is the documentation for Quilt 3. See here and here from Quilt 2.

Quilt is a versioned data portal for AWS

  • open.quiltdata.com is a petabyte-scale open

    data portal that runs on Quilt

  • quiltdata.com includes case studies, use cases, videos,

    and information on how you can run a private Quilt instance

Who is Quilt for?

Quilt is for data-driven teams of both technical and non-technical members (executives, data scientists, data engineers, sales, product, etc.).

What does Quilt do?

Quilt adds search, visual content preview, and versioning to every file in S3.

How does Quilt work?

Quilt consists of a Python client, web catalog, lambda functions—all of which are open source—plus a suite of backend services and Docker containers orchestrated by CloudFormation. The latter are available under a paid license for private use on quiltdata.com.

Use cases

Quilt addresses five key use cases:

  • Share data at scale. Quilt wraps AWS S3 to add simple URLs, web preview for large files, and sharing via email address (no need to

    create an IAM role).

  • Understand data better through inline documentation

    (Jupyter notebooks, markdown) and visualizations (Vega,

    Vega Lite)

  • Discover related data by indexing objects in

    ElasticSearch

  • Model data by providing a home for large data and models that don't fit in git, and by providing immutable

    versions for objects and data sets (a.k.a. "Quilt Packages")

  • Decide by broadening data access within the organization

    and supporting the documentation of decision

    processes through audit-able versioning and inline

    documentation

Roadmap

I - Performance and core services

  • Address performance issues with push (e.g. re-hash)

  • Refactor bucket/.quilt for improved listing

    and delete performance

II - CI/CD for data

  • Ability to fork/merge packages (via manifests in git)

  • Automated data quality monitoring

III - Storage agnostic (support Azure, GCP buckets)

  • evaluate min.io and ceph.io

  • evaluate feasibility of local storage (e.g. NAS)

IV - Cloud agnostic

  • K8s deployment for Azure, GCP

  • Shim lambdas via serverless.com?

  • Shim ElasticSearch via SOLR?