Introduction

open.quiltdata.com is a petabyte-scale open

data portal that runs on Quilt

quiltdata.com includes case studies, use cases, videos,

and instructions on how to run a private Quilt instance

Share data at scale. Quilt wraps AWS S3 to add simple URLs, web preview for large files, and sharing via email address (no need to create an IAM role).

Understand data better through inline documentation (Jupyter notebooks, markdown) and visualizations (Vega, Vega Lite)

Discover related data by indexing objects in ElasticSearch

Model data by providing a home for large data and models that don't fit in git, and by providing immutable versions for objects and data sets (a.k.a. "Quilt Packages")

Decide by broadening data access within the organization and supporting the documentation of decision processes through audit-able versioning and inline documentation

Address performance issues with push (e.g. re-hash)

Provide Presto-DB-powered services for filtering package repos with SQL

Investigate and implement more efficient manifest formats (e.g. Parquet),

that scale to 10M keys; consider abbreviated "fast manifests" for lazy browsing

Refactor s3://bucket/.quilt for improved listing and delete performance

Ability to fork/merge packages

Data quality monitoring

Evaluate min.io and ceph.io as shims

Evaluate feasibility of on-prem local storage as a repo

Evaluate K8s and Terraform to replace CloudFormation

Shim lambdas (consider serverless.com)

Shim ElasticSearch (consider SOLR)

Last updated 3 years ago

Introduction

Quilt is a self-organizing data hub

Python Quick start, tutorials

Quilt in action

Who is Quilt for?

What does Quilt do?

How does Quilt work?

Use cases

Roadmap

I - Performance and core services

II - CI/CD for data

III - Storage agnostic (support Azure, GCP buckets)

IV - Cloud agnostic