open.quiltdata.com is a petabyte-scale open
data portal that runs on Quilt
quiltdata.com includes case studies, use cases, videos,
and information on how you can run a private Quilt instance
Quilt is for data-driven teams of both technical and non-technical members (executives, data scientists, data engineers, sales, product, etc.).
Quilt adds search, visual content preview, and versioning to every file in S3.
Quilt consists of a Python client, web catalog, lambda functions—all of which are open source—plus a suite of backend services and Docker containers orchestrated by CloudFormation. The latter are available under a paid license for private use on quiltdata.com.
Quilt addresses five key use cases:
Share data at scale. Quilt wraps AWS S3 to add simple URLs, web preview for large files, and sharing via email address (no need to
create an IAM role).
Understand data better through inline documentation
(Jupyter notebooks, markdown) and visualizations (Vega,
Discover related data by indexing objects in
Model data by providing a home for large data and models that don't fit in git, and by providing immutable
versions for objects and data sets (a.k.a. "Quilt Packages")
Decide by broadening data access within the organization
and supporting the documentation of decision
processes through audit-able versioning and inline
Address performance issues with push (e.g. re-hash)
bucket/.quilt for improved listing
and delete performance
Ability to fork/merge packages (via manifests in git)
Automated data quality monitoring
evaluate min.io and ceph.io
evaluate feasibility of local storage (e.g. NAS)
K8s deployment for Azure, GCP
Shim lambdas via serverless.com?
Shim ElasticSearch via SOLR?