[Fixed] Bump minimum required version of tqdm. Fixes a crash (
UnseekableStreamError) during upload retry. (#1853)
Refactors local and s3 storage-layer code around a new PackageRegistry base class (to support improved file layouts in future releases)
Multithreaded download for large files, large performance gains when installing packages with large files, especially on large instances
Package name added to Package.resolve_hash
Bugfix: remove package revision by shorthash
Performance improvements for build and push
Browse full package contents (no longer limited to 1000 files)
Indexing and search package-level metadata
Fixed issue with download button for certain text files
FCS files: content indexing and preview
Catalog sign-in with email (or username)
Catalog support for sign-in with Okta
allow hiding download button
only show stats for 2-level extensions for .gz files
fix retries during hashing
improve progress bars
pyyaml requirements to prevent version conflicts
improve unit test coverage for indexing lambdas
fix real-time delete handling (incl. for unversioned objects)
handle all s3:ObjectCreated: and ObjectRemoved: events (fixes ES search state and bucket Overview)
Official support for Windows
Add support for Python 3.7, 3.8
Fix Package import in Python
Updated libraries for stability and security
Quiet TQDM for log files ($ export QUILT_MINIMIZE_STDOUT=true )
CLI setting of config parameters
new feature to filter large S3 directories with regex
more reliable bucket region inference
Support preview of larger Jupyter notebooks in S3 (via transparent GZIP)
JS (catalog) dependencies for stability and security
extended Parquet file support (for files without a .parquet extension)
Improvements to catalog signing logic for external and in-stack buckets
Special thanks to @NathanDeMaria (CLI and Windows support) and @JacksonMaxfield for contributing code to this release.
push to CLI
Updated JS dependencies
Display package truncation warning in Packages
quilt3 install foo/bar/subdirectory
Bug fixes for CopyObject and other exceptions
Fix bug introduced in 3.1.9 where uploads fail due to incorrect error checking after a HEAD request to see if an object already exists (#1512)
quilt3 install now displays the tophash of the installed package (#1461)
quilt3 --version (#1495)
quilt3 disable-telemetry CLI command (#1496)
CLI command to launch catalog directly to file viewer -
quilt3 catalog $S3_URL (#1470, #1487)
No longer run local container for
quilt3 catalog (#1504). See (#1468, #1483, #1482) for various bugs leading to this decision.
Add PhysicalKey class to abstract away local files vs unversioned s3 object vs versioned s3 object (#1456, #1473, #1478)
Changed cache directory location (#1466)
More informative progress bars (#1506)
Improve support for downloading from public buckets (#1503)
Always disable telemetry during tests (#1494)
Bug fix: prevent misleading CLI argument abbreviations (#1481) such as
--to referring to
Bug fix: background upload/download threads are now killed if the main thread is interrupted (#1486)
Performance improvements: load JSONL manifest faster (#1480)
Performance improvement: If there is an error when copying files, fail quickly (#1488)
Better package listing UX (#1462)
Improve bucket stats visualization when there are many categories (#1469)
Performance improvements for Packages
Updated landing page
LOCAL mode for running the catalog on localhost
quilt3 catalog command to run the Quilt catalog on your local machine
quilt3 verify compares the state of a directory to the contents of a package version
Added a local file cache for installed packages
Performance improvements for upload and download
Support for short hashes to identify package versions
Adding telemetry for API calls
Drop support for object metadata (outside of packages)
Change the number of threads used when installing and pushing from 4 to 10 (S3 default)
Misc bug fixes
Fix package listing for packages with more 100 revisions
Add stacked area charts for downloads
2-level file-extensions for bucket summary
Fix uploads of very large files
Remove unnecessary copying during push
delete_package for a specific version via
Bug fix: when adding python objects to a package a temporary file would be created and then deleted when the object was pushed, leading to a crash if you tried to push that package again (PR #1264)
Added support for adding an in-memory object (such as a
pandas.DataFrame) to a package via
Fix to work with pyarrow 0.15.0
Performance improvements for list_packages and delete_package
Adds a feature to allow
quilt config to set a registry URL for a private Teams registry.
Adding a hash argument to
quilt.push to allow pushing any package version to a registry.
Make object sizes required.
Update urllib3 version for security patch
Improved instructions for running registries.
Fix an ascii decoding issue related to ellipses …
Update Parquet reading code to match the API change in pyarrow 0.11.
Fix downloading of zero-byte files
New helper function
quilt.save adds an object (e.g., a Pandas DataFrame) to an existing package by performing a sub-package build and push in a single step
quilt.load now correctly returns sub-packages (fixes issue #741)
Send a welcome email to new users after activation
fixes an issue with packages created on older versions of pyarrow
improves readability for
allow adding a node with metadata using sub-package build/push
adds documentation for running a private registry in AWS
Suppress numpy warnings under Python 2.7
Fix subpackage build and push
Added support for sub-package build and push to allow updates to allow adding nodes to large packages without materializing the whole package
First-class support for
Replaced dependence on external OAuth2 provider with a built-in authentication and session management
Registry support for sub-package push
Updated to support new registry authentication
added Bracket accessor for GroupNodes
asa.plot to show images in packages
asa.torch to convert packages to PyTorch Datasets
Enforce fragment store as read-only
Added source maps and CI for catalog testing
Expands and improves documentation for working with Quilt packages.
Load packages by hash
Choose a custom loader for DataNodes with asa=
Specify Ubuntu version in Dockerfiles
display package traffic stats in catalog
filter packages based on per-node metadata
get/set metadata for package nodes
support custom loaders in the _data method
Metadata-only package install
Build DataFrames from existing Parquet files
Remove HDF5 dependencies
Code cleanup and refactoring
Option for metadata-only package installs
New endpoint for fetching missing fragments (e.g., from partially installed packages)
Improved full-text search
Allow building packages out of other packages and elements from other packages. A new build-file keyword,
package inserts a package (or sub-package) as an element in the package being built.
Upgrade router and other dependencies
Display packages by author
Amin UI for controlling users and access
Allow specifying sets of input files in build.yml
Specify teams packages
Admin commands to create and activate/deactivate users
Version 2.9.1 introduces a better progress bar for installing (downloading) Quilt packages. Quilt push now sends objects' uncompressed size to the registry. The progress bar is now based on the total bytes downloaded instead of the number of files.
Import packages from shared local directories to save storage overhead and network traffic when sharing packages on the same local network.
Log package installs in the registry to display stats on package use.
Updates to commands and local storage to allow users to connect to different registries to support teams running private registries for internal sharing.
Fixes a bug in download that prevented retrying failed downloads.
Source for the Quilt data catalog is now included in this repository.
Ported the Quilt registry from MySQL to Postgres
Improvements to the docker configuration that allows running the registry, catalog, database and authentication service from Docker compose.
Data fragments can now be downloaded in parallel leading to much faster package installs for large packages.
Quilt data packages are now available wherever you run Python. We recommend that users quilt push all local packages to the registry before upgrading. Further details on migration are here.
Quilt now caches build intermediates. So if you wish to update the README of a multi-gigabyte package, you can rebuild the entire package in one second.
You can now specify build parameters (like transform) for all children of a group in one shot. The updated syntax and docs are here.
You can now express dependencies on multiple packages in a single file. Docs here.
Quilt build now accepts GitHub URLs. If you use data stored on GitHub you can turn it into a Quilt package with quilt build.
Version 2.7.1 includes several minor bug fixes and one new feature, checks. Checks allow a user to specify data integrity checks that are enforced during quilt build.
Support installing subpackages as
quilt install usr/pkg/path
Upload fragments in parallel
Use http sessions when accessing S3
This release adds a new command to delete a package including all versions and history from the registry.
Building a package from a directory of input files now skips generating a build file. That speeds up the build process and makes it easier to change the package contents and rebuild.
This release includes support for paid plans on quiltdata.com and is recommended for all individual and business-plan users. It adds a shortcut to push packages and make them public in a single command and improves documentation.