Event-Driven Packaging
EDP is in private preview. Ask your Quilt account manager for details.
Overview
Data tend to be created in logical batches by machines, people, and pipelines. Detecting these logical events from Amazon S3 events alone is complex and requires extensive logic.
Quilt's Event-Driven Packaging (EDP) service intelligently groups one or more Amazon S3 object events into a single batch-level event. You can easily (and if desired, automatically) trigger logical events like data package creation that depend on batches rather than on individual files.
Any AWS service or action that generates S3 object events may trigger the EDP service.
Requirements
A pre-existing VPC that either includes a NAT Gateway or the following VPC endpoints:
Amazon S3 (gateway endpoint or interface endpoint).
EventBridge (interface endpoint).
Enable EventBridge S3 Events for all S3 buckets to be monitored by EDP.
Deployment
EDP deploys Lambda and RDS resources to monitor S3 and generate EventBridge events under user-configurable conditions.
Networking
Lambda and RDS resources are placed in the
VPC
andSubnets
that you provide.Subnets
are normally private and must be able to reach Amazon services such as EventBridge via port 443 (e.g. by means of a NAT gateway, or VPC endpoint).SecurityGroup
should allow outbound access to AWS services on port 443. Does not need inbound access.
Parameters
EDP is deployed by a standalone CloudFormation template with the following parameters:
VPC
For EDP resources and Subnets.
Subnets
For EDP Lambda, RDS (see above for configuration).
SecurityGroup
For EDP Lambdas (see above for configuration).
BucketName
Name of the Amazon S3 bucket to monitor.
BucketIgnorePrefixes
Text string of comma separated bucket path segments to ignore, for example raw/*, scratch/*
. Default value is an empty string (i.e. nothing ignored).
BucketPrefixDepth
The number of /
-separated common path segments at the beginning of an S3 object key. Default value is 2
.
BucketThresholdDuration
Trigger a notification when this number of seconds has elapsed since the last object event in the S3 bucket occurred. Default value is 300
seconds.
BucketThresholdEventCount
Trigger a notification when this number of files have been created (since the prior trigger). Default value is 20
.
DBUser
Username for EDP RDS instance.
DBPassword
Password for EDP RDS instance.
EventBusName
Name of custom EventBridge event bus that receives events.
How EDP works
EDP monitors S3 object events for s3://bucket-name
After a fixed number of object events (
BucketThresholdEventCount
) or a maximum duration within a common prefix (BucketThresholdDuration
), EDP creates apackage-objects-ready
event that signals there is sufficient information to make Quilt data packages from a batch of files:S3 bucket name
Common prefix
Number of files
Timestamp of event
The event payload is JSON:
EDP publishes the event to an AWS EventBridge bus. From there the event can be forwarded to any services that can be targeted from AWS EventBridge for additional manual or automatic processing.
EDP, upon completion and if configured to do so, may warm its contents to a File Gateway where it has read permissions to ensure that new EDP-created Quilt packages are available to Gateway clients like Windows Workspaces.
Users can optionally subscribe directly to the EDP SNS topic. This is useful for both debugging and viewing how events are structured.
Example: Lambda function to automatically create data packages
An instrument automatically uploads a folder containing files from a single experiment into s3://instrument-bucket/instrument-name/experiment-id/.
EDP listens for events in s3://instrument-bucket/instrument-name/experiment-id/*. After the specified duration or event count, a
package-objects-ready
event is generated and sent to EventBridge.A custom SNS topic is created for monitoring data package creation that Lab and Computational scientists subscribe to (
SNS_TOPIC_ARN
).A custom lambda function triggered by the
package-objects-ready
event processes the experiment files and generates a data package. Additional processing includes (but is not limited to):Enhance the package with documentation, charts, and metadata, such as the following:
README.md
: Noting that the package was created by EDP, a custom lambda function, and validated with a Quilt workflow.
Package metadata creation and validation: Send an SNS notification on metadata validation failure.
If a metadata validation error occurs, an SNS event is sent to
SNS_TOPIC_ARN
noting that the package was created in the quarantine bucket. The SNS notification is routed to subscribers.Computational scientist opens the new data package for additional analysis, modeling, and versioning.
Debugging
EDP includes a CloudWatch dashboard which exposes some metrics useful for debugging:
EDP event bus topic: Displays the number of events emitted by EDP. If EDP is working correctly there should be one or more events received (depending on the time range selected).
Per-bucket metrics:
S3 EventBridge rule: The number of events published to EventBridge from the specified Amazon S3 bucket. If there is no data, there are several possibilities:
Invocations: If this value is zero, the S3 bucket isn't correctly configured (
Send notifications to Amazon EventBridge for all events in this bucket
is not turnedOn
).TriggeredRules: If this value is zero, there was a problem with the automated EventBridge rule creation process during deployment. In general, you want the number of invocations to approximately equal the number of triggered rules.
Failed Invocations: This value should be zero. If greater than zero, there is an EDP configuration issue.
Store in DB lambda: If EDP is configured correctly, there should be zero errors and a 100% success rate.
Emit event lambda: If EDP is configured correctly, there should be zero errors and a 100% success rate.
Limitations
Each EDP stack monitors one S3 bucket.
Last updated