One of the fun parts in using serverless is the fact that you can try out new ideas and provision them in a flick of a finger. I’ve mentioned more than once that s3 is a powerful tool that can be used as more than an elastic persistent layer.
S3 can be used more than just storing data. View novel ways to extend its functionality.hackernoon.com
In this post, I’m going to demonstrate how to use s3 as a scheduling mechanism to execute various tasks.
S3, alongside a Lambda function, creates a simple event base flow. For example, attach a Lambda to an s3 PUT event and create a new file, and the Lambda function is then called. To create a schedule event, all you have to do is to write the file you want to act upon on the designated time; however, AWS enables you to create only recurring events using cron or rate expression. What happens when you want to schedule a one-time event? You are stuck.
The s3-scheduler library enables you to do just that. Specifically, it uses s3 as a scheduling mechanism that enables you to schedule one-time events.
How it works
Each event is a separate file. Behind the scenes, the library uses the recurring mechanism to wake up every 1 minute, scan for the relevant files using s3’s filter capabilities, and if the scheduled time has passed, move the file to the relevant bucket + key.
In order to function properly, the library has to know the answer to three questions:
- What content to save,
- Where to save it (bucket + key) → will trigger the appropriate Lambda function, and
- When to move it to the appropriate bucket.
The content to save is left unchanged, and points 2 and 3 (see above) are encoded in the key’s name and use
| as a separator between the parts. For example, to copy the relevant content on the 5th of August to a bucket called
s3-bucket and a folder named
s3_important_files, the scheduler will produce the following file:
2018–08–05|s3-bucket|s3_files-important. By keeping the meta data outside the actual content, we achieve the following benefits:
- It speeds up the process with no need to read the entire content to decide when and where to copy.
- It allows the content to be binary, not only text-based.
- By using s3 filter capabilities, it reduces the cost to fetch the correct files.
- There is easier debugging; just view the file name to understand when and where to copy.
pip install s3-scheduler
Setting up a recurring flow
The library uses the AWS built-in capability to run every 1 minute. The configuration depends on your framework. For example, for Zappa use the following:
During initialization, the scheduler requires the bucket and a folder in which the actual scheduling details are kept. Remember that each event is a separate file; therefore, there is a need to save them somewhere. When to schedule is a simple
If you want to cancel the schedule event before it occurs, do the following:
Scheduling in the AWS serverless world is a bit tricky. Right now, AWS provides only cron-like capabilities, but this post has demonstrated one technique that can be used to create a more robust scheduling capability.