Saturday, 6 October 2018

How to do iterative tasks with a delay in AWS Step Functions

Background

Very recently I did a video series on Building Web application using AWS serverless architecture -
In one of the videos, I covered AWS step function service that can be used to create distributed applications using visual workflows.

Step functions use Lambdas internally to execute business logic with the provided workflow. But there could be limitations that arise. Eg.
  1. The business logic that the Lambda is executing might take more than the maximum allowed time of 5 minutes.
  2. The API that you are calling from the Lambda might have the rate limiting.
One straightforward way would be to break down the Lambda into multiple lambdas and execute the business logic. But it is not always possible. For eg. let's say you are importing some data from a site through an API and then saving it in your DB. You may not always have control over the number if user returned and processed. It is always a good idea to handle it on your end. Fortunately, step functions provide a way to handle such scenarios. In this post, I will try to explain the same.

 How to do iterative tasks with a delay in AWS Step Functions

 Let us assume we have the following flow -
  1. Get remote data from an API. Let's say we can process only 50 items in that data in a minute. Let's say the API that we call to process each item allows only 50 APIs per minute.
  2. Let's assume we get more than 50 items in the fetch API call. Now we have to batch 50 items at a time and process it.
  3. Wait for 60 seconds and then process the next batch of 50.
  4. Do this till all items fetched are processed.


To do this we can use the following Step machine definition -


 {
    "Comment": "Step function to import external data",
    "StartAt": "FetchDataFromAPI",
    "States": {
        "FetchDataFromAPI": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:us-east-1:499222264523:function:fetch-data",
            "Next": "ProcessData"
        },

        "ProcessData": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:us-east-1:499222264523:function:process-data",
            "Next": "ProcessMoreChoiceState"
        },

        "ProcessMoreChoiceState": {
            "Type": "Choice",
            "Choices": [{
                "Variable": "$.done",
                "BooleanEquals": false,
                "Next": "WaitAndProcessMore"
            }],
            "Default": "Done"
        },

        "WaitAndProcessMore": {
            "Type": "Wait",
            "Seconds": 60,
            "Next": "ProcessData"
        },

        "Done": {
            "Type": "Pass",
            "End": true
        }
    }
}


Visually it looks like below -




If you have gone through the video link shared earlier most of this would have made sense to you by now. The only difference here is the "Wait" state that waits for 60 seconds before retrying.

You will have to send the entire array and the number of items processed so far as output to "ProcessMoreChoiceState" and subsequently to "WaitAndprocessMore" state so that it can be sent back to "ProcessData" state again to process remaining entries. If all entries are processed we just set "done" variable to true which transitions to "Done" state finishing the state machine execution.

Hope this helps. If you have any questions add it in the comments below. Thanks.

Related Links

t> UA-39527780-1 back to top