Open Source For Geeks: AWS

Showing posts with label AWS. Show all posts

Thursday, 28 March 2019

Writing unit tests for AWS Lambda in Node.js

Background

In the last post, we saw -

How to write Mocha and Chai unit tests for your Node.js app?

In this post, we will see an extension of the same. We will see how we can extend this to write test cases for AWS Lambda functions. We are still going to use Chai and Mocha, but also some additional dependencies. Following is the list of final dependencies we need -

    "devDependencies": {
        "chai": "^4.2.0",
        "lambda-tester": "^3.5.0",
        "mocha": "^6.0.2",
        "mock-require": "^3.0.3"
    }

Also, note these are dev dependencies. These are required only for development/testing. Here -

lambda-tester is used for simulating lambda behavior. You can use this to send mock events and handle the result or errors.
mock-require is used to mock any other dependencies your lambda might be depending on. For eg. any service or network call.

Writing unit tests for AWS Lambda in Node.js

Let's say following is our lambda code -

exports.handler = (event, context, callback) => {

    var searchId = event.pathParameters.searchId;

    if(searchId) {
        var result = {
            status : "success",
            code: 200,
            data: "Got searchId in the request"
        }
        callback(null,result)
    }
    else {
        var result = {
            status : "failure",
            code: 400,
            data: "Missing searchId in the request"
        }
        callback(result, null);
    }


};

It's just a simple code that looks for searchId as the path parameter and if not found returns error. Now mocha tests for this lambda would look like -

const assert = require('chai').assert;
const expect = require('chai').expect;
const lambdaTester = require('lambda-tester');
const mockRequire = require('mock-require');

const index = require("../index");

describe("Lambda Tests", function(){

    describe("Successful Invocation", function(){
        it("Successful Invocation with results", function(done) {

            const mockData = {
                pathParameters : {
                    searchId : 10
                }
            };

            lambdaTester(index.handler).event(mockData)
            .expectResult((result) => {
                expect(result.status).to.exist;
                expect(result.code).to.exist;
                expect(result.data).to.exist;

                assert.equal(result.status, "success");
                assert.equal(result.code, 200);
                assert.equal(result.data, "Got searchId in the request");
            }).verify(done);

        });
       
    });
   
    describe("Failed Invocation", function(){
        it("Unsuccessful invocation", function(done) {

            const mockData = {
                pathParameters : {
                    newSearchId : 5
                }
            };

            lambdaTester(index.handler).event(mockData)
            .expectError((result) => {
                expect(result.status).to.exist;
                expect(result.code).to.exist;
                expect(result.data).to.exist;

                assert.equal(result.status, "failure");
                assert.equal(result.code, 400);
                assert.equal(result.data, "Missing searchId in the request");
            }).verify(done);


        });
    });

})

You need to follow the same conventions we saw in the last post. Create package.json, add dev dependencies mentioned above. Run npm install. Create a folder call test and add your test file in it (Content mentioned above). Once done you can run mocha at the root level.

If you see above I also mentioned we need "mock-require". This would be used if you actually need to mack a service. Let's say your lambda uses a custom logger -

const logger = require('customLogger').logger;

then you can mock this by -

const mockedLogger = {
          logger : {
                 log: function(message) {
                          console.log("message: " + message);
                 }
          }
}

mockRequire('custom-logger', mockedLogger);

This could be a network service or a database query. You can mock it as per your requirement.

Saturday, 6 October 2018

How to do iterative tasks with a delay in AWS Step Functions

Background

Very recently I did a video series on Building Web application using AWS serverless architecture -

Building a web application using AWS serverless architecture

In one of the videos, I covered AWS step function service that can be used to create distributed applications using visual workflows.

Step functions use Lambdas internally to execute business logic with the provided workflow. But there could be limitations that arise. Eg.

The business logic that the Lambda is executing might take more than the maximum allowed time of 5 minutes.
The API that you are calling from the Lambda might have the rate limiting.

One straightforward way would be to break down the Lambda into multiple lambdas and execute the business logic. But it is not always possible. For eg. let's say you are importing some data from a site through an API and then saving it in your DB. You may not always have control over the number if user returned and processed. It is always a good idea to handle it on your end. Fortunately, step functions provide a way to handle such scenarios. In this post, I will try to explain the same.

How to do iterative tasks with a delay in AWS Step Functions

Let us assume we have the following flow -

Get remote data from an API. Let's say we can process only 50 items in that data in a minute. Let's say the API that we call to process each item allows only 50 APIs per minute.
Let's assume we get more than 50 items in the fetch API call. Now we have to batch 50 items at a time and process it.
Wait for 60 seconds and then process the next batch of 50.
Do this till all items fetched are processed.

To do this we can use the following Step machine definition -

 {
    "Comment": "Step function to import external data",
    "StartAt": "FetchDataFromAPI",
    "States": {
        "FetchDataFromAPI": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:us-east-1:499222264523:function:fetch-data",
            "Next": "ProcessData"
        },

        "ProcessData": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:us-east-1:499222264523:function:process-data",
            "Next": "ProcessMoreChoiceState"
        },

        "ProcessMoreChoiceState": {
            "Type": "Choice",
            "Choices": [{
                "Variable": "$.done",
                "BooleanEquals": false,
                "Next": "WaitAndProcessMore"
            }],
            "Default": "Done"
        },

        "WaitAndProcessMore": {
            "Type": "Wait",
            "Seconds": 60,
            "Next": "ProcessData"
        },

        "Done": {
            "Type": "Pass",
            "End": true
        }
    }
}

Visually it looks like below -

If you have gone through the video link shared earlier most of this would have made sense to you by now. The only difference here is the "Wait" state that waits for 60 seconds before retrying.

You will have to send the entire array and the number of items processed so far as output to "ProcessMoreChoiceState" and subsequently to "WaitAndprocessMore" state so that it can be sent back to "ProcessData" state again to process remaining entries. If all entries are processed we just set "done" variable to true which transitions to "Done" state finishing the state machine execution.

Hope this helps. If you have any questions add it in the comments below. Thanks.

Friday, 5 October 2018

Building a web application using AWS serverless architecture

Background

The last couple of weeks I have tried to record Youtube videos to demonstrate how to build a web application using AWS serverless architecture.

Serverless essentially means you do not have to worry about the physical hardware or the operating system or the runtimes. It's all taken care by the service offered under the serverless architecture. This has essentially given way to "Function as a service".

In this post, I will try to summarize what I have covered in those videos over the last 2 week so that anyone looking to put it all together can refer.

If you do not wish to proceed and are just interested in the video links here's a list of them -

AWS Serverless 101(Building a web application): https://youtu.be/I5bW0Oi0tY4
AWS Serverless 102(Understanding the UI code): https://youtu.be/dQJCr0r_RuM
AWS Serverless 103(AWS Lambda): https://youtu.be/Kn86Lq29IMA
AWS Serverless 104(API Gateway): https://youtu.be/yKI_UCYblio
AWS Serverless 105(CI/CD with code pipeline): https://youtu.be/GEWrpZuBEkQ
AWS Serverless 106(Cloudwatch Events): https://youtu.be/9gUB2n0hV7Q
AWS Serverless 107(Step functions): https://youtu.be/rL6EqaMbC5U

Github repo: https://github.com/aniket91/AWS_Serverless

I will go over each of the topics and explain what has been covered in each of the videos. If you think a specific topic interests you, you can pick that up. However, note that some videos might have references to setup done in previous videos. So it is recommended to go through them in the order provided.

AWS Serverless 101(Building a web application)

Video: https://youtu.be/I5bW0Oi0tY4

This video is primarily to introduce you to the world of serverless. It covers a bit of cloud deployment history, information about serverless architecture - What and Why?. It also covers what are the prerequisites for this series and what exactly are we trying to build here. If you are completely new to AWS or AWS serverless concept this is a good place to start.

This web application has a UI with a button on click on which an API call is made to get all the image files from an S3 bucket and are rendered on the UI. API call goes to API Gateway which forwards the request to Lambda. Lambda executes to get all the image files from S3 and returns it to API gateway which in turn returns the response to the UI code(javascript). Response is then parsed as json array and images are rendered on UI.

AWS Serverless 102(Understanding the UI code)

Video: https://youtu.be/dQJCr0r_RuM

This video covers the UI component of the web application. This is in plain HTML and javascript. UI code is hosted on a static website in an S3 bucket.

AWS Serverless 103(AWS Lambda)

Video: https://youtu.be/Kn86Lq29IMA

AWS Lambda forms the basic unit of AWS serverless architecture. This video talks about AWS Lambda - what is Lambda, How to configure it, How to use it and finally code to get the list of files from an S3 bucket.

AWS Serverless 104(API Gateway)

Video: https://youtu.be/yKI_UCYblio

API Gateway is an AWS service to create and expose APIs. This video talks about API Gateway service - how you can create APIs, resources, and methods, How we can integrate API gateway with Lambda. How to create stages and see corresponding endpoints. It also talks about CORS and logging in the API gateway service.

AWS Serverless 105(CI/CD with code pipeline)

Video: https://youtu.be/GEWrpZuBEkQ

This video shows how we can automate backend code deployment using code pipeline AWS service. This includes using Code pipeline, Code build, and Cloud formation services.

AWS Serverless 106(Cloudwatch Events)

Video: https://youtu.be/9gUB2n0hV7Q

This video shows how you can use Cloud watch events to trigger or schedule periodic jobs just like cron jobs in a Linux based system. We are using this service to do a periodic cleanup for non-image files in the S3 bucket.

AWS Serverless 107(Step functions)

Video: https://youtu.be/rL6EqaMbC5U

This video covers how to use state machines of AWS step functions service to build distributed applications using visual workflows. We are using this service to do a asynchronous processing of various image files like jpg, png etc.

Thursday, 13 September 2018

How to use API Gateway stage variables to call specific Lambda alias?

Background

When you are using an API Gateway you create stages like for dev, QA, and production. Sometimes you might want to call different aliases of the same lambda for different stages you have created. Stage variables let you do that. Let's see how we can achieve that in this post.

If you do not wish to read the post below you can just view the youtube video that covers the same flow.

How to use API Gateway stage variables to call specific Lambda alias?

I am going to assume you -

Have already created an API
Deployed it to a stage called "dev"
You have a lambda created. You have published a new version. You have created a new alias and pointed to this new version.

Once you have above setup go to your API gateway and create a resource/method as you like. Once done select integration type as "Lambda" and in place for lambda function add -

MyTestLambda:${stageVariables.lambdaAlias}

Here MyTestLambda is the lambda name and stageVariables.lamndaAlias is using a stage variable named. We will see how we can set this later. Once done save this configuration.

On save, AWS will prompt you saying you are using stage variable and you need to grant appropriate permission -

You defined your Lambda function as a stage variable. Please ensure that you have the appropriate Function Policy on all functions you will use. You can do this by running the below AWS CLI command for each function, replacing the stage variable in the function-name parameter with the necessary function name.

Execute this command in your console.

NOTE: You should have aws cli configured along with a profile that corresponds to your AWS account you are trying this on. If you have multiple AWS profiles configured you might want to use --profile and --region parameters as well. See screenshot below for command and response.

You can now test your method invocation with manually providing the lambdaAlias stage variable value which is "dev" in our case.

You should see appropriate response once executed. Now let's see how we can add it to our actual stages. First, click on your API and deploy it to the stage you need it in, For me, I have created a stage called "dev". In this go to "Stage variables".

Now create a stage variable with name "lambdaAlias" and value "dev" and save it. Now you are all set up. You can not invoke the actual API URL and you should get back the response returned by the lambda alias function.

Wednesday, 12 September 2018

How to enable CloudWatch Logs for APIs in API Gateway

Background

AWS API gateway lets you create APIs that can scale. In this post, I will show you how to turn on cloud watch logging for your API gateway.

If you do not wish to read the post below you can just view the youtube video that covers the same flow.

How to enable CloudWatch Logs for APIs in API Gateway

I am assuming you already have an API created in API gateway and have deployed it in a stage.

Before you turn on cloud watch logging for your API deployed in a stage you need to provide API gateway a role to provide permission to send logs to cloud watch. To do so first create a role from IAM service for API gateway with permission to send logs to cloud watch. To do so go to IAM and then roles and click on create Role.

Next, select the permission that it shows - the one that allows API gateway to publish logs to cloud watch. Click review and create this role.

Once the role is created open it and copy the role ARN.

Now go to API gateway and go to Settings. Here you should see "CloudWatch log role ARN" field. Paste the copied ARN into this and save.

Once this is set up all that is left is to turn on cloud watch logging for your API. To turn it on go to your stage where your API is deployed. Next, go to the "Logs/tracing" tab and select the checkbox that says "Enable CloudWatch Logs". You can also optionally select "Log full requests/responses data".

Now you can go to Cloud watch -> Logs and see logs corresponding to each stage of your API gateway.

NOTES:

If you successfully enabled CloudWatch Logs for API Gateway, you will see the entry /aws/apigateway/welcome listed in the Log Groups section of the right pane.
You might need to redeploy your API after enabling CloudWatch logs from the API Gateway console before your logs are visible in the CloudWatch console.
Your API will have a Log Group titled API-Gateway-Execution-Logs_api-id/ that contains numerous log streams.

Tuesday, 21 August 2018

How to delete private EC2 AMI from AWS

Background

Most of the times you would want to customize the EC2 image that you want to run. It would help you quickly spin up a new instance or put it behind an auto scaling group. To do this you can create an image out of your current running EC2 instance. But this is an incremental process. So you may want to create a new image after making some changes to EC2 instance spun up from the current image. Once the new image is created you can delete the previous AMI/image. In this post, I will tell you what are the steps to do so.

How to delete private EC2 AMI from AWS

To delete a private AMI follow the steps below -

Open the Amazon EC2 console (https://console.aws.amazon.com/ec2/).
Select the region from the drop-down at the top based on which region your Alocatedocaled.
In the navigation pane, click AMIs.
Select the AMI, click Actions, and then click Deregister. When prompted for confirmation, click Continue.

NOTE: It may take a few minutes before the console removes the AMI from the list. Choose Refresh to refresh the status.

In the navigation pane, click Snapshots.
Select the snapshot, click Actions, and then click Delete. When prompted for confirmation, click Yes, Delete.
Terminate any EC2 instances that might be running with old AMI.
Delete any EBS volumes for those EC2 instances if "delete on termination" is not set.

When you create an AMI it creates a snapshot for each volume associated with that image. So let's say your EC2 instance has 2 EBS volumes C drive and D drive of 30 GB and 50GB respectively in size then you would see 2 snapshots for them under snapshots section. Your AMI size will be the sum of individual snapshot size (i.e 30GB + 50GB = 80GB). To successfully delete the AMI you need to deregister the AMI and then delete the snapshots manually. And finally, when both are done terminate the EC2 instances that you might have running with old AMI.

Source: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/deregister-ami.html

NOTES:

When you deregister an Amazon EBS-backed AMI, it doesn't affect the snapshot that was created for the root volume of the instance during the AMI creation process. You'll continue to incur storage costs for this snapshot. Therefore, if you are finished with the snapshot, you should delete it.
You can deregister an AMI when you have finished using it. After you deregister an AMI, you can't use it to launch new instances.
When you deregister an AMI, it doesn't affect any instances that you've already launched from the AMI. You'll continue to incur usage costs for these instances. Therefore, if you are finished with these instances, you should terminate them.
AWS will not let you delete a snapshot associated with an AMI before you deregister the AMI.
Delete EBS volumes (unless they are set to delete on termination, in which case, they would be removed on terminating EC2 instance). This isn't necessary for S3 backed instances

Thursday, 16 August 2018

How to send an email using the AWS SES service?

Background

In many business use cases, you need to notify your users over an email. AWS provides its own service to do so. You can use AWS SES (Simple email service) to send out emails to your users. In this post, we will see how we can configure AWS SES service and send out emails.

If you want to skip all the text post below you can refer to the video I have created to cover the same thing -

Setting up configurations in AWS SES service

First and foremost go to SES service in your AWS console. There are limited regions were AWS SES service is available. Please choose one of those regions -

Next thing that you need to do is verify your email address - the address from which you want to send out emails. For this go to "Email addresses" section in the left panel. Now click on "Verify a New Email Address" button. Enter your email address and click "Verify this email address". Once done you should get an email to confirm and validate your email address.

Click on the confirmation link. Once done you should see your email address as verified in SES "Email addresses" section.

Next thing that we would need to setup is email template. The email template is the content that you went send in your email. This would include your unique template name, subject, body etc. Also, this can be parameterized which means you can have placeholders in the template that can be replaced at the runtime. A sample template would be -

{
    "Template": {
        "TemplateName": "testtemplate",
        "SubjectPart": "Test email for username {{userName}}",
        "TextPart": "Test email body!",
        "HtmlPart": "<html>\r\n\r\n<head>\r\n    <title>Test Title<\/title>\r\n<\/head>\r\n\r\n<body>\r\n    <h1> Username : {{userName}}<\/h1>\r\n<\/body>\r\n\r\n<\/html>"


    }
}

HtmlPart is JSON escaped version of your actual HTML content. TemplateName is unique template name for your template that you would reference later from your code to send the mail. SubjectPart is the email subject. For more details refer -

https://docs.aws.amazon.com/ses/latest/APIReference/API_Template.html

Notice how we have added {{userName}} in the subject and text. This is a placeholder that will be replaced at runtime. We will see how in some time. Once your template is set up it is time to add it to AWS SES. To do this you need to call following API -

aws ses create-template --cli-input-json fileb://test_template.json --region us-east-1 --profile osfg

If you have already created the template and want to update it you can execute -

aws ses update-template --cli-input-json fileb://test_template.json --region us-east-1 --profile osfg

NOTE: You need to have AWS CLI configured on your machine and setup with your AWS profile. I have multiple profiles on my machine which is why you see --profile argument. if you just have one then you do not have to mention it.

Once you have executed this API you can see your template under "Email Templates" section of SES service.

Now that you have your email verified and template setup you can send an email. Following is the node.js code to do it -

const aws = require('aws-sdk');
aws.config.loadFromPath('./config.json');
const ses = new aws.SES();

/**
 * author: athakur
 * aws ses create-template --cli-input-json fileb://test_template.json --region us-east-1 --profile osfg
 * aws ses update-template --cli-input-json fileb://test_template.json --region us-east-1 --profile osfg
 * @param {*} callback 
 */
var sendMail = function (callback) {
    var params = {};

    var destination = {
        "ToAddresses": ["opensourceforgeeks@gmail.com"]
    };
    var templateData = {};
    templateData.userName = "athakur";

    params.Source = "opensourceforgeeks@gmail.com";
    params.Destination = destination;
    params.Template = "testtemplate";
    params.TemplateData = JSON.stringify(templateData)



    ses.sendTemplatedEmail(params, function (email_err, email_data) {
        if (email_err) {
            console.error("Failed to send the email : " + email_err);
            callback(email_err, email_data)
        } else {
            console.info("Successfully sent the email : " + JSON.stringify(email_data));
            callback(null, email_data);
        }
    })
}


sendMail(function(err, data){
    if(err){
        console.log("send mail failed");
    }
    else {
        console.log("Send mail succedded");
    }
})

In the above code, you can see that we provide the ToAddresses which is an array field. So you can give multiple addresses here. I have used the same email address as the one we have used in the source (from email address - the one we whitelisted and verified in the first step of setup). Also, notice the template name that we have provided - it is the same name we had configures in the json template file. Finally, notice the placeholder value we have provided - for userName. This will be replaced by "athakur" in the template and the email will be sent.

You can find this code snippet in my GitHub gist-

https://gist.github.com/aniket91/799e3c8a6132b79f6e35b0c8a99f7742

And you should get an email. In my case, it looks as follows -

NOTE: If you are not able to see the mail check your spam folder. Since it is sent by AWS SES service it might land up there. Mark it as non-spam and you should be good going forward.

Monday, 16 July 2018

How to fix "Unable to find a region via the region provider chain" exception with AWS SDK

Background

Recently I was working on a task that needed to upload and download a test file on AWS S3 bucket. For this, I used AWS Java SDK. The functionality seemed to work fine until the code was deployed in production. In this post, I will try to explain why does this exception occur and how we can fix this.

Code that I had used for the test upload was as follows -

  try {
   BasicAWSCredentials awsCreds = new BasicAWSCredentials("YourAccessKeyId", "YourSecretKey");
   AmazonS3 s3client = AmazonS3ClientBuilder.standard()
                    .withCredentials(new AWSStaticCredentialsProvider(awsCreds))
                    .build();
   s3client.putObject(storage.getBucketName(), "test.txt", "Done!");

  } catch (AmazonServiceException ase) {
   logger.error(
     "Caught an AmazonServiceException, which means your request made it to Amazon S3, but was rejected with an error response for some reason. storage : {}",
     storage, ase);
   logger.error("Error Message:    {}", ase.getMessage());
   logger.error("HTTP Status Code: {}", ase.getStatusCode());
   logger.error("AWS Error Code:   {}", ase.getErrorCode());
   logger.error("Error Type:       {}", ase.getErrorType());
   logger.error("Request ID:       {}", ase.getRequestId());
  } catch (AmazonClientException ace) {
   logger.error(
     "Caught an AmazonClientException, which means the client encountered an internal error while trying to communicate with S3, such as not being able to access the network storage : {}",
     storage, ace);
   logger.error("Error Message: {}", ace.getMessage());
  } catch (Exception ex) {
   logger.error("Got exception while testing upload to S3", ex);
  }

I had tested this code by deploying it on one of our EC2 instances and it worked fine. I will probably explain this at a later point as to why it worked with EC2 instance but there is a major flaw with the above code that finally ends up throwing an exception - "Unable to find a region via the region provider chain"

Relevant Stacktrace:

com.amazonaws.client.builder.AwsClientBuilder.configureMutableProperties(AwsClientBuilder.java:352) 
at com.amazonaws.client.builder.AwsClientBuilder.setRegion(AwsClientBuilder.java:386) 
com.amazonaws.SdkClientException: Unable to find a region via the region provider chain.
Must provide an explicit region in the builder or setup environment to supply a region.

The problem

The problem with the above piece of code is that we are not supplying the AmazonS3ClientBuilder with the region. Even though S3 is a global service, the bucket that you create are region specific. Each region would have its own endpoint and hence AWS S3 SDK need to know which region to know the endpoint it needs to make the API call to.

The Solution

The simplest solution is to pass the region to the AmazonS3ClientBuilder builder explicitly as follows -

AmazonS3 s3client = AmazonS3ClientBuilder.standard()
        .withCredentials(new AWSStaticCredentialsProvider(awsCreds))
        .withRegion(Regions.US_EAST_1)
        .build();

NOTE: After you build a client with the builder, it's immutable and the region cannot be changed. If you are working with multiple AWS Regions for the same service, you should create multiple clients—one per region.

Understanding the solution

The problem and the solution might appear simple but there are various aspects you need to understand about setting region in your S3 builder.

There are various ways your AWS SDK can infer the region to use instead if you explicitly providing it yourself.

NOTE: You must use client builders to have the SDK automatically detect the region your code is running in. This is not applicable if you are using client constructor and default region from the SDK will be used.

If you don't explicitly set a region using the withRegion methods, the SDK consults the default region provider chain to try and determine the region to use.

Any explicit region set by using withRegion or setRegion on the builder itself takes precedence over anything else.
The AWS_REGION environment variable is checked. If it's set, that region is used to configure the client.

NOTE: This environment variable is set by the Lambda container.

The SDK checks the AWS shared configuration file (usually located at ~/.aws/config). If the region property is present, the SDK uses it.

The AWS_CONFIG_FILE environment variable can be used to customize the location of the shared config file.
The AWS_PROFILE environment variable or the aws.profile system property can be used to customize the profile that is loaded by the SDK.

The SDK attempts to use the Amazon EC2 instance metadata service to determine the region of the currently running Amazon EC2 instance.

If the SDK still hasn't found a region by this point, client creation fails with an exception. And we saw what exception it throws :) It is is the reason I am writing this post. Also, you must have realized why it worked for me with the EC2 instance. As per point number 4, SDK attempts to use the Amazon EC2 instance metadata service to determine the region. Hope this helps!

Saturday, 23 June 2018

DynamoDB read and write provisioned throughput calculations

Background

Amazon DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit millisecond latency at any scale. It is a fully managed cloud database and supports both document and key-value store models. Its flexible data model, reliable performance, and automatic scaling of throughput capacity make it a great fit for mobile, web, gaming, ad tech, IoT, and many other applications.

In this post, I will show you how to calculate read and write provisioned throughput for Dynamo DB. This is a very common question asked in "AWS Certified Developer - Associate" exam. I will also show you some of the examples to fully understand the calculations.

AWS allows us to change the read and write capacity unity of DynamoDB which lets us scale the DB based on our requirements.

But the major question is how do you come up with this capacity unit values which is exactly what we are going to see below.

DynamoDB read and write provisioned throughput calculations

Before we head on to the calculation part let's try to understand some details about read and write throughputs in DynamoDB.

Read provisioned throughput :

All reads are rounded to increments of 4 KB
Eventual consistent reads (default) consist of 2 reads per second
Strongly consistent reads consist of 1 read per second

Write provisioned throughput:

All writes are rounded to increments of 1 KB
All writes consists of 1 write per second

Now let's see how we can compute read provisioned throughput -

DynamoDB read provisioned throughput calculation

Find the read units required per item. For this, you need to round the item size to the nearest chunk of 4KB and then divide by 4. For example, if you each item size if 6KB then the nearest 4KB chunk would be 8KB and the read units required would be 8 / 4 = 2. Another example, if your item size is 1KB then your nearest 4KB chunk is 4KB and the read units required are 4 /4 =1. Let's call this value X.
Now you need to calculate a number of items read per second. For example, if you are reading 120 items per minute then the number of items read per second is 120/60 = 2. Let's call this value Y.
Your read capacity unit for strongly consistent reads would be (X*Y).
If you are using eventual consistent reads then divide above number by 2 to get the read provisioned throughput i.e (X*Y)/2. This is because for eventual consistency case there are 2 reads per second. So to get read throughput you need to divide by 2.

Let's take some example to understand this better.

Q1. Let's say you have an application that requires to read 20 items of 2 KB per second using eventual consistent reads. What read throughout value should be set?

A. Our item size is 2KB per second. So let's first round it to nearest 4KB chunk which is nothing but 4KB. Now to get read units per item we divide by 4. So 4 /4 = 1. This is our X if you are following above method. Now the number of items read is 20 per second which is our Y. So X* Y = 1 * 20 = 20. Finally, we are saying reads are eventually consistent which means we need to divide above value further by 2. So the final read throughput is 20 / 2 = 10.

Let's see another example -

Q2. Let's say you have an application that requires to read 10 items of 10 KB per second using eventual consistent reads. What read throughout value should be set?

A. Our item size is 10KB per second. So let's first round it to nearest 4KB chunk which is nothing but 12KB. Now to get read units per item we divide by 4. So 12 /4 = 3. This is our X if you are following above method. Now the number of items read is 10 per second which is our Y. So X* Y = 3 * 10 = 30. Finally, we are saying reads are eventually consistent which means we need to divide above value further by 2. So the final read throughput is 30 / 2 = 15.

Now let's see an example with strong consistency.

Q3. Let's say you have an application that requires to read 5 items of 6 KB per second using strongly consistent reads. What read throughout value should be set?

A. Our item size is 6KB per second. So let's first round it to nearest 4KB chunk which is nothing but 8KB. Now to get read units per item we divide by 4. So 8 /4 = 2. This is our X if you are following above method. Now the number of items read is 5 per second which is our Y. So X* Y = 2 * 5 = 10. Finally, since the reads are strongly consistent you do not need to divide the result by 3. So the final read throughput is 10.

DynamoDB write provisioned throughput calculation

Find the write units required per item. Since each write unit is 1 KB you can directly use the actual size of an item in KB as the read unit per item. For example, if you each item size if 6KB then the write units required would be 6. Another example, if your item size is 12KB then your the write units required are 12. Let's call this value X.
Now you need to calculate a number of items read per second. For example, if you are reading 120 items per minute then the number of items read per second is 120/60 = 2. Let's call this value Y.
Your write capacity unit for strongly consistent reads would be (X*Y). There is no notion of strongly consistent or eventually consistent write.

Let's see some examples for this.

Q1. You have an application that writes 10 items where each item is 11 KB in size per second. What should be the write throughput set to?

A. Since item size is 11 KB and each write unit is of 1KB we need 11 write units per item. This is our X. We also know the application is writing 10 items per second to the DB which is our Y value. So the write throughout is X * Y = 11 * 10 = 110.

Let's see another example -

Q2. You have an application that writes 100 items where each item is 10 KB in size per second. What should be the write throughput set to?

A. Since item size is 10 KB and each write unit is of 1KB we need 10 write units per item. This is our X. We also know the application is writing 100 items per second to the DB which is our Y value. So the write throughout is X * Y = 10 * 100 = 1000.

NOTE: Each item is nothing but a row in DynamoDB.

Tuesday, 12 June 2018

AWS service limits asked in "AWS Certified Solutions Architect - Associate" and "AWS Certified Developer - Associate" certifications

Background

I just cleared my "AWS Certified Developer - Associate" certification exam yesterday with 90%. I have already cleared "AWS Certified Solutions Architect - Associate" exam 6 months back with 89%. You can see my badges below-

While preparing I realized that there are some questions based on service limits in AWS. These can be straightforward questions or they can be slightly twisted. Either case knowing service limits help out a lot. So I am going to summarize most of them which I feel important from certification perspective.

NOTE: AWS service limits can change anytime. So it is best to refer the FAQ sections of corresponding services to confirm. Following limits are as of June 2018.

AWS service limits & constraints

Following are AWS services and their corresponding limits. There would be more limits and constraints to each service. I am simply trying to summarise based on my exam preparation, test quizzes, and actual exam experience. Please let me know in comments if these limits are changed and I can update accordingly. Thanks.

Consolidated billing

There is a soft limit of 20 accounts per organization and a hard limit of one level of billing hierarchy.
For more detials refer - https://aws.amazon.com/answers/account-management/aws-multi-account-billing-strategy/

AWS S3

By default, customers can provision up to 100 buckets per AWS account. However, you can increase your Amazon S3 bucket limit by visiting AWS Service Limits.
The bucket name can be between 3 and 63 characters long and can contain only lower-case characters, numbers, periods, and dashes.
Bucket names must not be formatted as an IP address (for example, 192.168.5.4).
For more details refer - https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html
AWS S3 offers unlimited storage
Each object on S3, however, can be 0 bytes to 5TB.
The largest object that can be uploaded in a single PUT is 5GB
For objects larger than 100 megabytes, customers should consider using the Multipart Upload capability.
For further details refer - https://aws.amazon.com/s3/faqs/

Glacier

There is no maximum limit to the total amount of data that can be stored in Amazon Glacier.
Individual archives are limited to a maximum size of 40 terabytes.
For more details refer - https://aws.amazon.com/glacier/faqs/

Redshift

Block size for columnar storage is 1024 kb or 1 MB
For more details refer - https://docs.aws.amazon.com/redshift/latest/dg/c_columnar_storage_disk_mem_mgmnt.html

AWS EC2

There is a limit of 20 EC2 instances per region. However, this may vary from region to region. Use the EC2 Service Limits page in the Amazon EC2 console to view the current limits for resources provided by Amazon EC2 on a per-region basis. This limit can be increased on request.
For more details refer - https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-resource-limits.html
Size limit for a root device for Amazon EBS-Backed AMI is 16 TiB
Size limit for a root device for Amazon Instance Store-Backed AMI is 10 GiB
For more details refer - https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ComponentsAMIs.html
When you enable connection draining, you can specify a maximum time for the load balancer to keep connections alive before reporting the instance as de-registered. The maximum timeout value can be set between 1 and 3,600 seconds (the default is 300 seconds).
For more details refer - https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/config-conn-drain.html

VPC

You can have a maximum of 5 VPCs per region.
You can have a maximum of 200 subnets per VPC
Only one internet gateway can be attached to a VPC at a time.
Only one virtual private gateway can be attached to a VPC at a time.
One subnet always corresponds to 1 AZ.
For more details refer - https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Appendix_Limits.html

Route 53

There are 50 domain names available by default, however, it is a soft limit and can be raised by contacting AWS support
For more details refer - https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/DNSLimitations.html

Cloud watch

Standard/Basic monitoring - 5 mins
Detailed monitoring - 1 mins
For more details refer - https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-cloudwatch-new.html

Cloud formation

Maximum number of AWS CloudFormation stacks that you can create - 200 stacks
Maximum number of parameters that you can declare in your AWS CloudFormation template - 60
Maximum number of outputs that you can declare in your AWS CloudFormation template - 60
For more details refer - https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cloudformation-limits.html

Lambda

Maximum configuration: 3GB memory and 5 mins timeout
Default configuration: 128 MB memory and 3 seconds timeout
512 MB of temp space i.e /tmp
For more details refer - https://docs.aws.amazon.com/lambda/latest/dg/limits.html

Dynamo DB

There is an initial limit of 256 tables per region. You can raise a request to increase this limit.
You can define a maximum of 5 local secondary indexes and 5 global secondary indexes per table(hard limit) - total 10 secondary indexes
The maximum size of item collection is 10GB
The minimum amount of reserved capacity that can be bought - 100
The maximum item size in DynamoDB is 400 KB, which includes both attribute name binary length (UTF-8 length) and attribute value lengths (again binary length). The attribute name counts towards the size limit. No limit on the number of items.
For more details refer - https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html
A BatchGetItem single operation can retrieve up to 16 MB of data, which can contain as many as 100 items
For more details refer - https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchGetItem.html
A single Scan operation will read up to the maximum number of items set (if using the Limit parameter) or a maximum of 1 MB of data and then apply any filtering to the results using FilterExpression.
For more details refer - https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Scan.html

SQS

You can create any number of message queues.
Max configuration: 14 days retention and 12 hours visibility timeout
Default configuration: 4 days retention and 30 seconds visibility timeout
A single request can have up to 1 to 10 messages up to a maximum payload of 256KB.
Each 64 kb chunk payload is billed as 1 request. So a single API call with 256kb payload will be billed as 4 requests.
To configure the maximum message size, use the console or the SetQueueAttributes method to set the MaximumMessageSize attribute. This attribute specifies the limit on bytes that an Amazon SQS message can contain. Set this limit to a value between 1,024 bytes (1 KB), and 262,144 bytes (256 KB).
For more details refer - https://aws.amazon.com/sqs/faqs/

SNS

By default, SNS offers 10 million subscriptions per topic and 100,000 topics per account. To request a higher limit, please contact Support.
Topic names are limited to 256 characters.
SNS subscription confirmation time period is 3 days

SWF

Maximum registered domains – 100
Maximum workflow and activity types – 10,000 each per domain
Maximum open activity tasks – 1,000 per workflow execution
Year of retention for workflow execution
For more details refer - https://docs.aws.amazon.com/amazonswf/latest/developerguide/swf-dg-limits.html

Again as mentioned before this is obviously not an exhaustive list but merely a summary of what I thought could be best to revise before going to the associate exams. Let me know if you think something else needs to be added here for the benefit of everyone.

Since you have taken time to go through the limits here is a bonus question for you :)

Question: You receive a call from a potential client who explains that one of the many services they offer is a website running on a t2.micro EC2 instance where users can submit requests for customized e-cards to be sent to their friends and family. The e-card website administrator was on a cruise and was shocked when he returned to the office in mid-January to find hundreds of angry emails complaining that customers' loved ones had not received their Christmas cards. He also had several emails from CloudWatch alerting him that the SQS queue for the e-card application had grown to over 500 messages on December 25th. You investigate and find that the problem was caused by a crashed EC2 instance which serves as an application server. What do you advise your client to do first? Choose the correct answer from the options below

Options:

Use an autoscaling group to create as many application servers as needed to access all of the Christmas card SQS messages.
Reboot the application server immediately so that it begins processing the Christmas cards SQS messages.
Redeploy the application server as larger instance type so that it processed the Christmas cards SQS faster.
Send an apology to the customer notifying them that their cards will not be delivered.

Answer:

4. Send an apology to the customer notifying them that their cards will not be delivered.

Explanation:

Since 500 message count was as of December 25th and e-card website administrator returned mid-Jan the difference is more than 14 days which is the maximum retention period for SQS messages.

To be honest I had select option 1 in my 1st attempt :)

Friday, 25 May 2018

Understanding difference between Cognito User Pool vs Identity Pool

Background

In one of the previous post, we saw how to setup Cognito Identity Pool for unauthenticated or authenticated access to AWS resources like S3. Cognito identity pool is used to federate users into AWS so that they can call AWS services. In this post, we are going to see what is the difference between Cognito user pool and identity pool.

Amazon Cognito User Pools

As per AWS documentation,

A user pool is a user directory in Amazon Cognito. With a user pool, your users can sign in to your web or mobile app through Amazon Cognito. Your users can also sign in through social identity providers like Facebook or Amazon, and through SAML identity providers. Whether your users sign-in directly or through a third party, all members of the user pool have a directory profile that you can access through an SDK.

User pools provide:

Sign-up and sign-in services.
A built-in, customizable web UI to sign in users.
Social sign-in with Facebook, Google, and Login with Amazon, as well as sign-in with SAML identity providers from your user pool.
User directory management and user profiles.
Security features such as multi-factor authentication (MFA), checks for compromised credentials, account takeover protection, and phone and email verification.
Customized workflows and user migration through AWS Lambda triggers.

Source: https://docs.aws.amazon.com/cognito/latest/developerguide/cognito-user-identity-pools.html

After successfully authenticating a user, Amazon Cognito issues JSON web tokens (JWT) that you can use to secure and authorize access to your own APIs, or exchange for AWS credentials.

Amazon Cognito Identity Pools (Federated Identities)

As per AWS documentation,

Amazon Cognito identity pools (federated identities) enable you to create unique identities for your users and federate them with identity providers. With an identity pool, you can obtain temporary, limited-privilege AWS credentials to access other AWS services.

Source: https://docs.aws.amazon.com/cognito/latest/developerguide/cognito-identity.html

Understanding the difference between Cognito User Pool vs Identity Pool

Above definitions can be very confusing, so let me simplify them for you.

Cognito user pool is nothing but your user management system backed by its own user directory. If you are building a new app or a website and you want to add authentication mechanism to sign in or sign up for your users you should use Cognito user pool. You can choose to have users sign in with an email address, phone number, username or preferred username plus their password. You can also use social identity providers for authentication and signing in or up for your app or website. Everything is under Cognito user pool umbrella. You can use their SDK provided to do this. Cognito user pool helps you maintain your user base details and their authentication. On successful authentication, it provides a JWT token that can be used to authenticate your custom server APIs as well.

Cognito identity pool is used when you need access to AWS services. It basically authenticates the user and if authentication is successful it will give you a temporary token that you can use to talk to AWS. For eg. let's say you want to upload a file to S3 then you can use this. I had written a post earlier to do just the same -

How to upload files to S3 from iOS app written in Objective C using AWS Cognito identity pool

Now how do you authenticate users to get these temporary credentials is something you decide. You could authenticate against the Cognito user group you have already created or you can again use 3rd party authentication providers like -

Amazon
Google
Facebook etc

Authentication flow is as -

Source: https://docs.aws.amazon.com/cognito/latest/developerguide/authentication-flow.html

So to summarize if you want to build a user directory with sign-in / sign-up functionality Cognito user pool is the way to go and if you just want access to AWS services without worrying about maintaining user database of your own you can use Cognito identity pool.

Tuesday, 22 May 2018

How to upload files to S3 from iOS app written in Objective C using AWS Cognito identity pool

Background

This post assumes you have setup Cognito identity pool as explained in the previous post -

How to setup Cognito Identity Pool for unauthenticated or authenticated access to AWS resources like S3

If not, then please refer the previous post and set that up. Before starting with using this you should have -

Identity pool ID
AWS region where pool and S3 bucket reside
S3 bucket name

We will use above in configuration and implementation that follows.

How to upload files to S3 from iOS app written in Objective C using AWS Cognito identity pool

First, go to your pod file and update following dependencies -

# Uncomment the next line to define a global platform for your project
# platform :ios, '9.0'


target 'TestApp' do
  # Uncomment the next line if you're using Swift or would like to use dynamic frameworks
  # use_frameworks!
  # Pods for TestApp


pod 'AWSMobileClient', '~> 2.6.18'  # For AWSMobileClient
pod 'AWSS3', '~> 2.6.18'            # For file transfers
pod 'AWSCognito', '~> 2.6.18'       # For data sync

end

Next run "Run pod install --repo-update" from the command line
Once you have the dependencies installed we can now write Objective C code to upload a file to S3.

//
//  FileUploader.m
//  TestApp
//
//  Created by Aniket on 22/05/18.
//


#import <Foundation/Foundation.h>


#import "FileUploader.h"
#import <AWSS3/AWSS3.h>
#import <AWSCore/AWSCore.h>




@implementation FileUploader


static AWSS3TransferManager *transferManager;


+ (void) initialize {
    
    if (self == [FileUploader class]) {
        AWSCognitoCredentialsProvider *credentialsProvider = [[AWSCognitoCredentialsProvider alloc] initWithRegionType:AWSRegionUSEast1 identityPoolId:@"us-east-1:f847843f-0162-43c2-b73f-efdc7c69cce2"];
        AWSServiceConfiguration *configuration = [[AWSServiceConfiguration alloc] initWithRegion:AWSRegionUSEast1 credentialsProvider:credentialsProvider];
        [AWSS3TransferManager registerS3TransferManagerWithConfiguration:[[AWSServiceConfiguration alloc] initWithRegion:AWSRegionUSEast1 credentialsProvider:credentialsProvider] forKey:s3TransferManagerKey];


        AWSServiceManager.defaultServiceManager.defaultServiceConfiguration = configuration;
                                        
        transferManager = [AWSS3TransferManager S3TransferManagerForKey:s3TransferManagerKey];
    }
    
}


+ (void)uploadFile
{
    
    
    NSURL *uploadingFileURL = [NSURL fileURLWithPath: @"PATH_TO_FILE";
    AWSS3TransferManagerUploadRequest *uploadRequest = [AWSS3TransferManagerUploadRequest new];
    
    
    uploadRequest.bucket = s3Bucket;
    int timestamp = [[NSDate date] timeIntervalSince1970];
    uploadRequest.key = [NSString stringWithFormat:@"%@-%d%@",@"testfile",timestamp,@".txt"];
    uploadRequest.body = uploadingFileURL;
    
    [[transferManager upload:uploadRequest] continueWithExecutor:[AWSExecutor mainThreadExecutor]
                                                       withBlock:^id(AWSTask *task) {
                                                           if (task.error) {
                                                               if ([task.error.domain isEqualToString:AWSS3TransferManagerErrorDomain]) {
                                                                   switch (task.error.code) {
                                                                       case AWSS3TransferManagerErrorCancelled:
                                                                       case AWSS3TransferManagerErrorPaused:
                                                                           break;
                                                                           
                                                                       default:
                                                                           NSLog(@"Error uploading file to S3: %@", task.error);
                                                                           break;
                                                                   }
                                                               } else {
                                                                   // Unknown error.
                                                                   NSLog(@"Error uploading file to S3: %@", task.error);
                                                               }
                                                           }
                                                           
                                                           if (task.result) {
                                                               AWSS3TransferManagerUploadOutput *uploadOutput = task.result;
                                                               // The file uploaded successfully.
                                                               NSLog(@"uploading file to S3 was successful: %@", uploadOutput);
                                                           }
                                                           return nil;
                                                       }];
    
}
@end

In above code replace the identity pool id and region as per your configuration settings. This is Objective C code, if you want to see Swift or Android - Java please refer https://docs.aws.amazon.com/aws-mobile/latest/developerguide/how-to-integrate-an-existing-bucket.html

Thursday, 28 March 2019

Background

Writing unit tests for AWS Lambda in Node.js

Related Links

Saturday, 6 October 2018

Background

How to do iterative tasks with a delay in AWS Step Functions

Related Links

Friday, 5 October 2018

Background

AWS Serverless 101(Building a web application)

AWS Serverless 102(Understanding the UI code)

AWS Serverless 103(AWS Lambda)

AWS Serverless 104(API Gateway)

AWS Serverless 105(CI/CD with code pipeline)

AWS Serverless 106(Cloudwatch Events)

AWS Serverless 107(Step functions)

Related Links

Thursday, 13 September 2018

Background

How to use API Gateway stage variables to call specific Lambda alias?

Related Links

Wednesday, 12 September 2018

Background

How to enable CloudWatch Logs for APIs in API Gateway

Related Links

Tuesday, 21 August 2018

Background

How to delete private EC2 AMI from AWS

Related Links

Thursday, 16 August 2018

Background

Setting up configurations in AWS SES service

Related Links

Monday, 16 July 2018

Background

The problem

The Solution

Understanding the solution

Related Links

Saturday, 23 June 2018

Background

DynamoDB read and write provisioned throughput calculations

DynamoDB read provisioned throughput calculation

DynamoDB write provisioned throughput calculation

Related Links

Tuesday, 12 June 2018

Background

AWS service limits & constraints

Consolidated billing

AWS S3

Glacier

Redshift

AWS EC2

VPC

Route 53

Cloud watch

Cloud formation

Lambda

Dynamo DB

SQS

SNS

SWF

Related Links

Friday, 25 May 2018

Background

Amazon Cognito User Pools

Amazon Cognito Identity Pools (Federated Identities)

Understanding the difference between Cognito User Pool vs Identity Pool

Related Links

Tuesday, 22 May 2018

Background

How to upload files to S3 from iOS app written in Objective C using AWS Cognito identity pool

Related Links