Saturday, 23 June 2018

DynamoDB read and write provisioned throughput calculations

Background

Amazon DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit millisecond latency at any scale. It is a fully managed cloud database and supports both document and key-value store models. Its flexible data model, reliable performance, and automatic scaling of throughput capacity make it a great fit for mobile, web, gaming, ad tech, IoT, and many other applications.

In this post, I will show you how to calculate read and write provisioned throughput for Dynamo DB. This is a very common question asked in "AWS Certified Developer - Associate"  exam. I will also show you some of the examples to fully understand the calculations.

AWS allows us to change the read and write capacity unity of DynamoDB which lets us scale the DB based on our requirements. 



But the major question is how do you come up with this capacity unit values which is exactly what we are going to see below.

DynamoDB read and write provisioned throughput calculations


Before we head on to the calculation part let's try to understand some details about read and write throughputs in DynamoDB.

  1. Read provisioned throughput :
    • All reads are rounded to increments of 4 KB
    • Eventual consistent reads (default) consist of 2 reads per second
    • Strongly consistent reads consist of 1 read per second
  2. Write provisioned throughput:
    • All writes are rounded to increments of 1 KB
    • All writes consists of 1 write per second

Now let's see how we can compute read provisioned throughput -

DynamoDB read provisioned throughput calculation

  1. Find the read units required per item. For this, you need to round the item size to the nearest chunk of 4KB and then divide by 4. For example, if you each item size if 6KB then the nearest 4KB chunk would be 8KB and the read units required would be 8 / 4 = 2. Another example, if your item size is 1KB then your nearest 4KB chunk is 4KB and the read units required are 4 /4 =1. Let's call this value X.
  2. Now you need to calculate a number of items read per second. For example, if you are reading 120 items per minute then the number of items read per second is 120/60 = 2. Let's call this value Y.
  3. Your read capacity unit for strongly consistent reads would be (X*Y).
  4. If you are using eventual consistent reads then divide above number by 2 to get the read provisioned throughput i.e (X*Y)/2. This is because for eventual consistency case there are 2 reads per second. So to get read throughput you need to divide by 2.
Let's take some example to understand this better.


Q1. Let's say you have an application that requires to read 20 items of 2 KB per second using eventual consistent reads. What read throughout value should be set?

A. Our item size is 2KB per second. So let's first round it to nearest 4KB chunk which is nothing but 4KB. Now to get read units per item we divide by 4. So 4 /4 = 1. This is our X if you are following above method. Now the number of items read is 20 per second which is our Y. So X* Y = 1 * 20 = 20. Finally, we are saying reads are eventually consistent which means we need to divide above value further by 2. So the final read throughput is 20 / 2 = 10.

Let's see another example -

Q2. Let's say you have an application that requires to read 10 items of 10 KB per second using eventual consistent reads. What read throughout value should be set?

A. Our item size is 10KB per second. So let's first round it to nearest 4KB chunk which is nothing but 12KB. Now to get read units per item we divide by 4. So 12 /4 = 3. This is our X if you are following above method. Now the number of items read is 10 per second which is our Y. So X* Y = 3 * 10 = 30. Finally, we are saying reads are eventually consistent which means we need to divide above value further by 2. So the final read throughput is 30 / 2 = 15.


Now let's see an example with strong consistency.

Q3. Let's say you have an application that requires to read 5 items of 6 KB per second using strongly consistent reads. What read throughout value should be set?

A. Our item size is 6KB per second. So let's first round it to nearest 4KB chunk which is nothing but 8KB. Now to get read units per item we divide by 4. So 8 /4 = 2. This is our X if you are following above method. Now the number of items read is 5 per second which is our Y. So X* Y = 2 * 5 = 10. Finally, since the reads are strongly consistent you do not need to divide the result by 3. So the final read throughput is 10.
 

DynamoDB write provisioned throughput calculation

  1. Find the write units required per item. Since each write unit is 1 KB you can directly use the actual size of an item in KB as the read unit per item. For example, if you each item size if 6KB then the write units required would be 6. Another example, if your item size is 12KB then your the write units required are 12. Let's call this value X.
  2. Now you need to calculate a number of items read per second. For example, if you are reading 120 items per minute then the number of items read per second is 120/60 = 2. Let's call this value Y.
  3. Your write capacity unit for strongly consistent reads would be (X*Y). There is no notion of strongly consistent or eventually consistent write.

Let's see some examples for this.

Q1. You have an application that writes 10 items where each item is 11 KB in size per second. What should be the write throughput set to?
A. Since item size is 11 KB and each write unit is of 1KB we need 11 write units per item. This is our X. We also know the application is writing 10 items per second to the DB which is our Y value. So the write throughout is X * Y = 11 * 10 = 110.


Let's see another example -

Q2. You have an application that writes 100 items where each item is 10 KB in size per second. What should be the write throughput set to?
A. Since item size is 10 KB and each write unit is of 1KB we need 10 write units per item. This is our X. We also know the application is writing 100 items per second to the DB which is our Y value. So the write throughout is X * Y = 10 * 100 = 1000.


NOTE: Each item is nothing but a row in DynamoDB.


Related Links



Thursday, 21 June 2018

How to clean and manage "Recent Places" in Mac OS X

Background

Whenever you save files in your Mac, OS X will save this folder in a “Recent Places” category which will be available the next time you save another file. This feature enables quick access to the folders you commonly use which saves a lot of time.


But sometimes you need to have a granular control over this. For example, you may have to clear recent places or limit the number of folders that are stored under this category. In this post, I am going to show you exactly the same thing.

How to clean and manage "Recent Places" in Mac OS X

Let's start with how we can limit the number of folders that are stored in the "Recent places" category. For me by default, it stores 3 folders. It may vary as per your OS version.
To increase the limit execute following command in your terminal -

Here NUMBER is the number of entries you want. Making it  Zero will disable the recent places list. 




To remove this limit you can execute the following command -



And finally, if you want to clear recent places list then you can execute the following command -



Tuesday, 12 June 2018

AWS service limits asked in "AWS Certified Solutions Architect - Associate" and "AWS Certified Developer - Associate" certifications

Background

I just cleared my "AWS Certified Developer - Associate" certification exam yesterday with 90%. I have already cleared "AWS Certified Solutions Architect - Associate" exam 6 months back with 89%. You can see my badges below-
While preparing I realized that there are some questions based on service limits in AWS. These can be straightforward questions or they can be slightly twisted. Either case knowing service limits help out a lot. So I am going to summarize most of them which I feel important from certification perspective.




NOTE: AWS service limits can change anytime. So it is best to refer the FAQ sections of corresponding services to confirm. Following limits are as of June 2018.

AWS service limits & constraints

Following are AWS services and their corresponding limits. There would be more limits and constraints to each service. I am simply trying to summarise based on my exam preparation, test quizzes, and actual exam experience. Please let me know in comments if these limits are changed and I can update accordingly. Thanks.

Consolidated billing


AWS S3

  • By default, customers can provision up to 100 buckets per AWS account. However, you can increase your Amazon S3 bucket limit by visiting AWS Service Limits.
  • The bucket name can be between 3 and 63 characters long and can contain only lower-case characters, numbers, periods, and dashes.
  • Bucket names must not be formatted as an IP address (for example, 192.168.5.4).
  • For more details refer - https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html
  • AWS S3 offers unlimited storage
  • Each object on S3, however, can be 0 bytes to 5TB.
  • The largest object that can be uploaded in a single PUT is 5GB
  • For objects larger than 100 megabytes, customers should consider using the Multipart Upload capability.
  • For further details refer - https://aws.amazon.com/s3/faqs/

Glacier

  • There is no maximum limit to the total amount of data that can be stored in Amazon Glacier. 
  • Individual archives are limited to a maximum size of 40 terabytes.
  • For more details refer - https://aws.amazon.com/glacier/faqs/

Redshift


AWS EC2

VPC

Route 53



Cloud watch

Cloud formation

Lambda

Dynamo DB

  • There is an initial limit of 256 tables per region. You can raise a request to increase this limit.
  • You can define a maximum of 5 local secondary indexes and 5 global secondary indexes per table(hard limit) - total 10 secondary indexes
  • The maximum size of item collection is 10GB
  • The minimum amount of reserved capacity that can be bought - 100
  • The maximum item size in DynamoDB is 400 KB, which includes both attribute name binary length (UTF-8 length) and attribute value lengths (again binary length). The attribute name counts towards the size limit. No limit on the number of items.
  • For more details refer - https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html
  • A BatchGetItem single operation can retrieve up to 16 MB of data, which can contain as many as 100 items
  • For more details refer - https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchGetItem.html
  • A single Scan operation will read up to the maximum number of items set (if using the Limit parameter) or a maximum of 1 MB of data and then apply any filtering to the results using FilterExpression.
  • For more details refer - https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Scan.html

SQS

  • You can create any number of message queues.
  • Max configuration: 14 days retention and 12 hours visibility timeout
  • Default configuration: 4 days retention  and 30 seconds visibility timeout
  • A single request can have up to 1 to 10 messages up to a maximum payload of 256KB.
  • Each 64 kb chunk payload is billed as 1 request. So a single API call with 256kb payload will be billed as 4 requests.
  • To configure the maximum message size, use the console or the SetQueueAttributes method to set the MaximumMessageSize attribute. This attribute specifies the limit on bytes that an Amazon SQS message can contain. Set this limit to a value between 1,024 bytes (1 KB), and 262,144 bytes (256 KB).
  • For more details refer - https://aws.amazon.com/sqs/faqs/

SNS

  • By default, SNS offers 10 million subscriptions per topic and 100,000 topics per account.  To request a higher limit, please contact Support.
  • Topic names are limited to 256 characters.
  • SNS subscription confirmation time period is 3 days

SWF



Again as mentioned before this is obviously not an exhaustive list but merely a summary of what I thought could be best to revise before going to the associate exams. Let me know if you think something else needs to be added here for the benefit of everyone.


Since you have taken time to go through the limits here is a bonus question for you :)

Question: You receive a call from a potential client who explains that one of the many services they offer is a website running on a t2.micro EC2 instance where users can submit requests for customized e-cards to be sent to their friends and family. The e-card website administrator was on a cruise and was shocked when he returned to the office in mid-January to find hundreds of angry emails complaining that customers' loved ones had not received their Christmas cards. He also had several emails from CloudWatch alerting him that the SQS queue for the e-card application had grown to over 500 messages on December 25th. You investigate and find that the problem was caused by a crashed EC2 instance which serves as an application server. What do you advise your client to do first? Choose the correct answer from the options below

Options:
  1. Use an autoscaling group to create as many application servers as needed to access all of the Christmas card SQS messages.
  2. Reboot the application server immediately so that it begins processing the Christmas cards SQS messages.
  3. Redeploy the application server as larger instance type so that it processed the  Christmas cards SQS faster.
  4. Send an apology to the customer notifying them that their cards will not be delivered.

Answer:
4. Send an apology to the customer notifying them that their cards will not be delivered.

Explanation:
Since 500 message count was as of December 25th and e-card website administrator returned mid-Jan the difference is more than 14 days which is the maximum retention period for SQS messages.

To be honest I had select option 1 in my 1st attempt :)


Related Links



t> UA-39527780-1 back to top