Open Source For Geeks: June 2018

Saturday 23 June 2018

DynamoDB read and write provisioned throughput calculations

Background

Amazon DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit millisecond latency at any scale. It is a fully managed cloud database and supports both document and key-value store models. Its flexible data model, reliable performance, and automatic scaling of throughput capacity make it a great fit for mobile, web, gaming, ad tech, IoT, and many other applications.

In this post, I will show you how to calculate read and write provisioned throughput for Dynamo DB. This is a very common question asked in "AWS Certified Developer - Associate" exam. I will also show you some of the examples to fully understand the calculations.

AWS allows us to change the read and write capacity unity of DynamoDB which lets us scale the DB based on our requirements.

But the major question is how do you come up with this capacity unit values which is exactly what we are going to see below.

DynamoDB read and write provisioned throughput calculations

Before we head on to the calculation part let's try to understand some details about read and write throughputs in DynamoDB.

Read provisioned throughput :

All reads are rounded to increments of 4 KB
Eventual consistent reads (default) consist of 2 reads per second
Strongly consistent reads consist of 1 read per second

Write provisioned throughput:

All writes are rounded to increments of 1 KB
All writes consists of 1 write per second

Now let's see how we can compute read provisioned throughput -

DynamoDB read provisioned throughput calculation

Find the read units required per item. For this, you need to round the item size to the nearest chunk of 4KB and then divide by 4. For example, if you each item size if 6KB then the nearest 4KB chunk would be 8KB and the read units required would be 8 / 4 = 2. Another example, if your item size is 1KB then your nearest 4KB chunk is 4KB and the read units required are 4 /4 =1. Let's call this value X.
Now you need to calculate a number of items read per second. For example, if you are reading 120 items per minute then the number of items read per second is 120/60 = 2. Let's call this value Y.
Your read capacity unit for strongly consistent reads would be (X*Y).
If you are using eventual consistent reads then divide above number by 2 to get the read provisioned throughput i.e (X*Y)/2. This is because for eventual consistency case there are 2 reads per second. So to get read throughput you need to divide by 2.

Let's take some example to understand this better.

Q1. Let's say you have an application that requires to read 20 items of 2 KB per second using eventual consistent reads. What read throughout value should be set?

A. Our item size is 2KB per second. So let's first round it to nearest 4KB chunk which is nothing but 4KB. Now to get read units per item we divide by 4. So 4 /4 = 1. This is our X if you are following above method. Now the number of items read is 20 per second which is our Y. So X* Y = 1 * 20 = 20. Finally, we are saying reads are eventually consistent which means we need to divide above value further by 2. So the final read throughput is 20 / 2 = 10.

Let's see another example -

Q2. Let's say you have an application that requires to read 10 items of 10 KB per second using eventual consistent reads. What read throughout value should be set?

A. Our item size is 10KB per second. So let's first round it to nearest 4KB chunk which is nothing but 12KB. Now to get read units per item we divide by 4. So 12 /4 = 3. This is our X if you are following above method. Now the number of items read is 10 per second which is our Y. So X* Y = 3 * 10 = 30. Finally, we are saying reads are eventually consistent which means we need to divide above value further by 2. So the final read throughput is 30 / 2 = 15.

Now let's see an example with strong consistency.

Q3. Let's say you have an application that requires to read 5 items of 6 KB per second using strongly consistent reads. What read throughout value should be set?

A. Our item size is 6KB per second. So let's first round it to nearest 4KB chunk which is nothing but 8KB. Now to get read units per item we divide by 4. So 8 /4 = 2. This is our X if you are following above method. Now the number of items read is 5 per second which is our Y. So X* Y = 2 * 5 = 10. Finally, since the reads are strongly consistent you do not need to divide the result by 3. So the final read throughput is 10.

DynamoDB write provisioned throughput calculation

Find the write units required per item. Since each write unit is 1 KB you can directly use the actual size of an item in KB as the read unit per item. For example, if you each item size if 6KB then the write units required would be 6. Another example, if your item size is 12KB then your the write units required are 12. Let's call this value X.
Now you need to calculate a number of items read per second. For example, if you are reading 120 items per minute then the number of items read per second is 120/60 = 2. Let's call this value Y.
Your write capacity unit for strongly consistent reads would be (X*Y). There is no notion of strongly consistent or eventually consistent write.

Let's see some examples for this.

Q1. You have an application that writes 10 items where each item is 11 KB in size per second. What should be the write throughput set to?

A. Since item size is 11 KB and each write unit is of 1KB we need 11 write units per item. This is our X. We also know the application is writing 10 items per second to the DB which is our Y value. So the write throughout is X * Y = 11 * 10 = 110.

Let's see another example -

Q2. You have an application that writes 100 items where each item is 10 KB in size per second. What should be the write throughput set to?

A. Since item size is 10 KB and each write unit is of 1KB we need 10 write units per item. This is our X. We also know the application is writing 100 items per second to the DB which is our Y value. So the write throughout is X * Y = 10 * 100 = 1000.

NOTE: Each item is nothing but a row in DynamoDB.

Thursday 21 June 2018

How to clean and manage "Recent Places" in Mac OS X

Background

Whenever you save files in your Mac, OS X will save this folder in a “Recent Places” category which will be available the next time you save another file. This feature enables quick access to the folders you commonly use which saves a lot of time.

But sometimes you need to have a granular control over this. For example, you may have to clear recent places or limit the number of folders that are stored under this category. In this post, I am going to show you exactly the same thing.

How to clean and manage "Recent Places" in Mac OS X

Let's start with how we can limit the number of folders that are stored in the "Recent places" category. For me by default, it stores 3 folders. It may vary as per your OS version.

To increase the limit execute following command in your terminal -

defaults write -g NSNavRecentPlacesLimit -int NUMBER

Here NUMBER is the number of entries you want. Making it Zero will disable the recent places list.

To remove this limit you can execute the following command -

defaults delete -g NSNavRecentPlacesLimit

And finally, if you want to clear recent places list then you can execute the following command -

defaults delete -g NSNavRecentPlaces

Tuesday 12 June 2018

AWS service limits asked in "AWS Certified Solutions Architect - Associate" and "AWS Certified Developer - Associate" certifications

Background

I just cleared my "AWS Certified Developer - Associate" certification exam yesterday with 90%. I have already cleared "AWS Certified Solutions Architect - Associate" exam 6 months back with 89%. You can see my badges below-

While preparing I realized that there are some questions based on service limits in AWS. These can be straightforward questions or they can be slightly twisted. Either case knowing service limits help out a lot. So I am going to summarize most of them which I feel important from certification perspective.

NOTE: AWS service limits can change anytime. So it is best to refer the FAQ sections of corresponding services to confirm. Following limits are as of June 2018.

AWS service limits & constraints

Following are AWS services and their corresponding limits. There would be more limits and constraints to each service. I am simply trying to summarise based on my exam preparation, test quizzes, and actual exam experience. Please let me know in comments if these limits are changed and I can update accordingly. Thanks.

Consolidated billing

There is a soft limit of 20 accounts per organization and a hard limit of one level of billing hierarchy.
For more detials refer - https://aws.amazon.com/answers/account-management/aws-multi-account-billing-strategy/

AWS S3

By default, customers can provision up to 100 buckets per AWS account. However, you can increase your Amazon S3 bucket limit by visiting AWS Service Limits.
The bucket name can be between 3 and 63 characters long and can contain only lower-case characters, numbers, periods, and dashes.
Bucket names must not be formatted as an IP address (for example, 192.168.5.4).
For more details refer - https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html
AWS S3 offers unlimited storage
Each object on S3, however, can be 0 bytes to 5TB.
The largest object that can be uploaded in a single PUT is 5GB
For objects larger than 100 megabytes, customers should consider using the Multipart Upload capability.
For further details refer - https://aws.amazon.com/s3/faqs/

Glacier

There is no maximum limit to the total amount of data that can be stored in Amazon Glacier.
Individual archives are limited to a maximum size of 40 terabytes.
For more details refer - https://aws.amazon.com/glacier/faqs/

Redshift

Block size for columnar storage is 1024 kb or 1 MB
For more details refer - https://docs.aws.amazon.com/redshift/latest/dg/c_columnar_storage_disk_mem_mgmnt.html

AWS EC2

There is a limit of 20 EC2 instances per region. However, this may vary from region to region. Use the EC2 Service Limits page in the Amazon EC2 console to view the current limits for resources provided by Amazon EC2 on a per-region basis. This limit can be increased on request.
For more details refer - https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-resource-limits.html
Size limit for a root device for Amazon EBS-Backed AMI is 16 TiB
Size limit for a root device for Amazon Instance Store-Backed AMI is 10 GiB
For more details refer - https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ComponentsAMIs.html
When you enable connection draining, you can specify a maximum time for the load balancer to keep connections alive before reporting the instance as de-registered. The maximum timeout value can be set between 1 and 3,600 seconds (the default is 300 seconds).
For more details refer - https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/config-conn-drain.html

VPC

You can have a maximum of 5 VPCs per region.
You can have a maximum of 200 subnets per VPC
Only one internet gateway can be attached to a VPC at a time.
Only one virtual private gateway can be attached to a VPC at a time.
One subnet always corresponds to 1 AZ.
For more details refer - https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Appendix_Limits.html

Route 53

There are 50 domain names available by default, however, it is a soft limit and can be raised by contacting AWS support
For more details refer - https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/DNSLimitations.html

Cloud watch

Standard/Basic monitoring - 5 mins
Detailed monitoring - 1 mins
For more details refer - https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-cloudwatch-new.html

Cloud formation

Maximum number of AWS CloudFormation stacks that you can create - 200 stacks
Maximum number of parameters that you can declare in your AWS CloudFormation template - 60
Maximum number of outputs that you can declare in your AWS CloudFormation template - 60
For more details refer - https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cloudformation-limits.html

Lambda

Maximum configuration: 3GB memory and 5 mins timeout
Default configuration: 128 MB memory and 3 seconds timeout
512 MB of temp space i.e /tmp
For more details refer - https://docs.aws.amazon.com/lambda/latest/dg/limits.html

Dynamo DB

There is an initial limit of 256 tables per region. You can raise a request to increase this limit.
You can define a maximum of 5 local secondary indexes and 5 global secondary indexes per table(hard limit) - total 10 secondary indexes
The maximum size of item collection is 10GB
The minimum amount of reserved capacity that can be bought - 100
The maximum item size in DynamoDB is 400 KB, which includes both attribute name binary length (UTF-8 length) and attribute value lengths (again binary length). The attribute name counts towards the size limit. No limit on the number of items.
For more details refer - https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html
A BatchGetItem single operation can retrieve up to 16 MB of data, which can contain as many as 100 items
For more details refer - https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchGetItem.html
A single Scan operation will read up to the maximum number of items set (if using the Limit parameter) or a maximum of 1 MB of data and then apply any filtering to the results using FilterExpression.
For more details refer - https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Scan.html

SQS

You can create any number of message queues.
Max configuration: 14 days retention and 12 hours visibility timeout
Default configuration: 4 days retention and 30 seconds visibility timeout
A single request can have up to 1 to 10 messages up to a maximum payload of 256KB.
Each 64 kb chunk payload is billed as 1 request. So a single API call with 256kb payload will be billed as 4 requests.
To configure the maximum message size, use the console or the SetQueueAttributes method to set the MaximumMessageSize attribute. This attribute specifies the limit on bytes that an Amazon SQS message can contain. Set this limit to a value between 1,024 bytes (1 KB), and 262,144 bytes (256 KB).
For more details refer - https://aws.amazon.com/sqs/faqs/

SNS

By default, SNS offers 10 million subscriptions per topic and 100,000 topics per account. To request a higher limit, please contact Support.
Topic names are limited to 256 characters.
SNS subscription confirmation time period is 3 days

SWF

Maximum registered domains – 100
Maximum workflow and activity types – 10,000 each per domain
Maximum open activity tasks – 1,000 per workflow execution
Year of retention for workflow execution
For more details refer - https://docs.aws.amazon.com/amazonswf/latest/developerguide/swf-dg-limits.html

Again as mentioned before this is obviously not an exhaustive list but merely a summary of what I thought could be best to revise before going to the associate exams. Let me know if you think something else needs to be added here for the benefit of everyone.

Since you have taken time to go through the limits here is a bonus question for you :)

Question: You receive a call from a potential client who explains that one of the many services they offer is a website running on a t2.micro EC2 instance where users can submit requests for customized e-cards to be sent to their friends and family. The e-card website administrator was on a cruise and was shocked when he returned to the office in mid-January to find hundreds of angry emails complaining that customers' loved ones had not received their Christmas cards. He also had several emails from CloudWatch alerting him that the SQS queue for the e-card application had grown to over 500 messages on December 25th. You investigate and find that the problem was caused by a crashed EC2 instance which serves as an application server. What do you advise your client to do first? Choose the correct answer from the options below

Options:

Use an autoscaling group to create as many application servers as needed to access all of the Christmas card SQS messages.
Reboot the application server immediately so that it begins processing the Christmas cards SQS messages.
Redeploy the application server as larger instance type so that it processed the Christmas cards SQS faster.
Send an apology to the customer notifying them that their cards will not be delivered.

Answer:

4. Send an apology to the customer notifying them that their cards will not be delivered.

Explanation:

Since 500 message count was as of December 25th and e-card website administrator returned mid-Jan the difference is more than 14 days which is the maximum retention period for SQS messages.

To be honest I had select option 1 in my 1st attempt :)

Saturday 23 June 2018

DynamoDB read and write provisioned throughput calculations

Background

DynamoDB read and write provisioned throughput calculations

DynamoDB read provisioned throughput calculation

DynamoDB write provisioned throughput calculation

Related Links

Thursday 21 June 2018

How to clean and manage "Recent Places" in Mac OS X

Background

How to clean and manage "Recent Places" in Mac OS X

Related Links

Tuesday 12 June 2018

AWS service limits asked in "AWS Certified Solutions Architect - Associate" and "AWS Certified Developer - Associate" certifications

Background

AWS service limits & constraints

Consolidated billing

AWS S3

Glacier

Redshift

AWS EC2

VPC

Route 53

Cloud watch

Cloud formation

Lambda

Dynamo DB

SQS

SNS

SWF

Related Links