Which setup win meet the requirements?

seenagapeFebruary 12, 2016

You have recently joined a startup company building sensors to measure street noise and air quality in urban
areas.
The company has been running a pilot deployment of around 100 sensors for 3 months Each sensor uploads
1KB of sensor data every minute to a backend hosted on AWS.
During the pilot, you measured a peak or 10 IOPS on the database, and you stored an average of 3GB of sensor
data per month in the database
The current deployment consists of a load-balanced auto scaled Ingestion layer using EC2 instances and a
PostgreSQL RDS database with 500GB standard storage.
The pilot is considered a success and your CEO has managed to get the attention or some potential investors
The business plan requires a deployment of at least 1O0K sensors which needs to be supported by the backend

You also need to store sensor data for at least two years to be able to compare year over year Improvements.
To secure funding, you have to make sure that the platform meets these requirements and leaves room for
further scaling
Which setup win meet the requirements?

PrepAway - Latest Free Exam Questions & Answers

A.
Add an SOS queue to the ingestion layer to buffer writes to the RDS instance

B.
Ingest data into a DynamoDB table and move old data to a Redshift cluster

C.
Replace the RDS instance with a 6 node Redshift cluster with 96TB of storage

D.
Keep the current architecture but upgrade RDS storage to 3TB and 10K provisioned IOPS

23 Comments on “Which setup win meet the requirements?”

Vlad says:

June 9, 2016 at 3:32 pm

Does this sounds right?

0

0

fun4two says:

June 16, 2016 at 4:57 am

answer c

https://aws.amazon.com/redshift/faqs/

ideally you bring data to emr then to redshift assuming sensor data is unstructured or semi structured

0

0

JK says:

June 22, 2016 at 8:16 am

B is the best solution.

The POC solution is being scaled up by 1000, which means it will require 72TB of Storage to retain 24 months worth of data. This rules out RDS as a possible DB solution which leaves you with RedShift.

I believe DynamoDB is a more cost effective and scales better for ingest rather than using EC2 in an autoscaling group.

Also, this example solution from AWS is some what similar for reference.
http://media.amazonwebservices.com/architecturecenter/AWS_ac_ra_timeseriesprocessing_16.pdf

0

1

1. Tsao says:
  
  February 6, 2017 at 1:26 am
  
  I think the example you quote is not suitable for this question. The example is typical for mapreduce job, which uses dynamic db for some preparing task (etl, data laundry…)
  
  0
  
  0
  
Chef says:

July 14, 2016 at 8:24 pm

You also need to store sensor data for at least two years to be able to compare year over year Improvements.

Sounds like a RedShift function to me! Dynamo DB is better suited for large ingest!

B

0

0

Bones Cisco says:

August 16, 2016 at 1:06 pm

Why C is wrong.

A six node Redshift architecture cannot have 96TB storage, can it?
Even though compression is possible.
A single node is limited to 160GB of data.

0

0

1. Amit Pande says:
  
  December 1, 2017 at 10:03 am
  
  Wonderful point. C is completely ruled out leaving B as the answer. Thanks Bones Cisco
  
  0
  
  0
  
2. MTL says:
  
  January 20, 2018 at 3:03 am
  
  Yoy mean 16TB
  I’ll let you calculate 6×16
  
  0
  
  0
  
emontario says:

August 19, 2016 at 4:03 am

A single node is not limited to 160 GB.

Dense Storage (DS) nodes are available in two sizes, Extra Large and Eight Extra Large. The Extra Large (XL) has 3 HDDs with a total of 2TB of magnetic storage, whereas Eight Extra Large (8XL) has 24 HDDs with a total of 16TB of magnetic storage

https://aws.amazon.com/redshift/faqs/

2

0

balaji says:

August 22, 2016 at 7:43 am

You can easily create an Amazon Redshift data warehouse cluster by using the AWS Management Console or the Amazon Redshift APIs. You can start with a single node, 160GB data warehouse and scale all the way to a petabyte or more with a few clicks in the AWS Console or a single API call.

extract from
https://aws.amazon.com/redshift/faqs/

0

0

Ashley says:

October 13, 2016 at 11:42 am

B

0

0

donkeynuts says:

October 25, 2016 at 1:56 pm

I would go with B.

0

0

Manish says:

October 25, 2016 at 11:45 pm

Going with option B. it’s will be more costlier than option C. by using both DynamoDB and Redshift Cluster. Though DynamoDB is faster and on SSD drives but here we IOPS are very low and we don’t need high speed Database so just using Redshift would be enough!.

I would go with option C.

Redshift single node can support up to 16TB of Storage and we need 96TB for 24 months of data to be saved and with 6 node Redshift Cluster it makes 96TB of storage by using Eight Extra Large (8XL) has 24 HDDs with a total of 16TB of magnetic storage. So the option C. should be the right one.

C. Replace the RDS instance with a 6 node Redshift cluster with 96TB of storage

0

0

1. Sail says:
  
  November 19, 2016 at 4:42 pm
  
  It says “replace” RDS, how does the data is a data ware house solution. It is not kinesis to stream live data directly.
  
  I got with B.
  
  0
  
  0
  
  1. Sail says:
    
    November 19, 2016 at 4:43 pm
    
    It says “replace” RDS, Redshift is a data ware house solution. It is not kinesis to stream live data directly.
    
    I got with B.
    
    0
    
    0
    
vladam says:

November 7, 2016 at 11:25 am

The two challenges are to scale ingestion and storage.
Option B is the right answer as it allows scaling ingestion with DynamoDB and storage with Redshift. With these options you’ll be able to scale even past the 1000x POC size.

2

0

1. JJ says:
  
  August 3, 2017 at 2:36 pm
  
  but isn’t C more suitable ?
  can you explain why B ?
  
  0
  
  0
  
Paul says:

December 2, 2016 at 10:56 am

Dynamo is right but the write costs would be expensive ! You’d want SQS or Kinesis to buffer the writes …

Redshift isnt suited for lots of small writes, its ingestion is supposed to be from S3, Dynamo, EMR or Kinesis .. (using COPY not INSERT)

0

0

gpadukone says:

February 19, 2017 at 3:18 pm

C is the right answer. You cannot go with DynamoDB because the application is currently using a Postgre SQL which is an RDS. Replacing an RDS SQL with a noSQL DB, for the sake of scaling is not a sensible option.

Whereas Amazon Redshift allows you to run relational databases.

2

0

Moh says:

April 26, 2017 at 2:45 pm

Having data on two places will allow to compare year over year? so C seems like an answer

0

0

jaya says:

May 9, 2017 at 2:01 am

I think answer C is incorrect. It stores only 3GB of data a month [“you stored an average of 3GB of sensor data per month in the database”]So why do you need 96TB of storage, I don’t get it.

0

0

1. ask says:
  
  June 30, 2017 at 12:25 am
  
  you are moving 100 sensors to 100K sensors , C is correct answer.
  
  0
  
  0
  
2. ask says:
  
  June 30, 2017 at 12:27 am
  
  you are moving 100 sensors to 100K sensors , C is correct answer and why need to have dynamo DB then move to redshift?
  
  0
  
  0