How can you reduce the load on your on-premises database resources in the most cost-effective way?

seenagapeFebruary 12, 2016

A customer has a 10 GB AWS Direct Connect connection to an AWS region where they have a web application
hosted on Amazon Elastic Computer Cloud (EC2). The application has dependencies on an on-premises
mainframe database that uses a BASE (Basic Available. Sort stale Eventual consistency) rather than an ACID
(Atomicity. Consistency isolation. Durability) consistency model. The application is exhibiting undesirable
behavior because the database is not able to handle the volume of writes. How can you reduce the load on
your on-premises database resources in the most cost-effective way?

PrepAway - Latest Free Exam Questions & Answers

A.
Use an Amazon Elastic Map Reduce (EMR) S3DistCp as a synchronization mechanism between the onpremises database and a Hadoop cluster on AWS.

B.
Modify the application to write to an Amazon SQS queue and develop a worker process to flush the queue
to the on-premises database.

C.
Modify the application to use DynamoDB to feed an EMR cluster which uses a map function to write to the
on-premises database.

D.
Provision an RDS read-replica database on AWS to handle the writes and synchronize the two databases
using Data Pipeline.

Explanation:
https://aws.amazon.com/blogs/aws/category/amazon-elastic-map-reduce/

16 Comments on “How can you reduce the load on your on-premises database resources in the most cost-effective way?”

Venku says:

April 24, 2016 at 8:21 am

The answer would be B for this question. The option A doesn’t make any sense because we are relating a database and we know that we can’t have database on S3.

To support my answer, look at the benefits the SQS offers. SQS is pull based message mechanism which delivers at least once to the respective recipient and it can support any number of messages but the there is limit on the size of message it is 256KB. It will support high volume of writes. We can attach the SQS to the application from which we receive high number of write operations.

0

1

Tarun Biswas says:

May 2, 2016 at 3:32 pm

As SQS is the queuing mechanism which is organized the total write request and pass on to on-premises DB which is oppose to come all request together , however SQS will not change the volume of writes.

0

2

1. Naser says:
  
  February 15, 2017 at 4:49 am
  
  SQS will mange writes so not all writes happened simultaneously.
  
  0
  
  1
  
Kumar G says:

June 2, 2016 at 11:24 am

Using S3DistCp, you can efficiently copy a large amount of data from Amazon S3 into the HDFS datastore of your cluster.

So B will be suitable.

0

1

harry999 says:

June 15, 2016 at 4:51 am

b

1

1

fun4two says:

June 16, 2016 at 4:24 am

answer is b

https://aws.amazon.com/sqs/faqs/

0

1

Manu says:

July 8, 2016 at 5:48 pm

answer is B

0

1

swagata mondal says:

August 26, 2016 at 6:54 am

B

2

1

Ashley says:

October 13, 2016 at 11:33 am

B

1

1

shubham says:

October 14, 2016 at 3:45 am

yrrr option B take the help of worker , and in option A there is mechansim, which Is more cost effective ???

2

0

Dat says:

November 10, 2016 at 3:38 am

B should be the answer.
A & C utilize AWS elastic map reduce’s technologies which I could not find any relationship to the question’s requirement.
D utilizes “synchronize the two databases using Data Pipeline” but this way, customer need to store database at both side: on-premise DB, and AWS’ RDS DB, hence violate it’s prior requirement of “…mainframe database that uses a BASE…”

1

1

Ryan says:

January 2, 2017 at 12:58 am

B is the answer.

1

1

Wajahat says:

April 3, 2017 at 10:21 am

why not A?

http://docs.aws.amazon.com/emr/latest/ReleaseGuide/UsingEMR_s3distcp.html

Apache DistCp is an open-source tool you can use to copy large amounts of data. DistCp uses MapReduce to copy in a distributed manner—sharing the copy, error handling, recovery, and reporting tasks across several servers. For more information about the Apache DistCp open source project,

1

0

1. mutiger91 says:
  
  June 12, 2017 at 3:59 pm
  
  EMR is a solution for taking processing of a large data set by splitting up the wok and then merging the results. If EMR is sitting behind your web server it’s because it is delivering some sort of reporting or analytics. This question appears to be a transactional system that is having write IO issues. While data ingest is something that EMR does well, it is built to plow through large data sets and produce results. Maybe with some sets of data EMR could be used as part of the ingest process to structure the data in a manner that it can be more easily ingested by the target, but we are not guaranteed that it would even have all of the data needed to process the input data
  
  The bigger clue may be that it says we should use this as a synchronization mechanism. What synchronization capability exists between EMR and this unnamed legacy database?
  
  In this example, SQS allows the back end database to ingest at a pace that it can handle and still remain consistent. You still have to assume that it will have a time where it can eventually catch up.
  
  1
  
  0
  
Megatron says:

May 27, 2017 at 11:01 pm

A.
Use an Amazon Elastic Map Reduce (EMR) S3DistCp as a synchronization mechanism between the onpremises database and a Hadoop cluster on AWS.

–Not cost effective

C.
Modify the application to use DynamoDB to feed an EMR cluster which uses a map function to write to the
on-premises database.

— I think the DynamoDB here is just to distract and focus on BASE. It may be suitable but it complicates and there is additional cost.

D.
Provision an RDS read-replica database on AWS to handle the writes and synchronize the two databases
using Data Pipeline.

— RDS read-replica are for MySQL, MariaDB, and PostgreSQL. Not applicable here. Easily ruled out.

The correct answer should be :

B.
Modify the application to write to an Amazon SQS queue and develop a worker process to flush the queue
to the on-premises database.

i) Its BASE so we can use SQS and there is no hurry to write/read the data – Eventual consistency model.
ii) cost effective as that is the only item that is introduced here.

3

1

lyannabear says:

June 15, 2017 at 11:37 am

Answer is B

Most of the answers at the top are wrong. I’ve gone through the trouble of correcting all 400 of them for my own study purposes. If you would like a digital copy of this dump please send $40 to paypal.me/lyannabear

0

10