Free Practice Questions for AWS Certified Data Analytics - Specialty (AWS-Certified-Data-Analytics-Specialty)

QUESTION 26

A company analyzes its data in an Amazon Redshift data warehouse, which currently has a cluster of three dense storage nodes. Due to a recent business acquisition, the company needs to load an additional 4 TB of user data into Amazon Redshift. The engineering team will combine all the user data and apply complex calculations that require I/O intensive resources. The company needs to adjust the cluster's capacity to support the change in analytical and storage requirements.
Which solution meets these requirements?

A. Resize the cluster using elastic resize with dense compute nodes.
B. Resize the cluster using classic resize with dense compute nodes.
C. Resize the cluster using elastic resize with dense storage nodes.
D. Resize the cluster using classic resize with dense storage nodes.

Correct Answer: C

QUESTION 27

A company stores Apache Parquet-formatted files in Amazon S3 The company uses an AWS Glue Data Catalog to store the table metadata and Amazon Athena to query and analyze the data The tables have a large number of partitions The queries are only run on small subsets of data in the table A data analyst adds new time partitions into the table as new data arrives The data analyst has been asked to reduce the query runtime
Which solution will provide the MOST reduction in the query runtime?

A. Convert the Parquet files to the csv file format..Then attempt to query the data again
B. Convert the Parquet files to the Apache ORC file forma
C. Then attempt to query the data again
D. Use partition projection to speed up the processing of the partitioned table
E. Add more partitions to be used over the tabl
F. Then filter over two partitions and put all columns in the WHERE clause

Correct Answer: C

QUESTION 28

A company needs to collect streaming data from several sources and store the data in the AWS Cloud. The dataset is heavily structured, but analysts need to perform several complex SQL queries and need consistent performance. Some of the data is queried more frequently than the rest. The company wants a solution that meets its performance requirements in a cost-effective manner.
Which solution meets these requirements?

A. Use Amazon Managed Streaming for Apache Kafka to ingest the data to save it to Amazon S3. Use Amazon Athena to perform SQL queries over the ingested data.
B. Use Amazon Managed Streaming for Apache Kafka to ingest the data to save it to Amazon Redshift.Enable Amazon Redshift workload management (WLM) to prioritize workloads.
C. Use Amazon Kinesis Data Firehose to ingest the data to save it to Amazon Redshif
D. Enable Amazon Redshift workload management (WLM) to prioritize workloads.
E. Use Amazon Kinesis Data Firehose to ingest the data to save it to Amazon S3. Load frequently queried data to Amazon Redshift using the COPY comman
F. Use Amazon Redshift Spectrum for less frequently queried data.

Correct Answer: B

QUESTION 29

A company wants to provide its data analysts with uninterrupted access to the data in its Amazon Redshift cluster. All data is streamed to an Amazon S3 bucket with Amazon Kinesis Data Firehose. An AWS Glue job that is scheduled to run every 5 minutes issues a COPY command to move the data into Amazon Redshift.
The amount of data delivered is uneven throughout the day, and cluster utilization is high during certain periods. The COPY command usually completes within a couple of seconds. However, when load spike occurs, locks can exist and data can be missed. Currently, the AWS Glue job is configured to run without retries, with timeout at 5 minutes and concurrency at 1.
How should a data analytics specialist configure the AWS Glue job to optimize fault tolerance and improve data availability in the Amazon Redshift cluster?

A. Increase the number of retrie
B. Decrease the timeout valu
C. Increase the job concurrency.
D. Keep the number of retries at 0. Decrease the timeout valu
E. Increase the job concurrency.
F. Keep the number of retries at 0. Decrease the timeout valu
G. Keep the job concurrency at 1.
H. Keep the number of retries at 0. Increase the timeout valu
I. Keep the job concurrency at 1.

Correct Answer: B

QUESTION 30

A bank is using Amazon Managed Streaming for Apache Kafka (Amazon MSK) to populate real-time data into a data lake The data lake is built on Amazon S3, and data must be accessible from the data lake within 24 hours Different microservices produce messages to different topics in the cluster The cluster is created with 8 TB of Amazon Elastic Block Store (Amazon EBS) storage and a retention period of 7 days
The customer transaction volume has tripled recently and disk monitoring has provided an alert that the cluster is almost out of storage capacity
What should a data analytics specialist do to prevent the cluster from running out of disk space1?

A. Use the Amazon MSK console to triple the broker storage and restart the cluster
B. Create an Amazon CloudWatch alarm that monitors the KafkaDataLogsDiskUsed metric Automaticallyflush the oldest messages when the value of this metric exceeds 85%
C. Create a custom Amazon MSK configuration Set the log retention hours parameter to 48 Update the cluster with the new configuration file
D. Triple the number of consumers to ensure that data is consumed as soon as it is added to a topic.

Correct Answer: B

AWS-Certified-Data-Analytics-Specialty Dumps

AWS-Certified-Data-Analytics-Specialty Free Practice Test

Amazon AWS-Certified-Data-Analytics-Specialty: AWS Certified Data Analytics - Specialty

Practice Test