Free Practice Questions for Databricks Certified Data Analyst Associate Exam (Databricks-Certified-Data-Analyst-Associate)

QUESTION 6

How can a data analyst determine if query results were pulled from the cache?

A. Go to the Query History tab and click on the text of the quer
B. The slideout shows if the results came from the cache.
C. Go to the Alerts tab and check the Cache Status alert.
D. Go to the Queries tab and click on Cache Statu
E. The status will be green if the results from the last run came from the cache.
F. Go to the SQL Warehouse (formerly SQL Endpoints) tab and click on Cach
G. The Cache file will show the contents of the cache.
H. Go to the Data tab and click Last Quer
I. The details of the query will show if the results came from the cache.

Correct Answer: A
Databricks SQL uses a query cache to store the results of queries that have been executed previously. This improves the performance and efficiency of repeated queries. To determine if a query result was pulled from the cache, you can go to the Query History tab in the Databricks SQL UI and click on the text of the query. A slideout will appear on the right side of the screen, showing the query details, including the cache status. If the result came from the cache, the cache status will show ??Cached??. If the result did not come from the cache, the cache status will show ??Not cached??. You can also see the cache hit ratio, which is the percentage of queries that were served from the cache. References: The answer can be verified from Databricks SQL documentation which provides information on how to use the query cache and how to check the cache status. Reference link: Databricks SQL - Query Cache

QUESTION 7

A data engineering team has created a Structured Streaming pipeline that processes data in micro-batches and populates gold-level tables. The microbatches are triggered every minute.
A data analyst has created a dashboard based on this gold-level data. The project stakeholders want to see the results in the dashboard updated within one minute or less of new data becoming available within the gold-level tables.
Which of the following cautions should the data analyst share prior to setting up the dashboard to complete this task?

A. The required compute resources could be costly
B. The gold-level tables are not appropriately clean for business reporting
C. The streaming data is not an appropriate data source for a dashboard
D. The streaming cluster is not fault tolerant
E. The dashboard cannot be refreshed that quickly

Correct Answer: A
A Structured Streaming pipeline that processes data in micro-batches and populates gold-level tables every minute requires a high level of compute resources to handle the frequent data ingestion, processing, and writing. This could result in a significant cost for the organization, especially if the data volume and velocity are large. Therefore, the data analyst should share this caution with the project stakeholders before setting up the dashboard and evaluate the trade-offs between the desired refresh rate and the available budget. The other options are not valid cautions because:
✑ B. The gold-level tables are assumed to be appropriately clean for business reporting, as they are the final output of the data engineering pipeline. If the data quality is notsatisfactory, the issue should be addressed at the source or silver level, not at the gold level.
✑ C. The streaming data is an appropriate data source for a dashboard, as it can provide near real-time insights and analytics for the business users. Structured Streaming supports various sources and sinks for streaming data, including Delta Lake, which can enable both batch and streaming queries on the same data.
✑ D. The streaming cluster is fault tolerant, as Structured Streaming provides end-to-end exactly-once fault-tolerance guarantees through checkpointing and write- ahead logs. If a query fails, it can be restarted from the last checkpoint and resume processing.
✑ E. The dashboard can be refreshed within one minute or less of new data becoming available in the gold-level tables, as Structured Streaming can trigger micro-batches as fast as possible (every few seconds) and update the results incrementally. However, this may not be necessary or optimal for the business use case, as it could cause frequent changes in the dashboard and consume more resources. References: Streaming on Databricks, Monitoring Structured Streaming queries on Databricks, A look at the new Structured Streaming UI in Apache Spark 3.0, Run your first Structured Streaming workload

QUESTION 8

An analyst writes a query that contains a query parameter. They then add an area chart visualization to the query. While adding the area chart visualization to a dashboard, the analyst chooses "Dashboard Parameter" for the query parameter associated with the area chart.
Which of the following statements is true?

A. The area chart will use whatever is selected in the Dashboard Parameter while all or the other visualizations will remain changed regardless of their parameter use.
B. The area chart will use whatever is selected in the Dashboard Parameter along with all of the other visualizations in the dashboard that use the same parameter.
C. The area chart will use whatever value is chosen on the dashboard at the time the area chart is added to the dashboard.
D. The area chart will use whatever value is input by the analyst when the visualization is added to the dashboar
E. The parameter cannot be changed by the user afterwards.
F. The area chart will convert to a Dashboard Parameter.

Correct Answer: B
A Dashboard Parameter is a parameter that is configured for one or more visualizations within a dashboard and appears at the top of the dashboard. The parameter values specified for a Dashboard Parameter apply to all visualizations reusing that particular Dashboard Parameter1. Therefore, if the analyst chooses ??Dashboard Parameter?? for the query parameter associated with the area chart, the area chart will use whatever is selected in the Dashboard Parameter along with all of the other visualizations
in the dashboard that use the same parameter. This allows the user to filter the data across multiple visualizations using a single parameter widget2. References: Databricks SQL dashboards, Query parameters

QUESTION 9

Which of the following statements about a refresh schedule is incorrect?

A. A query can be refreshed anywhere from 1 minute lo 2 weeks
B. Refresh schedules can be configured in the Query Editor.
C. A query being refreshed on a schedule does not use a SQL Warehouse (formerly known as SQL Endpoint).
D. A refresh schedule is not the same as an alert.
E. You must have workspace administrator privileges to configure a refresh schedule

Correct Answer: C
Refresh schedules are used to rerun queries at specified intervals, and these queries typically require computational resources to execute. In the context of a cloud data service like Databricks, this would typically involve the use of a SQL Warehouse (or a SQL Endpoint, as they were formerly known) to provide the necessary computational resources. Therefore, the statement is incorrect because scheduled query refreshes would indeed use a SQL Warehouse/Endpoint to execute the query.

QUESTION 10

A business analyst has been asked to create a data entity/object called sales_by_employee. It should always stay up-to-date when new data are added to the sales table. The new entity should have the columns sales_person, which will be the name of the employee from the employees table, and sales, which will be all sales for that particular sales person. Both the sales table and the employees table have an employee_id column that is used to identify the sales person.
Which of the following code blocks will accomplish this task?
A)
Databricks-Certified-Data-Analyst-Associate dumps exhibit
B)

C)

D)

A. Option A
B. Option B
C. Option C
D. Option D

Correct Answer: D
The SQL code provided in Option D is the correct way to create a view named sales_by_employee that will always stay up-to-date with the sales and employees tables. The code uses the CREATE OR REPLACE VIEW statement to define a new view that joins the sales and employees tables on the employee_id column. It selects the employee_name as sales_person and all sales for each employee, ensuring that the data entity/object is always up-to-date when new data are added to these tables.
References: The answer can be verified from Databricks SQL documentation which provides insights on creating views using SQL queries, joining tables, and selecting specific columns to be included in the view. Reference link: Databricks SQL

Databricks-Certified-Data-Analyst-Associate Dumps

Databricks-Certified-Data-Analyst-Associate Free Practice Test

Databricks Databricks-Certified-Data-Analyst-Associate: Databricks Certified Data Analyst Associate Exam

Practice Test