Associate-Data-Practitioner: Google Cloud Associate Data Practitioner

Your organization uses a BigQuery table that is partitioned by ingestion time. You need to remove data that is older than one year to reduce your organization's storage costs. You want to use the most efficient approach while minimizing cost. What should you do?

Create a scheduled query that periodically runs an update statement in SQL that sets the ''deleted' column to ''yes'' for data that is more than one year old. Create a view that filters out rows that have been markeddeleted.
Create a view that filters out rows that are older than one year.
Require users to specify a partition filter using the alter table statement in SQL.
Set the table partition expiration period to one year using the ALTER TABLE statement in SQL.

Correct answer: D

Explanation:

Setting the table partition expiration period to one year using the ALTER TABLE statement is the most efficient and cost-effective approach. This automatically deletes data in partitions older than one year, reducing storage costs without requiring manual intervention or additional queries. It minimizes administrative overhead and ensures compliance with your data retention policy while optimizing storage usage in BigQuery.

Your company is migrating their batch transformation pipelines to Google Cloud. You need to choose a solution that supports programmatic transformations using only SQL. You also want the technology to support Git integration for version control of your pipelines. What should you do?

Use Cloud Data Fusion pipelines.
Use Dataform workflows.
Use Dataflow pipelines.
Use Cloud Composer operators.

Correct answer: B

Explanation:

Dataform workflows are the ideal solution for migrating batch transformation pipelines to Google Cloud when you want to perform programmatic transformations using only SQL. Dataform allows you to define SQL-based workflows for data transformations and supports Git integration for version control, enabling collaboration and version tracking of your pipelines. This approach is purpose-built for SQL-driven data pipeline management and aligns perfectly with your requirements.

You manage a BigQuery table that is used for critical end-of-month reports. The table is updated weekly with new sales data. You want to prevent data loss and reporting issues if the table is accidentally deleted. What should you do?

Configure the time travel duration on the table to be exactly seven days. On deletion, re-create the deleted table solely from the time travel data.
Schedule the creation of a new snapshot of the table once a week. On deletion, re-create the deleted table using the snapshot and time travel data.
Create a clone of the table. On deletion, re-create the deleted table by copying the content of the clone.
Create a view of the table. On deletion, re-create the deleted table from the view and time travel data.

Correct answer: B

Explanation:

Scheduling the creation of a snapshot of the table weekly ensures that you have a point-in-time backup of the table. In case of accidental deletion, you can re-create the table from the snapshot. Additionally, BigQuery's time travel feature allows you to recover data from up to seven days prior to deletion. Combining snapshots with time travel provides a robust solution for preventing data loss and ensuring reporting continuity for critical tables.This approach minimizes risks while offering flexibility for recovery.

Scheduling the creation of a snapshot of the table weekly ensures that you have a point-in-time backup of the table. In case of accidental deletion, you can re-create the table from the snapshot. Additionally, BigQuery's time travel feature allows you to recover data from up to seven days prior to deletion. Combining snapshots with time travel provides a robust solution for preventing data loss and ensuring reporting continuity for critical tables.

This approach minimizes risks while offering flexibility for recovery.

You manage a large amount of data in Cloud Storage, including raw data, processed data, and backups. Your organization is subject to strict compliance regulations that mandate data immutability for specific data types. You want to use an efficient process to reduce storage costs while ensuring that your storage strategy meets retention requirements. What should you do?

Configure lifecycle management rules to transition objects to appropriate storage classes based on access patterns. Set up Object Versioning for all objects to meet immutability requirements.
Move objects to different storage classes based on their age and access patterns. Use Cloud Key Management Service (Cloud KMS) to encrypt specific objects with customer-managed encryption keys (CMEK) to meetimmutability requirements.
Create a Cloud Run function to periodically check object metadata, and move objects to the appropriate storage class based on age and access patterns. Use object holds to enforce immutability for specific objects.
Use object holds to enforce immutability for specific objects, and configure lifecycle management rules to transition objects to appropriate storage classes based on age and access patterns.

Correct answer: D

Explanation:

Using object holds and lifecycle management rules is the most efficient and compliant strategy for this scenario because:Immutability: Object holds (temporary or event-based) ensure that objects cannot be deleted or overwritten, meeting strict compliance regulations for data immutability.Cost efficiency: Lifecycle management rules automatically transition objects to more cost-effective storage classes based on their age and access patterns.Compliance and automation: This approach ensures compliance with retention requirements while reducing manual effort, leveraging built-in Cloud Storage features.

Using object holds and lifecycle management rules is the most efficient and compliant strategy for this scenario because:

Immutability: Object holds (temporary or event-based) ensure that objects cannot be deleted or overwritten, meeting strict compliance regulations for data immutability.

Cost efficiency: Lifecycle management rules automatically transition objects to more cost-effective storage classes based on their age and access patterns.

Compliance and automation: This approach ensures compliance with retention requirements while reducing manual effort, leveraging built-in Cloud Storage features.

You work for an ecommerce company that has a BigQuery dataset that contains customer purchase history, demographics, and website interactions. You need to build a machine learning (ML) model to predict which customers are most likely to make a purchase in the next month. You have limited engineering resources and need to minimize the ML expertise required for the solution. What should you do?

Use BigQuery ML to create a logistic regression model for purchase prediction.
Use Vertex AI Workbench to develop a custom model for purchase prediction.
Use Colab Enterprise to develop a custom model for purchase prediction.
Export the data to Cloud Storage, and use AutoML Tables to build a classification model for purchase prediction.

Correct answer: A

Explanation:

Using BigQuery ML is the best solution in this case because:Ease of use: BigQuery ML allows users to build machine learning models using SQL, which requires minimal ML expertise.Integrated platform: Since the data already exists in BigQuery, there's no need to move it to another service, saving time and engineering resources.Logistic regression: This is an appropriate model for binary classification tasks like predicting the likelihood of a customer making a purchase in the next month.

Using BigQuery ML is the best solution in this case because:

Ease of use: BigQuery ML allows users to build machine learning models using SQL, which requires minimal ML expertise.

Integrated platform: Since the data already exists in BigQuery, there's no need to move it to another service, saving time and engineering resources.

Logistic regression: This is an appropriate model for binary classification tasks like predicting the likelihood of a customer making a purchase in the next month.

You are designing a pipeline to process data files that arrive in Cloud Storage by 3:00 am each day. Data processing is performed in stages, where the output of one stage becomes the input of the next. Each stage takes a long time to run. Occasionally a stage fails, and you have to address the problem. You need to ensure that the final output is generated as quickly as possible. What should you do?

Design a Spark program that runs under Dataproc. Code the program to wait for user input when an error is detected. Rerun the last action after correcting any stage output data errors.
Design the pipeline as a set of PTransforms in Dataflow. Restart the pipeline after correcting any stage output data errors.
Design the workflow as a Cloud Workflow instance. Code the workflow to jump to a given stage based on an input parameter. Rerun the workflow after correcting any stage output data errors.
Design the processing as a directed acyclic graph (DAG) in Cloud Composer. Clear the state of the failed task after correcting any stage output data errors.

Correct answer: D

Explanation:

Using Cloud Composer to design the processing pipeline as a Directed Acyclic Graph (DAG) is the most suitable approach because:Fault tolerance: Cloud Composer (based on Apache Airflow) allows for handling failures at specific stages. You can clear the state of a failed task and rerun it without reprocessing the entire pipeline.Stage-based processing: DAGs are ideal for workflows with interdependent stages where the output of one stage serves as input to the next.Efficiency: This approach minimizes downtime and ensures that only failed stages are rerun, leading to faster final output generation.

Using Cloud Composer to design the processing pipeline as a Directed Acyclic Graph (DAG) is the most suitable approach because:

Fault tolerance: Cloud Composer (based on Apache Airflow) allows for handling failures at specific stages. You can clear the state of a failed task and rerun it without reprocessing the entire pipeline.

Stage-based processing: DAGs are ideal for workflows with interdependent stages where the output of one stage serves as input to the next.

Efficiency: This approach minimizes downtime and ensures that only failed stages are rerun, leading to faster final output generation.

Another team in your organization is requesting access to a BigQuery dataset. You need to share the dataset with the team while minimizing the risk of unauthorized copying of data. You also want to create a reusable framework in case you need to share this data with other teams in the future. What should you do?

Create authorized views in the team's Google Cloud project that is only accessible by the team.
Create a private exchange using Analytics Hub with data egress restriction, and grant access to the team members.
Enable domain restricted sharing on the project. Grant the team members the BigQuery Data Viewer IAM role on the dataset.
Export the dataset to a Cloud Storage bucket in the team's Google Cloud project that is only accessible by the team.

Correct answer: B

Explanation:

Using Analytics Hub to create a private exchange with data egress restrictions ensures controlled sharing of the dataset while minimizing the risk of unauthorized copying. This approach allows you to provide secure, managed access to the dataset without giving direct access to the raw data. The egress restriction ensures that data cannot be exported or copied outside the designated boundaries. Additionally, this solution provides a reusable framework that simplifies future data sharing with other teams or projects while maintaining strict data governance.

Your company has developed a website that allows users to upload and share video files. These files are most frequently accessed and shared when they are initially uploaded. Over time, the files are accessed and shared less frequently, although some old video files may remain very popular.

You need to design a storage system that is simple and cost-effective. What should you do?

Create a single-region bucket with Autoclass enabled.
Create a single-region bucket. Configure a Cloud Scheduler job that runs every 24 hours and changes the storage class based on upload date.
Create a single-region bucket with custom Object Lifecycle Management policies based on upload date.
Create a single-region bucket with Archive as the default storage class.

Correct answer: C

Explanation:

Creating a single-region bucket with custom Object Lifecycle Management policies based on upload date is the most appropriate solution. This approach allows you to automatically transition objects to less expensive storage classes as their access frequency decreases over time. For example, frequently accessed files can remain in the Standard storage class initially, then transition to Nearline, Coldline, or Archive storage as their popularity wanes.This strategy ensures a cost-effective and efficient storage system while maintaining simplicity by automating the lifecycle management of video files.

Creating a single-region bucket with custom Object Lifecycle Management policies based on upload date is the most appropriate solution. This approach allows you to automatically transition objects to less expensive storage classes as their access frequency decreases over time. For example, frequently accessed files can remain in the Standard storage class initially, then transition to Nearline, Coldline, or Archive storage as their popularity wanes.

This strategy ensures a cost-effective and efficient storage system while maintaining simplicity by automating the lifecycle management of video files.

You recently inherited a task for managing Dataflow streaming pipelines in your organization and noticed that proper access had not been provisioned to you. You need to request a Google-provided IAM role so you can restart the pipelines. You need to follow the principle of least privilege. What should you do?

Request the Dataflow Developer role.
Request the Dataflow Viewer role.
Request the Dataflow Worker role.
Request the Dataflow Admin role.

Correct answer: A

Explanation:

The Dataflow Developer role provides the necessary permissions to manage Dataflow streaming pipelines, including the ability to restart pipelines. This role adheres to the principle of least privilege, as it grants only the permissions required to manage and operate Dataflow jobs without unnecessary administrative access. Other roles, such as Dataflow Admin, would grant broader permissions, which are not needed in this scenario.

You need to create a new data pipeline. You want a serverless solution that meets the following requirements:

Data is streamed from Pub/Sub and is processed in real-time.
Data is transformed before being stored.
Data is stored in a location that will allow it to be analyzed with SQL using Looker.

Which Google Cloud services should you recommend for the pipeline?

1. Dataproc Serverless 2. Bigtable
1. Cloud Composer 2. Cloud SQL for MySQL
1. BigQuery 2. Analytics Hub
1. Dataflow 2. BigQuery

Correct answer: D

Explanation:

To build a serverless data pipeline that processes data in real-time from Pub/Sub, transforms it, and stores it for SQL-based analysis using Looker, the best solution is to use Dataflow and BigQuery. Dataflow is a fully managed service for real-time data processing and transformation, while BigQuery is a serverless data warehouse that supports SQL-based querying and integrates seamlessly with Looker for data analysis and visualization. This combination meets the requirements for real-time streaming, transformation, and efficient storage for analytical queries.

Your team wants to create a monthly report to analyze inventory data that is updated daily. You need to aggregate the inventory counts by using only the most recent month of data, and save the results to be used in a Looker Studio dashboard. What should you do?

Create a materialized view in BigQuery that uses the SUM( ) function and the DATE_SUB( ) function.
Create a saved query in the BigQuery console that uses the SUM( ) function and the DATE_SUB( ) function. Re-run the saved query every month, and save the results to a BigQuery table.
Create a BigQuery table that uses the SUM( ) function and the _PARTITIONDATE filter.
Create a BigQuery table that uses the SUM( ) function and the DATE_DIFF( ) function.

Correct answer: A

Explanation:

Creating a materialized view in BigQuery with the SUM() function and the DATE_SUB() function is the best approach. Materialized views allow you to pre-aggregate and cache query results, making them efficient for repeated access, such as monthly reporting. By using the DATE_SUB() function, you can filter the inventory data to include only the most recent month. This approach ensures that the aggregation is up-to-date with minimal latency and provides efficient integration with Looker Studio for dashboarding.

Vendor:	Google
Exam Code:	Associate-Data-Practitioner
Exam Name:	Google Cloud Associate Data Practitioner
Date:	Mar 14, 2025
File Size:	129 KB

Download Google Cloud Associate Data Practitioner.Associate-Data-Practitioner.VCEplus.2025-03-14.28q.vcex

How to open VCEX files?

Demo Questions

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Question 11

ProfExam at a 20% markdown