Professional Data Engineer on Google Cloud Platform Certification Dump Questions Answers Examples

Professional Data Engineer on Google Cloud Platform

17%

Question 41

You are planning to migrate your current on-premises Apache Hadoop deployment to the cloud. You need to ensure that the deployment is as fault-tolerant and cost-effective as possible for long-running batch jobs. You want to use a managed service.

What should you do?

Deploy a Cloud Dataproc cluster. Use a standard persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://

Deploy a Cloud Dataproc cluster. Use an SSD persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://

Install Hadoop and Spark on a 10-node Compute Engine instance group with standard instances. Install the Cloud Storage connector, and store the data in Cloud Storage. Change references in scripts from hdfs:// to gs://

Install Hadoop and Spark on a 10-node Compute Engine instance group with preemptible instances. Store data in HDFS. Change references in scripts from hdfs:// to gs://

Answer is Deploy a Cloud Dataproc cluster. Use a standard persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://

Cloud Dataproc for Managed Cloud native application and HDD for cost-effective solution.

Question 42

You need to choose a database for a new project that has the following requirements:
- Fully managed
- Able to automatically scale up
- Transactionally consistent
- Able to scale up to 6 TB
- Able to be queried using SQL

Which database do you choose?

Cloud SQL

Cloud Bigtable

Cloud Spanner

Cloud Datastore

Answer is Cloud SQL

It asks for scaling up which can be done in cloud sql, horizontal scaling is not possible in cloud sql

Automatic storage increase
If you enable this setting, Cloud SQL checks your available storage every 30 seconds. If the available storage falls below a threshold size, Cloud SQL automatically adds additional storage capacity. If the available storage repeatedly falls below the threshold size, Cloud SQL continues to add storage until it reaches the maximum of 30 TB.

Question 43

What are two of the benefits of using denormalized data structures in BigQuery?

Reduces the amount of data processed, reduces the amount of storage required

Increases query speed, makes queries simpler

Reduces the amount of storage required, increases query speed

Reduces the amount of data processed, increases query speed

Answer is Increases query speed, makes queries simpler

Cannot be A or C because:
"Denormalized schemas aren't storage-optimal, but BigQuery's low cost of storage addresses concerns about storage inefficiency."

Cannot be D because the amount of data processed is the same.

As for why is it "simpler", I don't see it directly stated but it is hinted at: "Expressing records by using nested and repeated fields simplifies data load using JSON or Avro files." and "Expressing records using nested and repeated structures can provide a more natural representation of the underlying data."

Reference:
https://cloud.google.com/solutions/bigquery-data-warehouse

Question 44

Which of the following are examples of hyperparameters? (Select 2 answers.)

Number of hidden layers

Number of nodes in each hidden layer

Biases

Weights

Answers are;
A. Number of hidden layers
B. Number of nodes in each hidden layer

Hyperparamters are configuration variables and cannot change

Reference:
https://cloud.google.com/ai-platform/training/docs/hyperparameter-tuning-overview

Question 45

Which of the following are feature engineering techniques? (Select 2 answers)

Hidden feature layers

Feature prioritization

Crossed feature columns

Bucketization of a continuous feature

Answer are;
C. Crossed feature columns
D. Bucketization of a continuous feature

Selecting and crafting the right set of feature columns is key to learning an effective model. Bucketization is a process of dividing the entire range of a continuous feature into a set of consecutive bins/buckets, and then converting the original numerical feature into a bucket ID (as a categorical feature) depending on which bucket that value falls into.

Using each base feature column separately may not be enough to explain the data. To learn the differences between different feature combinations, we can add crossed feature columns to the model.

Reference:
https://cloud.google.com/solutions/machine-learning/ml-on-structured-data-model-2

Question 46

You want to use a BigQuery table as a data sink. In which writing mode(s) can you use BigQuery as a sink?

Both batch and streaming

BigQuery cannot be used as a sink

Only batch

Only streaming

Answer is Both batch and streaming

When you apply a BigQueryIO.Write transform in batch mode to write to a single table, Dataflow invokes a BigQuery load job. When you apply a BigQueryIO.Write transform in streaming mode or in batch mode using a function to specify the destination table, Dataflow uses BigQuery's streaming inserts.

Reference:
https://cloud.google.com/dataflow/model/bigquery-io

Question 47

You have a job that you want to cancel. It is a streaming pipeline, and you want to ensure that any data that is in-flight is processed and written to the output.
Which of the following commands can you use on the Dataflow monitoring console to stop the pipeline job?

Cancel

Drain

Stop

Finish

Answer is Drain

Using the Drain option to stop your job tells the Dataflow service to finish your job in its current state. Your job will immediately stop ingesting new data from input sources, but the Dataflow service will preserve any existing resources (such as worker instances) to finish processing and writing any buffered data in your pipeline.

Reference:
https://cloud.google.com/dataflow/pipelines/stopping-a-pipeline

Question 48

Which of the following statements is NOT true regarding Bigtable access roles?

Using IAM roles, you cannot give a user access to only one table in a project, rather than all tables in a project.

To give a user access to only one table in a project, grant the user the Bigtable Editor role for that table.

You can configure access control only at the project level.

To give a user access to only one table in a project, you must configure access through your application.

Answer is To give a user access to only one table in a project, grant the user the Bigtable Editor role for that table.

For Cloud Bigtable, you can configure access control at the project level. For example, you can grant the ability to:
Read from, but not write to, any table within the project.
Read from and write to any table within the project, but not manage instances.
Read from and write to any table within the project, and manage instances.

Reference:
https://cloud.google.com/bigtable/docs/access-control

Question 49

What is the general recommendation when designing your row keys for a Cloud Bigtable schema?

Include multiple time series values within the row key

Keep the row keep as an 8 bit integer

Keep your row key reasonably short

Keep your row key as long as the field permits

Answer is

A general guide is to, keep your row keys reasonably short. Long row keys take up additional memory and storage and increase the time it takes to get responses from the Cloud Bigtable server.

Reference:
https://cloud.google.com/bigtable/docs/schema-design#row-keys

Question 50

All Google Cloud Bigtable client requests go through a front-end server ______ they are sent to a Cloud Bigtable node.

before

after

only if

once

Answer is before

In a Cloud Bigtable architecture all client requests go through a front-end server before they are sent to a Cloud Bigtable node.
The nodes are organized into a Cloud Bigtable cluster, which belongs to a Cloud Bigtable instance, which is a container for the cluster. Each node in the cluster handles a subset of the requests to the cluster.
When additional nodes are added to a cluster, you can increase the number of simultaneous requests that the cluster can handle, as well as the maximum throughput for the entire cluster.

Reference:
https://cloud.google.com/bigtable/docs/overview

< Previous Page Next Page >

Professional Data Engineer on Google Cloud Platform

278 QUESTIONS AS TOTAL

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Click here for the answer

Quick access to all questions in this exam