The YARN ResourceManager and the HDFS NameNode interfaces are available on a Cloud Dataproc cluster ____.
application node
conditional node
master node
worker node
Answer is master node
The YARN ResourceManager and the HDFS NameNode interfaces are available on a Cloud Dataproc cluster master node. The cluster master-host-name is the name of your Cloud Dataproc cluster followed by an -m suffixfor example, if your cluster is named "my-cluster", the master-host-name would be "my-cluster-m".
Cloud Dataproc charges you only for what you really use with _____ billing.
month-by-month
minute-by-minute
week-by-week
hour-by-hour
second-by-second
Answer is second-by-second
Although the pricing formula is expressed as an hourly rate, Dataproc is billed by the second, and all Dataproc clusters are billed in one-second clock-time increments, subject to a 1-minute minimum billing. Usage is stated in fractional hours (for example, 30 minutes is expressed as 0.5 hours) in order to apply hourly pricing to second-by-second use.
Scaling a Cloud Dataproc cluster typically involves ____.
increasing or decreasing the number of worker nodes
increasing or decreasing the number of master nodes
moving memory to run more applications on a single node
deleting applications from unused nodes periodically
Answer is increasing or decreasing the number of worker nodes
After creating a Cloud Dataproc cluster, you can scale the cluster by increasing or decreasing the number of worker nodes in the cluster at any time, even when jobs are running on the cluster. Cloud Dataproc clusters are typically scaled to: 1) increase the number of workers to make a job run faster 2) decrease the number of workers to save money 3) increase the number of nodes to expand available Hadoop Distributed Filesystem (HDFS) storage
Dataproc clusters contain many configuration files. To update these files, you will need to use the --properties option. The format for the option is: file_prefix:property=_____.
details
value
null
id
Answer is value
To make updating files and properties easy, the --properties command uses a special format to specify the configuration file and the property and value within the file that should be updated. The formatting is as follows: file_prefix:property=value.
A Cloud Dataproc Viewer is limited in its actions based on its role. A viewer can only list clusters, get cluster details, list jobs, get job details, list operations, and get operation details.
Cloud Dataproc is a managed Apache Hadoop and Apache _____ service.
Blaze
Spark
Fire
Ignite
Answer is Spark
Cloud Dataproc is a managed Apache Spark and Apache Hadoop service that lets you use open source data tools for batch processing, querying, streaming, and machine learning.
When using Cloud Dataproc clusters, you can access the YARN web interface by configuring a browser to connect through a ____ proxy.
HTTPS
VPN
SOCKS
HTTP
Answer is SOCKS
When using Cloud Dataproc clusters, configure your browser to use the SOCKS proxy. The SOCKS proxy routes data intended for the Cloud Dataproc cluster through an SSH tunnel.
Which of these rules apply when you add preemptible workers to a Dataproc cluster (select 2 answers)?
Preemptible workers cannot use persistent disk.
Preemptible workers cannot store data.
If a preemptible worker is reclaimed, then a replacement worker must be added manually.
A Dataproc cluster cannot have only preemptible workers.
Answers are; Preemptible workers cannot store data.
A Dataproc cluster cannot have only preemptible workers.
The following rules will apply when you use preemptible workers with a Cloud Dataproc cluster: . Processing onlySince preemptibles can be reclaimed at any time, preemptible workers do not store data. Preemptibles added to a Cloud Dataproc cluster only function as processing nodes. . No preemptible-only clustersTo ensure clusters do not lose all workers, Cloud Dataproc cannot create preemptible-only clusters. . Persistent disk sizeAs a default, all preemptible workers are created with the smaller of 100GB or the primary worker boot disk size. This disk space is used for local caching of data and is not available through HDFS. The managed group automatically re-adds workers lost due to reclamation as capacity permits.
When creating a new Cloud Dataproc cluster with the projects.regions.clusters.create operation, these four values are required: project, region, name, and ____.
zone
node
label
type
Answer is zone
At a minimum, you must specify four values when creating a new cluster with the projects.regions.clusters.create operation: The project in which the cluster will be created
The region to use -
The name of the cluster - The zone in which the cluster will be created You can specify many more details beyond these minimum requirements. For example, you can also specify the number of workers, whether preemptible compute should be used, and the network settings.