You plan to create an Azure Data Factory pipeline that will include a mapping data flow.
You have JSON data containing objects that have nested arrays.
You need to transform the JSON-formatted data into a tabular dataset. The dataset must have one row for each item in the arrays.
Which transformation method should you use in the mapping data flow?
new branch
unpivot
alter row
flatten
Answer is flatten
Use the flatten transformation to take array values inside hierarchical structures such as JSON and unroll them into individual rows. This process is known as denormalization.
You have an Azure Data Factory pipeline that is triggered hourly.
The pipeline has had 100% success for the past seven days.
The pipeline execution fails, and two retries that occur 15 minutes apart also fail. The third failure returns the following error.
ErrorCode=UserErrorFileNotFound,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=ADLS Gen2 operation failed for: Operation returned an invalid status code 'NotFound'.
Account: 'contosoproduksouth'. Filesystem: wwi.
Path: 'BIKES/CARBON/year=2021/month=01/day=10/hour=06'.
ErrorCode: 'PathNotFound'.
Message: 'The specified path does not exist.'.
RequestId: '6d269b78-901f-001b-4924-e7a7bc000000'.
TimeStamp: 'Sun, 10 Jan 2021 07:45:05
What is a possible cause of the error?
The parameter used to generate year=2021/month=01/day=10/hour=06 was incorrect.
From 06:00 to 07:00 on January 10, 2021, there was no data in wwi/BIKES/CARBON.
From 06:00 to 07:00 on January 10, 2021, the file format of data in wwi/BIKES/CARBON was incorrect.
The pipeline was triggered too early.
Answer is From 06:00 to 07:00 on January 10, 2021, there was no data in wwi/BIKES/CARBON.
The error message says a missing file, which matches with answer B: missing data from 06:00. The process had re-tried three times, 15 mins apart, which explains that the error was generated 07:45.
Question 83
You have an Azure data factory.
You need to examine the pipeline failures from the last 180 days.
What should you use?
the Activity log blade for the Data Factory resource
Pipeline runs in the Azure Data Factory user experience
the Resource health blade for the Data Factory resource
Azure Data Factory activity runs in Azure Monitor
Answer is Azure Data Factory activity runs in Azure Monitor
Data Factory stores pipeline-run data for only 45 days. Use Azure Monitor if you want to keep that data for a longer time.
Which of the following terms refer to the scale of compute that is being used in an Azure SQL Synapse Analytics server?
RTU
DWU
DTU
Answer is DWU
DWU refers to Data Warehouse Units. It is the measure of compute scale that is assigned to an Azure SL Data Warehouse. RTU is a compute scale unit of Cosmos DB. DTU is a compute scale unit of Azure SQL Database.
Question 85
You have an Azure Synapse Analytics database, within this, you have a dimension table named Stores that contains store information. There is a total of 263 stores nationwide. Store information is retrieved in more than half of the queries that are issued against this database. These queries include staff information per store, sales information per store and finance information. You want to improve the query performance of these queries by configuring the table geometry of the stores table. Which is the appropriate table geometry to select for the stores table?
Round Robin
Non Clustered
Replicated table
Answer is Replicated table
A replicated table is an appropriate table geometry choice as the size of the data in the table is less than 200m and the table will be replicated to every distribution node of an Azure Synapse Analytics to improve the performance. A Round Robin distribution is a table geometry that is useful to perform initial data loads. Non Clustered is not a valid table geometry in Azure Synapse Analytics.
Question 86
What is the default port for connecting to an enterprise data warehouse in Azure Synapse Analytics?
TCP port 1344
UDP port 1433
TCP port 1433
Answer is TCP port 1433
The default port for connecting to an Azure Synapse Analytics is TCP port 1433.
Question 87
You are moving data from an Azure Data Lake Gen2 store to Azure Synapse Analytics. Which Azure Data Factory integration runtime would be used in a data copy activity?
Azure - SSIS
Azure IR
Self-hosted
Pipelines
Answer is Azure IR
When moving data between Azure data platform technologies, the Azure Integration runtime is used when copying data between two Azure data platforms. Azure-SSIS IR is used when you lift and shift existing SSIS workload, while Self-hosted IR is used when working with data movement from private networks to the cloud and vica versa
Question 88
How long is the Recovery Point Objective for Azure Synapse Analytics?
4 hours
8 hours
12 hours
16 hours
Answer is 8 hours.
Azure Synapse Analytics has a Recovery Point Objective of 8 hours.
Question 89
You have an enterprise data warehouse in Azure Synapse Analytics named DW1 on a server named Server1.
You need to verify whether the size of the transaction log file for each distribution of DW1 is smaller than 160 GB.
What should you do?
On the master database, execute a query against the sys.dm_pdw_nodes_os_performance_counters dynamic management view.
From Azure Monitor in the Azure portal, execute a query against the logs of DW1.
On DW1, execute a query against the sys.database_files dynamic management view.
Execute a query against the logs of DW1 by using the Get-AzOperationalInsightSearchResult PowerShell cmdlet.
Answer is On the master database, execute a query against the sys.dm_pdw_nodes_os_performance_counters dynamic management view.
The following query returns the transaction log size on each distribution. If one of the log files is reaching 160 GB, you should consider scaling up your instance or limiting your transaction size.
-- Transaction log size
SELECT
instance_name as distribution_db, cntr_value*1.0/1048576 as log_file_size_used_GB, pdw_node_id
FROM sys.dm_pdw_nodes_os_performance_counters
WHERE
instance_name like 'Distribution_%'
AND counter_name = 'Log File(s) Used Size (KB)'
You have an enterprise data warehouse in Azure Synapse Analytics.
You need to monitor the data warehouse to identify whether you must scale up to a higher service level to accommodate the current workloads.
Which is the best metric to monitor?
More than one answer choice may achieve the goal. Select the BEST answer.
CPU percentage
DWU used
DWU percentage
Data IO percentage
Answer is DWU used
DWU used, defined as DWU limit * DWU percentage, represents only a high-level representation of usage across the SQL pool and is not meant to be a comprehensive indicator of utilization. To determine whether to scale up or down, consider all factors which can be impacted by DWU such as concurrency, memory, tempdb, and adaptive cache capacity. We recommend running your workload at different DWU settings to determine what works best to meet your business objectives.