Close

Free DP-100 Exam Practice Questions

  • Exam Code: DP-100
    Exam Title: Azure Data Scientist Associate
  • Exam Provider: Microsoft
  • Total Exam Questions: 22
  • Last Updated On: 27 May 2024
Exercise : Exam DP 100 Azure Data Scientist Associate MCQ Questions and Answers

Question 1

You use the Azure Machine learning SDK foe Python to create a pipeline that includes the following step:
The output of the step run must be cached and reused on subsequent runs when the source.directory value has not changed.
You need to define the step.
What should you include in the step definition?

A.  
B.  
C.  
D.  

Correct Answer : B. allow.reuse

Description :
No answer description available for this question. Let us discuss.

Question 2

You register a file dataset named csv_folder that references a folder. The folder includes multiple comma-separated values (CSV) files in an Azure storage blob container.
You plan to use the following code to run a script that loads data from the file dataset. You create and instantiate the following variables:
You have the following code:
You need to pass the dataset to ensure that the script can read the files it references.
Which code segment should you insert to replace the code comment?

A.  
B.  
C.  
D.  

Correct Answer : C. inputs=[file_dataset.as_named_input('training_files').as_mount()],

Description :
Example:

from azureml.train.estimator import Estimator
script_params = {

# to mount files referenced by mnist dataset
'--data-folder': mnist_file_dataset.as_named_input('mnist_opendataset').as_mount(),
'--regularization': 0.5
}
est = Estimator(source_directory=script_folder,
script_params=script_params,
compute_target=compute_target,
environment_definition=env,
entry_script='train.py')

Reference:
https://learn.microsoft.com/en-us/azure/machine-learning/tutorial-azure-ml-in-a-day?view=azureml-api-2

Question 3

You use the Azure Machine Learning Python SDK to create a batch inference pipeline.
You must publish the batch inference pipeline so that business groups in your organization can use the pipeline. Each business group must be able to specify a different location for the data that the pipeline submits to the model for scoring.
You need to publish the pipeline.
What should you do?

A.  
B.  
C.  
D.  

Correct Answer : C. Define a OutputFileDatasetConfig object for the pipeline and use the object to specify the business group-specific input dataset for each pipeline run.

Description :
No answer description available for this question. Let us discuss.

Question 4

You plan to use a Data Science Virtual Machine (DSVM) with the open source deep learning frameworks Caffe2 and PyTorch.
You need to select a pre-configured DSVM to support the frameworks.
What should you create?

A.  
B.  
C.  
D.  
E.  

Correct Answer : A. Data Science Virtual Machine for Linux (Ubuntu)

Description :
Caffe2 and PyTorch is supported by Data Science Virtual Machine for Linux.
Microsoft offers Linux editions of the DSVM on Ubuntu 16.04 LTS and CentOS 7.4. Only the DSVM on Ubuntu is preconfigured for Caffe2 and PyTorch.
D: Caffe2 and PytOCH are only supported in the Data Science Virtual Machine for Linux.
Reference:
https://learn.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/overview?view=azureml-api-2

Question 5

You need to implement a feature engineering strategy for the crowd sentiment local models.
What should you do?

A.  
B.  
C.  
D.  

Correct Answer : C. Apply a linear discriminant analysis.

Description :
The linear discriminant analysis method works only on continuous variables, not categorical or ordinal variables.
Linear discriminant analysis is similar to analysis of variance (ANOVA) in that it works by comparing the means of the variables.
Scenario:
Data scientists must build notebooks in a local environment using automatic feature engineering and model building in machine learning pipelines. Experiments for local crowd sentiment models must combine local penalty detection data. All shared features for local models are continuous variables.
B: The Pearson correlation coefficient, sometimes called Pearson's R test, is a statistical value that measures the linear relationship between two variables. By examining the coefficient values, you can infer something about the strength of the relationship between the two variables, and whether they are positively correlated or negatively correlated.
C: Spearman's correlation coefficient is designed for use with non-parametric and non-normally distributed data. Spearman's coefficient is a nonparametric measure of statistical dependence between two variables, and is sometimes denoted by the Greek letter rho. The Spearman's coefficient expresses the degree to which two variables are monotonically related. It is also called Spearman rank correlation, because it can be used with ordinal variables.
Reference:
https://learn.microsoft.com/en-us/previous-versions/azure/machine-learning/studio-module-reference/fisher-linear-discriminant-analysis
https://learn.microsoft.com/en-us/previous-versions/azure/machine-learning/studio-module-reference/compute-linear-correlation

Question 6

You need to implement a scaling strategy for the local penalty detection data.
Which normalization type should you use?

A.  
B.  
C.  
D.  

Correct Answer : C. Batch

Description :
Post batch normalization statistics (PBN) is the Microsoft Cognitive Toolkit (CNTK) version of how to evaluate the population mean and variance of Batch Normalization which could be used in inference Original Paper.

In CNTK, custom networks are defined using the BrainScriptNetworkBuilder and described in the CNTK network description language "BrainScript." Scenario:
Local penalty detection models must be written by using BrainScript.
Reference:
https://learn.microsoft.com/en-us/previous-versions/cognitive-toolkit/post-batch-normalization-statistics

Question 7

You need to implement a model development strategy to determine a user's tendency to respond to an ad.
Which technique should you use?

A.  
B.  
C.  
D.  

Correct Answer : A. Use a Relative Expression Split module to partition the data based on centroid distance.

Description :
Split Data partitions the rows of a dataset into two distinct sets.
The Relative Expression Split option in the Split Data module of Azure Machine Learning Studio is helpful when you need to divide a dataset into training and testing datasets using a numerical expression.
Relative Expression Split: Use this option whenever you want to apply a condition to a number column. The number could be a date/time field, a column containing age or dollar amounts, or even a percentage. For example,
you might want to divide your data set depending on the cost of the items, group people by age ranges, or separate data by a calendar date.
Scenario:
Local market segmentation models will be applied before determining a user's propensity to respond to an advertisement. The distribution of features across training and production data are not consistent

Reference:
https://learn.microsoft.com/en-us/previous-versions/azure/machine-learning/studio-module-reference/split-data

Question 8

You are creating a machine learning model. You have a dataset that contains null rows.
You need to use the Clean Missing Data module in Azure Machine Learning Studio to identify and resolve the null and missing data in the dataset.
Which parameter should you use?

A.  
B.  
C.  
D.  
E.  
.  

Correct Answer : D. Remove entire row

Description :
Remove entire row: Completely removes any row in the dataset that has one or more missing values. This is useful if the missing value can be considered randomly missing.
Reference:
https://learn.microsoft.com/en-us/previous-versions/azure/machine-learning/studio-module-reference/clean-missing-data

Question 9

You need to select a feature extraction method.
Which method should you use?

A.  
B.  
C.  
D.  

Correct Answer : B. Kendall correlation

Description :
In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's tau coefficient (after the Greek letter Ď„), is a statistic used to measure the ordinal association between two measured quantities. It is a supported method of the Azure Machine Learning Feature selection.
Note: Both Spearman's and Kendall's can be formulated as special cases of a more general correlation coefficient, and they are both appropriate in this scenario.
Scenario: The MedianValue and AvgRoomsInHouse columns both hold data in numeric format. You need to select a feature selection algorithm to analyze the relationship between the two columns in more detail.
Reference:
https://learn.microsoft.com/en-us/previous-versions/azure/machine-learning/studio-module-reference/feature-selection-modules

Question 10

You create an MLflow model
You must deploy the model to Azure Machine Learning for batch inference.
You need to create the batch deployment.
Which two components should you use? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point

A.  
B.  
C.  
D.  
E.  

Correct Answer : A. Model files, D. Compute target

Description :
No answer description available for this question. Let us discuss.

Search Current Affairs by date
Other Category List

Cookies Consent

We use cookies to enhance your browsing experience and analyze our traffic. By clicking "Accept All", you consent to our use of cookies. Cookies Policy