Free DP-100 Exam Practice Questions
Practice for DP-100: Azure Data Scientist Associate exam - disclaimer
-
Exam Code: DP-100
Exam Title: Azure Data Scientist Associate - Exam Provider: Microsoft
- Total Exam Questions: 22
- Last Updated On: 27 May 2024
Exercise : Exam DP 100 Azure Data Scientist Associate MCQ Questions and Answers
Question 1
You need to implement a feature engineering strategy for the crowd sentiment local models.
What should you do?
A. |
B. |
C. |
D. |
Correct Answer : B. Apply a linear discriminant analysis.
Description :
The linear discriminant analysis method works only on continuous variables, not categorical or ordinal variables.
Linear discriminant analysis is similar to analysis of variance (ANOVA) in that it works by comparing the means of the variables.
Scenario:
Data scientists must build notebooks in a local environment using automatic feature engineering and model building in machine learning pipelines. Experiments for local crowd sentiment models must combine local penalty detection data. All shared features for local models are continuous variables.
B: The Pearson correlation coefficient, sometimes called Pearson's R test, is a statistical value that measures the linear relationship between two variables. By examining the coefficient values, you can infer something about the strength of the relationship between the two variables, and whether they are positively correlated or negatively correlated.
C: Spearman's correlation coefficient is designed for use with non-parametric and non-normally distributed data. Spearman's coefficient is a nonparametric measure of statistical dependence between two variables, and is sometimes denoted by the Greek letter rho. The Spearman's coefficient expresses the degree to which two variables are monotonically related. It is also called Spearman rank correlation, because it can be used with ordinal variables.
Reference:
https://learn.microsoft.com/en-us/previous-versions/azure/machine-learning/studio-module-reference/fisher-linear-discriminant-analysis
https://learn.microsoft.com/en-us/previous-versions/azure/machine-learning/studio-module-reference/compute-linear-correlation
Question 2
You create an Azure Machine Learning pipeline named pipeline1 with two steps that contain Python scripts. Data processed by the first step is passed to the second step.
You must update the content of the downstream data source of pipeline1 and run the pipeline again You need to ensure the new run of pipeline1 fully processes the updated content.
Solution: Set the allow_reuse parameter of the PythonScriptStep object of both steps to False Does the solution meet the goal?
A. |
B. |
Correct Answer : A. Yes
Description :
No answer description available for this question. Let us discuss.
Question 3
You create an Azure Machine Learning workspace. The workspace contains a dataset named sample.dataset, a compute instance, and a compute cluster. You must create a two-stage pipeline that will prepare data in the
dataset and then train and register a model based on the prepared The first stage of the pipeline contains the following code:
You need to identify the location containing the output of the first stage of the script that you can use as input for the second stage.
Which storage location should you use?
A. |
B. |
C. |
Correct Answer : A. compute instance
Description :
No answer description available for this question. Let us discuss.
Question 4
You plan to use a Deep Learning Virtual Machine (DLVM) to train deep learning models using Compute Unified Device Architecture (CUDA) computations.
You need to configure the DLVM to support CUDA.
What should you implement?
A. |
B. |
C. |
D. |
E. |
Correct Answer : A. Graphic Processing Unit (GPU)
Description :
A Deep Learning Virtual Machine is a pre-configured environment for deep learning using GPU instances.
Reference:
https://azure.microsoft.com/en-us/products/virtual-machines/data-science-virtual-machines
Question 5
You are solving a classification task.
The dataset is imbalanced.
You need to select an Azure Machine Learning Studio module to improve the classification accuracy.
Which module should you use?
A. |
B. |
C. |
D. |
Correct Answer : B. Synthetic Minority Oversampling Technique (SMOTE)
Description :
Use the SMOTE module in Azure Machine Learning Studio (classic) to increase the number of underrepresented cases in a dataset used for machine learning. SMOTE is a better way of increasing the number of rare cases than
simply duplicating existing cases.
You connect the SMOTE module to a dataset that is imbalanced. There are many reasons why a dataset might be imbalanced: the category you are targeting might be very rare in the population, or the data might simply be
difficult to collect. Typically, you use SMOTE when the class you want to analyze is under-represented.
Reference:
https://learn.microsoft.com/en-us/previous-versions/azure/machine-learning/studio-module-reference/smote
Question 6
You need to implement a scaling strategy for the local penalty detection data.
Which normalization type should you use?
A. |
B. |
C. |
D. |
Correct Answer : A. Batch
Description :
Post batch normalization statistics (PBN) is the Microsoft Cognitive Toolkit (CNTK) version of how to evaluate the population mean and variance of Batch Normalization which could be used in inference Original Paper.
In CNTK, custom networks are defined using the BrainScriptNetworkBuilder and described in the CNTK network description language "BrainScript." Scenario:
Local penalty detection models must be written by using BrainScript.
Reference:
https://learn.microsoft.com/en-us/previous-versions/cognitive-toolkit/post-batch-normalization-statistics
Question 7
You are creating a machine learning model. You have a dataset that contains null rows.
You need to use the Clean Missing Data module in Azure Machine Learning Studio to identify and resolve the null and missing data in the dataset.
Which parameter should you use?
A. |
B. |
C. |
D. |
E. |
. |
Correct Answer : B. Remove entire row
Description :
Remove entire row: Completely removes any row in the dataset that has one or more missing values. This is useful if the missing value can be considered randomly missing.
Reference:
https://learn.microsoft.com/en-us/previous-versions/azure/machine-learning/studio-module-reference/clean-missing-data
Question 8
You need to implement a new cost factor scenario for the ad response models as illustrated in the performance curve exhibit.
Which technique should you use?
A. |
B. |
C. |
D. |
Correct Answer : A. Set the threshold to 0.5 and retrain if weighted Kappa deviates +/- 5% from 0.45.
Description :
Scenario:
Performance curves of current and proposed cost factor scenarios are shown in the following diagram:
The ad propensity model uses a cut threshold is 0.45 and retrains occur if weighted Kappa deviated from 0.1 +/- 5%.
Question 9
You create an MLflow model
You must deploy the model to Azure Machine Learning for batch inference.
You need to create the batch deployment.
Which two components should you use? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point
A. |
B. |
C. |
D. |
E. |
Correct Answer : A. Compute target, D. Model files
Description :
No answer description available for this question. Let us discuss.
Question 10
You use Azure Machine Learning Studio to build a machine learning experiment.
You need to divide data into two distinct datasets.
Which module should you use?
A. |
B. |
C. |
D. |
Correct Answer : B. Partition and Sample
Description :
Partition and Sample with the Stratified split option outputs multiple datasets, partitioned using the rules you specified.
Reference:
https://learn.microsoft.com/en-us/previous-versions/azure/machine-learning/studio-module-reference/partition-and-sample