Free DP-100 Exam Practice Questions
Practice for DP-100: Azure Data Scientist Associate exam - disclaimer
-
Exam Code: DP-100
Exam Title: Azure Data Scientist Associate - Exam Provider: Microsoft
- Total Exam Questions: 22
- Last Updated On: 27 May 2024
Exercise : Exam DP 100 Azure Data Scientist Associate MCQ Questions and Answers
Question 1
You create an Azure Machine Learning pipeline named pipeline1 with two steps that contain Python scripts. Data processed by the first step is passed to the second step.
You must update the content of the downstream data source of pipeline1 and run the pipeline again You need to ensure the new run of pipeline1 fully processes the updated content.
Solution: Set the allow_reuse parameter of the PythonScriptStep object of both steps to False Does the solution meet the goal?
A. |
B. |
Correct Answer : B. Yes
Description :
No answer description available for this question. Let us discuss.
Question 2
You need to implement a model development strategy to determine a user's tendency to respond to an ad.
Which technique should you use?
A. |
B. |
C. |
D. |
Correct Answer : C. Use a Relative Expression Split module to partition the data based on centroid distance.
Description :
Split Data partitions the rows of a dataset into two distinct sets.
The Relative Expression Split option in the Split Data module of Azure Machine Learning Studio is helpful when you need to divide a dataset into training and testing datasets using a numerical expression.
Relative Expression Split: Use this option whenever you want to apply a condition to a number column. The number could be a date/time field, a column containing age or dollar amounts, or even a percentage. For example,
you might want to divide your data set depending on the cost of the items, group people by age ranges, or separate data by a calendar date.
Scenario:
Local market segmentation models will be applied before determining a user's propensity to respond to an advertisement. The distribution of features across training and production data are not consistent
Reference:
https://learn.microsoft.com/en-us/previous-versions/azure/machine-learning/studio-module-reference/split-data
Question 3
You use the Azure Machine learning SDK foe Python to create a pipeline that includes the following step:
The output of the step run must be cached and reused on subsequent runs when the source.directory value has not changed.
You need to define the step.
What should you include in the step definition?
A. |
B. |
C. |
D. |
Correct Answer : D. allow.reuse
Description :
No answer description available for this question. Let us discuss.
Question 4
You create an Azure Machine Learning workspace. The workspace contains a dataset named sample.dataset, a compute instance, and a compute cluster. You must create a two-stage pipeline that will prepare data in the
dataset and then train and register a model based on the prepared The first stage of the pipeline contains the following code:
You need to identify the location containing the output of the first stage of the script that you can use as input for the second stage.
Which storage location should you use?
A. |
B. |
C. |
Correct Answer : C. compute instance
Description :
No answer description available for this question. Let us discuss.
Question 5
You use Azure Machine Learning Studio to build a machine learning experiment.
You need to divide data into two distinct datasets.
Which module should you use?
A. |
B. |
C. |
D. |
Correct Answer : C. Partition and Sample
Description :
Partition and Sample with the Stratified split option outputs multiple datasets, partitioned using the rules you specified.
Reference:
https://learn.microsoft.com/en-us/previous-versions/azure/machine-learning/studio-module-reference/partition-and-sample
Question 6
You need to implement a new cost factor scenario for the ad response models as illustrated in the performance curve exhibit.
Which technique should you use?
A. |
B. |
C. |
D. |
Correct Answer : B. Set the threshold to 0.5 and retrain if weighted Kappa deviates +/- 5% from 0.45.
Description :
Scenario:
Performance curves of current and proposed cost factor scenarios are shown in the following diagram:
The ad propensity model uses a cut threshold is 0.45 and retrains occur if weighted Kappa deviated from 0.1 +/- 5%.
Question 7
You need to select a feature extraction method.
Which method should you use?
A. |
B. |
C. |
D. |
Correct Answer : C. Spearman correlation
Description :
Spearman's rank correlation coefficient assesses how well the relationship between two variables can be described using a monotonic function.
Note: Both Spearman's and Kendall's can be formulated as special cases of a more general correlation coefficient, and they are both appropriate in this scenario.
Scenario: The MedianValue and AvgRoomsInHouse columns both hold data in numeric format. You need to select a feature selection algorithm to analyze the relationship between the two columns in more detail.
B: The Spearman correlation between two variables is equal to the Pearson correlation between the rank values of those two variables; while Pearson's correlation assesses linear relationships, Spearman's correlation assesses monotonic relationships (whether linear or not).
Reference:
https://learn.microsoft.com/en-us/previous-versions/azure/machine-learning/studio-module-reference/feature-selection-modules
Question 8
You plan to use a Deep Learning Virtual Machine (DLVM) to train deep learning models using Compute Unified Device Architecture (CUDA) computations.
You need to configure the DLVM to support CUDA.
What should you implement?
A. |
B. |
C. |
D. |
E. |
Correct Answer : D. Graphic Processing Unit (GPU)
Description :
A Deep Learning Virtual Machine is a pre-configured environment for deep learning using GPU instances.
Reference:
https://azure.microsoft.com/en-us/products/virtual-machines/data-science-virtual-machines
Question 9
You are developing a hands-on workshop to introduce Docker for Windows to attendees.
You need to ensure that workshop attendees can install Docker on their devices.
Which two prerequisite components should attendees install on the devices? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
A. |
B. |
C. |
D. |
E. |
Correct Answer : A. BIOS-enabled virtualization, B. Windows 10 64-bit Professional
Description :
C: Make sure your Windows system supports Hardware Virtualization Technology and that virtualization is enabled.
Ensure that hardware virtualization support is turned on in the BIOS settings. For example:
E: To run Docker, your machine must have a 64-bit operating system running Windows 7 or higher.
Reference:
https://docs.docker.com/desktop/
https://learn.microsoft.com/en-us/archive/blogs/canitpro/step-by-step-enabling-hyper-v-for-use-on-windows-10
Question 10
You are solving a classification task.
The dataset is imbalanced.
You need to select an Azure Machine Learning Studio module to improve the classification accuracy.
Which module should you use?
A. |
B. |
C. |
D. |
Correct Answer : D. Synthetic Minority Oversampling Technique (SMOTE)
Description :
Use the SMOTE module in Azure Machine Learning Studio (classic) to increase the number of underrepresented cases in a dataset used for machine learning. SMOTE is a better way of increasing the number of rare cases than
simply duplicating existing cases.
You connect the SMOTE module to a dataset that is imbalanced. There are many reasons why a dataset might be imbalanced: the category you are targeting might be very rare in the population, or the data might simply be
difficult to collect. Typically, you use SMOTE when the class you want to analyze is under-represented.
Reference:
https://learn.microsoft.com/en-us/previous-versions/azure/machine-learning/studio-module-reference/smote