
Certificazione MCSA Machine Learning
PANORAMICA

Esame 70-773 Analyzing Big Data with Microsoft R;
Esame 70-774 Perform Cloud Data Science with Azure Machine Learning;
L’innovativa certificazione MCSA Machine Learning fa parte della area di competenza Microsoft sul DATA MANAGEMENT & ANALYTICS. Chi consegue questo titolo certifica la propria competenza nella posizione professionale di Data Scientists e Data Analysts, dimostrando la capacità di utilizzare Microsoft R e Azure Cloude Services, applicando le tecnologie di Machine Learning per analizzare e processare i Big Data.
Per conseguire la Certificazione MCSA Machine Learning è necessario sostenere con successo i seguenti esami:
Esame 70-773 Analyzing Big Data with Microsoft R;
Esame 70-774 Perform Cloud Data Science with Azure Machine Learning;

CORSI PROPEDEUTICI
Affrontare un esame di certificazione richiede una preparazione specifica.
Per questa certificazione consigliamo:
Corso Microsoft Machine Learning
SVOLGIMENTO E DURATA
Esame 70-773 Durata 120 minuti 40-60 quesiti;
Esame 70-774 Durata 120 minuti 40-60 quesiti;
Negli esami sono presenti quesiti formulati in lingua inglese in forme differenti: Risposta Multipla; completamento di testo, collegamenti concettuali Drag and Drop; vere e proprie simulazioni laboratoriali.
PREREQUISITI
Nessun prerequisito.
ARGOMENTI D’ESAME
Esame 70-773 Analyzing Big Data with Microsoft R;
Read data with R Server
Read supported data file formats, such as text files, SAS, and SPSS; convert data to XDF format; identify trade-offs between XDF and flat text files; read data through Open Database Connectivity (ODBC) data sources; read in files from other file systems; use an internal data frame as a data source; process data from sources that cannot be read natively by R Server
Summarize data
Compute crosstabs and univariate statistics, choose when to use rxCrossTabs versus rxCube, integrate with open source technologies by using packages such as dplyrXdf, use group by functionality, create complex formulas to perform multiple tasks in one pass through the data, extract quantiles by using rxQuantile
Visualize data
Visualize in-memory data with base plotting functions and ggplot2; create custom visualizations with rxSummary and rxCube; visualize data with rxHistogram and rxLinePlot, including faceted plots
Process data with rxDataStep
Subset rows of data, modify and create columns by using the Transforms argument, choose when to use on-the-fly transformations versus in-data transform trade-offs, handle missing values through filtering or replacement, generate a data frame or an XDF file, process dates (POSIXct, POSIXlt)
Perform complex transforms that use transform functions
Define a transform function; reshape data by using a transform function; use open source packages, such as lubridate; pass in values by using transformVars and transformEnvir; use internal .rx variables and functions for tasks, including cross-chunk communication
Manage data sets
Sort data in various orders, such as ascending and descending; use rxSort deduplication to remove duplicate values; merge data sources using rxMerge(); merge options and types; identify when alternatives to rxSort and rxMerge should be used
Process text using RML packages
Create features using RML functions, such as featurizeText(); create indicator variables and arrays using RML functions, such as categorical() and categoricalHash(); perform feature selection using RML functions
Estimate linear models
Use rxLinMod, rxGlm, and rxLogit to estimate linear models; set the family for a generalized linear model by using functions such as rxTweedie; process data on the fly by using the appropriate arguments and functions, such as the F function and Transforms argument; weight observations through frequency or probability weights; choose between different types of automatic variable selections, such as greedy searches, repeated scoring, and byproduct of training; identify the impact of missing values during automatic variable selection
Build and use partitioning models
Use rxDTree, rxDForest, and rxBTrees to build partitioning models; adjust the weighting of false positives and misses by using loss; select parameters that affect bias and variance, such as pruning, learning rate, and tree depth; use as.rpart to interact with open source ecosystems
Generate predictions and residuals
Use rxPredict to generate predictions; perform parallel scoring using rxExec; generate different types of predictions, such as link and response scores for GLM, response, prob, and vote for rxDForest; generate different types of residuals, such as Usual, Pearson, and DBM
Evaluate models and tuning parameters
Summarize estimated models; run arbitrary code out of process, such as parallel parameter tuning by using rxExec; evaluate tree models by using RevoTreeView and rxVarImpPlot; calculate model evaluation metrics by using built-in functions; calculate model evaluation metrics and visualizations by using custom code, such as mean absolute percentage error and precision recall curves
Create additional models using RML packages
Build and use a One-Class Support Vector Machine, build and use linear and logistic regressions that use L1 and L2 regularization, build and use a decision tree by using FastTree, use FastTree as a recommender with ranking loss (NDCG), build and use a simple three-layer feed-forward neural network
Use different compute contexts to run R Server effectively
Change the compute context (rxHadoopMR, rxSpark, rxLocalseq, and rxLocalParallel); identify which compute context to use for different tasks; use different data source objects, depending on the context (RxOdbcData and RxTextData); identify and use appropriate data sources for different data sources and compute contexts (HDFS and SQL Server); debug processes across different compute contexts; identify use cases for RevoPemaR
Optimize tasks by using local compute contexts
Identify and execute tasks that can be run only in the local compute context, identify tasks that are more efficient to run in the local compute context, choose between rxLocalseq and rxLocalParallel, profile across different compute contexts
Perform in-database analytics by using SQL Server
Choose when to perform in-database versus out-of-database computations, identify limitations of in-database computations, use in-database versus out-of-database compute contexts appropriately, use stored procedures for data processing steps, serialize objects and write back to binary fields in a table, write tables, configure R to optimize SQL Server ( chunksize, numtasks, and computecontext), effectively communicate performance properties to SQL administrators and architects (SQL Server Profiler)
Implement analysis workflows in the Hadoop ecosystem and Spark
Use appropriate R Server functions in Spark; integrate with Hive, Pig, and Hadoop MapReduce; integrate with the Spark ecosystem of tools, such as SparklyR and SparkR; profile and tune across different compute contexts; use doRSR for parallelizing code that was written using open source foreach
Deploy predictive models to SQL Server and Azure Machine Learning
Deploy predictive models to SQL Server as a stored procedure, deploy an arbitrary function to Azure Machine Learning by using the AzureML R package, identify when to use DeployR
Esame 70-774 Perform Cloud Data Science with Azure Machine Learning;
Import and export data to and from Azure Machine Learning
Import and export data to and from Azure Blob storage, import and export data to and from Azure SQL Database, import and export data via Hive Queries, import data from a website, import data from on-premises SQL
Explore and summarize data
Create univariate summaries, create multivariate summaries, visualize univariate distributions, use existing Microsoft R or Python notebooks for custom summaries and custom visualizations, use zip archives to import external packages for R or Python
Cleanse data for Azure Machine Learning
Apply filters to limit a dataset to the desired rows, identify and address missing data, identify and address outliers, remove columns and rows of datasets
Perform feature engineering
Merge multiple datasets by rows or columns into a single dataset by columns, merge multiple datasets by rows or columns into a single dataset by rows, add columns that are combinations of other columns, manually select and construct features for model estimation, automatically select and construct features for model estimation, reduce dimensions of data through principal component analysis (PCA), manage variable metadata, select standardized variables based on planned analysis
Select an appropriate algorithm or method
Select an appropriate algorithm for predicting continuous label data, select an appropriate algorithm for supervised versus unsupervised scenarios, identify when to select R versus Python notebooks, identify an appropriate algorithm for grouping unlabeled data, identify an appropriate algorithm for classifying label data, select an appropriate ensemble
Initialize and train appropriate models
Tune hyperparameters manually; tune hyperparameters automatically; split data into training and testing datasets, including using routines for cross-validation; build an ensemble using the stacking method
Validate models
Score and evaluate models, select appropriate evaluation metrics for clustering, select appropriate evaluation metrics for classification, select appropriate evaluation metrics for regression, use evaluation metrics to choose between Machine Learning models, compare ensemble metrics against base models
Deploy models using Azure Machine Learning
Publish a model developed inside Azure Machine Learning, publish an externally developed scoring function using an Azure Machine Learning package, use web service parameters, create and publish a recommendation model, create and publish a language understanding model
Manage Azure Machine Learning projects and workspaces
Create projects and experiments, add assets to a project, create new workspaces, invite users to a workspace, switch between different workspaces, create a Jupyter notebook that references an intermediate dataset
Consume Azure Machine Learning models
Connect to a published Machine Learning web service, consume a published Machine Learning model programmatically using a batch execution service, consume a published Machine Learning model programmatically using a request response service, interact with a published Machine Learning model using Microsoft Excel, publish models to the marketplace
Consume exemplar Cognitive Services APIs
Consume Vision APIs to process images, consume Language APIs to process text, consume Knowledge APIs to create recommendations
Build and use neural networks with the Microsoft Cognitive Toolkit
Use N-series VMs for GPU acceleration, build and train a three-layer feed forward neural network, determine when to implement a neural network
Streamline development by using existing resources
Clone template experiments from Cortana Intelligence Gallery, use Cortana Intelligence Quick Start to deploy resources, use a data science VM for streamlined development
Perform data sciences at scale by using HDInsights
Deploy the appropriate type of HDI cluster, perform exploratory data analysis by using Spark SQL, build and use Machine Learning models with Spark on HDI, build and use Machine Learning models using MapReduce, build and use Machine Learning models using Microsoft R Server
Perform database analytics by using SQL Server R Services on Azure
Deploy a SQL Server 2016 Azure VM, configure SQL Server to allow execution of R scripts, execute R scripts inside T-SQL statements
VIDEO DI APPROFONDIMENTO
Laboratorio H24
La nostra community
Le Testimonianze