Init and have all packages required
This commit is contained in:
commit
782aba19ba
53 changed files with 21896 additions and 0 deletions
118
azuremlpythonsdk-v2/README.md
Normal file
118
azuremlpythonsdk-v2/README.md
Normal file
|
@ -0,0 +1,118 @@
|
|||
# Azure ML Lesson 2 Lab
|
||||
|
||||
## 1. Set environmental variables
|
||||
|
||||
1. Run VS Code in a Azure ML remote instance as shown before.
|
||||
2. Press `File > Open Folder` and navigate to `azuremlpythonsdk-v2/` to open the exercise.
|
||||
|
||||
**IMPORTANT** Relative paths are assumed to be initialized from the `azuremlpythonsdk-v2` folder.
|
||||
|
||||
Open the file `initialize_constants.py`, there are three variables that should be updated:
|
||||
|
||||
- AZURE_WORKSPACE_NAME
|
||||
|
||||
- AZURE_RESOURCE_GROUP
|
||||
|
||||
- AZURE_SUBSCRIPTION_ID
|
||||
|
||||
Open your workspace at in `https://ml.azure.com`. At the top right, select the workspace name, then copy the workspace name, the subscription id and the resource name.
|
||||
|
||||
## 2. Load a workspace
|
||||
|
||||
Open the file `ml_client.py` and understand how a ML client object is loaded or created. In this lab, the namespace was already created. Just fill the name of the variables from `initialize_constants.py`.
|
||||
|
||||
When finished, run this file and check that it is executed without errors.
|
||||
|
||||
## 3. Load a Compute Cluster
|
||||
|
||||
Open the file `compute_aml.py` and understand how a compute cluster is loaded or created. In this lab, the compute cluster was already created but some variables should be added, which are marked with `XXXX`.
|
||||
|
||||
When finished, run this file and check that it is executed without errors.
|
||||
|
||||
What would happen if the compute cluster is not present?
|
||||
|
||||
## 4. Create a tabular dataset
|
||||
|
||||
Open the file `data_tabular.py` , several gaps should be filled which are marked with `XXXX`:
|
||||
|
||||
1. `ml_client = XXXXX()`
|
||||
|
||||
Hint: look into previous files.
|
||||
|
||||
2. How can you get the names of the datasets already registered in `if name_dataset not in [XXXXX for env in ml_client.data.list()]`
|
||||
|
||||
Hint: Try to get one object from the class [Data](https://learn.microsoft.com/en-us/python/api/azure-ai-ml/azure.ai.ml.entities.data?view=azure-python) and check their attributes.
|
||||
|
||||
3. Which should be the `path` parameter in `path=XXXXX`?
|
||||
|
||||
4. Which input should you give in `ml_client.data.create_or_update(XXXXX)`?
|
||||
|
||||
When finished, run this file and check that it is executed without errors.
|
||||
|
||||
## 5. Create and register an environment
|
||||
|
||||
Open the file `environment.py` , several gaps should be filled which are marked with `XXXX`:
|
||||
|
||||
1. `ml_client = XXXXX()`
|
||||
|
||||
Hint: look into previous files.
|
||||
|
||||
2. Which class should be used to register the environment?
|
||||
|
||||
Hint: Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-environments-v2?tabs=python)
|
||||
|
||||
When finished, run this file and check that it is executed without errors.
|
||||
|
||||
## 6. Train a model from a tabular dataset using a remote compute
|
||||
|
||||
Open the file `azml_01_experiment_remote_compute.py` , several gaps should be filled which are marked with `XXXX`:
|
||||
|
||||
1. `ml_client = XXXX()`
|
||||
|
||||
Hint: look into previous files.
|
||||
|
||||
2. Complete the `latest_version_dataset` definition.
|
||||
|
||||
Hint: Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/tutorial-azure-ml-in-a-day#deploy-the-model-to-the-endpoint)
|
||||
|
||||
3. Complete the `Input` part.
|
||||
|
||||
Hint: Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-read-write-data-v2?tabs=python)
|
||||
|
||||
4. Complete the `command` part.
|
||||
|
||||
Hint: Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-read-write-data-v2?tabs=python)
|
||||
|
||||
When finished, run this file and check that it is executed without errors.
|
||||
|
||||
### 7. Tune hyperparameters using a remote compute
|
||||
|
||||
Open the file `azml_02_hyperparameters_tuning.py` , several gaps should be filled which are marked with `XXXX`. The hyperparameter search should be defined in the following space:
|
||||
|
||||
- learning_rate: one of the values 0.01, 0.1, 1.0
|
||||
|
||||
- n_estimators: one of the values 10, 100
|
||||
|
||||
Hint: Use the previous file as template.
|
||||
|
||||
Hint: For the `Hyperdrive settings` format, look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-sweep-in-pipeline)
|
||||
|
||||
Open the file `diabetes_hyperdrive/diabetes_training.py` , several gaps should be filled which are marked with `XXXX`. A Gradient Boosting classification model should be trained and the auc and the accuracy in the test set should be computed.
|
||||
|
||||
Hint: Use as a template the file `data/diabetes_training.py`.
|
||||
|
||||
When finished, run this file and check that it is executed without errors.
|
||||
|
||||
## 8. Create a real-time inferencing service
|
||||
|
||||
Open the file `azml_03_realtime_inference.py` , several gaps should be filled which are marked with `XXXX`.
|
||||
|
||||
Hint: Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-mlflow-models?tabs=fromjob%2Cmir%2Csdk)
|
||||
|
||||
When finished, run this file and check that it is executed without errors.
|
||||
|
||||
## 9. Test the inference service
|
||||
|
||||
Open the file `azml_04_test_inference.py` , several gaps should be filled which are marked with `XXXX`.
|
||||
|
||||
Hint: Check [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-safely-rollout-online-endpoints?view=azureml-api-2&tabs=python)
|
70
azuremlpythonsdk-v2/azml_01_experiment_remote_compute.py
Normal file
70
azuremlpythonsdk-v2/azml_01_experiment_remote_compute.py
Normal file
|
@ -0,0 +1,70 @@
|
|||
"""
|
||||
Script to train a model from a tabular dataset using a remote compute
|
||||
Based on:
|
||||
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-scikit-learn
|
||||
"""
|
||||
from azure.ai.ml import Input, command
|
||||
from azure.ai.ml.constants import AssetTypes
|
||||
|
||||
from compute_aml import create_or_load_aml
|
||||
from data_tabular import create_tabular_dataset, name_dataset
|
||||
from environment import custom_env_name
|
||||
from initialize_constants import AML_COMPUTE_NAME
|
||||
from ml_client import create_or_load_ml_client
|
||||
|
||||
experiment_name = "mslearn-train-diabetes"
|
||||
experiment_folder = "./diabetes_training"
|
||||
script_name = "diabetes_training.py"
|
||||
registered_model_name = "diabetes_model"
|
||||
|
||||
|
||||
def main():
|
||||
# 1. Create or Load a ML client
|
||||
ml_client = XXXX()
|
||||
|
||||
# 2. Create compute resources
|
||||
create_or_load_aml()
|
||||
|
||||
# 3. Create and register a File Dataset
|
||||
create_tabular_dataset()
|
||||
latest_version_dataset = next(
|
||||
dataset.latest_version
|
||||
for dataset in ml_client.data.XXXX
|
||||
if dataset.name == name_dataset
|
||||
)
|
||||
|
||||
# 4. Run Job
|
||||
job = command(
|
||||
inputs=dict(
|
||||
script_name=script_name,
|
||||
data=Input(
|
||||
type=AssetTypes.URI_FILE,
|
||||
# @latest doesn't work with dataset paths
|
||||
path=XXXX,
|
||||
),
|
||||
registered_model_name=registered_model_name,
|
||||
),
|
||||
code=experiment_folder,
|
||||
command=(
|
||||
"python ${{inputs.script_name}}"
|
||||
+ " --data XXXX"
|
||||
+ " --registered_model_name XXXX"
|
||||
),
|
||||
environment=f"{custom_env_name}@latest",
|
||||
compute=AML_COMPUTE_NAME,
|
||||
experiment_name=experiment_name,
|
||||
display_name=experiment_name,
|
||||
)
|
||||
|
||||
# submit the command
|
||||
returned_job = ml_client.jobs.create_or_update(job)
|
||||
|
||||
# stream the output and wait until the job is finished
|
||||
ml_client.jobs.stream(returned_job.name)
|
||||
|
||||
# refresh the latest status of the job after streaming
|
||||
returned_job = ml_client.jobs.get(name=returned_job.name)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
113
azuremlpythonsdk-v2/azml_02_hyperparameters_tuning.py
Normal file
113
azuremlpythonsdk-v2/azml_02_hyperparameters_tuning.py
Normal file
|
@ -0,0 +1,113 @@
|
|||
"""
|
||||
Script to train tune hyperparameters
|
||||
Based on:
|
||||
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-scikit-learn
|
||||
"""
|
||||
from azure.ai.ml import Input, command
|
||||
from azure.ai.ml.constants import AssetTypes
|
||||
from azure.ai.ml.entities import Model
|
||||
from azure.ai.ml.sweep import Choice
|
||||
|
||||
from compute_aml import create_or_load_aml
|
||||
from data_tabular import create_tabular_dataset, name_dataset
|
||||
from environment import create_docker_environment, custom_env_name
|
||||
from initialize_constants import AML_COMPUTE_NAME
|
||||
from ml_client import create_or_load_ml_client
|
||||
|
||||
experiment_folder = "diabetes_hyperdrive"
|
||||
experiment_name = "mslearn-diabetes-hyperdrive"
|
||||
script_name = "diabetes_training.py"
|
||||
registered_model_name = "diabetes_model_hyper"
|
||||
best_model_name = "best_diabetes_model"
|
||||
|
||||
|
||||
def main():
|
||||
# 1. Create or Load a ML client
|
||||
ml_client = XXXX()
|
||||
|
||||
# 2. Create compute resources
|
||||
XXXX()
|
||||
|
||||
# 3. Create and register a File Dataset
|
||||
XXXX()
|
||||
latest_version_dataset = XXXX()
|
||||
|
||||
# 4. Environment
|
||||
environment_names = [env.name for XXXX in ml_client.environments.list()]
|
||||
if custom_env_name not in environment_names:
|
||||
create_docker_environment()
|
||||
|
||||
# 5. Run Job
|
||||
job_for_sweep = command(
|
||||
inputs=dict(
|
||||
script_name=script_name,
|
||||
data=Input(
|
||||
type=AssetTypes.URI_FILE,
|
||||
# @latest doesn't work with dataset paths
|
||||
path=f"azureml:{name_dataset}:{latest_version_dataset}",
|
||||
),
|
||||
registered_model_name=registered_model_name,
|
||||
learning_rate=XXXX(values= XXXX),
|
||||
n_estimators=XXXX(values=XXXX),
|
||||
),
|
||||
code=experiment_folder,
|
||||
command=(
|
||||
"python XXXX"
|
||||
+ " --data XXXX"
|
||||
+ " --registered_model_name XXXX"
|
||||
+ " --learning_rate XXXX"
|
||||
+ " --n_estimators XXXX"
|
||||
),
|
||||
environment=XXXX,
|
||||
compute=AML_COMPUTE_NAME,
|
||||
experiment_name=experiment_name,
|
||||
display_name=experiment_name,
|
||||
)
|
||||
|
||||
# Configure hyperdrive settings
|
||||
sweep_job = job_for_sweep.XXXX(
|
||||
compute=AML_COMPUTE_NAME,
|
||||
sampling_algorithm="grid",
|
||||
primary_metric="AUC",
|
||||
goal="Maximize",
|
||||
max_total_trials=6,
|
||||
max_concurrent_trials=2,
|
||||
)
|
||||
|
||||
# submit the command
|
||||
returned_sweep_job = ml_client.create_or_update(sweep_job)
|
||||
|
||||
# stream the output and wait until the job is finished
|
||||
ml_client.jobs.stream(returned_sweep_job.name)
|
||||
|
||||
# refresh the latest status of the job after streaming
|
||||
returned_sweep_job = ml_client.jobs.get(name=returned_sweep_job.name)
|
||||
|
||||
# Find and register the best model
|
||||
if returned_sweep_job.status == "Completed":
|
||||
# First let us get the run which gave us the best result
|
||||
best_run = returned_sweep_job.properties["best_child_run_id"]
|
||||
|
||||
# lets get the model from this run
|
||||
model = Model(
|
||||
# the script stores the model as the given name
|
||||
path=(
|
||||
f"azureml://jobs/{best_run}/outputs/artifacts/paths/"
|
||||
+ f"{registered_model_name}/"
|
||||
),
|
||||
name=best_model_name,
|
||||
type="mlflow_model",
|
||||
)
|
||||
else:
|
||||
print(
|
||||
f"Sweep job status: {returned_sweep_job.status}. \
|
||||
Please wait until it completes"
|
||||
)
|
||||
|
||||
# Register best model
|
||||
print(f"Registering Model {best_model_name}")
|
||||
ml_client.models.XXXX(model=model)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
49
azuremlpythonsdk-v2/azml_03_realtime_inference.py
Normal file
49
azuremlpythonsdk-v2/azml_03_realtime_inference.py
Normal file
|
@ -0,0 +1,49 @@
|
|||
"""
|
||||
Script to create a real-time inferencing service
|
||||
Based on:
|
||||
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-mlflow-models
|
||||
"""
|
||||
from azure.ai.ml.entities import ManagedOnlineDeployment, ManagedOnlineEndpoint
|
||||
|
||||
from azml_02_hyperparameters_tuning import best_model_name
|
||||
from initialize_constants import AZURE_WORKSPACE_NAME, VM_SIZE
|
||||
from ml_client import create_or_load_ml_client
|
||||
|
||||
online_endpoint_name = ("srv-" + AZURE_WORKSPACE_NAME).lower()
|
||||
|
||||
|
||||
def main():
|
||||
# 1. Create or Load a ML client
|
||||
ml_client = XXXX()
|
||||
|
||||
# 2. Create a endpoint
|
||||
print(f"Creating endpoint {online_endpoint_name}")
|
||||
endpoint = XXXX(
|
||||
name=online_endpoint_name,
|
||||
auth_mode="key",
|
||||
)
|
||||
|
||||
# Method `result()` should be added to wait until completion
|
||||
ml_client.online_endpoints.XXXX(endpoint).result()
|
||||
|
||||
# 3. Create a deployment
|
||||
best_model_latest_version = XXXX
|
||||
|
||||
blue_deployment = XXXX(
|
||||
name=online_endpoint_name,
|
||||
endpoint_name=online_endpoint_name,
|
||||
# @latest doesn't work with model paths
|
||||
model=XXXX,
|
||||
instance_type=VM_SIZE,
|
||||
instance_count=1,
|
||||
)
|
||||
|
||||
# Assign all the traffic to this endpoint
|
||||
# Method `result()` should be added to wait until completion
|
||||
ml_client.begin_create_or_update(blue_deployment).result()
|
||||
endpoint.traffic = {online_endpoint_name: 100}
|
||||
ml_client.begin_create_or_update(endpoint).result()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
23
azuremlpythonsdk-v2/azml_04_test_inference.py
Normal file
23
azuremlpythonsdk-v2/azml_04_test_inference.py
Normal file
|
@ -0,0 +1,23 @@
|
|||
"""
|
||||
Script to use real-time inferencing with online endpoints
|
||||
"""
|
||||
from azml_03_realtime_inference import online_endpoint_name
|
||||
from ml_client import create_or_load_ml_client
|
||||
|
||||
|
||||
def main():
|
||||
# 1. Load a Workspace
|
||||
ml_client = XXXX()
|
||||
|
||||
# 2. Get predictions
|
||||
output = ml_client.online_endpoints.XXXX(
|
||||
endpoint_name=XXXX,
|
||||
deployment_name=online_endpoint_name,
|
||||
request_file="./diabetes_test_inference/request.json",
|
||||
)
|
||||
|
||||
print(output)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
63
azuremlpythonsdk-v2/compute_aml.py
Normal file
63
azuremlpythonsdk-v2/compute_aml.py
Normal file
|
@ -0,0 +1,63 @@
|
|||
"""
|
||||
Script to initialize an Azure Machine Learning compute cluster (aml)
|
||||
"""
|
||||
from azure.ai.ml.entities import AmlCompute
|
||||
|
||||
from initialize_constants import AML_COMPUTE_NAME, MAX_NODES, MIN_NODES, VM_SIZE
|
||||
from ml_client import create_or_load_ml_client
|
||||
|
||||
|
||||
def create_or_load_aml(
|
||||
cpu_compute_target=AML_COMPUTE_NAME,
|
||||
vm_size=VM_SIZE,
|
||||
min_nodes=MIN_NODES,
|
||||
max_nodes=MAX_NODES,
|
||||
):
|
||||
"""Create or load an Azure Machine Learning compute cluster (aml) in a
|
||||
given Workspace.
|
||||
Args:
|
||||
cpu_compute_target: Name of the compute resource
|
||||
vm_size: Virtual machine size, VM_SIZE is used as default,
|
||||
for example STANDARD_D2_V2. Set to STANDARD_NC6 to get a GPU
|
||||
min_nodes: Minimal number of nodes, MIN_NODES is used as default.
|
||||
max_nodes: Minimal number of nodes, MIN_NODES is used as default.
|
||||
|
||||
Returns:
|
||||
An aml and set quick load.
|
||||
"""
|
||||
# Create or Load a Workspace
|
||||
ml_client = create_or_load_ml_client()
|
||||
try:
|
||||
# let's see if the compute target already exists
|
||||
cpu_cluster = ml_client.compute.get(XXXXX)
|
||||
print(
|
||||
f"You already have a cluster named {XXXXX},",
|
||||
"we'll reuse it.",
|
||||
)
|
||||
except Exception:
|
||||
print("Creating a new cpu compute target...")
|
||||
cpu_cluster = AmlCompute(
|
||||
name=cpu_compute_target,
|
||||
# Azure ML Compute is the on-demand VM service
|
||||
type="amlcompute",
|
||||
# VM Family
|
||||
size=vm_size,
|
||||
# Minimum running nodes when there is no job running
|
||||
min_instances=min_nodes,
|
||||
# Nodes in cluster
|
||||
max_instances=max_nodes,
|
||||
# How many seconds will the node running after the job termination
|
||||
idle_time_before_scale_down=180,
|
||||
# Dedicated or LowPriority.
|
||||
# The latter is cheaper but there is a chance of job termination
|
||||
tier="Dedicated",
|
||||
)
|
||||
|
||||
# Now, we pass the object to MLClient's create_or_update method
|
||||
cpu_cluster = ml_client.compute.begin_create_or_update(XXXXX)
|
||||
|
||||
return cpu_cluster
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
create_or_load_aml()
|
10001
azuremlpythonsdk-v2/data/diabetes.csv
Normal file
10001
azuremlpythonsdk-v2/data/diabetes.csv
Normal file
File diff suppressed because it is too large
Load diff
31
azuremlpythonsdk-v2/data_tabular.py
Normal file
31
azuremlpythonsdk-v2/data_tabular.py
Normal file
|
@ -0,0 +1,31 @@
|
|||
"""
|
||||
Script to create and register file as an uri
|
||||
"""
|
||||
from azure.ai.ml.constants import AssetTypes
|
||||
from azure.ai.ml.entities import Data
|
||||
|
||||
from ml_client import create_or_load_ml_client
|
||||
|
||||
name_dataset = "diabetes-dataset"
|
||||
data_folder = "./data/diabetes.csv"
|
||||
|
||||
|
||||
def create_tabular_dataset():
|
||||
# 1. Create or Load a ML client
|
||||
ml_client = XXXXX()
|
||||
|
||||
# 2. Add files
|
||||
if name_dataset not in [XXXXX for env in ml_client.data.list()]:
|
||||
tab_data_set = Data(
|
||||
path=XXXXX,
|
||||
type=AssetTypes.URI_FILE,
|
||||
name=name_dataset,
|
||||
)
|
||||
|
||||
ml_client.data.create_or_update(XXXXX)
|
||||
else:
|
||||
print("Dataset already registered.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
create_tabular_dataset()
|
11
azuremlpythonsdk-v2/dependencies/conda.yml
Normal file
11
azuremlpythonsdk-v2/dependencies/conda.yml
Normal file
|
@ -0,0 +1,11 @@
|
|||
name: model-env
|
||||
dependencies:
|
||||
- python=3.8
|
||||
- scikit-learn
|
||||
- pandas
|
||||
- numpy
|
||||
- matplotlib
|
||||
- pip
|
||||
- pip:
|
||||
- mlflow
|
||||
- azureml-mlflow
|
123
azuremlpythonsdk-v2/diabetes_hyperdrive/diabetes_training.py
Normal file
123
azuremlpythonsdk-v2/diabetes_hyperdrive/diabetes_training.py
Normal file
|
@ -0,0 +1,123 @@
|
|||
# Import libraries
|
||||
import argparse
|
||||
import os
|
||||
|
||||
import mlflow
|
||||
import mlflow.sklearn
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from sklearn.ensemble import GradientBoostingClassifier
|
||||
from sklearn.metrics import roc_auc_score
|
||||
from sklearn.model_selection import train_test_split
|
||||
|
||||
|
||||
def main():
|
||||
"""Main function of the script."""
|
||||
|
||||
# Input and output arguments
|
||||
|
||||
# Get script arguments
|
||||
parser = XXXX()
|
||||
|
||||
# Input dataset
|
||||
parser.add_argument(
|
||||
"XXXX",
|
||||
type=str,
|
||||
help="path to input data",
|
||||
)
|
||||
|
||||
# Model name
|
||||
parser.add_argument("XXXX", type=str, help="model name")
|
||||
|
||||
# Hyperparameters
|
||||
parser.add_argument(
|
||||
"XXXX",
|
||||
type=float,
|
||||
dest="learning_rate",
|
||||
default=0.1,
|
||||
help="learning rate",
|
||||
)
|
||||
parser.add_argument(
|
||||
"XXXX",
|
||||
type=int,
|
||||
dest="n_estimators",
|
||||
default=100,
|
||||
help="number of estimators",
|
||||
)
|
||||
|
||||
# Add arguments to args collection
|
||||
args = parser.parse_args()
|
||||
print(" ".join(f"{k}={v}" for k, v in vars(args).items()))
|
||||
|
||||
# Start Logging
|
||||
mlflow.XXXX()
|
||||
|
||||
# enable autologging
|
||||
mlflow.XXXX()
|
||||
|
||||
# load the diabetes data (passed as an input dataset)
|
||||
print("input data:", args.data)
|
||||
|
||||
diabetes = pd.read_csv(args.data)
|
||||
|
||||
# Separate features and labels
|
||||
X, y = (
|
||||
diabetes[
|
||||
[
|
||||
"Pregnancies",
|
||||
"PlasmaGlucose",
|
||||
"DiastolicBloodPressure",
|
||||
"TricepsThickness",
|
||||
"SerumInsulin",
|
||||
"BMI",
|
||||
"DiabetesPedigree",
|
||||
"Age",
|
||||
]
|
||||
].values,
|
||||
diabetes["Diabetic"].values,
|
||||
)
|
||||
|
||||
# Split data into training set and test set
|
||||
X_train, X_test, y_train, y_test = XXXX(
|
||||
X, y, test_size=0.30, random_state=0
|
||||
)
|
||||
|
||||
# Train a Gradient Boosting classification model
|
||||
# with the specified hyperparameters
|
||||
print("Training a classification model")
|
||||
model = XXXX(
|
||||
learning_rate=XXXX, n_estimators=XXXX
|
||||
).fit(X_train, y_train)
|
||||
|
||||
# calculate accuracy
|
||||
y_hat = model.XXXX(X_test)
|
||||
accuracy = np.average(y_hat == y_test)
|
||||
print("Accuracy:", accuracy)
|
||||
mlflow.log_metric("Accuracy", float(accuracy))
|
||||
|
||||
# calculate AUC
|
||||
y_scores = model.XXXX(X_test)
|
||||
auc = roc_auc_score(y_test, y_scores[:, 1])
|
||||
print("AUC: " + str(auc))
|
||||
mlflow.log_metric("AUC", float(auc))
|
||||
|
||||
# Registering the model to the workspace
|
||||
print("Registering the model via MLFlow")
|
||||
mlflow.XXXX(
|
||||
sk_model=model,
|
||||
registered_model_name=args.registered_model_name,
|
||||
artifact_path=args.registered_model_name,
|
||||
)
|
||||
|
||||
# Saving the model to a file
|
||||
mlflow.sklearn.save_model(
|
||||
sk_model=model,
|
||||
path=os.path.join(args.registered_model_name, "trained_model"),
|
||||
)
|
||||
|
||||
# Stop Logging
|
||||
mlflow.XXXX()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
4
azuremlpythonsdk-v2/diabetes_test_inference/request.json
Normal file
4
azuremlpythonsdk-v2/diabetes_test_inference/request.json
Normal file
|
@ -0,0 +1,4 @@
|
|||
{"input_data": [
|
||||
[2, 180, 74, 24, 21, 23.9091702, 1.488172308, 22],
|
||||
[0, 148, 58, 11, 179, 39.19207553, 0.160829008, 45]
|
||||
]}
|
115
azuremlpythonsdk-v2/diabetes_training/diabetes_training.py
Normal file
115
azuremlpythonsdk-v2/diabetes_training/diabetes_training.py
Normal file
|
@ -0,0 +1,115 @@
|
|||
# Import libraries
|
||||
import argparse
|
||||
import os
|
||||
|
||||
import matplotlib.pyplot as plt
|
||||
import mlflow
|
||||
import mlflow.sklearn
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from sklearn.metrics import roc_auc_score, roc_curve
|
||||
from sklearn.model_selection import train_test_split
|
||||
from sklearn.tree import DecisionTreeClassifier
|
||||
|
||||
|
||||
def main():
|
||||
"""Main function of the script."""
|
||||
|
||||
# Input and output arguments
|
||||
# Get script arguments
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
"--data",
|
||||
type=str,
|
||||
help="path to input data",
|
||||
)
|
||||
parser.add_argument("--registered_model_name", type=str, help="model name")
|
||||
args = parser.parse_args()
|
||||
print(" ".join(f"{k}={v}" for k, v in vars(args).items()))
|
||||
|
||||
# Start Logging
|
||||
mlflow.start_run()
|
||||
|
||||
# enable autologging
|
||||
mlflow.sklearn.autolog()
|
||||
|
||||
# load the diabetes data (passed as an input dataset)
|
||||
print("input data:", args.data)
|
||||
|
||||
diabetes = pd.read_csv(args.data)
|
||||
|
||||
mlflow.log_metric("num_samples", diabetes.shape[0])
|
||||
mlflow.log_metric("num_features", diabetes.shape[1] - 1)
|
||||
|
||||
# Separate features and labels
|
||||
X, y = (
|
||||
diabetes[
|
||||
[
|
||||
"Pregnancies",
|
||||
"PlasmaGlucose",
|
||||
"DiastolicBloodPressure",
|
||||
"TricepsThickness",
|
||||
"SerumInsulin",
|
||||
"BMI",
|
||||
"DiabetesPedigree",
|
||||
"Age",
|
||||
]
|
||||
].values,
|
||||
diabetes["Diabetic"].values,
|
||||
)
|
||||
|
||||
# Split data into training set and test set
|
||||
X_train, X_test, y_train, y_test = train_test_split(
|
||||
X, y, test_size=0.30, random_state=0
|
||||
)
|
||||
|
||||
# Train a decision tree model
|
||||
print("Training a decision tree model")
|
||||
model = DecisionTreeClassifier().fit(X_train, y_train)
|
||||
|
||||
# calculate accuracy
|
||||
y_hat = model.predict(X_test)
|
||||
accuracy = np.average(y_hat == y_test)
|
||||
print("Accuracy:", accuracy)
|
||||
mlflow.log_metric("Accuracy", float(accuracy))
|
||||
|
||||
# calculate AUC
|
||||
y_scores = model.predict_proba(X_test)
|
||||
auc = roc_auc_score(y_test, y_scores[:, 1])
|
||||
print("AUC: " + str(auc))
|
||||
mlflow.log_metric("AUC", float(auc))
|
||||
|
||||
# plot ROC curve
|
||||
fpr, tpr, thresholds = roc_curve(y_test, y_scores[:, 1])
|
||||
fig = plt.figure(figsize=(6, 4))
|
||||
# Plot the diagonal 50% line
|
||||
plt.plot([0, 1], [0, 1], "k--")
|
||||
# Plot the FPR and TPR achieved by our model
|
||||
plt.plot(fpr, tpr)
|
||||
plt.xlabel("False Positive Rate")
|
||||
plt.ylabel("True Positive Rate")
|
||||
plt.title("ROC Curve")
|
||||
fig.savefig("ROC.png")
|
||||
mlflow.log_artifact("ROC.png")
|
||||
plt.show()
|
||||
|
||||
# Registering the model to the workspace
|
||||
print("Registering the model via MLFlow")
|
||||
mlflow.sklearn.log_model(
|
||||
sk_model=model,
|
||||
registered_model_name=args.registered_model_name,
|
||||
artifact_path=args.registered_model_name,
|
||||
)
|
||||
|
||||
# Saving the model to a file
|
||||
mlflow.sklearn.save_model(
|
||||
sk_model=model,
|
||||
path=os.path.join(args.registered_model_name, "trained_model"),
|
||||
)
|
||||
|
||||
# Stop Logging
|
||||
mlflow.end_run()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
33
azuremlpythonsdk-v2/environment.py
Normal file
33
azuremlpythonsdk-v2/environment.py
Normal file
|
@ -0,0 +1,33 @@
|
|||
"""
|
||||
Script to create and register an environment including SKlearn
|
||||
"""
|
||||
import os
|
||||
|
||||
from azure.ai.ml.entities import Environment
|
||||
|
||||
from ml_client import create_or_load_ml_client
|
||||
|
||||
dependencies_dir = "./dependencies"
|
||||
custom_env_name = "custom-scikit-learn"
|
||||
|
||||
|
||||
def create_docker_environment():
|
||||
# 1. Create or Load a ML client
|
||||
ml_client = XXXXX()
|
||||
|
||||
# 2. Create a Python environment for the experiment
|
||||
env_docker_image = XXXXX(
|
||||
name=custom_env_name,
|
||||
conda_file=os.path.join(dependencies_dir, "XXXXX"),
|
||||
image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04:latest",
|
||||
)
|
||||
ml_client.environments.create_or_update(env_docker_image)
|
||||
|
||||
print(
|
||||
f"Environment with name {env_docker_image.name} is registered to the workspace,",
|
||||
f"the environment version is {env_docker_image.version}"
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
create_docker_environment()
|
23
azuremlpythonsdk-v2/initialize_constants.py
Normal file
23
azuremlpythonsdk-v2/initialize_constants.py
Normal file
|
@ -0,0 +1,23 @@
|
|||
"""
|
||||
Script to initialize global constants
|
||||
"""
|
||||
import os
|
||||
|
||||
# Global constants can be set via environmental variables
|
||||
# Remove default values in production
|
||||
AZURE_RESOURCE_GROUP = os.getenv("AZURE_RESOURCE_GROUP", "itvitae-azure-ml")
|
||||
AZURE_SUBSCRIPTION_ID = os.getenv(
|
||||
"AZURE_SUBSCRIPTION_ID", "34faeead-244d-4ae8-8194-1eeaaffaf5be"
|
||||
)
|
||||
AZURE_WORKSPACE_NAME = os.getenv(
|
||||
"AZURE_WORKSPACE_NAME",
|
||||
"ws-kevin-heimbach",
|
||||
)
|
||||
AZURE_LOCATION = os.getenv("AZURE_LOCATION", "westeurope")
|
||||
# Choose names for your clusters
|
||||
AML_COMPUTE_NAME = os.getenv("AML_COMPUTE_NAME", "aml-compute")
|
||||
# General Servers Characteristics
|
||||
VM_SIZE = os.getenv("VM_SIZE", "STANDARD_DS2_V2")
|
||||
MIN_NODES = int(os.getenv("MIN_NODES", 0))
|
||||
MAX_NODES = int(os.getenv("MAX_NODES", 1))
|
||||
AGENT_COUNT = int(os.getenv("AGENT_COUNT", 2))
|
46
azuremlpythonsdk-v2/ml_client.py
Normal file
46
azuremlpythonsdk-v2/ml_client.py
Normal file
|
@ -0,0 +1,46 @@
|
|||
"""
|
||||
Script to initialize MLClient object
|
||||
"""
|
||||
from azure.ai.ml import MLClient
|
||||
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
|
||||
|
||||
from initialize_constants import (
|
||||
AZURE_RESOURCE_GROUP,
|
||||
AZURE_SUBSCRIPTION_ID,
|
||||
AZURE_WORKSPACE_NAME,
|
||||
)
|
||||
|
||||
|
||||
def create_or_load_ml_client():
|
||||
"""Create or load an Azure ML Client based on env variables.
|
||||
Args:
|
||||
None since information is taken from global constants
|
||||
defined in initialize_constants.py.
|
||||
|
||||
Returns:
|
||||
A workspace and set quick load.
|
||||
"""
|
||||
try:
|
||||
credential = DefaultAzureCredential()
|
||||
# Check if given credential can get token successfully.
|
||||
credential.get_token("https://management.azure.com/.default")
|
||||
except Exception as ex:
|
||||
# Fall back to InteractiveBrowserCredential
|
||||
# in case DefaultAzureCredential not working
|
||||
print(ex)
|
||||
credential = InteractiveBrowserCredential()
|
||||
|
||||
# Get a handle to the workspace.
|
||||
# You can find the info on the workspace tab on ml.azure.com
|
||||
ml_client = MLClient(
|
||||
credential=credential,
|
||||
subscription_id=XXXXX,
|
||||
resource_group_name=XXXXX,
|
||||
workspace_name=XXXXX,
|
||||
)
|
||||
return ml_client
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
ml_client = create_or_load_ml_client()
|
||||
print(ml_client)
|
37
azuremlpythonsdk-v2/setup.cfg
Normal file
37
azuremlpythonsdk-v2/setup.cfg
Normal file
|
@ -0,0 +1,37 @@
|
|||
[flake8]
|
||||
ignore = E203, W503
|
||||
max-line-length = 99
|
||||
max-complexity = 18
|
||||
select = B,C,E,F,W,T4
|
||||
|
||||
[isort]
|
||||
multi_line_output=3
|
||||
include_trailing_comma=True
|
||||
force_grid_wrap=0
|
||||
use_parentheses=True
|
||||
ensure_newline_before_comments=True
|
||||
line_length=99
|
||||
|
||||
[mypy]
|
||||
files=refactor,tests
|
||||
ignore_missing_imports=True
|
||||
|
||||
[coverage:run]
|
||||
source = refactor
|
||||
|
||||
[coverage:report]
|
||||
exclude_lines =
|
||||
# exclude pragma again
|
||||
pragma: no cover
|
||||
|
||||
# exclude main
|
||||
if __name__ == .__main__.:
|
||||
|
||||
[coverage:html]
|
||||
directory = coverage
|
||||
|
||||
[coverage:xml]
|
||||
output = coverage.xml
|
||||
|
||||
[tool:pytest]
|
||||
testpaths=tests/
|
Loading…
Add table
Add a link
Reference in a new issue