Init and have all packages required

2024-09-04 10:15:43 +02:00 · 2024-09-04 10:15:43 +02:00 · 782aba19ba
commit 782aba19ba
53 changed files with 21896 additions and 0 deletions
--- a/.envrc
+++ b/.envrc
@ -0,0 +1 @@
 use flake
--- a/Azure_ML-2.pptx
+++ b/Azure_ML-2.pptx
--- a/README.md
+++ b/README.md
@ -0,0 +1,51 @@
 # Azure ML Lesson 2
 ## How to install all the tools in a nutshell.
 A host running **Ubuntu 22.04** is expected. If you have a Windows system or Mac, download Virtualbox and setup a VM or WSL2 with Ubuntu 22.04.
 **Anaconda/Miniconda** must be installed. See [here](https://docs.docker.com/desktop/install/windows-install/) and [here](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html), respectively.
 Run the following commands to install Azure CLI:
 ```bash
 curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
 ```
 Configure the Azure CLI
 ```bash
 az login
 ```
 Install Azure ML CLI
 ```bash
 az extension add -n ml -y
 ```
 Create a Conda environment to work with Azure in. At this moment, there are [problems with Python 3.9](https://github.com/Azure/MachineLearningNotebooks/issues/1285), so use Python 3.12.
 ```bash
 conda create --name azure_ml -y python=3.12 pip
 conda activate azure_ml
 # Install linting, formatting and additional libraries
 pip install flake8 black isort joblib azure-ai-ml azure-identity
 ```
 You now have a Conda environment called `azure_ml` containing the AzureML SDK.
 Install Visual Studio Code as shown [here](https://code.visualstudio.com/download).
 Once you've installed VS Code, configure its plugins and tell it to use the
 Python interpreter with the `azure_ml` environment.
 - Run VS Code and install vscode-icons, python, Code Spell Checker and Azure Machine Learning extensions
 - Go into Azure Machine Learning and log in. Check that you have access to your workspace.
 - Install Flake8, Black formatter and isort Microsoft extensions.
 - Select a Python interpreter
  - Python is an interpreted language, and in order to run Python code and get Python IntelliSense, you must tell VS Code which interpreter to use.
  - From within VS Code, select a Python 3 interpreter by opening the Command
    Palette (Ctrl+Shift+P) and searching for: `Python: Select Interpreter`...
  - ... then select the environment named `azure_ml`.
--- a/azuremlpythonsdk-v2/README.md
+++ b/azuremlpythonsdk-v2/README.md
@ -0,0 +1,118 @@
 # Azure ML Lesson 2 Lab
 ## 1. Set environmental variables
 1. Run VS Code in a Azure ML remote instance as shown before.
 2. Press `File > Open Folder` and navigate to `azuremlpythonsdk-v2/` to open the exercise.
 **IMPORTANT** Relative paths are assumed to be initialized from the `azuremlpythonsdk-v2` folder.
 Open the file `initialize_constants.py`, there are three variables that should be updated:
 - AZURE_WORKSPACE_NAME
 - AZURE_RESOURCE_GROUP
 - AZURE_SUBSCRIPTION_ID
 Open your workspace at in `https://ml.azure.com`. At the top right, select the workspace name, then copy the workspace name, the subscription id and the resource name.
 ## 2. Load a workspace
 Open the file `ml_client.py` and understand how a ML client object is loaded or created. In this lab, the namespace was already created. Just fill the name of the variables from `initialize_constants.py`.
 When finished, run this file and check that it is executed without errors.
 ## 3. Load a Compute Cluster
 Open the file `compute_aml.py` and understand how a compute cluster is loaded or created. In this lab, the compute cluster was already created but some variables should be added, which are marked with `XXXX`.
 When finished, run this file and check that it is executed without errors.
 What would happen if the compute cluster is not present?
 ## 4. Create a tabular dataset
 Open the file `data_tabular.py` , several gaps should be filled which are marked with `XXXX`:
 1. `ml_client = XXXXX()`
   Hint: look into previous files.
 2. How can you get the names of the datasets already registered in `if name_dataset not in [XXXXX for env in ml_client.data.list()]`
   Hint: Try to get one object from the class [Data](https://learn.microsoft.com/en-us/python/api/azure-ai-ml/azure.ai.ml.entities.data?view=azure-python) and check their attributes.
 3. Which should be the `path` parameter in `path=XXXXX`?
 4. Which input should you give in `ml_client.data.create_or_update(XXXXX)`?
 When finished, run this file and check that it is executed without errors.
 ## 5. Create and register an environment
 Open the file `environment.py` , several gaps should be filled which are marked with `XXXX`:
 1. `ml_client = XXXXX()`
   Hint: look into previous files.
 2. Which class should be used to register the environment?
   Hint: Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-environments-v2?tabs=python)
 When finished, run this file and check that it is executed without errors.
 ## 6. Train a model from a tabular dataset using a remote compute
 Open the file `azml_01_experiment_remote_compute.py` , several gaps should be filled which are marked with `XXXX`:
 1. `ml_client = XXXX()`
   Hint: look into previous files.
 2. Complete the  `latest_version_dataset` definition.
   Hint:  Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/tutorial-azure-ml-in-a-day#deploy-the-model-to-the-endpoint)
 3. Complete the `Input` part.
   Hint:  Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-read-write-data-v2?tabs=python)
 4. Complete the `command` part.
   Hint:  Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-read-write-data-v2?tabs=python)
 When finished, run this file and check that it is executed without errors.
 ### 7. Tune hyperparameters using a remote compute
 Open the file `azml_02_hyperparameters_tuning.py` , several gaps should be filled which are marked with `XXXX`. The hyperparameter search should be defined in the following space:
 - learning_rate: one of the values 0.01, 0.1, 1.0
 - n_estimators: one of the values 10, 100
 Hint: Use the previous file as template.
 Hint: For the `Hyperdrive settings` format, look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-sweep-in-pipeline)
 Open the file `diabetes_hyperdrive/diabetes_training.py` , several gaps should be filled which are marked with `XXXX`. A Gradient Boosting classification model should be trained and the auc and the accuracy in the test set should be computed.
 Hint: Use as a template the file `data/diabetes_training.py`.
 When finished, run this file and check that it is executed without errors.
 ## 8. Create a real-time inferencing service
 Open the file `azml_03_realtime_inference.py` , several gaps should be filled which are marked with `XXXX`.
 Hint: Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-mlflow-models?tabs=fromjob%2Cmir%2Csdk)
 When finished, run this file and check that it is executed without errors.
 ## 9. Test the inference service
 Open the file `azml_04_test_inference.py` , several gaps should be filled which are marked with `XXXX`.
 Hint: Check [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-safely-rollout-online-endpoints?view=azureml-api-2&tabs=python)
--- a/azuremlpythonsdk-v2/azml_01_experiment_remote_compute.py
+++ b/azuremlpythonsdk-v2/azml_01_experiment_remote_compute.py
@ -0,0 +1,70 @@
 """
    Script to train a model from a tabular dataset using a remote compute
    Based on:
    https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-scikit-learn
 """
 from azure.ai.ml import Input, command
 from azure.ai.ml.constants import AssetTypes
 from compute_aml import create_or_load_aml
 from data_tabular import create_tabular_dataset, name_dataset
 from environment import custom_env_name
 from initialize_constants import AML_COMPUTE_NAME
 from ml_client import create_or_load_ml_client
 experiment_name = "mslearn-train-diabetes"
 experiment_folder = "./diabetes_training"
 script_name = "diabetes_training.py"
 registered_model_name = "diabetes_model"
 def main():
    # 1. Create or Load a ML client
    ml_client = XXXX()
    # 2. Create compute resources
    create_or_load_aml()
    # 3. Create and register a File Dataset
    create_tabular_dataset()
    latest_version_dataset = next(
        dataset.latest_version
        for dataset in ml_client.data.XXXX
        if dataset.name == name_dataset
    )
    # 4. Run Job
    job = command(
        inputs=dict(
            script_name=script_name,
            data=Input(
                type=AssetTypes.URI_FILE,
                # @latest doesn't work with dataset paths
                path=XXXX,
            ),
            registered_model_name=registered_model_name,
        ),
        code=experiment_folder,
        command=(
            "python ${{inputs.script_name}}"
            + " --data XXXX"
            + " --registered_model_name XXXX"
        ),
        environment=f"{custom_env_name}@latest",
        compute=AML_COMPUTE_NAME,
        experiment_name=experiment_name,
        display_name=experiment_name,
    )
    # submit the command
    returned_job = ml_client.jobs.create_or_update(job)
    # stream the output and wait until the job is finished
    ml_client.jobs.stream(returned_job.name)
    # refresh the latest status of the job after streaming
    returned_job = ml_client.jobs.get(name=returned_job.name)
 if __name__ == "__main__":
    main()
--- a/azuremlpythonsdk-v2/azml_02_hyperparameters_tuning.py
+++ b/azuremlpythonsdk-v2/azml_02_hyperparameters_tuning.py
@ -0,0 +1,113 @@
 """
    Script to train tune hyperparameters
    Based on:
    https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-scikit-learn
 """
 from azure.ai.ml import Input, command
 from azure.ai.ml.constants import AssetTypes
 from azure.ai.ml.entities import Model
 from azure.ai.ml.sweep import Choice
 from compute_aml import create_or_load_aml
 from data_tabular import create_tabular_dataset, name_dataset
 from environment import create_docker_environment, custom_env_name
 from initialize_constants import AML_COMPUTE_NAME
 from ml_client import create_or_load_ml_client
 experiment_folder = "diabetes_hyperdrive"
 experiment_name = "mslearn-diabetes-hyperdrive"
 script_name = "diabetes_training.py"
 registered_model_name = "diabetes_model_hyper"
 best_model_name = "best_diabetes_model"
 def main():
    # 1. Create or Load a ML client
    ml_client = XXXX()
    # 2. Create compute resources
    XXXX()
    # 3. Create and register a File Dataset
    XXXX()
    latest_version_dataset =  XXXX()
    # 4. Environment
    environment_names = [env.name for XXXX in ml_client.environments.list()]
    if custom_env_name not in environment_names:
        create_docker_environment()
    # 5. Run Job
    job_for_sweep = command(
        inputs=dict(
            script_name=script_name,
            data=Input(
                type=AssetTypes.URI_FILE,
                # @latest doesn't work with dataset paths
                path=f"azureml:{name_dataset}:{latest_version_dataset}",
            ),
            registered_model_name=registered_model_name,
            learning_rate=XXXX(values= XXXX),
            n_estimators=XXXX(values=XXXX),
        ),
        code=experiment_folder,
        command=(
            "python XXXX"
            + " --data XXXX"
            + " --registered_model_name XXXX"
            + " --learning_rate XXXX"
            + " --n_estimators XXXX"
        ),
        environment=XXXX,
        compute=AML_COMPUTE_NAME,
        experiment_name=experiment_name,
        display_name=experiment_name,
    )
    # Configure hyperdrive settings
    sweep_job = job_for_sweep.XXXX(
        compute=AML_COMPUTE_NAME,
        sampling_algorithm="grid",
        primary_metric="AUC",
        goal="Maximize",
        max_total_trials=6,
        max_concurrent_trials=2,
    )
    # submit the command
    returned_sweep_job = ml_client.create_or_update(sweep_job)
    # stream the output and wait until the job is finished
    ml_client.jobs.stream(returned_sweep_job.name)
    # refresh the latest status of the job after streaming
    returned_sweep_job = ml_client.jobs.get(name=returned_sweep_job.name)
    # Find and register the best model
    if returned_sweep_job.status == "Completed":
        # First let us get the run which gave us the best result
        best_run = returned_sweep_job.properties["best_child_run_id"]
        # lets get the model from this run
        model = Model(
            # the script stores the model as the given name
            path=(
                f"azureml://jobs/{best_run}/outputs/artifacts/paths/"
                + f"{registered_model_name}/"
            ),
            name=best_model_name,
            type="mlflow_model",
        )
    else:
        print(
            f"Sweep job status: {returned_sweep_job.status}. \
                Please wait until it completes"
        )
    # Register best model
    print(f"Registering Model {best_model_name}")
    ml_client.models.XXXX(model=model)
 if __name__ == "__main__":
    main()
--- a/azuremlpythonsdk-v2/azml_03_realtime_inference.py
+++ b/azuremlpythonsdk-v2/azml_03_realtime_inference.py
@ -0,0 +1,49 @@
 """
    Script to create a real-time inferencing service
    Based on:
    https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-mlflow-models
 """
 from azure.ai.ml.entities import ManagedOnlineDeployment, ManagedOnlineEndpoint
 from azml_02_hyperparameters_tuning import best_model_name
 from initialize_constants import AZURE_WORKSPACE_NAME, VM_SIZE
 from ml_client import create_or_load_ml_client
 online_endpoint_name = ("srv-" + AZURE_WORKSPACE_NAME).lower()
 def main():
    # 1. Create or Load a ML client
    ml_client = XXXX()
    # 2. Create a endpoint
    print(f"Creating endpoint {online_endpoint_name}")
    endpoint = XXXX(
        name=online_endpoint_name,
        auth_mode="key",
    )
    # Method `result()` should be added to wait until completion
    ml_client.online_endpoints.XXXX(endpoint).result()
    # 3. Create a deployment
    best_model_latest_version = XXXX
    blue_deployment = XXXX(
        name=online_endpoint_name,
        endpoint_name=online_endpoint_name,
        # @latest doesn't work with model paths
        model=XXXX,
        instance_type=VM_SIZE,
        instance_count=1,
    )
    # Assign all the traffic to this endpoint
    # Method `result()` should be added to wait until completion
    ml_client.begin_create_or_update(blue_deployment).result()
    endpoint.traffic = {online_endpoint_name: 100}
    ml_client.begin_create_or_update(endpoint).result()
 if __name__ == "__main__":
    main()
--- a/azuremlpythonsdk-v2/azml_04_test_inference.py
+++ b/azuremlpythonsdk-v2/azml_04_test_inference.py
@ -0,0 +1,23 @@
 """
    Script to use real-time inferencing with online endpoints
 """
 from azml_03_realtime_inference import online_endpoint_name
 from ml_client import create_or_load_ml_client
 def main():
    # 1. Load a Workspace
    ml_client = XXXX()
    # 2. Get predictions
    output = ml_client.online_endpoints.XXXX(
        endpoint_name=XXXX,
        deployment_name=online_endpoint_name,
        request_file="./diabetes_test_inference/request.json",
    )
    print(output)
 if __name__ == "__main__":
    main()
--- a/azuremlpythonsdk-v2/compute_aml.py
+++ b/azuremlpythonsdk-v2/compute_aml.py
@ -0,0 +1,63 @@
 """
    Script to initialize an Azure Machine Learning compute cluster (aml)
 """
 from azure.ai.ml.entities import AmlCompute
 from initialize_constants import AML_COMPUTE_NAME, MAX_NODES, MIN_NODES, VM_SIZE
 from ml_client import create_or_load_ml_client
 def create_or_load_aml(
    cpu_compute_target=AML_COMPUTE_NAME,
    vm_size=VM_SIZE,
    min_nodes=MIN_NODES,
    max_nodes=MAX_NODES,
 ):
    """Create or load an Azure Machine Learning compute cluster (aml) in a
        given Workspace.
    Args:
        cpu_compute_target: Name of the compute resource
        vm_size: Virtual machine size, VM_SIZE is used as default,
            for example STANDARD_D2_V2. Set to STANDARD_NC6 to get a GPU
        min_nodes: Minimal number of nodes, MIN_NODES is used as default.
        max_nodes: Minimal number of nodes, MIN_NODES is used as default.
    Returns:
        An aml and set quick load.
    """
    # Create or Load a Workspace
    ml_client = create_or_load_ml_client()
    try:
        # let's see if the compute target already exists
        cpu_cluster = ml_client.compute.get(XXXXX)
        print(
            f"You already have a cluster named {XXXXX},",
            "we'll reuse it.",
        )
    except Exception:
        print("Creating a new cpu compute target...")
        cpu_cluster = AmlCompute(
            name=cpu_compute_target,
            # Azure ML Compute is the on-demand VM service
            type="amlcompute",
            # VM Family
            size=vm_size,
            # Minimum running nodes when there is no job running
            min_instances=min_nodes,
            # Nodes in cluster
            max_instances=max_nodes,
            # How many seconds will the node running after the job termination
            idle_time_before_scale_down=180,
            # Dedicated or LowPriority.
            # The latter is cheaper but there is a chance of job termination
            tier="Dedicated",
        )
        # Now, we pass the object to MLClient's create_or_update method
        cpu_cluster = ml_client.compute.begin_create_or_update(XXXXX)
    return cpu_cluster
 if __name__ == "__main__":
    create_or_load_aml()
--- a/azuremlpythonsdk-v2/data/diabetes.csv
+++ b/azuremlpythonsdk-v2/data/diabetes.csv
--- a/azuremlpythonsdk-v2/data_tabular.py
+++ b/azuremlpythonsdk-v2/data_tabular.py
@ -0,0 +1,31 @@
 """
    Script to create and register file as an uri
 """
 from azure.ai.ml.constants import AssetTypes
 from azure.ai.ml.entities import Data
 from ml_client import create_or_load_ml_client
 name_dataset = "diabetes-dataset"
 data_folder = "./data/diabetes.csv"
 def create_tabular_dataset():
    # 1. Create or Load a ML client
    ml_client = XXXXX()
    # 2. Add files
    if name_dataset not in [XXXXX for env in ml_client.data.list()]:
        tab_data_set = Data(
            path=XXXXX,
            type=AssetTypes.URI_FILE,
            name=name_dataset,
        )
        ml_client.data.create_or_update(XXXXX)
    else:
        print("Dataset already registered.")
 if __name__ == "__main__":
    create_tabular_dataset()
--- a/azuremlpythonsdk-v2/dependencies/conda.yml
+++ b/azuremlpythonsdk-v2/dependencies/conda.yml
@ -0,0 +1,11 @@
 name: model-env
 dependencies:
  - python=3.8
  - scikit-learn
  - pandas
  - numpy
  - matplotlib
  - pip
  - pip:
    - mlflow
    - azureml-mlflow
--- a/azuremlpythonsdk-v2/diabetes_hyperdrive/diabetes_training.py
+++ b/azuremlpythonsdk-v2/diabetes_hyperdrive/diabetes_training.py
@ -0,0 +1,123 @@
 # Import libraries
 import argparse
 import os
 import mlflow
 import mlflow.sklearn
 import numpy as np
 import pandas as pd
 from sklearn.ensemble import GradientBoostingClassifier
 from sklearn.metrics import roc_auc_score
 from sklearn.model_selection import train_test_split
 def main():
    """Main function of the script."""
    # Input and output arguments
    # Get script arguments
    parser = XXXX()
    # Input dataset
    parser.add_argument(
        "XXXX",
        type=str,
        help="path to input data",
    )
    # Model name
    parser.add_argument("XXXX", type=str, help="model name")
    # Hyperparameters
    parser.add_argument(
        "XXXX",
        type=float,
        dest="learning_rate",
        default=0.1,
        help="learning rate",
    )
    parser.add_argument(
        "XXXX",
        type=int,
        dest="n_estimators",
        default=100,
        help="number of estimators",
    )
    # Add arguments to args collection
    args = parser.parse_args()
    print(" ".join(f"{k}={v}" for k, v in vars(args).items()))
    # Start Logging
    mlflow.XXXX()
    # enable autologging
    mlflow.XXXX()
    # load the diabetes data (passed as an input dataset)
    print("input data:", args.data)
    diabetes = pd.read_csv(args.data)
    # Separate features and labels
    X, y = (
        diabetes[
            [
                "Pregnancies",
                "PlasmaGlucose",
                "DiastolicBloodPressure",
                "TricepsThickness",
                "SerumInsulin",
                "BMI",
                "DiabetesPedigree",
                "Age",
            ]
        ].values,
        diabetes["Diabetic"].values,
    )
    # Split data into training set and test set
    X_train, X_test, y_train, y_test = XXXX(
        X, y, test_size=0.30, random_state=0
    )
    # Train a Gradient Boosting classification model
    # with the specified hyperparameters
    print("Training a classification model")
    model = XXXX(
        learning_rate=XXXX, n_estimators=XXXX
    ).fit(X_train, y_train)
    # calculate accuracy
    y_hat = model.XXXX(X_test)
    accuracy = np.average(y_hat == y_test)
    print("Accuracy:", accuracy)
    mlflow.log_metric("Accuracy", float(accuracy))
    # calculate AUC
    y_scores = model.XXXX(X_test)
    auc = roc_auc_score(y_test, y_scores[:, 1])
    print("AUC: " + str(auc))
    mlflow.log_metric("AUC", float(auc))
    # Registering the model to the workspace
    print("Registering the model via MLFlow")
    mlflow.XXXX(
        sk_model=model,
        registered_model_name=args.registered_model_name,
        artifact_path=args.registered_model_name,
    )
    # Saving the model to a file
    mlflow.sklearn.save_model(
        sk_model=model,
        path=os.path.join(args.registered_model_name, "trained_model"),
    )
    # Stop Logging
    mlflow.XXXX()
 if __name__ == "__main__":
    main()
--- a/azuremlpythonsdk-v2/diabetes_test_inference/request.json
+++ b/azuremlpythonsdk-v2/diabetes_test_inference/request.json
@ -0,0 +1,4 @@
 {"input_data": [
    [2, 180, 74, 24, 21, 23.9091702, 1.488172308, 22],
    [0, 148, 58, 11, 179, 39.19207553, 0.160829008, 45]
 ]}
--- a/azuremlpythonsdk-v2/diabetes_training/diabetes_training.py
+++ b/azuremlpythonsdk-v2/diabetes_training/diabetes_training.py
@ -0,0 +1,115 @@
 # Import libraries
 import argparse
 import os
 import matplotlib.pyplot as plt
 import mlflow
 import mlflow.sklearn
 import numpy as np
 import pandas as pd
 from sklearn.metrics import roc_auc_score, roc_curve
 from sklearn.model_selection import train_test_split
 from sklearn.tree import DecisionTreeClassifier
 def main():
    """Main function of the script."""
    # Input and output arguments
    # Get script arguments
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--data",
        type=str,
        help="path to input data",
    )
    parser.add_argument("--registered_model_name", type=str, help="model name")
    args = parser.parse_args()
    print(" ".join(f"{k}={v}" for k, v in vars(args).items()))
    # Start Logging
    mlflow.start_run()
    # enable autologging
    mlflow.sklearn.autolog()
    # load the diabetes data (passed as an input dataset)
    print("input data:", args.data)
    diabetes = pd.read_csv(args.data)
    mlflow.log_metric("num_samples", diabetes.shape[0])
    mlflow.log_metric("num_features", diabetes.shape[1] - 1)
    # Separate features and labels
    X, y = (
        diabetes[
            [
                "Pregnancies",
                "PlasmaGlucose",
                "DiastolicBloodPressure",
                "TricepsThickness",
                "SerumInsulin",
                "BMI",
                "DiabetesPedigree",
                "Age",
            ]
        ].values,
        diabetes["Diabetic"].values,
    )
    # Split data into training set and test set
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.30, random_state=0
    )
    # Train a decision tree model
    print("Training a decision tree model")
    model = DecisionTreeClassifier().fit(X_train, y_train)
    # calculate accuracy
    y_hat = model.predict(X_test)
    accuracy = np.average(y_hat == y_test)
    print("Accuracy:", accuracy)
    mlflow.log_metric("Accuracy", float(accuracy))
    # calculate AUC
    y_scores = model.predict_proba(X_test)
    auc = roc_auc_score(y_test, y_scores[:, 1])
    print("AUC: " + str(auc))
    mlflow.log_metric("AUC", float(auc))
    # plot ROC curve
    fpr, tpr, thresholds = roc_curve(y_test, y_scores[:, 1])
    fig = plt.figure(figsize=(6, 4))
    # Plot the diagonal 50% line
    plt.plot([0, 1], [0, 1], "k--")
    # Plot the FPR and TPR achieved by our model
    plt.plot(fpr, tpr)
    plt.xlabel("False Positive Rate")
    plt.ylabel("True Positive Rate")
    plt.title("ROC Curve")
    fig.savefig("ROC.png")
    mlflow.log_artifact("ROC.png")
    plt.show()
    # Registering the model to the workspace
    print("Registering the model via MLFlow")
    mlflow.sklearn.log_model(
        sk_model=model,
        registered_model_name=args.registered_model_name,
        artifact_path=args.registered_model_name,
    )
    # Saving the model to a file
    mlflow.sklearn.save_model(
        sk_model=model,
        path=os.path.join(args.registered_model_name, "trained_model"),
    )
    # Stop Logging
    mlflow.end_run()
 if __name__ == "__main__":
    main()
--- a/azuremlpythonsdk-v2/environment.py
+++ b/azuremlpythonsdk-v2/environment.py
@ -0,0 +1,33 @@
 """
    Script to create and register an environment including SKlearn
 """
 import os
 from azure.ai.ml.entities import Environment
 from ml_client import create_or_load_ml_client
 dependencies_dir = "./dependencies"
 custom_env_name = "custom-scikit-learn"
 def create_docker_environment():
    # 1. Create or Load a ML client
    ml_client =  XXXXX()
    # 2. Create a Python environment for the experiment
    env_docker_image = XXXXX(
        name=custom_env_name,
        conda_file=os.path.join(dependencies_dir, "XXXXX"),
        image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04:latest",
    )
    ml_client.environments.create_or_update(env_docker_image)
    print(
        f"Environment with name {env_docker_image.name} is registered to the workspace,",
        f"the environment version is {env_docker_image.version}"
    )
 if __name__ == "__main__":
    create_docker_environment()
--- a/azuremlpythonsdk-v2/initialize_constants.py
+++ b/azuremlpythonsdk-v2/initialize_constants.py
@ -0,0 +1,23 @@
 """
    Script to initialize global constants
 """
 import os
 # Global constants can be set via environmental variables
 # Remove default values in production
 AZURE_RESOURCE_GROUP = os.getenv("AZURE_RESOURCE_GROUP", "itvitae-azure-ml")
 AZURE_SUBSCRIPTION_ID = os.getenv(
    "AZURE_SUBSCRIPTION_ID", "34faeead-244d-4ae8-8194-1eeaaffaf5be"
 )
 AZURE_WORKSPACE_NAME = os.getenv(
    "AZURE_WORKSPACE_NAME",
    "ws-kevin-heimbach",
 )
 AZURE_LOCATION = os.getenv("AZURE_LOCATION", "westeurope")
 # Choose names for your clusters
 AML_COMPUTE_NAME = os.getenv("AML_COMPUTE_NAME", "aml-compute")
 # General Servers Characteristics
 VM_SIZE = os.getenv("VM_SIZE", "STANDARD_DS2_V2")
 MIN_NODES = int(os.getenv("MIN_NODES", 0))
 MAX_NODES = int(os.getenv("MAX_NODES", 1))
 AGENT_COUNT = int(os.getenv("AGENT_COUNT", 2))
--- a/azuremlpythonsdk-v2/ml_client.py
+++ b/azuremlpythonsdk-v2/ml_client.py
@ -0,0 +1,46 @@
 """
    Script to initialize MLClient object
 """
 from azure.ai.ml import MLClient
 from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
 from initialize_constants import (
    AZURE_RESOURCE_GROUP,
    AZURE_SUBSCRIPTION_ID,
    AZURE_WORKSPACE_NAME,
 )
 def create_or_load_ml_client():
    """Create or load an Azure ML Client based on env variables.
    Args:
        None since information is taken from global constants
            defined in initialize_constants.py.
    Returns:
        A workspace and set quick load.
    """
    try:
        credential = DefaultAzureCredential()
        # Check if given credential can get token successfully.
        credential.get_token("https://management.azure.com/.default")
    except Exception as ex:
        # Fall back to InteractiveBrowserCredential
        # in case DefaultAzureCredential not working
        print(ex)
        credential = InteractiveBrowserCredential()
    # Get a handle to the workspace.
    # You can find the info on the workspace tab on ml.azure.com
    ml_client = MLClient(
        credential=credential,
        subscription_id=XXXXX,
        resource_group_name=XXXXX,
        workspace_name=XXXXX,
    )
    return ml_client
 if __name__ == "__main__":
    ml_client = create_or_load_ml_client()
    print(ml_client)
--- a/azuremlpythonsdk-v2/setup.cfg
+++ b/azuremlpythonsdk-v2/setup.cfg
@ -0,0 +1,37 @@
 [flake8]
 ignore = E203, W503
 max-line-length = 99
 max-complexity = 18
 select = B,C,E,F,W,T4
 [isort]
 multi_line_output=3
 include_trailing_comma=True
 force_grid_wrap=0
 use_parentheses=True
 ensure_newline_before_comments=True
 line_length=99
 [mypy]
 files=refactor,tests
 ignore_missing_imports=True
 [coverage:run]
 source = refactor
 [coverage:report]
 exclude_lines =
    # exclude pragma again
    pragma: no cover
    # exclude main
    if __name__ == .__main__.:
 [coverage:html]
 directory = coverage
 [coverage:xml]
 output = coverage.xml
 [tool:pytest]
 testpaths=tests/
--- a/flake.lock
+++ b/flake.lock
@ -0,0 +1,23 @@
 {
  "nodes": {
    "nixpkgs": {
      "locked": {
        "lastModified": 1717196966,
        "narHash": "sha256-yZKhxVIKd2lsbOqYd5iDoUIwsRZFqE87smE2Vzf6Ck0=",
        "type": "tarball",
        "url": "https://flakehub.com/f/NixOS/nixpkgs/0.1.%2A.tar.gz"
      },
      "original": {
        "type": "tarball",
        "url": "https://flakehub.com/f/NixOS/nixpkgs/0.1.%2A.tar.gz"
      }
    },
    "root": {
      "inputs": {
        "nixpkgs": "nixpkgs"
      }
    }
  },
  "root": "root",
  "version": 7
 }
--- a/flake.nix
+++ b/flake.nix
@ -0,0 +1,46 @@
 {
  description = "A Nix-flake-based Jupyter development environment";
  inputs.nixpkgs.url = "https://flakehub.com/f/NixOS/nixpkgs/0.1.*.tar.gz";
  outputs = {
    self,
    nixpkgs,
  }: let
    supportedSystems = ["x86_64-linux" "aarch64-linux" "x86_64-darwin" "aarch64-darwin"];
    forEachSupportedSystem = f:
      nixpkgs.lib.genAttrs supportedSystems (system:
        f {
          pkgs = import nixpkgs {inherit system;};
        });
  in {
    devShells = forEachSupportedSystem ({pkgs}: {
      default = pkgs.mkShell {
        venvDir = "venv";
        packages = with pkgs;
          [python311 virtualenv]
          ++ (with pkgs.python311Packages; [
            pip
            python-lsp-server
            venvShellHook
            requests
            jupyter
            pandas
            numpy
            matplotlib
            mlflow
            seaborn
            scikit-learn
            plotnine
            arrow
            polars
            pyarrow
            ydata-profiling
            pydot
            graphviz
            (python311.pkgs.callPackage ./pkgs/azureml-mlflow/default.nix {})
          ]);
      };
    });
  };
 }
--- a/pkgs/azureml-mlflow/default.nix
+++ b/pkgs/azureml-mlflow/default.nix
@ -0,0 +1,33 @@
 {
  lib,
  buildPythonPackage,
  fetchPypi,
  setuptools,
  python311,
 }:
 buildPythonPackage rec {
  pname = "azureml_mlflow";
  version = "1.57.0.post1";
  format = "wheel";
  src = fetchPypi {
    inherit pname version format;
    sha256 = "sha256-uK7vQR9aQjXUQ9RXGXY5o7pPMg5ZmMfqbDt0GTfwx6k=";
    dist = "py3";
    python = "py3";
  };
  nativeBuildInputs = [setuptools];
  propagatedBuildInputs = [
  ];
  doCheck = false; # Package does not contain tests
  meta = with lib; {
    description = "The azureml-mlflow package contains the integration code of AzureML with MLflow. MLflow (https://mlflow.org/) is an open-source platform for tracking machine learning experiments and managing models. You can use MLflow logging APIs with Azure Machine Learning so that metrics and artifacts are logged to your Azure machine learning workspace.";
    homepage = "https://docs.microsoft.com/python/api/overview/azure/ml/?view=azure-ml-py";
    license = licenses.mit;
    maintainers = with maintainers; [Lillian-Violet];
  };
 }
--- a/solution-v2/README.md
+++ b/solution-v2/README.md
@ -0,0 +1,124 @@
 # Azure ML Lesson 2 Lab
 ## 1. Set environmental variables
 1. Run VS Code in a Azure ML remote instance as shown before.
 2. Press `File > Open Folder` and navigate to `azuremlpythonsdk-v2/` to open the exercise.
 **IMPORTANT** Relative paths are assumed to be initialized from the `azuremlpythonsdk-v2` folder.
 Open the file `initialize_constants.py`, there are three variables that should be updated:
 - AZURE_WORKSPACE_NAME
 - AZURE_RESOURCE_GROUP
 - AZURE_SUBSCRIPTION_ID
 Open your workspace at in `https://ml.azure.com`. At the top right, select the workspace name, then copy the workspace name, the subscription id and the resource name.
 ## 2. Load a workspace
 Open the file `ml_client.py` and understand how a ML client object is loaded or created. In this lab, the namespace was already created. Just fill the name of the variables from `initialize_constants.py`.
 When finished, run this file and check that it is executed without errors.
 ## 3. Load a Compute Cluster
 Open the file `compute_aml.py` and understand how a compute cluster is loaded or created. In this lab, the compute cluster was already created but some variables should be added, which are marked with `XXXX`.
 When finished, run this file and check that it is executed without errors.
 What would happen if the compute cluster is not present?
 ## 4. Create a tabular dataset
 Open the file `data_tabular.py` , several gaps should be filled which are marked with `XXXX`:
 1. `ml_client = XXXXX()`
   Hint: look into previous files.
 2. How can you get the names of the datasets already registered in `if name_dataset not in [XXXXX for env in ml_client.data.list()]`
   Hint: Try to get one object from the class [Data](https://learn.microsoft.com/en-us/python/api/azure-ai-ml/azure.ai.ml.entities.data?view=azure-python) and check their attributes.
 3. Which should be the `path` parameter in `path=XXXXX`?
 4. Which input should you give in `ml_client.data.create_or_update(XXXXX)`?
 When finished, run this file and check that it is executed without errors.
 ## 5. Create and register an environment
 Open the file `environment.py` , several gaps should be filled which are marked with `XXXX`:
 1. `ml_client = XXXXX()`
   Hint: look into previous files.
 2. Get a list of environments already registered and modify the following:
 `env_list = XXXXX`
   Hint: look into previous files.
 3. Which class should be used to register the environment?
   Hint: Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-environments-v2?tabs=python)
 When finished, run this file and check that it is executed without errors.
 ## 6. Train a model from a tabular dataset using a remote compute
 Open the file `azml_01_experiment_remote_compute.py` , several gaps should be filled which are marked with `XXXX`:
 1. `ml_client = XXXX()`
   Hint: look into previous files.
 2. Complete the  `latest_version_dataset` definition.
   Hint:  Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/tutorial-azure-ml-in-a-day#deploy-the-model-to-the-endpoint)
 3. Complete the `Input` part.
   Hint:  Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-read-write-data-v2?tabs=python)
 4. Complete the `command` part.
   Hint:  Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-read-write-data-v2?tabs=python)
 When finished, run this file and check that it is executed without errors.
 ### 7. Tune hyperparameters using a remote compute
 Open the file `azml_02_hyperparameters_tuning.py` , several gaps should be filled which are marked with `XXXX`. The hyperparameter search should be defined in the following space:
 - learning_rate: one of the values 0.01, 0.1, 1.0
 - n_estimators: one of the values 10, 100
 Hint: Use the previous file as template.
 Hint: For the `Hyperdrive settings` format, look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-sweep-in-pipeline)
 Open the file `diabetes_hyperdrive/diabetes_training.py` , several gaps should be filled which are marked with `XXXX`. A Gradient Boosting classification model should be trained and the auc and the accuracy in the test set should be computed.
 Hint: Use as a template the file `data/diabetes_training.py`.
 When finished, run this file and check that it is executed without errors.
 ## 8. Create a real-time inferencing service
 Open the file `azml_03_realtime_inference.py` , several gaps should be filled which are marked with `XXXX`.
 Hint: Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-mlflow-models?tabs=fromjob%2Cmir%2Csdk)
 When finished, run this file and check that it is executed without errors.
 ## 9. Test the inference service
 Open the file `azml_04_test_inference.py` , several gaps should be filled which are marked with `XXXX`.
 Hint: Check [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-safely-rollout-online-endpoints?view=azureml-api-2&tabs=python)
--- a/solution-v2/pycache/azml_02_hyperparameters_tuning.cpython-312.pyc
+++ b/solution-v2/pycache/azml_02_hyperparameters_tuning.cpython-312.pyc
--- a/solution-v2/pycache/azml_02_hyperparameters_tuning.cpython-38.pyc
+++ b/solution-v2/pycache/azml_02_hyperparameters_tuning.cpython-38.pyc
--- a/solution-v2/pycache/azml_03_realtime_inference.cpython-312.pyc
+++ b/solution-v2/pycache/azml_03_realtime_inference.cpython-312.pyc
--- a/solution-v2/pycache/azml_03_realtime_inference.cpython-38.pyc
+++ b/solution-v2/pycache/azml_03_realtime_inference.cpython-38.pyc
--- a/solution-v2/pycache/compute_aml.cpython-312.pyc
+++ b/solution-v2/pycache/compute_aml.cpython-312.pyc
--- a/solution-v2/pycache/compute_aml.cpython-38.pyc
+++ b/solution-v2/pycache/compute_aml.cpython-38.pyc
--- a/solution-v2/pycache/data_tabular.cpython-312.pyc
+++ b/solution-v2/pycache/data_tabular.cpython-312.pyc
--- a/solution-v2/pycache/data_tabular.cpython-38.pyc
+++ b/solution-v2/pycache/data_tabular.cpython-38.pyc
--- a/solution-v2/pycache/environment.cpython-312.pyc
+++ b/solution-v2/pycache/environment.cpython-312.pyc
--- a/solution-v2/pycache/environment.cpython-38.pyc
+++ b/solution-v2/pycache/environment.cpython-38.pyc
--- a/solution-v2/pycache/initialize_constants.cpython-312.pyc
+++ b/solution-v2/pycache/initialize_constants.cpython-312.pyc
--- a/solution-v2/pycache/initialize_constants.cpython-38.pyc
+++ b/solution-v2/pycache/initialize_constants.cpython-38.pyc
--- a/solution-v2/pycache/ml_client.cpython-312.pyc
+++ b/solution-v2/pycache/ml_client.cpython-312.pyc
--- a/solution-v2/pycache/ml_client.cpython-38.pyc
+++ b/solution-v2/pycache/ml_client.cpython-38.pyc
--- a/solution-v2/azml_01_experiment_remote_compute.py
+++ b/solution-v2/azml_01_experiment_remote_compute.py
@ -0,0 +1,70 @@
 """
    Script to train a model from a tabular dataset using a remote compute
    Based on:
    https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-scikit-learn
 """
 from azure.ai.ml import Input, command
 from azure.ai.ml.constants import AssetTypes
 from compute_aml import create_or_load_aml
 from data_tabular import create_tabular_dataset, name_dataset
 from environment import custom_env_name
 from initialize_constants import AML_COMPUTE_NAME
 from ml_client import create_or_load_ml_client
 experiment_name = "mslearn-train-diabetes"
 experiment_folder = "./diabetes_training"
 script_name = "diabetes_training.py"
 registered_model_name = "diabetes_model"
 def main():
    # 1. Create or Load a ML client
    ml_client = create_or_load_ml_client()
    # 2. Create compute resources
    create_or_load_aml()
    # 3. Create and register a File Dataset
    create_tabular_dataset()
    latest_version_dataset = next(
        dataset.latest_version
        for dataset in ml_client.data.list()
        if dataset.name == name_dataset
    )
    print(list(ml_client.data.list()))
    # 4. Run Job
    job = command(
        inputs=dict(
            script_name=script_name,
            data=Input(
                type=AssetTypes.URI_FILE,
                # @latest doesn't work with dataset paths
                path=f"azureml:{name_dataset}:{latest_version_dataset}",
            ),
            registered_model_name=registered_model_name,
        ),
        code=experiment_folder,
        command=(
            "python ${{inputs.script_name}}"
            + " --data ${{inputs.data}}"
            + " --registered_model_name ${{inputs.registered_model_name}}"
        ),
        environment=f"{custom_env_name}@latest",
        compute=AML_COMPUTE_NAME,
        experiment_name=experiment_name,
        display_name=experiment_name,
    )
    # submit the command
    returned_job = ml_client.jobs.create_or_update(job)
    # stream the output and wait until the job is finished
    ml_client.jobs.stream(returned_job.name)
    # refresh the latest status of the job after streaming
    returned_job = ml_client.jobs.get(name=returned_job.name)
 if __name__ == "__main__":
    main()
--- a/solution-v2/azml_02_hyperparameters_tuning.py
+++ b/solution-v2/azml_02_hyperparameters_tuning.py
@ -0,0 +1,115 @@
 """
    Script to train tune hyperparameters
    Based on:
    https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-scikit-learn
 """
 from azure.ai.ml import Input, command
 from azure.ai.ml.constants import AssetTypes
 from azure.ai.ml.entities import Model
 from azure.ai.ml.sweep import Choice
 from compute_aml import create_or_load_aml
 from data_tabular import create_tabular_dataset, name_dataset
 from environment import create_docker_environment, custom_env_name
 from initialize_constants import AML_COMPUTE_NAME
 from ml_client import create_or_load_ml_client
 experiment_folder = "diabetes_hyperdrive"
 experiment_name = "mslearn-diabetes-hyperdrive"
 script_name = "diabetes_training.py"
 registered_model_name = "diabetes_model_hyper"
 best_model_name = "best_diabetes_model"
 def main():
    # 1. Create or Load a ML client
    ml_client = create_or_load_ml_client()
    # 2. Create compute resources
    create_or_load_aml()
    # 3. Create and register a File Dataset
    create_tabular_dataset()
    latest_version_dataset = max(
        [int(d.version) for d in ml_client.data.list(name=name_dataset)]
    )
    # 4. Environment
    environment_names = [env.name for env in ml_client.environments.list()]
    if custom_env_name not in environment_names:
        create_docker_environment()
    # 5. Run Job
    job_for_sweep = command(
        inputs=dict(
            script_name=script_name,
            data=Input(
                type=AssetTypes.URI_FILE,
                # @latest doesn't work with dataset paths
                path=f"azureml:{name_dataset}:{latest_version_dataset}",
            ),
            registered_model_name=registered_model_name,
            learning_rate=Choice(values=[0.01, 0.1, 1.0]),
            n_estimators=Choice(values=[10, 100]),
        ),
        code=experiment_folder,
        command=(
            "python ${{inputs.script_name}}"
            + " --data ${{inputs.data}}"
            + " --registered_model_name ${{inputs.registered_model_name}}"
            + " --learning_rate ${{inputs.learning_rate}}"
            + " --n_estimators ${{inputs.n_estimators}}"
        ),
        environment=f"{custom_env_name}@latest",
        compute=AML_COMPUTE_NAME,
        experiment_name=experiment_name,
        display_name=experiment_name,
    )
    # Configure hyperdrive settings
    sweep_job = job_for_sweep.sweep(
        compute=AML_COMPUTE_NAME,
        sampling_algorithm="grid",
        primary_metric="AUC",
        goal="Maximize",
        max_total_trials=6,
        max_concurrent_trials=2,
    )
    # submit the command
    returned_sweep_job = ml_client.create_or_update(sweep_job)
    # stream the output and wait until the job is finished
    ml_client.jobs.stream(returned_sweep_job.name)
    # refresh the latest status of the job after streaming
    returned_sweep_job = ml_client.jobs.get(name=returned_sweep_job.name)
    # Find and register the best model
    if returned_sweep_job.status == "Completed":
        # First let us get the run which gave us the best result
        best_run = returned_sweep_job.properties["best_child_run_id"]
        # lets get the model from this run
        model = Model(
            # the script stores the model as the given name
            path=(
                f"azureml://jobs/{best_run}/outputs/artifacts/paths/"
                + f"{registered_model_name}/"
            ),
            name=best_model_name,
            type="mlflow_model",
        )
    else:
        print(
            f"Sweep job status: {returned_sweep_job.status}. \
                Please wait until it completes"
        )
    # Register best model
    print(f"Registering Model {best_model_name}")
    ml_client.models.create_or_update(model=model)
 if __name__ == "__main__":
    main()
--- a/solution-v2/azml_03_realtime_inference.py
+++ b/solution-v2/azml_03_realtime_inference.py
@ -0,0 +1,51 @@
 """
    Script to create a real-time inferencing service
    Based on:
    https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-mlflow-models
 """
 from azure.ai.ml.entities import ManagedOnlineDeployment, ManagedOnlineEndpoint
 from azml_02_hyperparameters_tuning import best_model_name
 from initialize_constants import AZURE_WORKSPACE_NAME, VM_SIZE
 from ml_client import create_or_load_ml_client
 online_endpoint_name = ("srv-" + AZURE_WORKSPACE_NAME).lower()
 def main():
    # 1. Create or Load a ML client
    ml_client = create_or_load_ml_client()
    # 2. Create a endpoint
    print(f"Creating endpoint {online_endpoint_name}")
    endpoint = ManagedOnlineEndpoint(
        name=online_endpoint_name,
        auth_mode="key",
    )
    # Method `result()` should be added to wait until completion
    ml_client.online_endpoints.begin_create_or_update(endpoint).result()
    # 3. Create a deployment
    best_model_latest_version = max(
        [int(m.version) for m in ml_client.models.list(name=best_model_name)]
    )
    blue_deployment = ManagedOnlineDeployment(
        name=online_endpoint_name,
        endpoint_name=online_endpoint_name,
        # @latest doesn't work with model paths
        model=f"azureml:{best_model_name}:{best_model_latest_version}",
        instance_type=VM_SIZE,
        instance_count=1,
    )
    # Assign all the traffic to this endpoint
    # Method `result()` should be added to wait until completion
    ml_client.begin_create_or_update(blue_deployment).result()
    endpoint.traffic = {online_endpoint_name: 100}
    ml_client.begin_create_or_update(endpoint).result()
 if __name__ == "__main__":
    main()
--- a/solution-v2/azml_04_test_inference.py
+++ b/solution-v2/azml_04_test_inference.py
@ -0,0 +1,23 @@
 """
    Script to use real-time inferencing with online endpoints
 """
 from azml_03_realtime_inference import online_endpoint_name
 from ml_client import create_or_load_ml_client
 def main():
    # 1. Load a Workspace
    ml_client = create_or_load_ml_client()
    # 2. Get predictions
    output = ml_client.online_endpoints.invoke(
        endpoint_name=online_endpoint_name,
        deployment_name=online_endpoint_name,
        request_file="./diabetes_test_inference/request.json",
    )
    print(output)
 if __name__ == "__main__":
    main()
--- a/solution-v2/compute_aml.py
+++ b/solution-v2/compute_aml.py
@ -0,0 +1,63 @@
 """
    Script to initialize an Azure Machine Learning compute cluster (aml)
 """
 from azure.ai.ml.entities import AmlCompute
 from initialize_constants import AML_COMPUTE_NAME, MAX_NODES, MIN_NODES, VM_SIZE
 from ml_client import create_or_load_ml_client
 def create_or_load_aml(
    cpu_compute_target=AML_COMPUTE_NAME,
    vm_size=VM_SIZE,
    min_nodes=MIN_NODES,
    max_nodes=MAX_NODES,
 ):
    """Create or load an Azure Machine Learning compute cluster (aml) in a
        given Workspace.
    Args:
        cpu_compute_target: Name of the compute resource
        vm_size: Virtual machine size, VM_SIZE is used as default,
            for example STANDARD_D2_V2. Set to STANDARD_NC6 to get a GPU
        min_nodes: Minimal number of nodes, MIN_NODES is used as default.
        max_nodes: Minimal number of nodes, MIN_NODES is used as default.
    Returns:
        An aml and set quick load.
    """
    # Create or Load a Workspace
    ml_client = create_or_load_ml_client()
    try:
        # let's see if the compute target already exists
        cpu_cluster = ml_client.compute.get(cpu_compute_target)
        print(
            f"You already have a cluster named {cpu_compute_target},",
            "we'll reuse it.",
        )
    except Exception:
        print("Creating a new cpu compute target...")
        cpu_cluster = AmlCompute(
            name=cpu_compute_target,
            # Azure ML Compute is the on-demand VM service
            type="amlcompute",
            # VM Family
            size=vm_size,
            # Minimum running nodes when there is no job running
            min_instances=min_nodes,
            # Nodes in cluster
            max_instances=max_nodes,
            # How many seconds will the node running after the job termination
            idle_time_before_scale_down=180,
            # Dedicated or LowPriority.
            # The latter is cheaper but there is a chance of job termination
            tier="Dedicated",
        )
        # Now, we pass the object to MLClient's create_or_update method
        cpu_cluster = ml_client.compute.begin_create_or_update(cpu_cluster)
    return cpu_cluster
 if __name__ == "__main__":
    create_or_load_aml()
--- a/solution-v2/data/diabetes.csv
+++ b/solution-v2/data/diabetes.csv
--- a/solution-v2/data_tabular.py
+++ b/solution-v2/data_tabular.py
@ -0,0 +1,31 @@
 """
    Script to create and register file as an uri
 """
 from azure.ai.ml.constants import AssetTypes
 from azure.ai.ml.entities import Data
 from ml_client import create_or_load_ml_client
 name_dataset = "diabetes-dataset"
 data_folder = "./data/diabetes.csv"
 def create_tabular_dataset():
    # 1. Create or Load a ML client
    ml_client = create_or_load_ml_client()
    # 2. Add files
    if name_dataset not in [dataset.name for dataset in ml_client.data.list()]:
        tab_data_set = Data(
            path=data_folder,
            type=AssetTypes.URI_FILE,
            name=name_dataset,
        )
        ml_client.data.create_or_update(tab_data_set)
    else:
        print("Dataset already registered.")
 if __name__ == "__main__":
    create_tabular_dataset()
--- a/solution-v2/dependencies/conda.yml
+++ b/solution-v2/dependencies/conda.yml
@ -0,0 +1,11 @@
 name: model-env
 dependencies:
  - python=3.8
  - scikit-learn
  - pandas
  - numpy
  - matplotlib
  - pip
  - pip:
    - mlflow
    - azureml-mlflow
--- a/solution-v2/diabetes_hyperdrive/diabetes_training.py
+++ b/solution-v2/diabetes_hyperdrive/diabetes_training.py
@ -0,0 +1,123 @@
 # Import libraries
 import argparse
 import os
 import mlflow
 import mlflow.sklearn
 import numpy as np
 import pandas as pd
 from sklearn.ensemble import GradientBoostingClassifier
 from sklearn.metrics import roc_auc_score
 from sklearn.model_selection import train_test_split
 def main():
    """Main function of the script."""
    # Input and output arguments
    # Get script arguments
    parser = argparse.ArgumentParser()
    # Input dataset
    parser.add_argument(
        "--data",
        type=str,
        help="path to input data",
    )
    # Model name
    parser.add_argument("--registered_model_name", type=str, help="model name")
    # Hyperparameters
    parser.add_argument(
        "--learning_rate",
        type=float,
        dest="learning_rate",
        default=0.1,
        help="learning rate",
    )
    parser.add_argument(
        "--n_estimators",
        type=int,
        dest="n_estimators",
        default=100,
        help="number of estimators",
    )
    # Add arguments to args collection
    args = parser.parse_args()
    print(" ".join(f"{k}={v}" for k, v in vars(args).items()))
    # Start Logging
    mlflow.start_run()
    # enable autologging
    mlflow.sklearn.autolog()
    # load the diabetes data (passed as an input dataset)
    print("input data:", args.data)
    diabetes = pd.read_csv(args.data)
    # Separate features and labels
    X, y = (
        diabetes[
            [
                "Pregnancies",
                "PlasmaGlucose",
                "DiastolicBloodPressure",
                "TricepsThickness",
                "SerumInsulin",
                "BMI",
                "DiabetesPedigree",
                "Age",
            ]
        ].values,
        diabetes["Diabetic"].values,
    )
    # Split data into training set and test set
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.30, random_state=0
    )
    # Train a Gradient Boosting classification model
    # with the specified hyperparameters
    print("Training a classification model")
    model = GradientBoostingClassifier(
        learning_rate=args.learning_rate, n_estimators=args.n_estimators
    ).fit(X_train, y_train)
    # calculate accuracy
    y_hat = model.predict(X_test)
    accuracy = np.average(y_hat == y_test)
    print("Accuracy:", accuracy)
    mlflow.log_metric("Accuracy", float(accuracy))
    # calculate AUC
    y_scores = model.predict_proba(X_test)
    auc = roc_auc_score(y_test, y_scores[:, 1])
    print("AUC: " + str(auc))
    mlflow.log_metric("AUC", float(auc))
    # Registering the model to the workspace
    print("Registering the model via MLFlow")
    mlflow.sklearn.log_model(
        sk_model=model,
        registered_model_name=args.registered_model_name,
        artifact_path=args.registered_model_name,
    )
    # Saving the model to a file
    mlflow.sklearn.save_model(
        sk_model=model,
        path=os.path.join(args.registered_model_name, "trained_model"),
    )
    # Stop Logging
    mlflow.end_run()
 if __name__ == "__main__":
    main()
--- a/solution-v2/diabetes_test_inference/request.json
+++ b/solution-v2/diabetes_test_inference/request.json
@ -0,0 +1,4 @@
 {"input_data": [
    [2, 180, 74, 24, 21, 23.9091702, 1.488172308, 22],
    [0, 148, 58, 11, 179, 39.19207553, 0.160829008, 45]
 ]}
--- a/solution-v2/diabetes_training/diabetes_training.py
+++ b/solution-v2/diabetes_training/diabetes_training.py
@ -0,0 +1,115 @@
 # Import libraries
 import argparse
 import os
 import matplotlib.pyplot as plt
 import mlflow
 import mlflow.sklearn
 import numpy as np
 import pandas as pd
 from sklearn.metrics import roc_auc_score, roc_curve
 from sklearn.model_selection import train_test_split
 from sklearn.tree import DecisionTreeClassifier
 def main():
    """Main function of the script."""
    # Input and output arguments
    # Get script arguments
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--data",
        type=str,
        help="path to input data",
    )
    parser.add_argument("--registered_model_name", type=str, help="model name")
    args = parser.parse_args()
    print(" ".join(f"{k}={v}" for k, v in vars(args).items()))
    # Start Logging
    mlflow.start_run()
    # enable autologging
    mlflow.sklearn.autolog()
    # load the diabetes data (passed as an input dataset)
    print("input data:", args.data)
    diabetes = pd.read_csv(args.data)
    mlflow.log_metric("num_samples", diabetes.shape[0])
    mlflow.log_metric("num_features", diabetes.shape[1] - 1)
    # Separate features and labels
    X, y = (
        diabetes[
            [
                "Pregnancies",
                "PlasmaGlucose",
                "DiastolicBloodPressure",
                "TricepsThickness",
                "SerumInsulin",
                "BMI",
                "DiabetesPedigree",
                "Age",
            ]
        ].values,
        diabetes["Diabetic"].values,
    )
    # Split data into training set and test set
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.30, random_state=0
    )
    # Train a decision tree model
    print("Training a decision tree model")
    model = DecisionTreeClassifier().fit(X_train, y_train)
    # calculate accuracy
    y_hat = model.predict(X_test)
    accuracy = np.average(y_hat == y_test)
    print("Accuracy:", accuracy)
    mlflow.log_metric("Accuracy", float(accuracy))
    # calculate AUC
    y_scores = model.predict_proba(X_test)
    auc = roc_auc_score(y_test, y_scores[:, 1])
    print("AUC: " + str(auc))
    mlflow.log_metric("AUC", float(auc))
    # plot ROC curve
    fpr, tpr, thresholds = roc_curve(y_test, y_scores[:, 1])
    fig = plt.figure(figsize=(6, 4))
    # Plot the diagonal 50% line
    plt.plot([0, 1], [0, 1], "k--")
    # Plot the FPR and TPR achieved by our model
    plt.plot(fpr, tpr)
    plt.xlabel("False Positive Rate")
    plt.ylabel("True Positive Rate")
    plt.title("ROC Curve")
    fig.savefig("ROC.png")
    mlflow.log_artifact("ROC.png")
    plt.show()
    # Registering the model to the workspace
    print("Registering the model via MLFlow")
    mlflow.sklearn.log_model(
        sk_model=model,
        registered_model_name=args.registered_model_name,
        artifact_path=args.registered_model_name,
    )
    # Saving the model to a file
    mlflow.sklearn.save_model(
        sk_model=model,
        path=os.path.join(args.registered_model_name, "trained_model"),
    )
    # Stop Logging
    mlflow.end_run()
 if __name__ == "__main__":
    main()
--- a/solution-v2/environment.py
+++ b/solution-v2/environment.py
@ -0,0 +1,33 @@
 """
    Script to create and register an environment including SKlearn
 """
 import os
 from azure.ai.ml.entities import Environment
 from ml_client import create_or_load_ml_client
 dependencies_dir = "./dependencies"
 custom_env_name = "custom-scikit-learn"
 def create_docker_environment():
    # 1. Create or Load a ML client
    ml_client = create_or_load_ml_client()
    # 2. Create a Python environment for the experiment
    env_docker_image = Environment(
        name=custom_env_name,
        conda_file=os.path.join(dependencies_dir, "conda.yml"),
        image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04:latest",
    )
    ml_client.environments.create_or_update(env_docker_image)
    print(
        f"Environment with name {env_docker_image.name} is registered to the workspace,",
        f"the environment version is {env_docker_image.version}"
    )
 if __name__ == "__main__":
    create_docker_environment()
--- a/solution-v2/initialize_constants.py
+++ b/solution-v2/initialize_constants.py
@ -0,0 +1,23 @@
 """
    Script to initialize global constants
 """
 import os
 # Global constants can be set via environmental variables
 # Remove default values in production
 AZURE_RESOURCE_GROUP = os.getenv("AZURE_RESOURCE_GROUP", "itvitae-azure-ml")
 AZURE_SUBSCRIPTION_ID = os.getenv(
    "AZURE_SUBSCRIPTION_ID", "34faeead-244d-4ae8-8194-1eeaaffaf5be"
 )
 AZURE_WORKSPACE_NAME = os.getenv(
    "AZURE_WORKSPACE_NAME",
    "ws-angelsevillacamins",
 )
 AZURE_LOCATION = os.getenv("AZURE_LOCATION", "westeurope")
 # Choose names for your clusters
 AML_COMPUTE_NAME = os.getenv("AML_COMPUTE_NAME", "aml-compute")
 # General Servers Characteristics
 VM_SIZE = os.getenv("VM_SIZE", "STANDARD_DS2_V2")
 MIN_NODES = int(os.getenv("MIN_NODES", 0))
 MAX_NODES = int(os.getenv("MAX_NODES", 1))
 AGENT_COUNT = int(os.getenv("AGENT_COUNT", 2))
--- a/solution-v2/ml_client.py
+++ b/solution-v2/ml_client.py
@ -0,0 +1,46 @@
 """
    Script to initialize MLClient object
 """
 from azure.ai.ml import MLClient
 from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
 from initialize_constants import (
    AZURE_RESOURCE_GROUP,
    AZURE_SUBSCRIPTION_ID,
    AZURE_WORKSPACE_NAME,
 )
 def create_or_load_ml_client():
    """Create or load an Azure ML Client based on env variables.
    Args:
        None since information is taken from global constants
            defined in initialize_constants.py.
    Returns:
        A workspace and set quick load.
    """
    try:
        credential = DefaultAzureCredential()
        # Check if given credential can get token successfully.
        credential.get_token("https://management.azure.com/.default")
    except Exception as ex:
        # Fall back to InteractiveBrowserCredential
        # in case DefaultAzureCredential not working
        print(ex)
        credential = InteractiveBrowserCredential()
    # Get a handle to the workspace.
    # You can find the info on the workspace tab on ml.azure.com
    ml_client = MLClient(
        credential=credential,
        subscription_id=AZURE_SUBSCRIPTION_ID,
        resource_group_name=AZURE_RESOURCE_GROUP,
        workspace_name=AZURE_WORKSPACE_NAME,
    )
    return ml_client
 if __name__ == "__main__":
    ml_client = create_or_load_ml_client()
    print(ml_client)
--- a/solution-v2/setup.cfg
+++ b/solution-v2/setup.cfg
@ -0,0 +1,37 @@
 [flake8]
 ignore = E203, W503
 max-line-length = 99
 max-complexity = 18
 select = B,C,E,F,W,T4
 [isort]
 multi_line_output=3
 include_trailing_comma=True
 force_grid_wrap=0
 use_parentheses=True
 ensure_newline_before_comments=True
 line_length=99
 [mypy]
 files=refactor,tests
 ignore_missing_imports=True
 [coverage:run]
 source = refactor
 [coverage:report]
 exclude_lines =
    # exclude pragma again
    pragma: no cover
    # exclude main
    if __name__ == .__main__.:
 [coverage:html]
 directory = coverage
 [coverage:xml]
 output = coverage.xml
 [tool:pytest]
 testpaths=tests/
--- a/summary_outline.md
+++ b/summary_outline.md
@ -0,0 +1,12 @@
 # Azure ML 2
 During this lesson you will learn the fundamentals of Azure ML Python SDK. Specifically, it will be focus on version 2 (azure-ai-ml package). Azure ML is used in machine learning experiments to explore, prepare and manage not only data but also ML models. Additionally, cloud resources can be managed from the code itself (infrastructure as code, IaC) including monitoring and logging. Moreover, machine learning experiments and models can be organized using MLflow, which is incorporated in the version 2 of the Python SDK. Finally, this SDK is able to deploy web services to convert your trained models into RESTful services.
 The training includes theory and hands-on exercises. After this training you will have gained knowledge about:
 - Fundamentals of Azure ML SDK v2
 - Define workspaces, compute targets, datasets and environments using IaC
 - Azure ML best practices for model and data management
 - MLFlow
 - Hyperparameter tuning
 - Deploy models as online endpoints
 - Lab session to get hands-on experience with these tools