Init and have all packages required

2024-09-04 10:15:43 +02:00 · 2024-09-04 10:15:43 +02:00 · 782aba19ba
commit 782aba19ba
53 changed files with 21896 additions and 0 deletions
--- a/azuremlpythonsdk-v2/README.md
+++ b/azuremlpythonsdk-v2/README.md
@ -0,0 +1,118 @@
+# Azure ML Lesson 2 Lab
+
+## 1. Set environmental variables
+
+1. Run VS Code in a Azure ML remote instance as shown before.
+2. Press `File > Open Folder` and navigate to `azuremlpythonsdk-v2/` to open the exercise.
+
+**IMPORTANT** Relative paths are assumed to be initialized from the `azuremlpythonsdk-v2` folder.
+
+Open the file `initialize_constants.py`, there are three variables that should be updated:
+
+- AZURE_WORKSPACE_NAME
+
+- AZURE_RESOURCE_GROUP
+
+- AZURE_SUBSCRIPTION_ID
+
+Open your workspace at in `https://ml.azure.com`. At the top right, select the workspace name, then copy the workspace name, the subscription id and the resource name.
+
+## 2. Load a workspace
+
+Open the file `ml_client.py` and understand how a ML client object is loaded or created. In this lab, the namespace was already created. Just fill the name of the variables from `initialize_constants.py`.
+
+When finished, run this file and check that it is executed without errors.
+
+## 3. Load a Compute Cluster
+
+Open the file `compute_aml.py` and understand how a compute cluster is loaded or created. In this lab, the compute cluster was already created but some variables should be added, which are marked with `XXXX`.
+
+When finished, run this file and check that it is executed without errors.
+
+What would happen if the compute cluster is not present?
+
+## 4. Create a tabular dataset
+
+Open the file `data_tabular.py` , several gaps should be filled which are marked with `XXXX`:
+
+1. `ml_client = XXXXX()`
+
+   Hint: look into previous files.
+
+2. How can you get the names of the datasets already registered in `if name_dataset not in [XXXXX for env in ml_client.data.list()]`
+
+   Hint: Try to get one object from the class [Data](https://learn.microsoft.com/en-us/python/api/azure-ai-ml/azure.ai.ml.entities.data?view=azure-python) and check their attributes.
+
+3. Which should be the `path` parameter in `path=XXXXX`?
+
+4. Which input should you give in `ml_client.data.create_or_update(XXXXX)`?
+
+When finished, run this file and check that it is executed without errors.
+
+## 5. Create and register an environment
+
+Open the file `environment.py` , several gaps should be filled which are marked with `XXXX`:
+
+1. `ml_client = XXXXX()`
+
+   Hint: look into previous files.
+
+2. Which class should be used to register the environment?
+
+   Hint: Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-environments-v2?tabs=python)
+
+When finished, run this file and check that it is executed without errors.
+
+## 6. Train a model from a tabular dataset using a remote compute
+
+Open the file `azml_01_experiment_remote_compute.py` , several gaps should be filled which are marked with `XXXX`:
+
+1. `ml_client = XXXX()`
+
+   Hint: look into previous files.
+
+2. Complete the  `latest_version_dataset` definition.
+
+   Hint:  Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/tutorial-azure-ml-in-a-day#deploy-the-model-to-the-endpoint)
+
+3. Complete the `Input` part.
+
+   Hint:  Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-read-write-data-v2?tabs=python)
+
+4. Complete the `command` part.
+
+   Hint:  Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-read-write-data-v2?tabs=python)
+
+When finished, run this file and check that it is executed without errors.
+
+### 7. Tune hyperparameters using a remote compute
+
+Open the file `azml_02_hyperparameters_tuning.py` , several gaps should be filled which are marked with `XXXX`. The hyperparameter search should be defined in the following space:
+
+- learning_rate: one of the values 0.01, 0.1, 1.0
+
+- n_estimators: one of the values 10, 100
+
+Hint: Use the previous file as template.
+
+Hint: For the `Hyperdrive settings` format, look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-sweep-in-pipeline)
+
+Open the file `diabetes_hyperdrive/diabetes_training.py` , several gaps should be filled which are marked with `XXXX`. A Gradient Boosting classification model should be trained and the auc and the accuracy in the test set should be computed.
+
+Hint: Use as a template the file `data/diabetes_training.py`.
+
+When finished, run this file and check that it is executed without errors.
+
+## 8. Create a real-time inferencing service
+
+Open the file `azml_03_realtime_inference.py` , several gaps should be filled which are marked with `XXXX`.
+
+Hint: Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-mlflow-models?tabs=fromjob%2Cmir%2Csdk)
+
+When finished, run this file and check that it is executed without errors.
+
+## 9. Test the inference service
+
+Open the file `azml_04_test_inference.py` , several gaps should be filled which are marked with `XXXX`.
+
+Hint: Check [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-safely-rollout-online-endpoints?view=azureml-api-2&tabs=python)
--- a/azuremlpythonsdk-v2/azml_01_experiment_remote_compute.py
+++ b/azuremlpythonsdk-v2/azml_01_experiment_remote_compute.py
@ -0,0 +1,70 @@
+"""
+    Script to train a model from a tabular dataset using a remote compute
+    Based on:
+    https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-scikit-learn
+"""
+from azure.ai.ml import Input, command
+from azure.ai.ml.constants import AssetTypes
+
+from compute_aml import create_or_load_aml
+from data_tabular import create_tabular_dataset, name_dataset
+from environment import custom_env_name
+from initialize_constants import AML_COMPUTE_NAME
+from ml_client import create_or_load_ml_client
+
+experiment_name = "mslearn-train-diabetes"
+experiment_folder = "./diabetes_training"
+script_name = "diabetes_training.py"
+registered_model_name = "diabetes_model"
+
+
+def main():
+    # 1. Create or Load a ML client
+    ml_client = XXXX()
+
+    # 2. Create compute resources
+    create_or_load_aml()
+
+    # 3. Create and register a File Dataset
+    create_tabular_dataset()
+    latest_version_dataset = next(
+        dataset.latest_version
+        for dataset in ml_client.data.XXXX
+        if dataset.name == name_dataset
+    )
+
+    # 4. Run Job
+    job = command(
+        inputs=dict(
+            script_name=script_name,
+            data=Input(
+                type=AssetTypes.URI_FILE,
+                # @latest doesn't work with dataset paths
+                path=XXXX,
+            ),
+            registered_model_name=registered_model_name,
+        ),
+        code=experiment_folder,
+        command=(
+            "python ${{inputs.script_name}}"
+            + " --data XXXX"
+            + " --registered_model_name XXXX"
+        ),
+        environment=f"{custom_env_name}@latest",
+        compute=AML_COMPUTE_NAME,
+        experiment_name=experiment_name,
+        display_name=experiment_name,
+    )
+
+    # submit the command
+    returned_job = ml_client.jobs.create_or_update(job)
+
+    # stream the output and wait until the job is finished
+    ml_client.jobs.stream(returned_job.name)
+
+    # refresh the latest status of the job after streaming
+    returned_job = ml_client.jobs.get(name=returned_job.name)
+
+
+if __name__ == "__main__":
+    main()
--- a/azuremlpythonsdk-v2/azml_02_hyperparameters_tuning.py
+++ b/azuremlpythonsdk-v2/azml_02_hyperparameters_tuning.py
@ -0,0 +1,113 @@
+"""
+    Script to train tune hyperparameters
+    Based on:
+    https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-scikit-learn
+"""
+from azure.ai.ml import Input, command
+from azure.ai.ml.constants import AssetTypes
+from azure.ai.ml.entities import Model
+from azure.ai.ml.sweep import Choice
+
+from compute_aml import create_or_load_aml
+from data_tabular import create_tabular_dataset, name_dataset
+from environment import create_docker_environment, custom_env_name
+from initialize_constants import AML_COMPUTE_NAME
+from ml_client import create_or_load_ml_client
+
+experiment_folder = "diabetes_hyperdrive"
+experiment_name = "mslearn-diabetes-hyperdrive"
+script_name = "diabetes_training.py"
+registered_model_name = "diabetes_model_hyper"
+best_model_name = "best_diabetes_model"
+
+
+def main():
+    # 1. Create or Load a ML client
+    ml_client = XXXX()
+
+    # 2. Create compute resources
+    XXXX()
+
+    # 3. Create and register a File Dataset
+    XXXX()
+    latest_version_dataset =  XXXX()
+
+    # 4. Environment
+    environment_names = [env.name for XXXX in ml_client.environments.list()]
+    if custom_env_name not in environment_names:
+        create_docker_environment()
+
+    # 5. Run Job
+    job_for_sweep = command(
+        inputs=dict(
+            script_name=script_name,
+            data=Input(
+                type=AssetTypes.URI_FILE,
+                # @latest doesn't work with dataset paths
+                path=f"azureml:{name_dataset}:{latest_version_dataset}",
+            ),
+            registered_model_name=registered_model_name,
+            learning_rate=XXXX(values= XXXX),
+            n_estimators=XXXX(values=XXXX),
+        ),
+        code=experiment_folder,
+        command=(
+            "python XXXX"
+            + " --data XXXX"
+            + " --registered_model_name XXXX"
+            + " --learning_rate XXXX"
+            + " --n_estimators XXXX"
+        ),
+        environment=XXXX,
+        compute=AML_COMPUTE_NAME,
+        experiment_name=experiment_name,
+        display_name=experiment_name,
+    )
+
+    # Configure hyperdrive settings
+    sweep_job = job_for_sweep.XXXX(
+        compute=AML_COMPUTE_NAME,
+        sampling_algorithm="grid",
+        primary_metric="AUC",
+        goal="Maximize",
+        max_total_trials=6,
+        max_concurrent_trials=2,
+    )
+
+    # submit the command
+    returned_sweep_job = ml_client.create_or_update(sweep_job)
+
+    # stream the output and wait until the job is finished
+    ml_client.jobs.stream(returned_sweep_job.name)
+
+    # refresh the latest status of the job after streaming
+    returned_sweep_job = ml_client.jobs.get(name=returned_sweep_job.name)
+
+    # Find and register the best model
+    if returned_sweep_job.status == "Completed":
+        # First let us get the run which gave us the best result
+        best_run = returned_sweep_job.properties["best_child_run_id"]
+
+        # lets get the model from this run
+        model = Model(
+            # the script stores the model as the given name
+            path=(
+                f"azureml://jobs/{best_run}/outputs/artifacts/paths/"
+                + f"{registered_model_name}/"
+            ),
+            name=best_model_name,
+            type="mlflow_model",
+        )
+    else:
+        print(
+            f"Sweep job status: {returned_sweep_job.status}. \
+                Please wait until it completes"
+        )
+
+    # Register best model
+    print(f"Registering Model {best_model_name}")
+    ml_client.models.XXXX(model=model)
+
+
+if __name__ == "__main__":
+    main()
--- a/azuremlpythonsdk-v2/azml_03_realtime_inference.py
+++ b/azuremlpythonsdk-v2/azml_03_realtime_inference.py
@ -0,0 +1,49 @@
+"""
+    Script to create a real-time inferencing service
+    Based on:
+    https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-mlflow-models
+"""
+from azure.ai.ml.entities import ManagedOnlineDeployment, ManagedOnlineEndpoint
+
+from azml_02_hyperparameters_tuning import best_model_name
+from initialize_constants import AZURE_WORKSPACE_NAME, VM_SIZE
+from ml_client import create_or_load_ml_client
+
+online_endpoint_name = ("srv-" + AZURE_WORKSPACE_NAME).lower()
+
+
+def main():
+    # 1. Create or Load a ML client
+    ml_client = XXXX()
+
+    # 2. Create a endpoint
+    print(f"Creating endpoint {online_endpoint_name}")
+    endpoint = XXXX(
+        name=online_endpoint_name,
+        auth_mode="key",
+    )
+
+    # Method `result()` should be added to wait until completion
+    ml_client.online_endpoints.XXXX(endpoint).result()
+
+    # 3. Create a deployment
+    best_model_latest_version = XXXX
+
+    blue_deployment = XXXX(
+        name=online_endpoint_name,
+        endpoint_name=online_endpoint_name,
+        # @latest doesn't work with model paths
+        model=XXXX,
+        instance_type=VM_SIZE,
+        instance_count=1,
+    )
+
+    # Assign all the traffic to this endpoint
+    # Method `result()` should be added to wait until completion
+    ml_client.begin_create_or_update(blue_deployment).result()
+    endpoint.traffic = {online_endpoint_name: 100}
+    ml_client.begin_create_or_update(endpoint).result()
+
+
+if __name__ == "__main__":
+    main()
--- a/azuremlpythonsdk-v2/azml_04_test_inference.py
+++ b/azuremlpythonsdk-v2/azml_04_test_inference.py
@ -0,0 +1,23 @@
+"""
+    Script to use real-time inferencing with online endpoints
+"""
+from azml_03_realtime_inference import online_endpoint_name
+from ml_client import create_or_load_ml_client
+
+
+def main():
+    # 1. Load a Workspace
+    ml_client = XXXX()
+
+    # 2. Get predictions
+    output = ml_client.online_endpoints.XXXX(
+        endpoint_name=XXXX,
+        deployment_name=online_endpoint_name,
+        request_file="./diabetes_test_inference/request.json",
+    )
+
+    print(output)
+
+
+if __name__ == "__main__":
+    main()
--- a/azuremlpythonsdk-v2/compute_aml.py
+++ b/azuremlpythonsdk-v2/compute_aml.py
@ -0,0 +1,63 @@
+"""
+    Script to initialize an Azure Machine Learning compute cluster (aml)
+"""
+from azure.ai.ml.entities import AmlCompute
+
+from initialize_constants import AML_COMPUTE_NAME, MAX_NODES, MIN_NODES, VM_SIZE
+from ml_client import create_or_load_ml_client
+
+
+def create_or_load_aml(
+    cpu_compute_target=AML_COMPUTE_NAME,
+    vm_size=VM_SIZE,
+    min_nodes=MIN_NODES,
+    max_nodes=MAX_NODES,
+):
+    """Create or load an Azure Machine Learning compute cluster (aml) in a
+        given Workspace.
+    Args:
+        cpu_compute_target: Name of the compute resource
+        vm_size: Virtual machine size, VM_SIZE is used as default,
+            for example STANDARD_D2_V2. Set to STANDARD_NC6 to get a GPU
+        min_nodes: Minimal number of nodes, MIN_NODES is used as default.
+        max_nodes: Minimal number of nodes, MIN_NODES is used as default.
+
+    Returns:
+        An aml and set quick load.
+    """
+    # Create or Load a Workspace
+    ml_client = create_or_load_ml_client()
+    try:
+        # let's see if the compute target already exists
+        cpu_cluster = ml_client.compute.get(XXXXX)
+        print(
+            f"You already have a cluster named {XXXXX},",
+            "we'll reuse it.",
+        )
+    except Exception:
+        print("Creating a new cpu compute target...")
+        cpu_cluster = AmlCompute(
+            name=cpu_compute_target,
+            # Azure ML Compute is the on-demand VM service
+            type="amlcompute",
+            # VM Family
+            size=vm_size,
+            # Minimum running nodes when there is no job running
+            min_instances=min_nodes,
+            # Nodes in cluster
+            max_instances=max_nodes,
+            # How many seconds will the node running after the job termination
+            idle_time_before_scale_down=180,
+            # Dedicated or LowPriority.
+            # The latter is cheaper but there is a chance of job termination
+            tier="Dedicated",
+        )
+
+        # Now, we pass the object to MLClient's create_or_update method
+        cpu_cluster = ml_client.compute.begin_create_or_update(XXXXX)
+
+    return cpu_cluster
+
+
+if __name__ == "__main__":
+    create_or_load_aml()
--- a/azuremlpythonsdk-v2/data/diabetes.csv
+++ b/azuremlpythonsdk-v2/data/diabetes.csv
--- a/azuremlpythonsdk-v2/data_tabular.py
+++ b/azuremlpythonsdk-v2/data_tabular.py
@ -0,0 +1,31 @@
+"""
+    Script to create and register file as an uri
+"""
+from azure.ai.ml.constants import AssetTypes
+from azure.ai.ml.entities import Data
+
+from ml_client import create_or_load_ml_client
+
+name_dataset = "diabetes-dataset"
+data_folder = "./data/diabetes.csv"
+
+
+def create_tabular_dataset():
+    # 1. Create or Load a ML client
+    ml_client = XXXXX()
+
+    # 2. Add files
+    if name_dataset not in [XXXXX for env in ml_client.data.list()]:
+        tab_data_set = Data(
+            path=XXXXX,
+            type=AssetTypes.URI_FILE,
+            name=name_dataset,
+        )
+
+        ml_client.data.create_or_update(XXXXX)
+    else:
+        print("Dataset already registered.")
+
+
+if __name__ == "__main__":
+    create_tabular_dataset()
--- a/azuremlpythonsdk-v2/dependencies/conda.yml
+++ b/azuremlpythonsdk-v2/dependencies/conda.yml
@ -0,0 +1,11 @@
+name: model-env
+dependencies:
+  - python=3.8
+  - scikit-learn
+  - pandas
+  - numpy
+  - matplotlib
+  - pip
+  - pip:
+    - mlflow
+    - azureml-mlflow
--- a/azuremlpythonsdk-v2/diabetes_hyperdrive/diabetes_training.py
+++ b/azuremlpythonsdk-v2/diabetes_hyperdrive/diabetes_training.py
@ -0,0 +1,123 @@
+# Import libraries
+import argparse
+import os
+
+import mlflow
+import mlflow.sklearn
+import numpy as np
+import pandas as pd
+from sklearn.ensemble import GradientBoostingClassifier
+from sklearn.metrics import roc_auc_score
+from sklearn.model_selection import train_test_split
+
+
+def main():
+    """Main function of the script."""
+
+    # Input and output arguments
+
+    # Get script arguments
+    parser = XXXX()
+
+    # Input dataset
+    parser.add_argument(
+        "XXXX",
+        type=str,
+        help="path to input data",
+    )
+
+    # Model name
+    parser.add_argument("XXXX", type=str, help="model name")
+
+    # Hyperparameters
+    parser.add_argument(
+        "XXXX",
+        type=float,
+        dest="learning_rate",
+        default=0.1,
+        help="learning rate",
+    )
+    parser.add_argument(
+        "XXXX",
+        type=int,
+        dest="n_estimators",
+        default=100,
+        help="number of estimators",
+    )
+
+    # Add arguments to args collection
+    args = parser.parse_args()
+    print(" ".join(f"{k}={v}" for k, v in vars(args).items()))
+
+    # Start Logging
+    mlflow.XXXX()
+
+    # enable autologging
+    mlflow.XXXX()
+
+    # load the diabetes data (passed as an input dataset)
+    print("input data:", args.data)
+
+    diabetes = pd.read_csv(args.data)
+
+    # Separate features and labels
+    X, y = (
+        diabetes[
+            [
+                "Pregnancies",
+                "PlasmaGlucose",
+                "DiastolicBloodPressure",
+                "TricepsThickness",
+                "SerumInsulin",
+                "BMI",
+                "DiabetesPedigree",
+                "Age",
+            ]
+        ].values,
+        diabetes["Diabetic"].values,
+    )
+
+    # Split data into training set and test set
+    X_train, X_test, y_train, y_test = XXXX(
+        X, y, test_size=0.30, random_state=0
+    )
+
+    # Train a Gradient Boosting classification model
+    # with the specified hyperparameters
+    print("Training a classification model")
+    model = XXXX(
+        learning_rate=XXXX, n_estimators=XXXX
+    ).fit(X_train, y_train)
+
+    # calculate accuracy
+    y_hat = model.XXXX(X_test)
+    accuracy = np.average(y_hat == y_test)
+    print("Accuracy:", accuracy)
+    mlflow.log_metric("Accuracy", float(accuracy))
+
+    # calculate AUC
+    y_scores = model.XXXX(X_test)
+    auc = roc_auc_score(y_test, y_scores[:, 1])
+    print("AUC: " + str(auc))
+    mlflow.log_metric("AUC", float(auc))
+
+    # Registering the model to the workspace
+    print("Registering the model via MLFlow")
+    mlflow.XXXX(
+        sk_model=model,
+        registered_model_name=args.registered_model_name,
+        artifact_path=args.registered_model_name,
+    )
+
+    # Saving the model to a file
+    mlflow.sklearn.save_model(
+        sk_model=model,
+        path=os.path.join(args.registered_model_name, "trained_model"),
+    )
+
+    # Stop Logging
+    mlflow.XXXX()
+
+
+if __name__ == "__main__":
+    main()
--- a/azuremlpythonsdk-v2/diabetes_test_inference/request.json
+++ b/azuremlpythonsdk-v2/diabetes_test_inference/request.json
@ -0,0 +1,4 @@
+{"input_data": [
+    [2, 180, 74, 24, 21, 23.9091702, 1.488172308, 22],
+    [0, 148, 58, 11, 179, 39.19207553, 0.160829008, 45]
+]}
--- a/azuremlpythonsdk-v2/diabetes_training/diabetes_training.py
+++ b/azuremlpythonsdk-v2/diabetes_training/diabetes_training.py
@ -0,0 +1,115 @@
+# Import libraries
+import argparse
+import os
+
+import matplotlib.pyplot as plt
+import mlflow
+import mlflow.sklearn
+import numpy as np
+import pandas as pd
+from sklearn.metrics import roc_auc_score, roc_curve
+from sklearn.model_selection import train_test_split
+from sklearn.tree import DecisionTreeClassifier
+
+
+def main():
+    """Main function of the script."""
+
+    # Input and output arguments
+    # Get script arguments
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--data",
+        type=str,
+        help="path to input data",
+    )
+    parser.add_argument("--registered_model_name", type=str, help="model name")
+    args = parser.parse_args()
+    print(" ".join(f"{k}={v}" for k, v in vars(args).items()))
+
+    # Start Logging
+    mlflow.start_run()
+
+    # enable autologging
+    mlflow.sklearn.autolog()
+
+    # load the diabetes data (passed as an input dataset)
+    print("input data:", args.data)
+
+    diabetes = pd.read_csv(args.data)
+
+    mlflow.log_metric("num_samples", diabetes.shape[0])
+    mlflow.log_metric("num_features", diabetes.shape[1] - 1)
+
+    # Separate features and labels
+    X, y = (
+        diabetes[
+            [
+                "Pregnancies",
+                "PlasmaGlucose",
+                "DiastolicBloodPressure",
+                "TricepsThickness",
+                "SerumInsulin",
+                "BMI",
+                "DiabetesPedigree",
+                "Age",
+            ]
+        ].values,
+        diabetes["Diabetic"].values,
+    )
+
+    # Split data into training set and test set
+    X_train, X_test, y_train, y_test = train_test_split(
+        X, y, test_size=0.30, random_state=0
+    )
+
+    # Train a decision tree model
+    print("Training a decision tree model")
+    model = DecisionTreeClassifier().fit(X_train, y_train)
+
+    # calculate accuracy
+    y_hat = model.predict(X_test)
+    accuracy = np.average(y_hat == y_test)
+    print("Accuracy:", accuracy)
+    mlflow.log_metric("Accuracy", float(accuracy))
+
+    # calculate AUC
+    y_scores = model.predict_proba(X_test)
+    auc = roc_auc_score(y_test, y_scores[:, 1])
+    print("AUC: " + str(auc))
+    mlflow.log_metric("AUC", float(auc))
+
+    # plot ROC curve
+    fpr, tpr, thresholds = roc_curve(y_test, y_scores[:, 1])
+    fig = plt.figure(figsize=(6, 4))
+    # Plot the diagonal 50% line
+    plt.plot([0, 1], [0, 1], "k--")
+    # Plot the FPR and TPR achieved by our model
+    plt.plot(fpr, tpr)
+    plt.xlabel("False Positive Rate")
+    plt.ylabel("True Positive Rate")
+    plt.title("ROC Curve")
+    fig.savefig("ROC.png")
+    mlflow.log_artifact("ROC.png")
+    plt.show()
+
+    # Registering the model to the workspace
+    print("Registering the model via MLFlow")
+    mlflow.sklearn.log_model(
+        sk_model=model,
+        registered_model_name=args.registered_model_name,
+        artifact_path=args.registered_model_name,
+    )
+
+    # Saving the model to a file
+    mlflow.sklearn.save_model(
+        sk_model=model,
+        path=os.path.join(args.registered_model_name, "trained_model"),
+    )
+
+    # Stop Logging
+    mlflow.end_run()
+
+
+if __name__ == "__main__":
+    main()
--- a/azuremlpythonsdk-v2/environment.py
+++ b/azuremlpythonsdk-v2/environment.py
@ -0,0 +1,33 @@
+"""
+    Script to create and register an environment including SKlearn
+"""
+import os
+
+from azure.ai.ml.entities import Environment
+
+from ml_client import create_or_load_ml_client
+
+dependencies_dir = "./dependencies"
+custom_env_name = "custom-scikit-learn"
+
+
+def create_docker_environment():
+    # 1. Create or Load a ML client
+    ml_client =  XXXXX()
+
+    # 2. Create a Python environment for the experiment
+    env_docker_image = XXXXX(
+        name=custom_env_name,
+        conda_file=os.path.join(dependencies_dir, "XXXXX"),
+        image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04:latest",
+    )
+    ml_client.environments.create_or_update(env_docker_image)
+
+    print(
+        f"Environment with name {env_docker_image.name} is registered to the workspace,",
+        f"the environment version is {env_docker_image.version}"
+    )
+
+
+if __name__ == "__main__":
+    create_docker_environment()
--- a/azuremlpythonsdk-v2/initialize_constants.py
+++ b/azuremlpythonsdk-v2/initialize_constants.py
@ -0,0 +1,23 @@
+"""
+    Script to initialize global constants
+"""
+import os
+
+# Global constants can be set via environmental variables
+# Remove default values in production
+AZURE_RESOURCE_GROUP = os.getenv("AZURE_RESOURCE_GROUP", "itvitae-azure-ml")
+AZURE_SUBSCRIPTION_ID = os.getenv(
+    "AZURE_SUBSCRIPTION_ID", "34faeead-244d-4ae8-8194-1eeaaffaf5be"
+)
+AZURE_WORKSPACE_NAME = os.getenv(
+    "AZURE_WORKSPACE_NAME",
+    "ws-kevin-heimbach",
+)
+AZURE_LOCATION = os.getenv("AZURE_LOCATION", "westeurope")
+# Choose names for your clusters
+AML_COMPUTE_NAME = os.getenv("AML_COMPUTE_NAME", "aml-compute")
+# General Servers Characteristics
+VM_SIZE = os.getenv("VM_SIZE", "STANDARD_DS2_V2")
+MIN_NODES = int(os.getenv("MIN_NODES", 0))
+MAX_NODES = int(os.getenv("MAX_NODES", 1))
+AGENT_COUNT = int(os.getenv("AGENT_COUNT", 2))
--- a/azuremlpythonsdk-v2/ml_client.py
+++ b/azuremlpythonsdk-v2/ml_client.py
@ -0,0 +1,46 @@
+"""
+    Script to initialize MLClient object
+"""
+from azure.ai.ml import MLClient
+from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
+
+from initialize_constants import (
+    AZURE_RESOURCE_GROUP,
+    AZURE_SUBSCRIPTION_ID,
+    AZURE_WORKSPACE_NAME,
+)
+
+
+def create_or_load_ml_client():
+    """Create or load an Azure ML Client based on env variables.
+    Args:
+        None since information is taken from global constants
+            defined in initialize_constants.py.
+
+    Returns:
+        A workspace and set quick load.
+    """
+    try:
+        credential = DefaultAzureCredential()
+        # Check if given credential can get token successfully.
+        credential.get_token("https://management.azure.com/.default")
+    except Exception as ex:
+        # Fall back to InteractiveBrowserCredential
+        # in case DefaultAzureCredential not working
+        print(ex)
+        credential = InteractiveBrowserCredential()
+
+    # Get a handle to the workspace.
+    # You can find the info on the workspace tab on ml.azure.com
+    ml_client = MLClient(
+        credential=credential,
+        subscription_id=XXXXX,
+        resource_group_name=XXXXX,
+        workspace_name=XXXXX,
+    )
+    return ml_client
+
+
+if __name__ == "__main__":
+    ml_client = create_or_load_ml_client()
+    print(ml_client)
--- a/azuremlpythonsdk-v2/setup.cfg
+++ b/azuremlpythonsdk-v2/setup.cfg
@ -0,0 +1,37 @@
+[flake8]
+ignore = E203, W503
+max-line-length = 99
+max-complexity = 18
+select = B,C,E,F,W,T4
+
+[isort]
+multi_line_output=3
+include_trailing_comma=True
+force_grid_wrap=0
+use_parentheses=True
+ensure_newline_before_comments=True
+line_length=99
+
+[mypy]
+files=refactor,tests
+ignore_missing_imports=True
+
+[coverage:run]
+source = refactor
+
+[coverage:report]
+exclude_lines =
+    # exclude pragma again
+    pragma: no cover
+
+    # exclude main
+    if __name__ == .__main__.:
+
+[coverage:html]
+directory = coverage
+
+[coverage:xml]
+output = coverage.xml
+
+[tool:pytest]
+testpaths=tests/