Init and have all packages required
This commit is contained in:
		
						commit
						782aba19ba
					
				
					 53 changed files with 21896 additions and 0 deletions
				
			
		
							
								
								
									
										1
									
								
								.envrc
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										1
									
								
								.envrc
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1 @@
 | 
			
		|||
use flake
 | 
			
		||||
							
								
								
									
										
											BIN
										
									
								
								Azure_ML-2.pptx
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								Azure_ML-2.pptx
									
										
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							
							
								
								
									
										51
									
								
								README.md
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										51
									
								
								README.md
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,51 @@
 | 
			
		|||
# Azure ML Lesson 2
 | 
			
		||||
 | 
			
		||||
## How to install all the tools in a nutshell.
 | 
			
		||||
 | 
			
		||||
A host running **Ubuntu 22.04** is expected. If you have a Windows system or Mac, download Virtualbox and setup a VM or WSL2 with Ubuntu 22.04.
 | 
			
		||||
 | 
			
		||||
**Anaconda/Miniconda** must be installed. See [here](https://docs.docker.com/desktop/install/windows-install/) and [here](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html), respectively.
 | 
			
		||||
 | 
			
		||||
Run the following commands to install Azure CLI:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Configure the Azure CLI
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
az login
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Install Azure ML CLI
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
az extension add -n ml -y
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Create a Conda environment to work with Azure in. At this moment, there are [problems with Python 3.9](https://github.com/Azure/MachineLearningNotebooks/issues/1285), so use Python 3.12.
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
conda create --name azure_ml -y python=3.12 pip
 | 
			
		||||
conda activate azure_ml
 | 
			
		||||
 | 
			
		||||
# Install linting, formatting and additional libraries
 | 
			
		||||
pip install flake8 black isort joblib azure-ai-ml azure-identity
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
You now have a Conda environment called `azure_ml` containing the AzureML SDK.
 | 
			
		||||
 | 
			
		||||
Install Visual Studio Code as shown [here](https://code.visualstudio.com/download).
 | 
			
		||||
 | 
			
		||||
Once you've installed VS Code, configure its plugins and tell it to use the
 | 
			
		||||
Python interpreter with the `azure_ml` environment.
 | 
			
		||||
 | 
			
		||||
- Run VS Code and install vscode-icons, python, Code Spell Checker and Azure Machine Learning extensions
 | 
			
		||||
- Go into Azure Machine Learning and log in. Check that you have access to your workspace.
 | 
			
		||||
- Install Flake8, Black formatter and isort Microsoft extensions.
 | 
			
		||||
- Select a Python interpreter
 | 
			
		||||
  - Python is an interpreted language, and in order to run Python code and get Python IntelliSense, you must tell VS Code which interpreter to use.
 | 
			
		||||
  - From within VS Code, select a Python 3 interpreter by opening the Command
 | 
			
		||||
    Palette (Ctrl+Shift+P) and searching for: `Python: Select Interpreter`...
 | 
			
		||||
  - ... then select the environment named `azure_ml`.
 | 
			
		||||
							
								
								
									
										118
									
								
								azuremlpythonsdk-v2/README.md
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										118
									
								
								azuremlpythonsdk-v2/README.md
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,118 @@
 | 
			
		|||
# Azure ML Lesson 2 Lab
 | 
			
		||||
 | 
			
		||||
## 1. Set environmental variables
 | 
			
		||||
 | 
			
		||||
1. Run VS Code in a Azure ML remote instance as shown before.
 | 
			
		||||
2. Press `File > Open Folder` and navigate to `azuremlpythonsdk-v2/` to open the exercise.
 | 
			
		||||
 | 
			
		||||
**IMPORTANT** Relative paths are assumed to be initialized from the `azuremlpythonsdk-v2` folder.
 | 
			
		||||
 | 
			
		||||
Open the file `initialize_constants.py`, there are three variables that should be updated:
 | 
			
		||||
 | 
			
		||||
- AZURE_WORKSPACE_NAME
 | 
			
		||||
 | 
			
		||||
- AZURE_RESOURCE_GROUP
 | 
			
		||||
 | 
			
		||||
- AZURE_SUBSCRIPTION_ID
 | 
			
		||||
 | 
			
		||||
Open your workspace at in `https://ml.azure.com`. At the top right, select the workspace name, then copy the workspace name, the subscription id and the resource name.
 | 
			
		||||
 | 
			
		||||
## 2. Load a workspace
 | 
			
		||||
 | 
			
		||||
Open the file `ml_client.py` and understand how a ML client object is loaded or created. In this lab, the namespace was already created. Just fill the name of the variables from `initialize_constants.py`.
 | 
			
		||||
 | 
			
		||||
When finished, run this file and check that it is executed without errors.
 | 
			
		||||
 | 
			
		||||
## 3. Load a Compute Cluster
 | 
			
		||||
 | 
			
		||||
Open the file `compute_aml.py` and understand how a compute cluster is loaded or created. In this lab, the compute cluster was already created but some variables should be added, which are marked with `XXXX`.
 | 
			
		||||
 | 
			
		||||
When finished, run this file and check that it is executed without errors.
 | 
			
		||||
 | 
			
		||||
What would happen if the compute cluster is not present?
 | 
			
		||||
 | 
			
		||||
## 4. Create a tabular dataset
 | 
			
		||||
 | 
			
		||||
Open the file `data_tabular.py` , several gaps should be filled which are marked with `XXXX`:
 | 
			
		||||
 | 
			
		||||
1. `ml_client = XXXXX()`
 | 
			
		||||
 | 
			
		||||
   Hint: look into previous files.
 | 
			
		||||
 | 
			
		||||
2. How can you get the names of the datasets already registered in `if name_dataset not in [XXXXX for env in ml_client.data.list()]`
 | 
			
		||||
 | 
			
		||||
   Hint: Try to get one object from the class [Data](https://learn.microsoft.com/en-us/python/api/azure-ai-ml/azure.ai.ml.entities.data?view=azure-python) and check their attributes.
 | 
			
		||||
 | 
			
		||||
3. Which should be the `path` parameter in `path=XXXXX`?
 | 
			
		||||
 | 
			
		||||
4. Which input should you give in `ml_client.data.create_or_update(XXXXX)`?
 | 
			
		||||
 | 
			
		||||
When finished, run this file and check that it is executed without errors.
 | 
			
		||||
 | 
			
		||||
## 5. Create and register an environment
 | 
			
		||||
 | 
			
		||||
Open the file `environment.py` , several gaps should be filled which are marked with `XXXX`:
 | 
			
		||||
 | 
			
		||||
1. `ml_client = XXXXX()`
 | 
			
		||||
 | 
			
		||||
   Hint: look into previous files.
 | 
			
		||||
 | 
			
		||||
2. Which class should be used to register the environment?
 | 
			
		||||
 | 
			
		||||
   Hint: Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-environments-v2?tabs=python)
 | 
			
		||||
 | 
			
		||||
When finished, run this file and check that it is executed without errors.
 | 
			
		||||
 | 
			
		||||
## 6. Train a model from a tabular dataset using a remote compute
 | 
			
		||||
 | 
			
		||||
Open the file `azml_01_experiment_remote_compute.py` , several gaps should be filled which are marked with `XXXX`:
 | 
			
		||||
 | 
			
		||||
1. `ml_client = XXXX()`
 | 
			
		||||
 | 
			
		||||
   Hint: look into previous files.
 | 
			
		||||
 | 
			
		||||
2. Complete the  `latest_version_dataset` definition.
 | 
			
		||||
 | 
			
		||||
   Hint:  Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/tutorial-azure-ml-in-a-day#deploy-the-model-to-the-endpoint)
 | 
			
		||||
 | 
			
		||||
3. Complete the `Input` part.
 | 
			
		||||
 | 
			
		||||
   Hint:  Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-read-write-data-v2?tabs=python)
 | 
			
		||||
 | 
			
		||||
4. Complete the `command` part.
 | 
			
		||||
 | 
			
		||||
   Hint:  Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-read-write-data-v2?tabs=python)
 | 
			
		||||
 | 
			
		||||
When finished, run this file and check that it is executed without errors.
 | 
			
		||||
 | 
			
		||||
### 7. Tune hyperparameters using a remote compute
 | 
			
		||||
 | 
			
		||||
Open the file `azml_02_hyperparameters_tuning.py` , several gaps should be filled which are marked with `XXXX`. The hyperparameter search should be defined in the following space:
 | 
			
		||||
 | 
			
		||||
- learning_rate: one of the values 0.01, 0.1, 1.0
 | 
			
		||||
 | 
			
		||||
- n_estimators: one of the values 10, 100
 | 
			
		||||
 | 
			
		||||
Hint: Use the previous file as template.
 | 
			
		||||
 | 
			
		||||
Hint: For the `Hyperdrive settings` format, look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-sweep-in-pipeline)
 | 
			
		||||
 | 
			
		||||
Open the file `diabetes_hyperdrive/diabetes_training.py` , several gaps should be filled which are marked with `XXXX`. A Gradient Boosting classification model should be trained and the auc and the accuracy in the test set should be computed.
 | 
			
		||||
 | 
			
		||||
Hint: Use as a template the file `data/diabetes_training.py`.
 | 
			
		||||
 | 
			
		||||
When finished, run this file and check that it is executed without errors.
 | 
			
		||||
 | 
			
		||||
## 8. Create a real-time inferencing service
 | 
			
		||||
 | 
			
		||||
Open the file `azml_03_realtime_inference.py` , several gaps should be filled which are marked with `XXXX`.
 | 
			
		||||
 | 
			
		||||
Hint: Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-mlflow-models?tabs=fromjob%2Cmir%2Csdk)
 | 
			
		||||
 | 
			
		||||
When finished, run this file and check that it is executed without errors.
 | 
			
		||||
 | 
			
		||||
## 9. Test the inference service
 | 
			
		||||
 | 
			
		||||
Open the file `azml_04_test_inference.py` , several gaps should be filled which are marked with `XXXX`.
 | 
			
		||||
 | 
			
		||||
Hint: Check [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-safely-rollout-online-endpoints?view=azureml-api-2&tabs=python)
 | 
			
		||||
							
								
								
									
										70
									
								
								azuremlpythonsdk-v2/azml_01_experiment_remote_compute.py
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										70
									
								
								azuremlpythonsdk-v2/azml_01_experiment_remote_compute.py
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,70 @@
 | 
			
		|||
"""
 | 
			
		||||
    Script to train a model from a tabular dataset using a remote compute
 | 
			
		||||
    Based on:
 | 
			
		||||
    https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-scikit-learn
 | 
			
		||||
"""
 | 
			
		||||
from azure.ai.ml import Input, command
 | 
			
		||||
from azure.ai.ml.constants import AssetTypes
 | 
			
		||||
 | 
			
		||||
from compute_aml import create_or_load_aml
 | 
			
		||||
from data_tabular import create_tabular_dataset, name_dataset
 | 
			
		||||
from environment import custom_env_name
 | 
			
		||||
from initialize_constants import AML_COMPUTE_NAME
 | 
			
		||||
from ml_client import create_or_load_ml_client
 | 
			
		||||
 | 
			
		||||
experiment_name = "mslearn-train-diabetes"
 | 
			
		||||
experiment_folder = "./diabetes_training"
 | 
			
		||||
script_name = "diabetes_training.py"
 | 
			
		||||
registered_model_name = "diabetes_model"
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
def main():
 | 
			
		||||
    # 1. Create or Load a ML client
 | 
			
		||||
    ml_client = XXXX()
 | 
			
		||||
 | 
			
		||||
    # 2. Create compute resources
 | 
			
		||||
    create_or_load_aml()
 | 
			
		||||
 | 
			
		||||
    # 3. Create and register a File Dataset
 | 
			
		||||
    create_tabular_dataset()
 | 
			
		||||
    latest_version_dataset = next(
 | 
			
		||||
        dataset.latest_version
 | 
			
		||||
        for dataset in ml_client.data.XXXX
 | 
			
		||||
        if dataset.name == name_dataset
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # 4. Run Job
 | 
			
		||||
    job = command(
 | 
			
		||||
        inputs=dict(
 | 
			
		||||
            script_name=script_name,
 | 
			
		||||
            data=Input(
 | 
			
		||||
                type=AssetTypes.URI_FILE,
 | 
			
		||||
                # @latest doesn't work with dataset paths
 | 
			
		||||
                path=XXXX,
 | 
			
		||||
            ),
 | 
			
		||||
            registered_model_name=registered_model_name,
 | 
			
		||||
        ),
 | 
			
		||||
        code=experiment_folder,
 | 
			
		||||
        command=(
 | 
			
		||||
            "python ${{inputs.script_name}}"
 | 
			
		||||
            + " --data XXXX"
 | 
			
		||||
            + " --registered_model_name XXXX"
 | 
			
		||||
        ),
 | 
			
		||||
        environment=f"{custom_env_name}@latest",
 | 
			
		||||
        compute=AML_COMPUTE_NAME,
 | 
			
		||||
        experiment_name=experiment_name,
 | 
			
		||||
        display_name=experiment_name,
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # submit the command
 | 
			
		||||
    returned_job = ml_client.jobs.create_or_update(job)
 | 
			
		||||
 | 
			
		||||
    # stream the output and wait until the job is finished
 | 
			
		||||
    ml_client.jobs.stream(returned_job.name)
 | 
			
		||||
 | 
			
		||||
    # refresh the latest status of the job after streaming
 | 
			
		||||
    returned_job = ml_client.jobs.get(name=returned_job.name)
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
if __name__ == "__main__":
 | 
			
		||||
    main()
 | 
			
		||||
							
								
								
									
										113
									
								
								azuremlpythonsdk-v2/azml_02_hyperparameters_tuning.py
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										113
									
								
								azuremlpythonsdk-v2/azml_02_hyperparameters_tuning.py
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,113 @@
 | 
			
		|||
"""
 | 
			
		||||
    Script to train tune hyperparameters
 | 
			
		||||
    Based on:
 | 
			
		||||
    https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-scikit-learn
 | 
			
		||||
"""
 | 
			
		||||
from azure.ai.ml import Input, command
 | 
			
		||||
from azure.ai.ml.constants import AssetTypes
 | 
			
		||||
from azure.ai.ml.entities import Model
 | 
			
		||||
from azure.ai.ml.sweep import Choice
 | 
			
		||||
 | 
			
		||||
from compute_aml import create_or_load_aml
 | 
			
		||||
from data_tabular import create_tabular_dataset, name_dataset
 | 
			
		||||
from environment import create_docker_environment, custom_env_name
 | 
			
		||||
from initialize_constants import AML_COMPUTE_NAME
 | 
			
		||||
from ml_client import create_or_load_ml_client
 | 
			
		||||
 | 
			
		||||
experiment_folder = "diabetes_hyperdrive"
 | 
			
		||||
experiment_name = "mslearn-diabetes-hyperdrive"
 | 
			
		||||
script_name = "diabetes_training.py"
 | 
			
		||||
registered_model_name = "diabetes_model_hyper"
 | 
			
		||||
best_model_name = "best_diabetes_model"
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
def main():
 | 
			
		||||
    # 1. Create or Load a ML client
 | 
			
		||||
    ml_client = XXXX()
 | 
			
		||||
 | 
			
		||||
    # 2. Create compute resources
 | 
			
		||||
    XXXX()
 | 
			
		||||
 | 
			
		||||
    # 3. Create and register a File Dataset
 | 
			
		||||
    XXXX()
 | 
			
		||||
    latest_version_dataset =  XXXX()
 | 
			
		||||
 | 
			
		||||
    # 4. Environment
 | 
			
		||||
    environment_names = [env.name for XXXX in ml_client.environments.list()]
 | 
			
		||||
    if custom_env_name not in environment_names:
 | 
			
		||||
        create_docker_environment()
 | 
			
		||||
 | 
			
		||||
    # 5. Run Job
 | 
			
		||||
    job_for_sweep = command(
 | 
			
		||||
        inputs=dict(
 | 
			
		||||
            script_name=script_name,
 | 
			
		||||
            data=Input(
 | 
			
		||||
                type=AssetTypes.URI_FILE,
 | 
			
		||||
                # @latest doesn't work with dataset paths
 | 
			
		||||
                path=f"azureml:{name_dataset}:{latest_version_dataset}",
 | 
			
		||||
            ),
 | 
			
		||||
            registered_model_name=registered_model_name,
 | 
			
		||||
            learning_rate=XXXX(values= XXXX),
 | 
			
		||||
            n_estimators=XXXX(values=XXXX),
 | 
			
		||||
        ),
 | 
			
		||||
        code=experiment_folder,
 | 
			
		||||
        command=(
 | 
			
		||||
            "python XXXX"
 | 
			
		||||
            + " --data XXXX"
 | 
			
		||||
            + " --registered_model_name XXXX"
 | 
			
		||||
            + " --learning_rate XXXX"
 | 
			
		||||
            + " --n_estimators XXXX"
 | 
			
		||||
        ),
 | 
			
		||||
        environment=XXXX,
 | 
			
		||||
        compute=AML_COMPUTE_NAME,
 | 
			
		||||
        experiment_name=experiment_name,
 | 
			
		||||
        display_name=experiment_name,
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Configure hyperdrive settings
 | 
			
		||||
    sweep_job = job_for_sweep.XXXX(
 | 
			
		||||
        compute=AML_COMPUTE_NAME,
 | 
			
		||||
        sampling_algorithm="grid",
 | 
			
		||||
        primary_metric="AUC",
 | 
			
		||||
        goal="Maximize",
 | 
			
		||||
        max_total_trials=6,
 | 
			
		||||
        max_concurrent_trials=2,
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # submit the command
 | 
			
		||||
    returned_sweep_job = ml_client.create_or_update(sweep_job)
 | 
			
		||||
 | 
			
		||||
    # stream the output and wait until the job is finished
 | 
			
		||||
    ml_client.jobs.stream(returned_sweep_job.name)
 | 
			
		||||
 | 
			
		||||
    # refresh the latest status of the job after streaming
 | 
			
		||||
    returned_sweep_job = ml_client.jobs.get(name=returned_sweep_job.name)
 | 
			
		||||
 | 
			
		||||
    # Find and register the best model
 | 
			
		||||
    if returned_sweep_job.status == "Completed":
 | 
			
		||||
        # First let us get the run which gave us the best result
 | 
			
		||||
        best_run = returned_sweep_job.properties["best_child_run_id"]
 | 
			
		||||
 | 
			
		||||
        # lets get the model from this run
 | 
			
		||||
        model = Model(
 | 
			
		||||
            # the script stores the model as the given name
 | 
			
		||||
            path=(
 | 
			
		||||
                f"azureml://jobs/{best_run}/outputs/artifacts/paths/"
 | 
			
		||||
                + f"{registered_model_name}/"
 | 
			
		||||
            ),
 | 
			
		||||
            name=best_model_name,
 | 
			
		||||
            type="mlflow_model",
 | 
			
		||||
        )
 | 
			
		||||
    else:
 | 
			
		||||
        print(
 | 
			
		||||
            f"Sweep job status: {returned_sweep_job.status}. \
 | 
			
		||||
                Please wait until it completes"
 | 
			
		||||
        )
 | 
			
		||||
 | 
			
		||||
    # Register best model
 | 
			
		||||
    print(f"Registering Model {best_model_name}")
 | 
			
		||||
    ml_client.models.XXXX(model=model)
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
if __name__ == "__main__":
 | 
			
		||||
    main()
 | 
			
		||||
							
								
								
									
										49
									
								
								azuremlpythonsdk-v2/azml_03_realtime_inference.py
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										49
									
								
								azuremlpythonsdk-v2/azml_03_realtime_inference.py
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,49 @@
 | 
			
		|||
"""
 | 
			
		||||
    Script to create a real-time inferencing service
 | 
			
		||||
    Based on:
 | 
			
		||||
    https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-mlflow-models
 | 
			
		||||
"""
 | 
			
		||||
from azure.ai.ml.entities import ManagedOnlineDeployment, ManagedOnlineEndpoint
 | 
			
		||||
 | 
			
		||||
from azml_02_hyperparameters_tuning import best_model_name
 | 
			
		||||
from initialize_constants import AZURE_WORKSPACE_NAME, VM_SIZE
 | 
			
		||||
from ml_client import create_or_load_ml_client
 | 
			
		||||
 | 
			
		||||
online_endpoint_name = ("srv-" + AZURE_WORKSPACE_NAME).lower()
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
def main():
 | 
			
		||||
    # 1. Create or Load a ML client
 | 
			
		||||
    ml_client = XXXX()
 | 
			
		||||
 | 
			
		||||
    # 2. Create a endpoint
 | 
			
		||||
    print(f"Creating endpoint {online_endpoint_name}")
 | 
			
		||||
    endpoint = XXXX(
 | 
			
		||||
        name=online_endpoint_name,
 | 
			
		||||
        auth_mode="key",
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Method `result()` should be added to wait until completion
 | 
			
		||||
    ml_client.online_endpoints.XXXX(endpoint).result()
 | 
			
		||||
 | 
			
		||||
    # 3. Create a deployment
 | 
			
		||||
    best_model_latest_version = XXXX
 | 
			
		||||
 | 
			
		||||
    blue_deployment = XXXX(
 | 
			
		||||
        name=online_endpoint_name,
 | 
			
		||||
        endpoint_name=online_endpoint_name,
 | 
			
		||||
        # @latest doesn't work with model paths
 | 
			
		||||
        model=XXXX,
 | 
			
		||||
        instance_type=VM_SIZE,
 | 
			
		||||
        instance_count=1,
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Assign all the traffic to this endpoint
 | 
			
		||||
    # Method `result()` should be added to wait until completion
 | 
			
		||||
    ml_client.begin_create_or_update(blue_deployment).result()
 | 
			
		||||
    endpoint.traffic = {online_endpoint_name: 100}
 | 
			
		||||
    ml_client.begin_create_or_update(endpoint).result()
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
if __name__ == "__main__":
 | 
			
		||||
    main()
 | 
			
		||||
							
								
								
									
										23
									
								
								azuremlpythonsdk-v2/azml_04_test_inference.py
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										23
									
								
								azuremlpythonsdk-v2/azml_04_test_inference.py
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,23 @@
 | 
			
		|||
"""
 | 
			
		||||
    Script to use real-time inferencing with online endpoints
 | 
			
		||||
"""
 | 
			
		||||
from azml_03_realtime_inference import online_endpoint_name
 | 
			
		||||
from ml_client import create_or_load_ml_client
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
def main():
 | 
			
		||||
    # 1. Load a Workspace
 | 
			
		||||
    ml_client = XXXX()
 | 
			
		||||
 | 
			
		||||
    # 2. Get predictions
 | 
			
		||||
    output = ml_client.online_endpoints.XXXX(
 | 
			
		||||
        endpoint_name=XXXX,
 | 
			
		||||
        deployment_name=online_endpoint_name,
 | 
			
		||||
        request_file="./diabetes_test_inference/request.json",
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    print(output)
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
if __name__ == "__main__":
 | 
			
		||||
    main()
 | 
			
		||||
							
								
								
									
										63
									
								
								azuremlpythonsdk-v2/compute_aml.py
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										63
									
								
								azuremlpythonsdk-v2/compute_aml.py
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,63 @@
 | 
			
		|||
"""
 | 
			
		||||
    Script to initialize an Azure Machine Learning compute cluster (aml)
 | 
			
		||||
"""
 | 
			
		||||
from azure.ai.ml.entities import AmlCompute
 | 
			
		||||
 | 
			
		||||
from initialize_constants import AML_COMPUTE_NAME, MAX_NODES, MIN_NODES, VM_SIZE
 | 
			
		||||
from ml_client import create_or_load_ml_client
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
def create_or_load_aml(
 | 
			
		||||
    cpu_compute_target=AML_COMPUTE_NAME,
 | 
			
		||||
    vm_size=VM_SIZE,
 | 
			
		||||
    min_nodes=MIN_NODES,
 | 
			
		||||
    max_nodes=MAX_NODES,
 | 
			
		||||
):
 | 
			
		||||
    """Create or load an Azure Machine Learning compute cluster (aml) in a
 | 
			
		||||
        given Workspace.
 | 
			
		||||
    Args:
 | 
			
		||||
        cpu_compute_target: Name of the compute resource
 | 
			
		||||
        vm_size: Virtual machine size, VM_SIZE is used as default,
 | 
			
		||||
            for example STANDARD_D2_V2. Set to STANDARD_NC6 to get a GPU
 | 
			
		||||
        min_nodes: Minimal number of nodes, MIN_NODES is used as default.
 | 
			
		||||
        max_nodes: Minimal number of nodes, MIN_NODES is used as default.
 | 
			
		||||
 | 
			
		||||
    Returns:
 | 
			
		||||
        An aml and set quick load.
 | 
			
		||||
    """
 | 
			
		||||
    # Create or Load a Workspace
 | 
			
		||||
    ml_client = create_or_load_ml_client()
 | 
			
		||||
    try:
 | 
			
		||||
        # let's see if the compute target already exists
 | 
			
		||||
        cpu_cluster = ml_client.compute.get(XXXXX)
 | 
			
		||||
        print(
 | 
			
		||||
            f"You already have a cluster named {XXXXX},",
 | 
			
		||||
            "we'll reuse it.",
 | 
			
		||||
        )
 | 
			
		||||
    except Exception:
 | 
			
		||||
        print("Creating a new cpu compute target...")
 | 
			
		||||
        cpu_cluster = AmlCompute(
 | 
			
		||||
            name=cpu_compute_target,
 | 
			
		||||
            # Azure ML Compute is the on-demand VM service
 | 
			
		||||
            type="amlcompute",
 | 
			
		||||
            # VM Family
 | 
			
		||||
            size=vm_size,
 | 
			
		||||
            # Minimum running nodes when there is no job running
 | 
			
		||||
            min_instances=min_nodes,
 | 
			
		||||
            # Nodes in cluster
 | 
			
		||||
            max_instances=max_nodes,
 | 
			
		||||
            # How many seconds will the node running after the job termination
 | 
			
		||||
            idle_time_before_scale_down=180,
 | 
			
		||||
            # Dedicated or LowPriority.
 | 
			
		||||
            # The latter is cheaper but there is a chance of job termination
 | 
			
		||||
            tier="Dedicated",
 | 
			
		||||
        )
 | 
			
		||||
 | 
			
		||||
        # Now, we pass the object to MLClient's create_or_update method
 | 
			
		||||
        cpu_cluster = ml_client.compute.begin_create_or_update(XXXXX)
 | 
			
		||||
 | 
			
		||||
    return cpu_cluster
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
if __name__ == "__main__":
 | 
			
		||||
    create_or_load_aml()
 | 
			
		||||
							
								
								
									
										10001
									
								
								azuremlpythonsdk-v2/data/diabetes.csv
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										10001
									
								
								azuremlpythonsdk-v2/data/diabetes.csv
									
										
									
									
									
										Normal file
									
								
							
										
											
												File diff suppressed because it is too large
												Load diff
											
										
									
								
							
							
								
								
									
										31
									
								
								azuremlpythonsdk-v2/data_tabular.py
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										31
									
								
								azuremlpythonsdk-v2/data_tabular.py
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,31 @@
 | 
			
		|||
"""
 | 
			
		||||
    Script to create and register file as an uri
 | 
			
		||||
"""
 | 
			
		||||
from azure.ai.ml.constants import AssetTypes
 | 
			
		||||
from azure.ai.ml.entities import Data
 | 
			
		||||
 | 
			
		||||
from ml_client import create_or_load_ml_client
 | 
			
		||||
 | 
			
		||||
name_dataset = "diabetes-dataset"
 | 
			
		||||
data_folder = "./data/diabetes.csv"
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
def create_tabular_dataset():
 | 
			
		||||
    # 1. Create or Load a ML client
 | 
			
		||||
    ml_client = XXXXX()
 | 
			
		||||
 | 
			
		||||
    # 2. Add files
 | 
			
		||||
    if name_dataset not in [XXXXX for env in ml_client.data.list()]:
 | 
			
		||||
        tab_data_set = Data(
 | 
			
		||||
            path=XXXXX,
 | 
			
		||||
            type=AssetTypes.URI_FILE,
 | 
			
		||||
            name=name_dataset,
 | 
			
		||||
        )
 | 
			
		||||
 | 
			
		||||
        ml_client.data.create_or_update(XXXXX)
 | 
			
		||||
    else:
 | 
			
		||||
        print("Dataset already registered.")
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
if __name__ == "__main__":
 | 
			
		||||
    create_tabular_dataset()
 | 
			
		||||
							
								
								
									
										11
									
								
								azuremlpythonsdk-v2/dependencies/conda.yml
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										11
									
								
								azuremlpythonsdk-v2/dependencies/conda.yml
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,11 @@
 | 
			
		|||
name: model-env
 | 
			
		||||
dependencies:
 | 
			
		||||
  - python=3.8
 | 
			
		||||
  - scikit-learn
 | 
			
		||||
  - pandas
 | 
			
		||||
  - numpy
 | 
			
		||||
  - matplotlib
 | 
			
		||||
  - pip
 | 
			
		||||
  - pip:
 | 
			
		||||
    - mlflow
 | 
			
		||||
    - azureml-mlflow
 | 
			
		||||
							
								
								
									
										123
									
								
								azuremlpythonsdk-v2/diabetes_hyperdrive/diabetes_training.py
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										123
									
								
								azuremlpythonsdk-v2/diabetes_hyperdrive/diabetes_training.py
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,123 @@
 | 
			
		|||
# Import libraries
 | 
			
		||||
import argparse
 | 
			
		||||
import os
 | 
			
		||||
 | 
			
		||||
import mlflow
 | 
			
		||||
import mlflow.sklearn
 | 
			
		||||
import numpy as np
 | 
			
		||||
import pandas as pd
 | 
			
		||||
from sklearn.ensemble import GradientBoostingClassifier
 | 
			
		||||
from sklearn.metrics import roc_auc_score
 | 
			
		||||
from sklearn.model_selection import train_test_split
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
def main():
 | 
			
		||||
    """Main function of the script."""
 | 
			
		||||
 | 
			
		||||
    # Input and output arguments
 | 
			
		||||
 | 
			
		||||
    # Get script arguments
 | 
			
		||||
    parser = XXXX()
 | 
			
		||||
 | 
			
		||||
    # Input dataset
 | 
			
		||||
    parser.add_argument(
 | 
			
		||||
        "XXXX",
 | 
			
		||||
        type=str,
 | 
			
		||||
        help="path to input data",
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Model name
 | 
			
		||||
    parser.add_argument("XXXX", type=str, help="model name")
 | 
			
		||||
 | 
			
		||||
    # Hyperparameters
 | 
			
		||||
    parser.add_argument(
 | 
			
		||||
        "XXXX",
 | 
			
		||||
        type=float,
 | 
			
		||||
        dest="learning_rate",
 | 
			
		||||
        default=0.1,
 | 
			
		||||
        help="learning rate",
 | 
			
		||||
    )
 | 
			
		||||
    parser.add_argument(
 | 
			
		||||
        "XXXX",
 | 
			
		||||
        type=int,
 | 
			
		||||
        dest="n_estimators",
 | 
			
		||||
        default=100,
 | 
			
		||||
        help="number of estimators",
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Add arguments to args collection
 | 
			
		||||
    args = parser.parse_args()
 | 
			
		||||
    print(" ".join(f"{k}={v}" for k, v in vars(args).items()))
 | 
			
		||||
 | 
			
		||||
    # Start Logging
 | 
			
		||||
    mlflow.XXXX()
 | 
			
		||||
 | 
			
		||||
    # enable autologging
 | 
			
		||||
    mlflow.XXXX()
 | 
			
		||||
 | 
			
		||||
    # load the diabetes data (passed as an input dataset)
 | 
			
		||||
    print("input data:", args.data)
 | 
			
		||||
 | 
			
		||||
    diabetes = pd.read_csv(args.data)
 | 
			
		||||
 | 
			
		||||
    # Separate features and labels
 | 
			
		||||
    X, y = (
 | 
			
		||||
        diabetes[
 | 
			
		||||
            [
 | 
			
		||||
                "Pregnancies",
 | 
			
		||||
                "PlasmaGlucose",
 | 
			
		||||
                "DiastolicBloodPressure",
 | 
			
		||||
                "TricepsThickness",
 | 
			
		||||
                "SerumInsulin",
 | 
			
		||||
                "BMI",
 | 
			
		||||
                "DiabetesPedigree",
 | 
			
		||||
                "Age",
 | 
			
		||||
            ]
 | 
			
		||||
        ].values,
 | 
			
		||||
        diabetes["Diabetic"].values,
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Split data into training set and test set
 | 
			
		||||
    X_train, X_test, y_train, y_test = XXXX(
 | 
			
		||||
        X, y, test_size=0.30, random_state=0
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Train a Gradient Boosting classification model
 | 
			
		||||
    # with the specified hyperparameters
 | 
			
		||||
    print("Training a classification model")
 | 
			
		||||
    model = XXXX(
 | 
			
		||||
        learning_rate=XXXX, n_estimators=XXXX
 | 
			
		||||
    ).fit(X_train, y_train)
 | 
			
		||||
 | 
			
		||||
    # calculate accuracy
 | 
			
		||||
    y_hat = model.XXXX(X_test)
 | 
			
		||||
    accuracy = np.average(y_hat == y_test)
 | 
			
		||||
    print("Accuracy:", accuracy)
 | 
			
		||||
    mlflow.log_metric("Accuracy", float(accuracy))
 | 
			
		||||
 | 
			
		||||
    # calculate AUC
 | 
			
		||||
    y_scores = model.XXXX(X_test)
 | 
			
		||||
    auc = roc_auc_score(y_test, y_scores[:, 1])
 | 
			
		||||
    print("AUC: " + str(auc))
 | 
			
		||||
    mlflow.log_metric("AUC", float(auc))
 | 
			
		||||
 | 
			
		||||
    # Registering the model to the workspace
 | 
			
		||||
    print("Registering the model via MLFlow")
 | 
			
		||||
    mlflow.XXXX(
 | 
			
		||||
        sk_model=model,
 | 
			
		||||
        registered_model_name=args.registered_model_name,
 | 
			
		||||
        artifact_path=args.registered_model_name,
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Saving the model to a file
 | 
			
		||||
    mlflow.sklearn.save_model(
 | 
			
		||||
        sk_model=model,
 | 
			
		||||
        path=os.path.join(args.registered_model_name, "trained_model"),
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Stop Logging
 | 
			
		||||
    mlflow.XXXX()
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
if __name__ == "__main__":
 | 
			
		||||
    main()
 | 
			
		||||
							
								
								
									
										4
									
								
								azuremlpythonsdk-v2/diabetes_test_inference/request.json
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										4
									
								
								azuremlpythonsdk-v2/diabetes_test_inference/request.json
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,4 @@
 | 
			
		|||
{"input_data": [
 | 
			
		||||
    [2, 180, 74, 24, 21, 23.9091702, 1.488172308, 22],
 | 
			
		||||
    [0, 148, 58, 11, 179, 39.19207553, 0.160829008, 45]
 | 
			
		||||
]}
 | 
			
		||||
							
								
								
									
										115
									
								
								azuremlpythonsdk-v2/diabetes_training/diabetes_training.py
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										115
									
								
								azuremlpythonsdk-v2/diabetes_training/diabetes_training.py
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,115 @@
 | 
			
		|||
# Import libraries
 | 
			
		||||
import argparse
 | 
			
		||||
import os
 | 
			
		||||
 | 
			
		||||
import matplotlib.pyplot as plt
 | 
			
		||||
import mlflow
 | 
			
		||||
import mlflow.sklearn
 | 
			
		||||
import numpy as np
 | 
			
		||||
import pandas as pd
 | 
			
		||||
from sklearn.metrics import roc_auc_score, roc_curve
 | 
			
		||||
from sklearn.model_selection import train_test_split
 | 
			
		||||
from sklearn.tree import DecisionTreeClassifier
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
def main():
 | 
			
		||||
    """Main function of the script."""
 | 
			
		||||
 | 
			
		||||
    # Input and output arguments
 | 
			
		||||
    # Get script arguments
 | 
			
		||||
    parser = argparse.ArgumentParser()
 | 
			
		||||
    parser.add_argument(
 | 
			
		||||
        "--data",
 | 
			
		||||
        type=str,
 | 
			
		||||
        help="path to input data",
 | 
			
		||||
    )
 | 
			
		||||
    parser.add_argument("--registered_model_name", type=str, help="model name")
 | 
			
		||||
    args = parser.parse_args()
 | 
			
		||||
    print(" ".join(f"{k}={v}" for k, v in vars(args).items()))
 | 
			
		||||
 | 
			
		||||
    # Start Logging
 | 
			
		||||
    mlflow.start_run()
 | 
			
		||||
 | 
			
		||||
    # enable autologging
 | 
			
		||||
    mlflow.sklearn.autolog()
 | 
			
		||||
 | 
			
		||||
    # load the diabetes data (passed as an input dataset)
 | 
			
		||||
    print("input data:", args.data)
 | 
			
		||||
 | 
			
		||||
    diabetes = pd.read_csv(args.data)
 | 
			
		||||
 | 
			
		||||
    mlflow.log_metric("num_samples", diabetes.shape[0])
 | 
			
		||||
    mlflow.log_metric("num_features", diabetes.shape[1] - 1)
 | 
			
		||||
 | 
			
		||||
    # Separate features and labels
 | 
			
		||||
    X, y = (
 | 
			
		||||
        diabetes[
 | 
			
		||||
            [
 | 
			
		||||
                "Pregnancies",
 | 
			
		||||
                "PlasmaGlucose",
 | 
			
		||||
                "DiastolicBloodPressure",
 | 
			
		||||
                "TricepsThickness",
 | 
			
		||||
                "SerumInsulin",
 | 
			
		||||
                "BMI",
 | 
			
		||||
                "DiabetesPedigree",
 | 
			
		||||
                "Age",
 | 
			
		||||
            ]
 | 
			
		||||
        ].values,
 | 
			
		||||
        diabetes["Diabetic"].values,
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Split data into training set and test set
 | 
			
		||||
    X_train, X_test, y_train, y_test = train_test_split(
 | 
			
		||||
        X, y, test_size=0.30, random_state=0
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Train a decision tree model
 | 
			
		||||
    print("Training a decision tree model")
 | 
			
		||||
    model = DecisionTreeClassifier().fit(X_train, y_train)
 | 
			
		||||
 | 
			
		||||
    # calculate accuracy
 | 
			
		||||
    y_hat = model.predict(X_test)
 | 
			
		||||
    accuracy = np.average(y_hat == y_test)
 | 
			
		||||
    print("Accuracy:", accuracy)
 | 
			
		||||
    mlflow.log_metric("Accuracy", float(accuracy))
 | 
			
		||||
 | 
			
		||||
    # calculate AUC
 | 
			
		||||
    y_scores = model.predict_proba(X_test)
 | 
			
		||||
    auc = roc_auc_score(y_test, y_scores[:, 1])
 | 
			
		||||
    print("AUC: " + str(auc))
 | 
			
		||||
    mlflow.log_metric("AUC", float(auc))
 | 
			
		||||
 | 
			
		||||
    # plot ROC curve
 | 
			
		||||
    fpr, tpr, thresholds = roc_curve(y_test, y_scores[:, 1])
 | 
			
		||||
    fig = plt.figure(figsize=(6, 4))
 | 
			
		||||
    # Plot the diagonal 50% line
 | 
			
		||||
    plt.plot([0, 1], [0, 1], "k--")
 | 
			
		||||
    # Plot the FPR and TPR achieved by our model
 | 
			
		||||
    plt.plot(fpr, tpr)
 | 
			
		||||
    plt.xlabel("False Positive Rate")
 | 
			
		||||
    plt.ylabel("True Positive Rate")
 | 
			
		||||
    plt.title("ROC Curve")
 | 
			
		||||
    fig.savefig("ROC.png")
 | 
			
		||||
    mlflow.log_artifact("ROC.png")
 | 
			
		||||
    plt.show()
 | 
			
		||||
 | 
			
		||||
    # Registering the model to the workspace
 | 
			
		||||
    print("Registering the model via MLFlow")
 | 
			
		||||
    mlflow.sklearn.log_model(
 | 
			
		||||
        sk_model=model,
 | 
			
		||||
        registered_model_name=args.registered_model_name,
 | 
			
		||||
        artifact_path=args.registered_model_name,
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Saving the model to a file
 | 
			
		||||
    mlflow.sklearn.save_model(
 | 
			
		||||
        sk_model=model,
 | 
			
		||||
        path=os.path.join(args.registered_model_name, "trained_model"),
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Stop Logging
 | 
			
		||||
    mlflow.end_run()
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
if __name__ == "__main__":
 | 
			
		||||
    main()
 | 
			
		||||
							
								
								
									
										33
									
								
								azuremlpythonsdk-v2/environment.py
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										33
									
								
								azuremlpythonsdk-v2/environment.py
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,33 @@
 | 
			
		|||
"""
 | 
			
		||||
    Script to create and register an environment including SKlearn
 | 
			
		||||
"""
 | 
			
		||||
import os
 | 
			
		||||
 | 
			
		||||
from azure.ai.ml.entities import Environment
 | 
			
		||||
 | 
			
		||||
from ml_client import create_or_load_ml_client
 | 
			
		||||
 | 
			
		||||
dependencies_dir = "./dependencies"
 | 
			
		||||
custom_env_name = "custom-scikit-learn"
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
def create_docker_environment():
 | 
			
		||||
    # 1. Create or Load a ML client
 | 
			
		||||
    ml_client =  XXXXX()
 | 
			
		||||
 | 
			
		||||
    # 2. Create a Python environment for the experiment
 | 
			
		||||
    env_docker_image = XXXXX(
 | 
			
		||||
        name=custom_env_name,
 | 
			
		||||
        conda_file=os.path.join(dependencies_dir, "XXXXX"),
 | 
			
		||||
        image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04:latest",
 | 
			
		||||
    )
 | 
			
		||||
    ml_client.environments.create_or_update(env_docker_image)
 | 
			
		||||
 | 
			
		||||
    print(
 | 
			
		||||
        f"Environment with name {env_docker_image.name} is registered to the workspace,",
 | 
			
		||||
        f"the environment version is {env_docker_image.version}"
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
if __name__ == "__main__":
 | 
			
		||||
    create_docker_environment()
 | 
			
		||||
							
								
								
									
										23
									
								
								azuremlpythonsdk-v2/initialize_constants.py
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										23
									
								
								azuremlpythonsdk-v2/initialize_constants.py
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,23 @@
 | 
			
		|||
"""
 | 
			
		||||
    Script to initialize global constants
 | 
			
		||||
"""
 | 
			
		||||
import os
 | 
			
		||||
 | 
			
		||||
# Global constants can be set via environmental variables
 | 
			
		||||
# Remove default values in production
 | 
			
		||||
AZURE_RESOURCE_GROUP = os.getenv("AZURE_RESOURCE_GROUP", "itvitae-azure-ml")
 | 
			
		||||
AZURE_SUBSCRIPTION_ID = os.getenv(
 | 
			
		||||
    "AZURE_SUBSCRIPTION_ID", "34faeead-244d-4ae8-8194-1eeaaffaf5be"
 | 
			
		||||
)
 | 
			
		||||
AZURE_WORKSPACE_NAME = os.getenv(
 | 
			
		||||
    "AZURE_WORKSPACE_NAME",
 | 
			
		||||
    "ws-kevin-heimbach",
 | 
			
		||||
)
 | 
			
		||||
AZURE_LOCATION = os.getenv("AZURE_LOCATION", "westeurope")
 | 
			
		||||
# Choose names for your clusters
 | 
			
		||||
AML_COMPUTE_NAME = os.getenv("AML_COMPUTE_NAME", "aml-compute")
 | 
			
		||||
# General Servers Characteristics
 | 
			
		||||
VM_SIZE = os.getenv("VM_SIZE", "STANDARD_DS2_V2")
 | 
			
		||||
MIN_NODES = int(os.getenv("MIN_NODES", 0))
 | 
			
		||||
MAX_NODES = int(os.getenv("MAX_NODES", 1))
 | 
			
		||||
AGENT_COUNT = int(os.getenv("AGENT_COUNT", 2))
 | 
			
		||||
							
								
								
									
										46
									
								
								azuremlpythonsdk-v2/ml_client.py
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										46
									
								
								azuremlpythonsdk-v2/ml_client.py
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,46 @@
 | 
			
		|||
"""
 | 
			
		||||
    Script to initialize MLClient object
 | 
			
		||||
"""
 | 
			
		||||
from azure.ai.ml import MLClient
 | 
			
		||||
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
 | 
			
		||||
 | 
			
		||||
from initialize_constants import (
 | 
			
		||||
    AZURE_RESOURCE_GROUP,
 | 
			
		||||
    AZURE_SUBSCRIPTION_ID,
 | 
			
		||||
    AZURE_WORKSPACE_NAME,
 | 
			
		||||
)
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
def create_or_load_ml_client():
 | 
			
		||||
    """Create or load an Azure ML Client based on env variables.
 | 
			
		||||
    Args:
 | 
			
		||||
        None since information is taken from global constants
 | 
			
		||||
            defined in initialize_constants.py.
 | 
			
		||||
 | 
			
		||||
    Returns:
 | 
			
		||||
        A workspace and set quick load.
 | 
			
		||||
    """
 | 
			
		||||
    try:
 | 
			
		||||
        credential = DefaultAzureCredential()
 | 
			
		||||
        # Check if given credential can get token successfully.
 | 
			
		||||
        credential.get_token("https://management.azure.com/.default")
 | 
			
		||||
    except Exception as ex:
 | 
			
		||||
        # Fall back to InteractiveBrowserCredential
 | 
			
		||||
        # in case DefaultAzureCredential not working
 | 
			
		||||
        print(ex)
 | 
			
		||||
        credential = InteractiveBrowserCredential()
 | 
			
		||||
 | 
			
		||||
    # Get a handle to the workspace.
 | 
			
		||||
    # You can find the info on the workspace tab on ml.azure.com
 | 
			
		||||
    ml_client = MLClient(
 | 
			
		||||
        credential=credential,
 | 
			
		||||
        subscription_id=XXXXX,
 | 
			
		||||
        resource_group_name=XXXXX,
 | 
			
		||||
        workspace_name=XXXXX,
 | 
			
		||||
    )
 | 
			
		||||
    return ml_client
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
if __name__ == "__main__":
 | 
			
		||||
    ml_client = create_or_load_ml_client()
 | 
			
		||||
    print(ml_client)
 | 
			
		||||
							
								
								
									
										37
									
								
								azuremlpythonsdk-v2/setup.cfg
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										37
									
								
								azuremlpythonsdk-v2/setup.cfg
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,37 @@
 | 
			
		|||
[flake8]
 | 
			
		||||
ignore = E203, W503
 | 
			
		||||
max-line-length = 99
 | 
			
		||||
max-complexity = 18
 | 
			
		||||
select = B,C,E,F,W,T4
 | 
			
		||||
 | 
			
		||||
[isort]
 | 
			
		||||
multi_line_output=3
 | 
			
		||||
include_trailing_comma=True
 | 
			
		||||
force_grid_wrap=0
 | 
			
		||||
use_parentheses=True
 | 
			
		||||
ensure_newline_before_comments=True
 | 
			
		||||
line_length=99
 | 
			
		||||
 | 
			
		||||
[mypy]
 | 
			
		||||
files=refactor,tests
 | 
			
		||||
ignore_missing_imports=True
 | 
			
		||||
 | 
			
		||||
[coverage:run]
 | 
			
		||||
source = refactor
 | 
			
		||||
 | 
			
		||||
[coverage:report]
 | 
			
		||||
exclude_lines =
 | 
			
		||||
    # exclude pragma again
 | 
			
		||||
    pragma: no cover
 | 
			
		||||
 | 
			
		||||
    # exclude main
 | 
			
		||||
    if __name__ == .__main__.:
 | 
			
		||||
 | 
			
		||||
[coverage:html]
 | 
			
		||||
directory = coverage
 | 
			
		||||
 | 
			
		||||
[coverage:xml]
 | 
			
		||||
output = coverage.xml
 | 
			
		||||
 | 
			
		||||
[tool:pytest]
 | 
			
		||||
testpaths=tests/
 | 
			
		||||
							
								
								
									
										23
									
								
								flake.lock
									
										
									
										generated
									
									
									
										Normal file
									
								
							
							
						
						
									
										23
									
								
								flake.lock
									
										
									
										generated
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,23 @@
 | 
			
		|||
{
 | 
			
		||||
  "nodes": {
 | 
			
		||||
    "nixpkgs": {
 | 
			
		||||
      "locked": {
 | 
			
		||||
        "lastModified": 1717196966,
 | 
			
		||||
        "narHash": "sha256-yZKhxVIKd2lsbOqYd5iDoUIwsRZFqE87smE2Vzf6Ck0=",
 | 
			
		||||
        "type": "tarball",
 | 
			
		||||
        "url": "https://flakehub.com/f/NixOS/nixpkgs/0.1.%2A.tar.gz"
 | 
			
		||||
      },
 | 
			
		||||
      "original": {
 | 
			
		||||
        "type": "tarball",
 | 
			
		||||
        "url": "https://flakehub.com/f/NixOS/nixpkgs/0.1.%2A.tar.gz"
 | 
			
		||||
      }
 | 
			
		||||
    },
 | 
			
		||||
    "root": {
 | 
			
		||||
      "inputs": {
 | 
			
		||||
        "nixpkgs": "nixpkgs"
 | 
			
		||||
      }
 | 
			
		||||
    }
 | 
			
		||||
  },
 | 
			
		||||
  "root": "root",
 | 
			
		||||
  "version": 7
 | 
			
		||||
}
 | 
			
		||||
							
								
								
									
										46
									
								
								flake.nix
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										46
									
								
								flake.nix
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,46 @@
 | 
			
		|||
{
 | 
			
		||||
  description = "A Nix-flake-based Jupyter development environment";
 | 
			
		||||
 | 
			
		||||
  inputs.nixpkgs.url = "https://flakehub.com/f/NixOS/nixpkgs/0.1.*.tar.gz";
 | 
			
		||||
 | 
			
		||||
  outputs = {
 | 
			
		||||
    self,
 | 
			
		||||
    nixpkgs,
 | 
			
		||||
  }: let
 | 
			
		||||
    supportedSystems = ["x86_64-linux" "aarch64-linux" "x86_64-darwin" "aarch64-darwin"];
 | 
			
		||||
    forEachSupportedSystem = f:
 | 
			
		||||
      nixpkgs.lib.genAttrs supportedSystems (system:
 | 
			
		||||
        f {
 | 
			
		||||
          pkgs = import nixpkgs {inherit system;};
 | 
			
		||||
        });
 | 
			
		||||
  in {
 | 
			
		||||
    devShells = forEachSupportedSystem ({pkgs}: {
 | 
			
		||||
      default = pkgs.mkShell {
 | 
			
		||||
        venvDir = "venv";
 | 
			
		||||
        packages = with pkgs;
 | 
			
		||||
          [python311 virtualenv]
 | 
			
		||||
          ++ (with pkgs.python311Packages; [
 | 
			
		||||
            pip
 | 
			
		||||
            python-lsp-server
 | 
			
		||||
            venvShellHook
 | 
			
		||||
            requests
 | 
			
		||||
            jupyter
 | 
			
		||||
            pandas
 | 
			
		||||
            numpy
 | 
			
		||||
            matplotlib
 | 
			
		||||
            mlflow
 | 
			
		||||
            seaborn
 | 
			
		||||
            scikit-learn
 | 
			
		||||
            plotnine
 | 
			
		||||
            arrow
 | 
			
		||||
            polars
 | 
			
		||||
            pyarrow
 | 
			
		||||
            ydata-profiling
 | 
			
		||||
            pydot
 | 
			
		||||
            graphviz
 | 
			
		||||
            (python311.pkgs.callPackage ./pkgs/azureml-mlflow/default.nix {})
 | 
			
		||||
          ]);
 | 
			
		||||
      };
 | 
			
		||||
    });
 | 
			
		||||
  };
 | 
			
		||||
}
 | 
			
		||||
							
								
								
									
										33
									
								
								pkgs/azureml-mlflow/default.nix
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										33
									
								
								pkgs/azureml-mlflow/default.nix
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,33 @@
 | 
			
		|||
{
 | 
			
		||||
  lib,
 | 
			
		||||
  buildPythonPackage,
 | 
			
		||||
  fetchPypi,
 | 
			
		||||
  setuptools,
 | 
			
		||||
  python311,
 | 
			
		||||
}:
 | 
			
		||||
buildPythonPackage rec {
 | 
			
		||||
  pname = "azureml_mlflow";
 | 
			
		||||
  version = "1.57.0.post1";
 | 
			
		||||
  format = "wheel";
 | 
			
		||||
 | 
			
		||||
  src = fetchPypi {
 | 
			
		||||
    inherit pname version format;
 | 
			
		||||
    sha256 = "sha256-uK7vQR9aQjXUQ9RXGXY5o7pPMg5ZmMfqbDt0GTfwx6k=";
 | 
			
		||||
    dist = "py3";
 | 
			
		||||
    python = "py3";
 | 
			
		||||
  };
 | 
			
		||||
 | 
			
		||||
  nativeBuildInputs = [setuptools];
 | 
			
		||||
 | 
			
		||||
  propagatedBuildInputs = [
 | 
			
		||||
  ];
 | 
			
		||||
 | 
			
		||||
  doCheck = false; # Package does not contain tests
 | 
			
		||||
 | 
			
		||||
  meta = with lib; {
 | 
			
		||||
    description = "The azureml-mlflow package contains the integration code of AzureML with MLflow. MLflow (https://mlflow.org/) is an open-source platform for tracking machine learning experiments and managing models. You can use MLflow logging APIs with Azure Machine Learning so that metrics and artifacts are logged to your Azure machine learning workspace.";
 | 
			
		||||
    homepage = "https://docs.microsoft.com/python/api/overview/azure/ml/?view=azure-ml-py";
 | 
			
		||||
    license = licenses.mit;
 | 
			
		||||
    maintainers = with maintainers; [Lillian-Violet];
 | 
			
		||||
  };
 | 
			
		||||
}
 | 
			
		||||
							
								
								
									
										124
									
								
								solution-v2/README.md
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										124
									
								
								solution-v2/README.md
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,124 @@
 | 
			
		|||
# Azure ML Lesson 2 Lab
 | 
			
		||||
 | 
			
		||||
## 1. Set environmental variables
 | 
			
		||||
 | 
			
		||||
1. Run VS Code in a Azure ML remote instance as shown before.
 | 
			
		||||
2. Press `File > Open Folder` and navigate to `azuremlpythonsdk-v2/` to open the exercise.
 | 
			
		||||
 | 
			
		||||
**IMPORTANT** Relative paths are assumed to be initialized from the `azuremlpythonsdk-v2` folder.
 | 
			
		||||
 | 
			
		||||
Open the file `initialize_constants.py`, there are three variables that should be updated:
 | 
			
		||||
 | 
			
		||||
- AZURE_WORKSPACE_NAME
 | 
			
		||||
 | 
			
		||||
- AZURE_RESOURCE_GROUP
 | 
			
		||||
 | 
			
		||||
- AZURE_SUBSCRIPTION_ID
 | 
			
		||||
 | 
			
		||||
Open your workspace at in `https://ml.azure.com`. At the top right, select the workspace name, then copy the workspace name, the subscription id and the resource name.
 | 
			
		||||
 | 
			
		||||
## 2. Load a workspace
 | 
			
		||||
 | 
			
		||||
Open the file `ml_client.py` and understand how a ML client object is loaded or created. In this lab, the namespace was already created. Just fill the name of the variables from `initialize_constants.py`.
 | 
			
		||||
 | 
			
		||||
When finished, run this file and check that it is executed without errors.
 | 
			
		||||
 | 
			
		||||
## 3. Load a Compute Cluster
 | 
			
		||||
 | 
			
		||||
Open the file `compute_aml.py` and understand how a compute cluster is loaded or created. In this lab, the compute cluster was already created but some variables should be added, which are marked with `XXXX`.
 | 
			
		||||
 | 
			
		||||
When finished, run this file and check that it is executed without errors.
 | 
			
		||||
 | 
			
		||||
What would happen if the compute cluster is not present?
 | 
			
		||||
 | 
			
		||||
## 4. Create a tabular dataset
 | 
			
		||||
 | 
			
		||||
Open the file `data_tabular.py` , several gaps should be filled which are marked with `XXXX`:
 | 
			
		||||
 | 
			
		||||
1. `ml_client = XXXXX()`
 | 
			
		||||
 | 
			
		||||
   Hint: look into previous files.
 | 
			
		||||
 | 
			
		||||
2. How can you get the names of the datasets already registered in `if name_dataset not in [XXXXX for env in ml_client.data.list()]`
 | 
			
		||||
 | 
			
		||||
   Hint: Try to get one object from the class [Data](https://learn.microsoft.com/en-us/python/api/azure-ai-ml/azure.ai.ml.entities.data?view=azure-python) and check their attributes.
 | 
			
		||||
 | 
			
		||||
3. Which should be the `path` parameter in `path=XXXXX`?
 | 
			
		||||
 | 
			
		||||
4. Which input should you give in `ml_client.data.create_or_update(XXXXX)`?
 | 
			
		||||
 | 
			
		||||
When finished, run this file and check that it is executed without errors.
 | 
			
		||||
 | 
			
		||||
## 5. Create and register an environment
 | 
			
		||||
 | 
			
		||||
Open the file `environment.py` , several gaps should be filled which are marked with `XXXX`:
 | 
			
		||||
 | 
			
		||||
1. `ml_client = XXXXX()`
 | 
			
		||||
 | 
			
		||||
   Hint: look into previous files.
 | 
			
		||||
 | 
			
		||||
2. Get a list of environments already registered and modify the following:
 | 
			
		||||
 | 
			
		||||
`env_list = XXXXX`
 | 
			
		||||
 | 
			
		||||
   Hint: look into previous files.
 | 
			
		||||
 | 
			
		||||
3. Which class should be used to register the environment?
 | 
			
		||||
 | 
			
		||||
   Hint: Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-environments-v2?tabs=python)
 | 
			
		||||
 | 
			
		||||
When finished, run this file and check that it is executed without errors.
 | 
			
		||||
 | 
			
		||||
## 6. Train a model from a tabular dataset using a remote compute
 | 
			
		||||
 | 
			
		||||
Open the file `azml_01_experiment_remote_compute.py` , several gaps should be filled which are marked with `XXXX`:
 | 
			
		||||
 | 
			
		||||
1. `ml_client = XXXX()`
 | 
			
		||||
 | 
			
		||||
   Hint: look into previous files.
 | 
			
		||||
 | 
			
		||||
2. Complete the  `latest_version_dataset` definition.
 | 
			
		||||
 | 
			
		||||
   Hint:  Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/tutorial-azure-ml-in-a-day#deploy-the-model-to-the-endpoint)
 | 
			
		||||
 | 
			
		||||
3. Complete the `Input` part.
 | 
			
		||||
 | 
			
		||||
   Hint:  Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-read-write-data-v2?tabs=python)
 | 
			
		||||
 | 
			
		||||
4. Complete the `command` part.
 | 
			
		||||
 | 
			
		||||
   Hint:  Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-read-write-data-v2?tabs=python)
 | 
			
		||||
 | 
			
		||||
When finished, run this file and check that it is executed without errors.
 | 
			
		||||
 | 
			
		||||
### 7. Tune hyperparameters using a remote compute
 | 
			
		||||
 | 
			
		||||
Open the file `azml_02_hyperparameters_tuning.py` , several gaps should be filled which are marked with `XXXX`. The hyperparameter search should be defined in the following space:
 | 
			
		||||
 | 
			
		||||
- learning_rate: one of the values 0.01, 0.1, 1.0
 | 
			
		||||
 | 
			
		||||
- n_estimators: one of the values 10, 100
 | 
			
		||||
 | 
			
		||||
Hint: Use the previous file as template.
 | 
			
		||||
 | 
			
		||||
Hint: For the `Hyperdrive settings` format, look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-sweep-in-pipeline)
 | 
			
		||||
 | 
			
		||||
Open the file `diabetes_hyperdrive/diabetes_training.py` , several gaps should be filled which are marked with `XXXX`. A Gradient Boosting classification model should be trained and the auc and the accuracy in the test set should be computed.
 | 
			
		||||
 | 
			
		||||
Hint: Use as a template the file `data/diabetes_training.py`.
 | 
			
		||||
 | 
			
		||||
When finished, run this file and check that it is executed without errors.
 | 
			
		||||
 | 
			
		||||
## 8. Create a real-time inferencing service
 | 
			
		||||
 | 
			
		||||
Open the file `azml_03_realtime_inference.py` , several gaps should be filled which are marked with `XXXX`.
 | 
			
		||||
 | 
			
		||||
Hint: Take a look [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-mlflow-models?tabs=fromjob%2Cmir%2Csdk)
 | 
			
		||||
 | 
			
		||||
When finished, run this file and check that it is executed without errors.
 | 
			
		||||
 | 
			
		||||
## 9. Test the inference service
 | 
			
		||||
 | 
			
		||||
Open the file `azml_04_test_inference.py` , several gaps should be filled which are marked with `XXXX`.
 | 
			
		||||
 | 
			
		||||
Hint: Check [here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-safely-rollout-online-endpoints?view=azureml-api-2&tabs=python)
 | 
			
		||||
										
											Binary file not shown.
										
									
								
							
										
											Binary file not shown.
										
									
								
							
										
											Binary file not shown.
										
									
								
							
										
											Binary file not shown.
										
									
								
							
							
								
								
									
										
											BIN
										
									
								
								solution-v2/__pycache__/compute_aml.cpython-312.pyc
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								solution-v2/__pycache__/compute_aml.cpython-312.pyc
									
										
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							
							
								
								
									
										
											BIN
										
									
								
								solution-v2/__pycache__/compute_aml.cpython-38.pyc
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								solution-v2/__pycache__/compute_aml.cpython-38.pyc
									
										
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							
							
								
								
									
										
											BIN
										
									
								
								solution-v2/__pycache__/data_tabular.cpython-312.pyc
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								solution-v2/__pycache__/data_tabular.cpython-312.pyc
									
										
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							
							
								
								
									
										
											BIN
										
									
								
								solution-v2/__pycache__/data_tabular.cpython-38.pyc
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								solution-v2/__pycache__/data_tabular.cpython-38.pyc
									
										
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							
							
								
								
									
										
											BIN
										
									
								
								solution-v2/__pycache__/environment.cpython-312.pyc
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								solution-v2/__pycache__/environment.cpython-312.pyc
									
										
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							
							
								
								
									
										
											BIN
										
									
								
								solution-v2/__pycache__/environment.cpython-38.pyc
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								solution-v2/__pycache__/environment.cpython-38.pyc
									
										
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							
							
								
								
									
										
											BIN
										
									
								
								solution-v2/__pycache__/initialize_constants.cpython-312.pyc
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								solution-v2/__pycache__/initialize_constants.cpython-312.pyc
									
										
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							
							
								
								
									
										
											BIN
										
									
								
								solution-v2/__pycache__/initialize_constants.cpython-38.pyc
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								solution-v2/__pycache__/initialize_constants.cpython-38.pyc
									
										
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							
							
								
								
									
										
											BIN
										
									
								
								solution-v2/__pycache__/ml_client.cpython-312.pyc
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								solution-v2/__pycache__/ml_client.cpython-312.pyc
									
										
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							
							
								
								
									
										
											BIN
										
									
								
								solution-v2/__pycache__/ml_client.cpython-38.pyc
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								solution-v2/__pycache__/ml_client.cpython-38.pyc
									
										
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							
							
								
								
									
										70
									
								
								solution-v2/azml_01_experiment_remote_compute.py
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										70
									
								
								solution-v2/azml_01_experiment_remote_compute.py
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,70 @@
 | 
			
		|||
"""
 | 
			
		||||
    Script to train a model from a tabular dataset using a remote compute
 | 
			
		||||
    Based on:
 | 
			
		||||
    https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-scikit-learn
 | 
			
		||||
"""
 | 
			
		||||
from azure.ai.ml import Input, command
 | 
			
		||||
from azure.ai.ml.constants import AssetTypes
 | 
			
		||||
 | 
			
		||||
from compute_aml import create_or_load_aml
 | 
			
		||||
from data_tabular import create_tabular_dataset, name_dataset
 | 
			
		||||
from environment import custom_env_name
 | 
			
		||||
from initialize_constants import AML_COMPUTE_NAME
 | 
			
		||||
from ml_client import create_or_load_ml_client
 | 
			
		||||
 | 
			
		||||
experiment_name = "mslearn-train-diabetes"
 | 
			
		||||
experiment_folder = "./diabetes_training"
 | 
			
		||||
script_name = "diabetes_training.py"
 | 
			
		||||
registered_model_name = "diabetes_model"
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
def main():
 | 
			
		||||
    # 1. Create or Load a ML client
 | 
			
		||||
    ml_client = create_or_load_ml_client()
 | 
			
		||||
 | 
			
		||||
    # 2. Create compute resources
 | 
			
		||||
    create_or_load_aml()
 | 
			
		||||
 | 
			
		||||
    # 3. Create and register a File Dataset
 | 
			
		||||
    create_tabular_dataset()
 | 
			
		||||
    latest_version_dataset = next(
 | 
			
		||||
        dataset.latest_version
 | 
			
		||||
        for dataset in ml_client.data.list()
 | 
			
		||||
        if dataset.name == name_dataset
 | 
			
		||||
    )
 | 
			
		||||
    print(list(ml_client.data.list()))
 | 
			
		||||
    # 4. Run Job
 | 
			
		||||
    job = command(
 | 
			
		||||
        inputs=dict(
 | 
			
		||||
            script_name=script_name,
 | 
			
		||||
            data=Input(
 | 
			
		||||
                type=AssetTypes.URI_FILE,
 | 
			
		||||
                # @latest doesn't work with dataset paths
 | 
			
		||||
                path=f"azureml:{name_dataset}:{latest_version_dataset}",
 | 
			
		||||
            ),
 | 
			
		||||
            registered_model_name=registered_model_name,
 | 
			
		||||
        ),
 | 
			
		||||
        code=experiment_folder,
 | 
			
		||||
        command=(
 | 
			
		||||
            "python ${{inputs.script_name}}"
 | 
			
		||||
            + " --data ${{inputs.data}}"
 | 
			
		||||
            + " --registered_model_name ${{inputs.registered_model_name}}"
 | 
			
		||||
        ),
 | 
			
		||||
        environment=f"{custom_env_name}@latest",
 | 
			
		||||
        compute=AML_COMPUTE_NAME,
 | 
			
		||||
        experiment_name=experiment_name,
 | 
			
		||||
        display_name=experiment_name,
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # submit the command
 | 
			
		||||
    returned_job = ml_client.jobs.create_or_update(job)
 | 
			
		||||
 | 
			
		||||
    # stream the output and wait until the job is finished
 | 
			
		||||
    ml_client.jobs.stream(returned_job.name)
 | 
			
		||||
 | 
			
		||||
    # refresh the latest status of the job after streaming
 | 
			
		||||
    returned_job = ml_client.jobs.get(name=returned_job.name)
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
if __name__ == "__main__":
 | 
			
		||||
    main()
 | 
			
		||||
							
								
								
									
										115
									
								
								solution-v2/azml_02_hyperparameters_tuning.py
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										115
									
								
								solution-v2/azml_02_hyperparameters_tuning.py
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,115 @@
 | 
			
		|||
"""
 | 
			
		||||
    Script to train tune hyperparameters
 | 
			
		||||
    Based on:
 | 
			
		||||
    https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-scikit-learn
 | 
			
		||||
"""
 | 
			
		||||
from azure.ai.ml import Input, command
 | 
			
		||||
from azure.ai.ml.constants import AssetTypes
 | 
			
		||||
from azure.ai.ml.entities import Model
 | 
			
		||||
from azure.ai.ml.sweep import Choice
 | 
			
		||||
 | 
			
		||||
from compute_aml import create_or_load_aml
 | 
			
		||||
from data_tabular import create_tabular_dataset, name_dataset
 | 
			
		||||
from environment import create_docker_environment, custom_env_name
 | 
			
		||||
from initialize_constants import AML_COMPUTE_NAME
 | 
			
		||||
from ml_client import create_or_load_ml_client
 | 
			
		||||
 | 
			
		||||
experiment_folder = "diabetes_hyperdrive"
 | 
			
		||||
experiment_name = "mslearn-diabetes-hyperdrive"
 | 
			
		||||
script_name = "diabetes_training.py"
 | 
			
		||||
registered_model_name = "diabetes_model_hyper"
 | 
			
		||||
best_model_name = "best_diabetes_model"
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
def main():
 | 
			
		||||
    # 1. Create or Load a ML client
 | 
			
		||||
    ml_client = create_or_load_ml_client()
 | 
			
		||||
 | 
			
		||||
    # 2. Create compute resources
 | 
			
		||||
    create_or_load_aml()
 | 
			
		||||
 | 
			
		||||
    # 3. Create and register a File Dataset
 | 
			
		||||
    create_tabular_dataset()
 | 
			
		||||
    latest_version_dataset = max(
 | 
			
		||||
        [int(d.version) for d in ml_client.data.list(name=name_dataset)]
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # 4. Environment
 | 
			
		||||
    environment_names = [env.name for env in ml_client.environments.list()]
 | 
			
		||||
    if custom_env_name not in environment_names:
 | 
			
		||||
        create_docker_environment()
 | 
			
		||||
 | 
			
		||||
    # 5. Run Job
 | 
			
		||||
    job_for_sweep = command(
 | 
			
		||||
        inputs=dict(
 | 
			
		||||
            script_name=script_name,
 | 
			
		||||
            data=Input(
 | 
			
		||||
                type=AssetTypes.URI_FILE,
 | 
			
		||||
                # @latest doesn't work with dataset paths
 | 
			
		||||
                path=f"azureml:{name_dataset}:{latest_version_dataset}",
 | 
			
		||||
            ),
 | 
			
		||||
            registered_model_name=registered_model_name,
 | 
			
		||||
            learning_rate=Choice(values=[0.01, 0.1, 1.0]),
 | 
			
		||||
            n_estimators=Choice(values=[10, 100]),
 | 
			
		||||
        ),
 | 
			
		||||
        code=experiment_folder,
 | 
			
		||||
        command=(
 | 
			
		||||
            "python ${{inputs.script_name}}"
 | 
			
		||||
            + " --data ${{inputs.data}}"
 | 
			
		||||
            + " --registered_model_name ${{inputs.registered_model_name}}"
 | 
			
		||||
            + " --learning_rate ${{inputs.learning_rate}}"
 | 
			
		||||
            + " --n_estimators ${{inputs.n_estimators}}"
 | 
			
		||||
        ),
 | 
			
		||||
        environment=f"{custom_env_name}@latest",
 | 
			
		||||
        compute=AML_COMPUTE_NAME,
 | 
			
		||||
        experiment_name=experiment_name,
 | 
			
		||||
        display_name=experiment_name,
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Configure hyperdrive settings
 | 
			
		||||
    sweep_job = job_for_sweep.sweep(
 | 
			
		||||
        compute=AML_COMPUTE_NAME,
 | 
			
		||||
        sampling_algorithm="grid",
 | 
			
		||||
        primary_metric="AUC",
 | 
			
		||||
        goal="Maximize",
 | 
			
		||||
        max_total_trials=6,
 | 
			
		||||
        max_concurrent_trials=2,
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # submit the command
 | 
			
		||||
    returned_sweep_job = ml_client.create_or_update(sweep_job)
 | 
			
		||||
 | 
			
		||||
    # stream the output and wait until the job is finished
 | 
			
		||||
    ml_client.jobs.stream(returned_sweep_job.name)
 | 
			
		||||
 | 
			
		||||
    # refresh the latest status of the job after streaming
 | 
			
		||||
    returned_sweep_job = ml_client.jobs.get(name=returned_sweep_job.name)
 | 
			
		||||
 | 
			
		||||
    # Find and register the best model
 | 
			
		||||
    if returned_sweep_job.status == "Completed":
 | 
			
		||||
        # First let us get the run which gave us the best result
 | 
			
		||||
        best_run = returned_sweep_job.properties["best_child_run_id"]
 | 
			
		||||
 | 
			
		||||
        # lets get the model from this run
 | 
			
		||||
        model = Model(
 | 
			
		||||
            # the script stores the model as the given name
 | 
			
		||||
            path=(
 | 
			
		||||
                f"azureml://jobs/{best_run}/outputs/artifacts/paths/"
 | 
			
		||||
                + f"{registered_model_name}/"
 | 
			
		||||
            ),
 | 
			
		||||
            name=best_model_name,
 | 
			
		||||
            type="mlflow_model",
 | 
			
		||||
        )
 | 
			
		||||
    else:
 | 
			
		||||
        print(
 | 
			
		||||
            f"Sweep job status: {returned_sweep_job.status}. \
 | 
			
		||||
                Please wait until it completes"
 | 
			
		||||
        )
 | 
			
		||||
 | 
			
		||||
    # Register best model
 | 
			
		||||
    print(f"Registering Model {best_model_name}")
 | 
			
		||||
    ml_client.models.create_or_update(model=model)
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
if __name__ == "__main__":
 | 
			
		||||
    main()
 | 
			
		||||
							
								
								
									
										51
									
								
								solution-v2/azml_03_realtime_inference.py
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										51
									
								
								solution-v2/azml_03_realtime_inference.py
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,51 @@
 | 
			
		|||
"""
 | 
			
		||||
    Script to create a real-time inferencing service
 | 
			
		||||
    Based on:
 | 
			
		||||
    https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-mlflow-models
 | 
			
		||||
"""
 | 
			
		||||
from azure.ai.ml.entities import ManagedOnlineDeployment, ManagedOnlineEndpoint
 | 
			
		||||
 | 
			
		||||
from azml_02_hyperparameters_tuning import best_model_name
 | 
			
		||||
from initialize_constants import AZURE_WORKSPACE_NAME, VM_SIZE
 | 
			
		||||
from ml_client import create_or_load_ml_client
 | 
			
		||||
 | 
			
		||||
online_endpoint_name = ("srv-" + AZURE_WORKSPACE_NAME).lower()
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
def main():
 | 
			
		||||
    # 1. Create or Load a ML client
 | 
			
		||||
    ml_client = create_or_load_ml_client()
 | 
			
		||||
 | 
			
		||||
    # 2. Create a endpoint
 | 
			
		||||
    print(f"Creating endpoint {online_endpoint_name}")
 | 
			
		||||
    endpoint = ManagedOnlineEndpoint(
 | 
			
		||||
        name=online_endpoint_name,
 | 
			
		||||
        auth_mode="key",
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Method `result()` should be added to wait until completion
 | 
			
		||||
    ml_client.online_endpoints.begin_create_or_update(endpoint).result()
 | 
			
		||||
 | 
			
		||||
    # 3. Create a deployment
 | 
			
		||||
    best_model_latest_version = max(
 | 
			
		||||
        [int(m.version) for m in ml_client.models.list(name=best_model_name)]
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    blue_deployment = ManagedOnlineDeployment(
 | 
			
		||||
        name=online_endpoint_name,
 | 
			
		||||
        endpoint_name=online_endpoint_name,
 | 
			
		||||
        # @latest doesn't work with model paths
 | 
			
		||||
        model=f"azureml:{best_model_name}:{best_model_latest_version}",
 | 
			
		||||
        instance_type=VM_SIZE,
 | 
			
		||||
        instance_count=1,
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Assign all the traffic to this endpoint
 | 
			
		||||
    # Method `result()` should be added to wait until completion
 | 
			
		||||
    ml_client.begin_create_or_update(blue_deployment).result()
 | 
			
		||||
    endpoint.traffic = {online_endpoint_name: 100}
 | 
			
		||||
    ml_client.begin_create_or_update(endpoint).result()
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
if __name__ == "__main__":
 | 
			
		||||
    main()
 | 
			
		||||
							
								
								
									
										23
									
								
								solution-v2/azml_04_test_inference.py
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										23
									
								
								solution-v2/azml_04_test_inference.py
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,23 @@
 | 
			
		|||
"""
 | 
			
		||||
    Script to use real-time inferencing with online endpoints
 | 
			
		||||
"""
 | 
			
		||||
from azml_03_realtime_inference import online_endpoint_name
 | 
			
		||||
from ml_client import create_or_load_ml_client
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
def main():
 | 
			
		||||
    # 1. Load a Workspace
 | 
			
		||||
    ml_client = create_or_load_ml_client()
 | 
			
		||||
 | 
			
		||||
    # 2. Get predictions
 | 
			
		||||
    output = ml_client.online_endpoints.invoke(
 | 
			
		||||
        endpoint_name=online_endpoint_name,
 | 
			
		||||
        deployment_name=online_endpoint_name,
 | 
			
		||||
        request_file="./diabetes_test_inference/request.json",
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    print(output)
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
if __name__ == "__main__":
 | 
			
		||||
    main()
 | 
			
		||||
							
								
								
									
										63
									
								
								solution-v2/compute_aml.py
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										63
									
								
								solution-v2/compute_aml.py
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,63 @@
 | 
			
		|||
"""
 | 
			
		||||
    Script to initialize an Azure Machine Learning compute cluster (aml)
 | 
			
		||||
"""
 | 
			
		||||
from azure.ai.ml.entities import AmlCompute
 | 
			
		||||
 | 
			
		||||
from initialize_constants import AML_COMPUTE_NAME, MAX_NODES, MIN_NODES, VM_SIZE
 | 
			
		||||
from ml_client import create_or_load_ml_client
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
def create_or_load_aml(
 | 
			
		||||
    cpu_compute_target=AML_COMPUTE_NAME,
 | 
			
		||||
    vm_size=VM_SIZE,
 | 
			
		||||
    min_nodes=MIN_NODES,
 | 
			
		||||
    max_nodes=MAX_NODES,
 | 
			
		||||
):
 | 
			
		||||
    """Create or load an Azure Machine Learning compute cluster (aml) in a
 | 
			
		||||
        given Workspace.
 | 
			
		||||
    Args:
 | 
			
		||||
        cpu_compute_target: Name of the compute resource
 | 
			
		||||
        vm_size: Virtual machine size, VM_SIZE is used as default,
 | 
			
		||||
            for example STANDARD_D2_V2. Set to STANDARD_NC6 to get a GPU
 | 
			
		||||
        min_nodes: Minimal number of nodes, MIN_NODES is used as default.
 | 
			
		||||
        max_nodes: Minimal number of nodes, MIN_NODES is used as default.
 | 
			
		||||
 | 
			
		||||
    Returns:
 | 
			
		||||
        An aml and set quick load.
 | 
			
		||||
    """
 | 
			
		||||
    # Create or Load a Workspace
 | 
			
		||||
    ml_client = create_or_load_ml_client()
 | 
			
		||||
    try:
 | 
			
		||||
        # let's see if the compute target already exists
 | 
			
		||||
        cpu_cluster = ml_client.compute.get(cpu_compute_target)
 | 
			
		||||
        print(
 | 
			
		||||
            f"You already have a cluster named {cpu_compute_target},",
 | 
			
		||||
            "we'll reuse it.",
 | 
			
		||||
        )
 | 
			
		||||
    except Exception:
 | 
			
		||||
        print("Creating a new cpu compute target...")
 | 
			
		||||
        cpu_cluster = AmlCompute(
 | 
			
		||||
            name=cpu_compute_target,
 | 
			
		||||
            # Azure ML Compute is the on-demand VM service
 | 
			
		||||
            type="amlcompute",
 | 
			
		||||
            # VM Family
 | 
			
		||||
            size=vm_size,
 | 
			
		||||
            # Minimum running nodes when there is no job running
 | 
			
		||||
            min_instances=min_nodes,
 | 
			
		||||
            # Nodes in cluster
 | 
			
		||||
            max_instances=max_nodes,
 | 
			
		||||
            # How many seconds will the node running after the job termination
 | 
			
		||||
            idle_time_before_scale_down=180,
 | 
			
		||||
            # Dedicated or LowPriority.
 | 
			
		||||
            # The latter is cheaper but there is a chance of job termination
 | 
			
		||||
            tier="Dedicated",
 | 
			
		||||
        )
 | 
			
		||||
 | 
			
		||||
        # Now, we pass the object to MLClient's create_or_update method
 | 
			
		||||
        cpu_cluster = ml_client.compute.begin_create_or_update(cpu_cluster)
 | 
			
		||||
 | 
			
		||||
    return cpu_cluster
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
if __name__ == "__main__":
 | 
			
		||||
    create_or_load_aml()
 | 
			
		||||
							
								
								
									
										10001
									
								
								solution-v2/data/diabetes.csv
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										10001
									
								
								solution-v2/data/diabetes.csv
									
										
									
									
									
										Normal file
									
								
							
										
											
												File diff suppressed because it is too large
												Load diff
											
										
									
								
							
							
								
								
									
										31
									
								
								solution-v2/data_tabular.py
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										31
									
								
								solution-v2/data_tabular.py
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,31 @@
 | 
			
		|||
"""
 | 
			
		||||
    Script to create and register file as an uri
 | 
			
		||||
"""
 | 
			
		||||
from azure.ai.ml.constants import AssetTypes
 | 
			
		||||
from azure.ai.ml.entities import Data
 | 
			
		||||
 | 
			
		||||
from ml_client import create_or_load_ml_client
 | 
			
		||||
 | 
			
		||||
name_dataset = "diabetes-dataset"
 | 
			
		||||
data_folder = "./data/diabetes.csv"
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
def create_tabular_dataset():
 | 
			
		||||
    # 1. Create or Load a ML client
 | 
			
		||||
    ml_client = create_or_load_ml_client()
 | 
			
		||||
 | 
			
		||||
    # 2. Add files
 | 
			
		||||
    if name_dataset not in [dataset.name for dataset in ml_client.data.list()]:
 | 
			
		||||
        tab_data_set = Data(
 | 
			
		||||
            path=data_folder,
 | 
			
		||||
            type=AssetTypes.URI_FILE,
 | 
			
		||||
            name=name_dataset,
 | 
			
		||||
        )
 | 
			
		||||
 | 
			
		||||
        ml_client.data.create_or_update(tab_data_set)
 | 
			
		||||
    else:
 | 
			
		||||
        print("Dataset already registered.")
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
if __name__ == "__main__":
 | 
			
		||||
    create_tabular_dataset()
 | 
			
		||||
							
								
								
									
										11
									
								
								solution-v2/dependencies/conda.yml
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										11
									
								
								solution-v2/dependencies/conda.yml
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,11 @@
 | 
			
		|||
name: model-env
 | 
			
		||||
dependencies:
 | 
			
		||||
  - python=3.8
 | 
			
		||||
  - scikit-learn
 | 
			
		||||
  - pandas
 | 
			
		||||
  - numpy
 | 
			
		||||
  - matplotlib
 | 
			
		||||
  - pip
 | 
			
		||||
  - pip:
 | 
			
		||||
    - mlflow
 | 
			
		||||
    - azureml-mlflow
 | 
			
		||||
							
								
								
									
										123
									
								
								solution-v2/diabetes_hyperdrive/diabetes_training.py
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										123
									
								
								solution-v2/diabetes_hyperdrive/diabetes_training.py
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,123 @@
 | 
			
		|||
# Import libraries
 | 
			
		||||
import argparse
 | 
			
		||||
import os
 | 
			
		||||
 | 
			
		||||
import mlflow
 | 
			
		||||
import mlflow.sklearn
 | 
			
		||||
import numpy as np
 | 
			
		||||
import pandas as pd
 | 
			
		||||
from sklearn.ensemble import GradientBoostingClassifier
 | 
			
		||||
from sklearn.metrics import roc_auc_score
 | 
			
		||||
from sklearn.model_selection import train_test_split
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
def main():
 | 
			
		||||
    """Main function of the script."""
 | 
			
		||||
 | 
			
		||||
    # Input and output arguments
 | 
			
		||||
 | 
			
		||||
    # Get script arguments
 | 
			
		||||
    parser = argparse.ArgumentParser()
 | 
			
		||||
 | 
			
		||||
    # Input dataset
 | 
			
		||||
    parser.add_argument(
 | 
			
		||||
        "--data",
 | 
			
		||||
        type=str,
 | 
			
		||||
        help="path to input data",
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Model name
 | 
			
		||||
    parser.add_argument("--registered_model_name", type=str, help="model name")
 | 
			
		||||
 | 
			
		||||
    # Hyperparameters
 | 
			
		||||
    parser.add_argument(
 | 
			
		||||
        "--learning_rate",
 | 
			
		||||
        type=float,
 | 
			
		||||
        dest="learning_rate",
 | 
			
		||||
        default=0.1,
 | 
			
		||||
        help="learning rate",
 | 
			
		||||
    )
 | 
			
		||||
    parser.add_argument(
 | 
			
		||||
        "--n_estimators",
 | 
			
		||||
        type=int,
 | 
			
		||||
        dest="n_estimators",
 | 
			
		||||
        default=100,
 | 
			
		||||
        help="number of estimators",
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Add arguments to args collection
 | 
			
		||||
    args = parser.parse_args()
 | 
			
		||||
    print(" ".join(f"{k}={v}" for k, v in vars(args).items()))
 | 
			
		||||
 | 
			
		||||
    # Start Logging
 | 
			
		||||
    mlflow.start_run()
 | 
			
		||||
 | 
			
		||||
    # enable autologging
 | 
			
		||||
    mlflow.sklearn.autolog()
 | 
			
		||||
 | 
			
		||||
    # load the diabetes data (passed as an input dataset)
 | 
			
		||||
    print("input data:", args.data)
 | 
			
		||||
 | 
			
		||||
    diabetes = pd.read_csv(args.data)
 | 
			
		||||
 | 
			
		||||
    # Separate features and labels
 | 
			
		||||
    X, y = (
 | 
			
		||||
        diabetes[
 | 
			
		||||
            [
 | 
			
		||||
                "Pregnancies",
 | 
			
		||||
                "PlasmaGlucose",
 | 
			
		||||
                "DiastolicBloodPressure",
 | 
			
		||||
                "TricepsThickness",
 | 
			
		||||
                "SerumInsulin",
 | 
			
		||||
                "BMI",
 | 
			
		||||
                "DiabetesPedigree",
 | 
			
		||||
                "Age",
 | 
			
		||||
            ]
 | 
			
		||||
        ].values,
 | 
			
		||||
        diabetes["Diabetic"].values,
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Split data into training set and test set
 | 
			
		||||
    X_train, X_test, y_train, y_test = train_test_split(
 | 
			
		||||
        X, y, test_size=0.30, random_state=0
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Train a Gradient Boosting classification model
 | 
			
		||||
    # with the specified hyperparameters
 | 
			
		||||
    print("Training a classification model")
 | 
			
		||||
    model = GradientBoostingClassifier(
 | 
			
		||||
        learning_rate=args.learning_rate, n_estimators=args.n_estimators
 | 
			
		||||
    ).fit(X_train, y_train)
 | 
			
		||||
 | 
			
		||||
    # calculate accuracy
 | 
			
		||||
    y_hat = model.predict(X_test)
 | 
			
		||||
    accuracy = np.average(y_hat == y_test)
 | 
			
		||||
    print("Accuracy:", accuracy)
 | 
			
		||||
    mlflow.log_metric("Accuracy", float(accuracy))
 | 
			
		||||
 | 
			
		||||
    # calculate AUC
 | 
			
		||||
    y_scores = model.predict_proba(X_test)
 | 
			
		||||
    auc = roc_auc_score(y_test, y_scores[:, 1])
 | 
			
		||||
    print("AUC: " + str(auc))
 | 
			
		||||
    mlflow.log_metric("AUC", float(auc))
 | 
			
		||||
 | 
			
		||||
    # Registering the model to the workspace
 | 
			
		||||
    print("Registering the model via MLFlow")
 | 
			
		||||
    mlflow.sklearn.log_model(
 | 
			
		||||
        sk_model=model,
 | 
			
		||||
        registered_model_name=args.registered_model_name,
 | 
			
		||||
        artifact_path=args.registered_model_name,
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Saving the model to a file
 | 
			
		||||
    mlflow.sklearn.save_model(
 | 
			
		||||
        sk_model=model,
 | 
			
		||||
        path=os.path.join(args.registered_model_name, "trained_model"),
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Stop Logging
 | 
			
		||||
    mlflow.end_run()
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
if __name__ == "__main__":
 | 
			
		||||
    main()
 | 
			
		||||
							
								
								
									
										4
									
								
								solution-v2/diabetes_test_inference/request.json
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										4
									
								
								solution-v2/diabetes_test_inference/request.json
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,4 @@
 | 
			
		|||
{"input_data": [
 | 
			
		||||
    [2, 180, 74, 24, 21, 23.9091702, 1.488172308, 22],
 | 
			
		||||
    [0, 148, 58, 11, 179, 39.19207553, 0.160829008, 45]
 | 
			
		||||
]}
 | 
			
		||||
							
								
								
									
										115
									
								
								solution-v2/diabetes_training/diabetes_training.py
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										115
									
								
								solution-v2/diabetes_training/diabetes_training.py
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,115 @@
 | 
			
		|||
# Import libraries
 | 
			
		||||
import argparse
 | 
			
		||||
import os
 | 
			
		||||
 | 
			
		||||
import matplotlib.pyplot as plt
 | 
			
		||||
import mlflow
 | 
			
		||||
import mlflow.sklearn
 | 
			
		||||
import numpy as np
 | 
			
		||||
import pandas as pd
 | 
			
		||||
from sklearn.metrics import roc_auc_score, roc_curve
 | 
			
		||||
from sklearn.model_selection import train_test_split
 | 
			
		||||
from sklearn.tree import DecisionTreeClassifier
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
def main():
 | 
			
		||||
    """Main function of the script."""
 | 
			
		||||
 | 
			
		||||
    # Input and output arguments
 | 
			
		||||
    # Get script arguments
 | 
			
		||||
    parser = argparse.ArgumentParser()
 | 
			
		||||
    parser.add_argument(
 | 
			
		||||
        "--data",
 | 
			
		||||
        type=str,
 | 
			
		||||
        help="path to input data",
 | 
			
		||||
    )
 | 
			
		||||
    parser.add_argument("--registered_model_name", type=str, help="model name")
 | 
			
		||||
    args = parser.parse_args()
 | 
			
		||||
    print(" ".join(f"{k}={v}" for k, v in vars(args).items()))
 | 
			
		||||
 | 
			
		||||
    # Start Logging
 | 
			
		||||
    mlflow.start_run()
 | 
			
		||||
 | 
			
		||||
    # enable autologging
 | 
			
		||||
    mlflow.sklearn.autolog()
 | 
			
		||||
 | 
			
		||||
    # load the diabetes data (passed as an input dataset)
 | 
			
		||||
    print("input data:", args.data)
 | 
			
		||||
 | 
			
		||||
    diabetes = pd.read_csv(args.data)
 | 
			
		||||
 | 
			
		||||
    mlflow.log_metric("num_samples", diabetes.shape[0])
 | 
			
		||||
    mlflow.log_metric("num_features", diabetes.shape[1] - 1)
 | 
			
		||||
 | 
			
		||||
    # Separate features and labels
 | 
			
		||||
    X, y = (
 | 
			
		||||
        diabetes[
 | 
			
		||||
            [
 | 
			
		||||
                "Pregnancies",
 | 
			
		||||
                "PlasmaGlucose",
 | 
			
		||||
                "DiastolicBloodPressure",
 | 
			
		||||
                "TricepsThickness",
 | 
			
		||||
                "SerumInsulin",
 | 
			
		||||
                "BMI",
 | 
			
		||||
                "DiabetesPedigree",
 | 
			
		||||
                "Age",
 | 
			
		||||
            ]
 | 
			
		||||
        ].values,
 | 
			
		||||
        diabetes["Diabetic"].values,
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Split data into training set and test set
 | 
			
		||||
    X_train, X_test, y_train, y_test = train_test_split(
 | 
			
		||||
        X, y, test_size=0.30, random_state=0
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Train a decision tree model
 | 
			
		||||
    print("Training a decision tree model")
 | 
			
		||||
    model = DecisionTreeClassifier().fit(X_train, y_train)
 | 
			
		||||
 | 
			
		||||
    # calculate accuracy
 | 
			
		||||
    y_hat = model.predict(X_test)
 | 
			
		||||
    accuracy = np.average(y_hat == y_test)
 | 
			
		||||
    print("Accuracy:", accuracy)
 | 
			
		||||
    mlflow.log_metric("Accuracy", float(accuracy))
 | 
			
		||||
 | 
			
		||||
    # calculate AUC
 | 
			
		||||
    y_scores = model.predict_proba(X_test)
 | 
			
		||||
    auc = roc_auc_score(y_test, y_scores[:, 1])
 | 
			
		||||
    print("AUC: " + str(auc))
 | 
			
		||||
    mlflow.log_metric("AUC", float(auc))
 | 
			
		||||
 | 
			
		||||
    # plot ROC curve
 | 
			
		||||
    fpr, tpr, thresholds = roc_curve(y_test, y_scores[:, 1])
 | 
			
		||||
    fig = plt.figure(figsize=(6, 4))
 | 
			
		||||
    # Plot the diagonal 50% line
 | 
			
		||||
    plt.plot([0, 1], [0, 1], "k--")
 | 
			
		||||
    # Plot the FPR and TPR achieved by our model
 | 
			
		||||
    plt.plot(fpr, tpr)
 | 
			
		||||
    plt.xlabel("False Positive Rate")
 | 
			
		||||
    plt.ylabel("True Positive Rate")
 | 
			
		||||
    plt.title("ROC Curve")
 | 
			
		||||
    fig.savefig("ROC.png")
 | 
			
		||||
    mlflow.log_artifact("ROC.png")
 | 
			
		||||
    plt.show()
 | 
			
		||||
 | 
			
		||||
    # Registering the model to the workspace
 | 
			
		||||
    print("Registering the model via MLFlow")
 | 
			
		||||
    mlflow.sklearn.log_model(
 | 
			
		||||
        sk_model=model,
 | 
			
		||||
        registered_model_name=args.registered_model_name,
 | 
			
		||||
        artifact_path=args.registered_model_name,
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Saving the model to a file
 | 
			
		||||
    mlflow.sklearn.save_model(
 | 
			
		||||
        sk_model=model,
 | 
			
		||||
        path=os.path.join(args.registered_model_name, "trained_model"),
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
    # Stop Logging
 | 
			
		||||
    mlflow.end_run()
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
if __name__ == "__main__":
 | 
			
		||||
    main()
 | 
			
		||||
							
								
								
									
										33
									
								
								solution-v2/environment.py
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										33
									
								
								solution-v2/environment.py
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,33 @@
 | 
			
		|||
"""
 | 
			
		||||
    Script to create and register an environment including SKlearn
 | 
			
		||||
"""
 | 
			
		||||
import os
 | 
			
		||||
 | 
			
		||||
from azure.ai.ml.entities import Environment
 | 
			
		||||
 | 
			
		||||
from ml_client import create_or_load_ml_client
 | 
			
		||||
 | 
			
		||||
dependencies_dir = "./dependencies"
 | 
			
		||||
custom_env_name = "custom-scikit-learn"
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
def create_docker_environment():
 | 
			
		||||
    # 1. Create or Load a ML client
 | 
			
		||||
    ml_client = create_or_load_ml_client()
 | 
			
		||||
 | 
			
		||||
    # 2. Create a Python environment for the experiment
 | 
			
		||||
    env_docker_image = Environment(
 | 
			
		||||
        name=custom_env_name,
 | 
			
		||||
        conda_file=os.path.join(dependencies_dir, "conda.yml"),
 | 
			
		||||
        image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04:latest",
 | 
			
		||||
    )
 | 
			
		||||
    ml_client.environments.create_or_update(env_docker_image)
 | 
			
		||||
 | 
			
		||||
    print(
 | 
			
		||||
        f"Environment with name {env_docker_image.name} is registered to the workspace,",
 | 
			
		||||
        f"the environment version is {env_docker_image.version}"
 | 
			
		||||
    )
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
if __name__ == "__main__":
 | 
			
		||||
    create_docker_environment()
 | 
			
		||||
							
								
								
									
										23
									
								
								solution-v2/initialize_constants.py
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										23
									
								
								solution-v2/initialize_constants.py
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,23 @@
 | 
			
		|||
"""
 | 
			
		||||
    Script to initialize global constants
 | 
			
		||||
"""
 | 
			
		||||
import os
 | 
			
		||||
 | 
			
		||||
# Global constants can be set via environmental variables
 | 
			
		||||
# Remove default values in production
 | 
			
		||||
AZURE_RESOURCE_GROUP = os.getenv("AZURE_RESOURCE_GROUP", "itvitae-azure-ml")
 | 
			
		||||
AZURE_SUBSCRIPTION_ID = os.getenv(
 | 
			
		||||
    "AZURE_SUBSCRIPTION_ID", "34faeead-244d-4ae8-8194-1eeaaffaf5be"
 | 
			
		||||
)
 | 
			
		||||
AZURE_WORKSPACE_NAME = os.getenv(
 | 
			
		||||
    "AZURE_WORKSPACE_NAME",
 | 
			
		||||
    "ws-angelsevillacamins",
 | 
			
		||||
)
 | 
			
		||||
AZURE_LOCATION = os.getenv("AZURE_LOCATION", "westeurope")
 | 
			
		||||
# Choose names for your clusters
 | 
			
		||||
AML_COMPUTE_NAME = os.getenv("AML_COMPUTE_NAME", "aml-compute")
 | 
			
		||||
# General Servers Characteristics
 | 
			
		||||
VM_SIZE = os.getenv("VM_SIZE", "STANDARD_DS2_V2")
 | 
			
		||||
MIN_NODES = int(os.getenv("MIN_NODES", 0))
 | 
			
		||||
MAX_NODES = int(os.getenv("MAX_NODES", 1))
 | 
			
		||||
AGENT_COUNT = int(os.getenv("AGENT_COUNT", 2))
 | 
			
		||||
							
								
								
									
										46
									
								
								solution-v2/ml_client.py
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										46
									
								
								solution-v2/ml_client.py
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,46 @@
 | 
			
		|||
"""
 | 
			
		||||
    Script to initialize MLClient object
 | 
			
		||||
"""
 | 
			
		||||
from azure.ai.ml import MLClient
 | 
			
		||||
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
 | 
			
		||||
 | 
			
		||||
from initialize_constants import (
 | 
			
		||||
    AZURE_RESOURCE_GROUP,
 | 
			
		||||
    AZURE_SUBSCRIPTION_ID,
 | 
			
		||||
    AZURE_WORKSPACE_NAME,
 | 
			
		||||
)
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
def create_or_load_ml_client():
 | 
			
		||||
    """Create or load an Azure ML Client based on env variables.
 | 
			
		||||
    Args:
 | 
			
		||||
        None since information is taken from global constants
 | 
			
		||||
            defined in initialize_constants.py.
 | 
			
		||||
 | 
			
		||||
    Returns:
 | 
			
		||||
        A workspace and set quick load.
 | 
			
		||||
    """
 | 
			
		||||
    try:
 | 
			
		||||
        credential = DefaultAzureCredential()
 | 
			
		||||
        # Check if given credential can get token successfully.
 | 
			
		||||
        credential.get_token("https://management.azure.com/.default")
 | 
			
		||||
    except Exception as ex:
 | 
			
		||||
        # Fall back to InteractiveBrowserCredential
 | 
			
		||||
        # in case DefaultAzureCredential not working
 | 
			
		||||
        print(ex)
 | 
			
		||||
        credential = InteractiveBrowserCredential()
 | 
			
		||||
 | 
			
		||||
    # Get a handle to the workspace.
 | 
			
		||||
    # You can find the info on the workspace tab on ml.azure.com
 | 
			
		||||
    ml_client = MLClient(
 | 
			
		||||
        credential=credential,
 | 
			
		||||
        subscription_id=AZURE_SUBSCRIPTION_ID,
 | 
			
		||||
        resource_group_name=AZURE_RESOURCE_GROUP,
 | 
			
		||||
        workspace_name=AZURE_WORKSPACE_NAME,
 | 
			
		||||
    )
 | 
			
		||||
    return ml_client
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
if __name__ == "__main__":
 | 
			
		||||
    ml_client = create_or_load_ml_client()
 | 
			
		||||
    print(ml_client)
 | 
			
		||||
							
								
								
									
										37
									
								
								solution-v2/setup.cfg
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										37
									
								
								solution-v2/setup.cfg
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,37 @@
 | 
			
		|||
[flake8]
 | 
			
		||||
ignore = E203, W503
 | 
			
		||||
max-line-length = 99
 | 
			
		||||
max-complexity = 18
 | 
			
		||||
select = B,C,E,F,W,T4
 | 
			
		||||
 | 
			
		||||
[isort]
 | 
			
		||||
multi_line_output=3
 | 
			
		||||
include_trailing_comma=True
 | 
			
		||||
force_grid_wrap=0
 | 
			
		||||
use_parentheses=True
 | 
			
		||||
ensure_newline_before_comments=True
 | 
			
		||||
line_length=99
 | 
			
		||||
 | 
			
		||||
[mypy]
 | 
			
		||||
files=refactor,tests
 | 
			
		||||
ignore_missing_imports=True
 | 
			
		||||
 | 
			
		||||
[coverage:run]
 | 
			
		||||
source = refactor
 | 
			
		||||
 | 
			
		||||
[coverage:report]
 | 
			
		||||
exclude_lines =
 | 
			
		||||
    # exclude pragma again
 | 
			
		||||
    pragma: no cover
 | 
			
		||||
 | 
			
		||||
    # exclude main
 | 
			
		||||
    if __name__ == .__main__.:
 | 
			
		||||
 | 
			
		||||
[coverage:html]
 | 
			
		||||
directory = coverage
 | 
			
		||||
 | 
			
		||||
[coverage:xml]
 | 
			
		||||
output = coverage.xml
 | 
			
		||||
 | 
			
		||||
[tool:pytest]
 | 
			
		||||
testpaths=tests/
 | 
			
		||||
							
								
								
									
										12
									
								
								summary_outline.md
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										12
									
								
								summary_outline.md
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,12 @@
 | 
			
		|||
# Azure ML 2
 | 
			
		||||
During this lesson you will learn the fundamentals of Azure ML Python SDK. Specifically, it will be focus on version 2 (azure-ai-ml package). Azure ML is used in machine learning experiments to explore, prepare and manage not only data but also ML models. Additionally, cloud resources can be managed from the code itself (infrastructure as code, IaC) including monitoring and logging. Moreover, machine learning experiments and models can be organized using MLflow, which is incorporated in the version 2 of the Python SDK. Finally, this SDK is able to deploy web services to convert your trained models into RESTful services.
 | 
			
		||||
 | 
			
		||||
The training includes theory and hands-on exercises. After this training you will have gained knowledge about:
 | 
			
		||||
 | 
			
		||||
- Fundamentals of Azure ML SDK v2
 | 
			
		||||
- Define workspaces, compute targets, datasets and environments using IaC
 | 
			
		||||
- Azure ML best practices for model and data management
 | 
			
		||||
- MLFlow
 | 
			
		||||
- Hyperparameter tuning
 | 
			
		||||
- Deploy models as online endpoints
 | 
			
		||||
- Lab session to get hands-on experience with these tools
 | 
			
		||||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue