August 31, 2021

MLOps – Deploying a recommender system in a production environment


  • Sofia Larsson


  • Sofia Larsson
  • Puya Sharif
  • Max Nihlén Ramström
  • Rikard Gard (Ahlsell)
  • Jacob Westman (Ahlsell)


Most of the time, the least challenging part of a machine learning project is developing the actual model. The real challenge is often to build an integrated ML system and continuously operate it in a production environment. 

In this blog post, we will give an overview of how we deployed an integrated recommender system at Ahlsell, the leading distributor of installation products, tools, and supplies in the Nordics.

Figure 0: Examples of products provided by Ahlsell.

What is MLOps?

When developing an integrated ML system, surprisingly little amount of time is spent on actual model development. The majority of time is spent creating the right prerequisites for model deployment – that is MLOps. 

The term MLOps is a union of “machine learning” and the continuous development practice in DevOps and its purpose is to ensure that the deployment of a machine learning system is as reliable and efficient as possible. Thus, MLOps is about creating automated workflows while monitoring and maintaining the entire pipeline and reporting back pre-decided business metrics.

Two central terms in DevOps are Continuous Integration (CI) and Continuous Delivery (CD). CI is about frequently committing changed or updated code to the main code base and building this code every time to ensure that there is no problem with integration. This step is usually unified with automated tests; the tests must be passed before merging to the main repository. CD is an approach where teams produce software in short cycles and ensure that that software can be reliably released at any time, manually or automatically.

Although a machine learning system is a software system, applying software engineering best practices to machine learning systems is not as straightforward as one might expect. First, machine learning is experimental, and the system needs to version-control every experiment configuration, including the data. Secondly, besides the unit and integration testing, a machine learning pipeline ought to have data- and model validation tests. Thirdly, as the model has to be retrained with fresh data, it is not enough to only deploy an offline trained model. The entire pipeline needs to be deployed to automatically retrain the model. Lastly, data drift as well as training-serving skew must be monitored as it will lead to a degradation in model performance eventually. 

Thus, for an ML system, CI also has to test and validate the data instead of just the code and components of the system and CD has to deliver an entire training pipeline that automatically retrains the model according to a schedule or whenever new data is fetched.

MLOps at Ahlsell

The following sections describe how we work with MLOps at Ahlsell. We’ll go through data infusion and processing, modeling, and evaluation pipelines as well as how we put it all together in an automated CI/CD pipeline.

Model training pipeline

The model training pipeline was developed using a service in Azure called Azure Machine learning. The top-level resource in Azure Machine Learning is called a workspace. In the workspace, it is possible to manage resources such as compute instances and clusters but also to control environments, experiments, pipelines, models, and inference endpoints. Other resources such as Azure Storage account and Azure key vault are connected to the workspace where the data and access keys are stored for secure and easy access. 

The pipeline is defined by specifying pipeline steps, see the code snippet above. In this step, the script that should be run is stated, in which source directory it is in, which compute instance, and in which environment the script should be run. It is therefore possible to run the different steps in the pipeline on different compute instances/clusters and in different environments. Using the arguments argument, one passes arguments to the script but also passes data between each step.

The workspace version controls the data and models and keeps track of what version was used for a specific run. Each run also logs metadata about the run (such as timestamp and duration of the run), metrics, and a snapshot of the code passed prior to the run. Furthermore, if the source directory of the script is the local Git repository, information about the repository is stored in the run as well.

After registering the model in the model registry, it will be version-controlled. It is possible to deploy the registered model as a service endpoint or a real-time REST endpoint. In this project, we created a pipeline endpoint via a REST endpoint to automatically run the entire pipeline and produce new recommendations. As the final step in our pipeline, we upload the recommendations to Azure Cosmos DB, a non-relational database from which the front-end API can fetch recommendations for a specific customer to show on their site.


Data Factory is Microsoft’s ELT (Extract, Load, Transform) service which allows for building complex pipelines. We used this service for data ingestion, data processing, and triggering the machine learning pipeline. According to a schedule, an ELT pipeline runs, fetches data from an on-premise database, processes it, and stores it in a SQL database. Once this is done, the machine learning pipeline is triggered to retrain the model on the new data.

Certain metrics regarding the pipeline run can be tracked and monitored using the Activity monitor in Data Factory. Awareness is raised and alarms are triggered if these metrics go beyond a certain threshold.

Model Evaluation

Previously, when describing that in the last step of the machine learning pipeline recommendations are saved to Azure Cosmos DB, that was not the whole truth. Depending on which service that triggers the pipeline, the recommendations are saved either to Azure Cosmos DB in the development environment (dev-cosmos) or the production environment (prod-cosmos). We will go deeper into this in the next section. 

If a developer submits the pipeline locally, the recommendations will be saved in the dev-cosmos. A well-known challenge with recommendation systems is that it is extremely difficult to evaluate the metric we really want to optimize – how much does the customer like these products? For model choice and parameter tuning, we optimize AUC or mean average precision at K, but those metrics do not tell the full story; it is not until the customers can interact with the recommendation system that we can truly evaluate it. 

To make the recommendations available for qualitative evaluation during the development phase, we built a simple GUI hosted on an Azure virtual machine, see Figure 1. Through an API, the GUI fetches the recommendations in dev-cosmos and displays the recommendations for a particular customer, enabling us to qualitatively evaluate the recommendation for each model iteration or code update.

Figure 2: The GUI to view and qualitative evaluate recommendations.

CI/CD pipeline 

So, how have we worked with CI/CD practices? The project CI/CD pipeline is implemented using Github Actions. Github actions are event-driven and can be set to trigger a run if e.g. a pull request is merged to the master branch in question. A pipeline in Github Actions is defined as a so-called workflow made up of several jobs. A job is made up of several different steps that will all run on the same runner, where a runner is a server hosted by GitHub. Each job runs in a fresh virtual environment and a common step to add to a job is therefore to install all the necessary packages needed to execute the steps. A step is an individual task that can either be an action in itself or a shell command. Since all steps in the same job execute on the same runner, they can share data.

Figure 2: A schematic of the recommendation system.

In the repository, the workflow is designed to be triggered when any attempted changes are made to the master branch. As the first steps, necessary packages and dependencies are installed. Thereafter, it creates a connection to Azure using a service principal and runs test. If the test fails, none of the following steps will run. Subsequently, it triggers the machine learning pipeline in Azure machine learning, passing a pipeline argument that will update the pipeline endpoint after execution, to point towards the updated machine learning pipeline. 

Once this is done, code is merged into master. The recommendations from this run are also stored in dev-cosmos. The only time the recommendations are saved in prod-cosmos is when the pipeline is triggered by Data Factory. It will use the latest version of the pipeline and pass a pipeline argument to allow the recommendations to be saved in the production environment.  

Putting all the components together, our system is described by the schematic in Figure 2 above.


So there you have it, how it is possible to work with MLOps in Microsoft Azure. We will continue refining our tests and implement data drift monitoring, which might cause this blog post to be updated. If you have any questions regarding this post, or just want to discuss MLOps over a cup of coffee, without hesitation, just reach out! It is always enjoyable to discuss an interesting topic and gain more perspectives on the matter. 

Wanna discuss MLOps?