December 09, 2024

ML for time and cost prediction of aid shipments

Authors:

Pratima Rao

Editors:

Puya Sharif

Introduction

This post explores how machine learning can address uncertainties in supply chain management, focusing on humanitarian aid. By leveraging predictive models and publicly available data, the project aimed to optimize shipment costs and delivery times. Using a specific region in Africa as a test case, we developed a prototype that combines machine learning, heuristics, and data-driven insights to improve the efficiency of aid delivery and logistical planning. The content of the post is based on a joint project between Modulai and Design for Life, performed in the spring of 2024.

Supply Chain

A “supply chain” is, complex logistics system that involves facilities that transform raw materials into finished products and then distribute these products to end consumers or customers. It includes all the steps and places involved in making and moving the products. In other words, it is an entire system of producing and delivering a product or service to the end user. Almost all industries use supply chains to manage the production and delivery of their products.

A typical supply chain consists of the following steps:

Planning
- Determining the demand for products and creating a plan for production, procurement, and logistics.
Sourcing
- Finding and selecting suppliers for raw materials and components needed for production.
Manufacturing
- Converting raw materials into finished products through various production processes.
Warehousing
- Storing raw materials, work-in-progress items, and finished products until they are needed.
Transportation:
- Moving raw materials to manufacturing facilities and finished products to distribution centres or retailers.
Distribution:
- Delivering finished products to end consumers or retailers through a network of warehouses and transportation systems.

Example of use cases:

Manufacturing: Producing goods from raw materials.
Food and beverage: Growing, processing, and distributing food.
Logistics: Transporting goods for various industries.

Humanitarian Aid and Supply Chains

Another sector where supply chain management plays a vital role is Humanitarian Aid. Aid shipment refers to the process of sending essential supplies to people in need during emergencies such as natural disasters, conflicts, or other crises. These shipments aim to provide relief and support to affected populations, and can include a wide range of supplies ranging from food and medicines to clothing and tools.

Key challenges

When organizing aid shipments, numerous uncertainties arise regarding the total cost, arrival time, and exact destinations. These uncertainties complicate the planning of aid programs, making it challenging to establish accurate budgets and timelines.

In this context, “uncertainties” refer to the missing information in the dataset, which is essential for planning aid programs. This could for example, be the total cost of the shipment, the exact location of the health centre, or the time required to manufacture the ordered products.

Prototype

Modulai, together with Design for Life, built a prototype aimed at addressing these issues by leveraging predictive machine learning to reduce uncertainties using historical data and heuristics.

During this project, we focused specifically on a region in East Africa. The rationale behind this approach was that if the prototype failed to perform effectively in here, it was unlikely to succeed in other regions. This is because this part of East Africa possesses good public datasets and well-developed road networks, providing an optimal testing ground.

Objective

An ideal system would:

Handle uncertainties for all steps in a supply chain.
Fill in potentially missing information, using:
- API calls
- Machine learning models
- Heuristic methods.

This approach would be a significant step towards more reliable and efficient aid program planning.

The image below illustrates what such a potential system would look like

Challenges and Scope

The general “complete missing data”-problem involves reducing uncertainties in several steps along the supply chain, making it a broad and complex problem to solve. Moreover, not all of these uncertainties require ML/AI algorithms – for example, in the image above, the gaps in the Long and Lat columns can be filled in by straightforward API querying. Other problems, such as estimating the cost of a shipment or the delivery time of a shipment, require prediction. These problems can be solved by machine learning, assuming the necessary data is available.

Within the scope of addressing uncertainties in the aid supply chain datasets, we focused our efforts on predicting the time and cost estimates for aid shipments.

The following table illustrates what the prototype should be able to accomplish

Dataset and Modelling

Breaking down the supply chain

The first step of the project was to divide the complete supply chain into smaller segments and research datasets for each segment. We divided the complete supply chain into seven stages:

Procurement Stages (P):
- Handle order placement and manufacturing.
Delivery Stages (D):
- Manage the transportation of the shipments.

Challenges in the Procurement Stages

The procurement stages posed two main challenges:

Shipments differ depending on the manufacturing site. This required building separate models for each manufacturing country that ships to our selected region, making the task highly complex.
There is a lack of publicly available data to address these issues.

Focus on Delivery Stages

For the delivery stages, we were able to find publicly available datasets for the D1 stage. We also found the exact names and locations of the NDCs (National Delivery Centers), RDCs (Regional Delivery Centers), and various health centers in this part of East Africa, enabling us to utilize any mapping service for an initial estimate of shipping time and distance. These could be supplemented with reasonable heuristics to build heuristic models that can predict the time and cost of shipments.

Owing to this, we decided to focus on the delivery stages of the supply chain. For each of the segments, we aimed to build models that could predict both time and cost estimates.

Datasets (D1)

Primary Datasets:

Supply Chain Shipment Pricing Dataset – For HIV/AIDS aid programs
- “This dataset provides supply chain health commodity shipment and pricing data. Specifically, the data set identifies Antiretroviral (ARV) and HIV lab shipments to supported countries. In addition, the data set provides the commodity pricing and associated supply chain expenses necessary to move the commodities to countries for use.”
USAID GHSC-PSM Health Commodity Delivery Dataset
- “Dataset containing detailed information about all health commodity orders delivered through the USAID Global Health Supply Chain Program – Procurement and Management (GHSC-PSM) project.” [2]
PMI DELIVER dataset – From the “President’s Malaria Initiative”
- “The President’s Malaria Initiative (PMI) is a U.S. Government initiative designed to reduce malaria deaths and illnesses in target countries in sub-Saharan Africa with a long-term vision of a world without malaria.”

Supplementary Datasets:

Monthly gas prices: Obtained from U.S. All Grades All Formulations Retail Gasoline Prices Dollars per Gallon
Monthly weather data: Obtained from Average Mean Surface Air Temperature)
Monthly CPI Data: Used to adjust historic values for inflation (obtained from U.S. CPI data).

These datasets pertained to the shipment of goods from the manufacturing site to the NDCs, meaning that these datasets could be used to model the D1 stage.

Datasets (D2 and D3)

For the D2 and D3 stages, we found a few data sources that tracked aid shipments within the region in East Africa but we were unable to access them. Consequently, we opted to construct heuristic models for these stages.

Approach:

To determine travel distance and time
- Estimated using OpenStreetMaps (OSM).
For estimating shipping costs
- Calculated based on travel distance, supplemented with information on fuel costs, fuel efficiency, etc.

Assumptions:

We made certain assumptions in order to supplement the predictions (in the case of D1 stage) and calculate time and cost estimates (in the case of D2 and D3 stages). These were:

Fuel price: 1.36 (USD per litre)
Fuel efficiency: 4 (km per litre)
Weight capacity per truck: 1800 (kg per truck)
Cost per truck per km: 3 (3 USD per km)
Cold chain vs. Ambient Shipping:
1. Cold chain twice as expensive as Ambient
2. Cold Chain 30% faster than Ambient
Base charge: 800 USD was applied to all D2 and D3 shipments.

The image below shows the mapping of the different data sources to the stages they were utilised in.

Modelling and results

For the D1 stage, we used LightGBM – a popular machine learning framework created by Microsoft. Supply Chain Shipment Pricing Dataset was used for the cost prediction, and both USAID GHSC-PSM Health Commodity Delivery Dataset and PMI Deliver dataset were used for time prediction.

Owing to better performance on test and validation data, for the time prediction USAID GHSC-PSM Health Commodity Delivery Dataset was used.

User Interface

In addition to the models, we also built a UI where the model predictions could be visualized. The UI has two components in the form of tabs

Feature Completion: This functionality aims to address the initial challenge of filling in missing data within the dataset by leveraging predictions generated by machine learning models.
Predictions: This functionality enables users to input details about a planned shipment and receive estimated time and cost projections. Additionally, this tab provides users with guidance on selecting the optimal RDC based on a specified NDC and health centre for delivery.

Data Completion

It allows the user to upload an incomplete excel or csv file. The gaps here would be the uncertainties in the dataset, which is what we aimed to solve for in the scope of this project.
It fills in the gaps in the dataset based on the models discussed in the previous slides
- It returns a completed dataset
- It allows the user to download the finished dataset
- It also plots the uncertainties in the model predictions

Predictions

Within this tab, users can input various parameters regarding the shipments, adjust the assumptions for the heuristic models, and receive predicted time and cost estimates for the different stages.

The image below shows the different values the user can input.

Upon clicking “Predict,” users are presented with predictions for the different stages of the shipment process.
Additionally, they can view the number of RDCs in the province to which the health centre belongs, along with the closest RDC to the health centre.
Furthermore, users have the option to explore the various data sources and heuristics used in generating the predictions by selecting the “View sources used for predictions” button.

Additionally, we also implement a feedback system wherein the users can either “Approve” or “Provide Feedback”

The user can provide feedback in one of two ways:

By modifying individual values in the prediction table
By providing a feedback text, this could for example, be “The predicted costs should be higher by 10%”

The user’s feedback is stored on disk upon clicking the “Submit Feedback” button. This feedback can be made use of either for model re-training or data completion based on feedback. This was not done during the course of this project, but utilising feedback to refine predictions would be a valuable next-step.

Conclusion

This prototype marks an initial attempt at reducing uncertainties in supply chain management within the aid shipment sector. The work conducted during this phase demonstrates that with the right data, uncertainties can be mitigated, enabling more efficient planning and management of aid where it’s needed most.

There is significant scope for improving this prototype. A key factor for enhancement would be incorporating additional data sources, such as real-time traffic, weather conditions, road condition data, and granular datasets related to aid shipment deliveries. Access to datasets that track procurement times and costs will also allow us to build models that predict the procurement stages (P1), providing more comprehensive time and cost estimates.

Ultimately, by refining this prototype and integrating diverse data sources, we can create a robust system that not only optimizes aid delivery but also ensures that resources are utilized more effectively, maximizing the impact of humanitarian efforts.

Thank you, Design for life, for this project.