Challenge | Address the lack of information managers have when estimating the duration of a project |
Solution | A microservice that provides access to the API to effectively evaluate a project duration with a higher degree of accuracy than managers сould do |
Technologies and tools | Python, PostgreSQL, LightGBM, Docker |
The Client is developing a project management system for a thousand enterprises (mainly from North America) covering various business areas: IT, marketing, education, healthcare, etc. Over 10 years, the product has become overwhelmingly successful and accumulated a substantial amount of data on completed projects.
The Client is aiming at the continuous improvement of the product performance to meet user needs. To enhance the quality of the project management system, the Client came up with the idea of leveraging the power of data and applying machine learning.
People are often wrong in their estimations, but twice as often, employees fail to track the actual status of tasks in the project management system. Therefore, top management faces significant difficulties trying to predict the project end date and whether or not a project will be completed on time.
A system capable of providing a more accurate estimation of the actual project end date would come in handy for managers and would increase the overall effectiveness of project management processes within the Client’s system.
The work of the InData Labs team was split into the following stages:
The InData Labs team conducted the research on some popular enhancements in the project management tools. To match user needs and market demands, the list of product features includes the following:
The Client picked the project duration prediction as the most useful feature to develop.
The overall development process comprises 4 stages.
Stage | Scope of work |
1. Data understanding and validation | Acquiring, processing, and validating the Client’s data. |
2. Feature engineering | Converting raw project data into features. |
3. Modeling | Training the model on the prepared dataset. |
4. Deployment | Delivery and deployment of the model; providing the Client with a user-friendly interface to access the trained model. |
The solution “road map” is in Figure 1.
The choice of the gradient boosting model was due to the volume (enough to abandon linear models but insufficient for neural networks) and the heterogeneous nature of the data.
So, the InData Labs team has delivered the model capable to efficiently predict the number of calendar days left before the expected project end.
The service delivered by InData Labs provides access to the API, through which it is possible to effectively evaluate the duration of a project. Predictions can be generated at the planning stage or on any given day during the development process.
The quality of the model was assessed by using the SMAPE metric. The model performance is 50-60% higher in terms of accuracy than the estimates that managers explicitly indicated.
For projects that didn’t have estimations by managers, the baselines were calculated (a simple method to estimate target variable). The quality of the model exceeds the quality of the baseline models by up to 15% (depending on a segment).
The detailed comparison is in Figure 2.
The model is successfully integrated into the Client’s project management system and allows the users to receive real-time predictions.
Top management receives a powerful tool for monitoring the “health” of a project. The model also provides an opportunity to more effectively assess the progress of a project or anticipate possible risks.