This notebook illustrates how to combine predictive and Decision Optimization techniques.
While predictive models can be trained to accurately predict the failure distribution for assets, in practice this does not enable you to plan the predictive maintenance of these assets, especially if there are some operational constraints to consider, such as the availability of parts or limited maintenance crew size.
The combination of machine learning and Decision Optimization is essential in helping you solve this problem.
A complete description of the problem can be found in the article Predictive Maintenance Scheduling with IBM Cloud Pak for Data and Decision Optimization.
In this notebooks, you can use sample data to:
This notebook requires the Commercial Edition of CPLEX engines. This notebook runs on Python and DO.
First import some of the packages you need to use.
import sys
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3
!pip install altair
import altair as alt
def __iter__(self): return 0
Collecting altair Downloading altair-5.2.0-py3-none-any.whl.metadata (8.7 kB) Requirement already satisfied: jinja2 in /opt/conda/envs/Python-RT23.1-Premium/lib/python3.10/site-packages (from altair) (3.1.2) Requirement already satisfied: jsonschema>=3.0 in /opt/conda/envs/Python-RT23.1-Premium/lib/python3.10/site-packages (from altair) (4.17.3) Requirement already satisfied: numpy in /opt/conda/envs/Python-RT23.1-Premium/lib/python3.10/site-packages (from altair) (1.23.5) Requirement already satisfied: packaging in /opt/conda/envs/Python-RT23.1-Premium/lib/python3.10/site-packages (from altair) (23.0) Requirement already satisfied: pandas>=0.25 in /opt/conda/envs/Python-RT23.1-Premium/lib/python3.10/site-packages (from altair) (1.5.3) Requirement already satisfied: toolz in /opt/conda/envs/Python-RT23.1-Premium/lib/python3.10/site-packages (from altair) (0.12.0) Requirement already satisfied: typing-extensions>=4.0.1 in /opt/conda/envs/Python-RT23.1-Premium/lib/python3.10/site-packages (from altair) (4.4.0) Requirement already satisfied: attrs>=17.4.0 in /opt/conda/envs/Python-RT23.1-Premium/lib/python3.10/site-packages (from jsonschema>=3.0->altair) (23.1.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /opt/conda/envs/Python-RT23.1-Premium/lib/python3.10/site-packages (from jsonschema>=3.0->altair) (0.18.0) Requirement already satisfied: python-dateutil>=2.8.1 in /opt/conda/envs/Python-RT23.1-Premium/lib/python3.10/site-packages (from pandas>=0.25->altair) (2.8.2) Requirement already satisfied: pytz>=2020.1 in /opt/conda/envs/Python-RT23.1-Premium/lib/python3.10/site-packages (from pandas>=0.25->altair) (2022.7) Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/Python-RT23.1-Premium/lib/python3.10/site-packages (from jinja2->altair) (2.1.1) Requirement already satisfied: six>=1.5 in /opt/conda/envs/Python-RT23.1-Premium/lib/python3.10/site-packages (from python-dateutil>=2.8.1->pandas>=0.25->altair) (1.16.0) Downloading altair-5.2.0-py3-none-any.whl (996 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 996.9/996.9 kB 22.9 MB/s eta 0:00:0000:01 Installing collected packages: altair Successfully installed altair-5.2.0
Load historical data, remove irrelevant data, and merge it to be used for model training.
First load the machine data frame from historical data.
Machines have different characteristics such as:
df_historical_machine = pd.read_csv('https://raw.githubusercontent.com/achabrier/data/master/historical-machine.csv')
df_historical_machine.head()
id | capacity | remaining life | cost of maintenance | maintenance loss | cost of repair | repair loss | loss per life day | production value unit | |
---|---|---|---|---|---|---|---|---|---|
0 | M-01 | 130 | 5 | 50 | 50 | 100 | 100 | 20 | 10 |
1 | M-02 | 100 | 9 | 50 | 50 | 100 | 100 | 20 | 10 |
2 | M-03 | 140 | 14 | 50 | 50 | 100 | 100 | 20 | 10 |
3 | M-04 | 140 | 13 | 50 | 50 | 100 | 100 | 20 | 10 |
4 | M-05 | 130 | 15 | 50 | 50 | 100 | 100 | 20 | 10 |
For the predictive algorithm training you only need a subset of these columns, so first do some cleaning.
The column 'life' represents the number of days before failure, according to the vendor's specifications.
df_historical_machine = df_historical_machine[['id', 'remaining life']];
df_historical_machine.columns = ['id', 'life']
df_historical_machine = df_historical_machine.set_index(['id'])
df_historical_machine.head()
life | |
---|---|
id | |
M-01 | 5 |
M-02 | 9 |
M-03 | 14 |
M-04 | 13 |
M-05 | 15 |
Next, load the production for these machines.
df_historical_production = pd.read_csv('https://raw.githubusercontent.com/achabrier/data/master/historical-production.csv')
df_historical_production.head()
machine | day | production | |
---|---|---|---|
0 | M-01 | Day-01 | 106 |
1 | M-01 | Day-02 | 104 |
2 | M-01 | Day-03 | 130 |
3 | M-01 | Day-04 | 130 |
4 | M-01 | Day-05 | 130 |
The following is a chart representing the historical production for machine M-01.
df_production_m01 = df_historical_production[df_historical_production.machine == 'M-01']
alt.Chart(df_production_m01).mark_trail().encode(
x='day',
y='production:T',
).properties(
width=800,
height=300
)
Reshape the data so that it can be used in training the predictive model.
Production will also be used as input for the predictive model as the level of production is certainly having an impact on possible failure.
df_historical_production.columns = ['id', 'day', 'production']
df_historical_production = df_historical_production.pivot(index='id', columns='day', values='production')
df_historical_production['total'] = df_historical_production.values[:, 1:].sum(1)
df_historical_production.head()
day | Day-01 | Day-02 | Day-03 | Day-04 | Day-05 | Day-06 | Day-07 | Day-08 | Day-09 | Day-10 | ... | Day-12 | Day-13 | Day-14 | Day-15 | Day-16 | Day-17 | Day-18 | Day-19 | Day-20 | total |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
M-01 | 106 | 104 | 130 | 130 | 130 | 130 | 104 | 104 | 102 | 130 | ... | 130 | 130 | 122 | 120 | 130 | 110 | 130 | 105 | 100 | 2240 |
M-02 | 56 | 64 | 57 | 56 | 73 | 80 | 78 | 81 | 76 | 82 | ... | 84 | 79 | 77 | 63 | 64 | 65 | 72 | 79 | 65 | 1358 |
M-03 | 116 | 140 | 138 | 118 | 140 | 140 | 127 | 108 | 110 | 140 | ... | 123 | 132 | 107 | 140 | 126 | 137 | 124 | 134 | 121 | 2445 |
M-04 | 86 | 105 | 84 | 83 | 111 | 84 | 97 | 95 | 92 | 116 | ... | 90 | 114 | 87 | 85 | 100 | 109 | 102 | 109 | 81 | 1861 |
M-05 | 89 | 85 | 78 | 87 | 102 | 99 | 108 | 111 | 81 | 109 | ... | 86 | 89 | 88 | 86 | 107 | 76 | 97 | 105 | 89 | 1770 |
5 rows × 21 columns
Next load the historical failure data for these machines.
The column 'mid' represents the number of days before failure, according to historical records.
df_historical_mid = pd.read_csv('https://raw.githubusercontent.com/achabrier/data/master/historical-mid.csv')
df_historical_mid.columns = ['id', 'mid']
df_historical_mid = df_historical_mid.set_index(['id'])
df_historical_mid.head()
mid | |
---|---|
id | |
M-01 | 5 |
M-02 | 7 |
M-03 | 12 |
M-04 | 12 |
M-05 | 13 |
Now merge all data required for model training into one data frame.
# merge all
df_historical = df_historical_machine.join(df_historical_production).join(df_historical_mid)
df_historical.head()
life | Day-01 | Day-02 | Day-03 | Day-04 | Day-05 | Day-06 | Day-07 | Day-08 | Day-09 | ... | Day-13 | Day-14 | Day-15 | Day-16 | Day-17 | Day-18 | Day-19 | Day-20 | total | mid | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
M-01 | 5 | 106 | 104 | 130 | 130 | 130 | 130 | 104 | 104 | 102 | ... | 130 | 122 | 120 | 130 | 110 | 130 | 105 | 100 | 2240 | 5 |
M-02 | 9 | 56 | 64 | 57 | 56 | 73 | 80 | 78 | 81 | 76 | ... | 79 | 77 | 63 | 64 | 65 | 72 | 79 | 65 | 1358 | 7 |
M-03 | 14 | 116 | 140 | 138 | 118 | 140 | 140 | 127 | 108 | 110 | ... | 132 | 107 | 140 | 126 | 137 | 124 | 134 | 121 | 2445 | 12 |
M-04 | 13 | 86 | 105 | 84 | 83 | 111 | 84 | 97 | 95 | 92 | ... | 114 | 87 | 85 | 100 | 109 | 102 | 109 | 81 | 1861 | 12 |
M-05 | 15 | 89 | 85 | 78 | 87 | 102 | 99 | 108 | 111 | 81 | ... | 89 | 88 | 86 | 107 | 76 | 97 | 105 | 89 | 1770 | 13 |
5 rows × 23 columns
Comparing the remaining life given by the vendor with the historical failure, you can see that there is indeed a significant difference which would be valuable to predict.
for m in range(1,5):
id = "M-0" + str(m);
print (id)
print ("Remaining life for ", id, ": " , df_historical.life[id])
print ("Historical failure for ", id, ": " , df_historical.mid[id])
df_historical
M-01 Remaining life for M-01 : 5 Historical failure for M-01 : 5 M-02 Remaining life for M-02 : 9 Historical failure for M-02 : 7 M-03 Remaining life for M-03 : 14 Historical failure for M-03 : 12 M-04 Remaining life for M-04 : 13 Historical failure for M-04 : 12
life | Day-01 | Day-02 | Day-03 | Day-04 | Day-05 | Day-06 | Day-07 | Day-08 | Day-09 | ... | Day-13 | Day-14 | Day-15 | Day-16 | Day-17 | Day-18 | Day-19 | Day-20 | total | mid | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
M-01 | 5 | 106 | 104 | 130 | 130 | 130 | 130 | 104 | 104 | 102 | ... | 130 | 122 | 120 | 130 | 110 | 130 | 105 | 100 | 2240 | 5 |
M-02 | 9 | 56 | 64 | 57 | 56 | 73 | 80 | 78 | 81 | 76 | ... | 79 | 77 | 63 | 64 | 65 | 72 | 79 | 65 | 1358 | 7 |
M-03 | 14 | 116 | 140 | 138 | 118 | 140 | 140 | 127 | 108 | 110 | ... | 132 | 107 | 140 | 126 | 137 | 124 | 134 | 121 | 2445 | 12 |
M-04 | 13 | 86 | 105 | 84 | 83 | 111 | 84 | 97 | 95 | 92 | ... | 114 | 87 | 85 | 100 | 109 | 102 | 109 | 81 | 1861 | 12 |
M-05 | 15 | 89 | 85 | 78 | 87 | 102 | 99 | 108 | 111 | 81 | ... | 89 | 88 | 86 | 107 | 76 | 97 | 105 | 89 | 1770 | 13 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
M-996 | 9 | 70 | 80 | 79 | 78 | 90 | 90 | 90 | 67 | 82 | ... | 67 | 71 | 88 | 66 | 87 | 72 | 76 | 89 | 1502 | 7 |
M-997 | 6 | 94 | 113 | 91 | 96 | 100 | 118 | 90 | 83 | 105 | ... | 115 | 100 | 85 | 89 | 109 | 80 | 108 | 116 | 1898 | 3 |
M-998 | 6 | 57 | 58 | 61 | 57 | 58 | 63 | 53 | 49 | 54 | ... | 66 | 52 | 60 | 56 | 51 | 53 | 49 | 61 | 1085 | 5 |
M-999 | 5 | 107 | 114 | 98 | 89 | 103 | 86 | 98 | 83 | 95 | ... | 102 | 89 | 102 | 81 | 82 | 113 | 99 | 90 | 1797 | 2 |
M-1000 | 7 | 68 | 69 | 89 | 98 | 77 | 78 | 72 | 71 | 79 | ... | 103 | 78 | 92 | 84 | 94 | 79 | 73 | 78 | 1617 | 4 |
1000 rows × 23 columns
The following is a chart that represents the deviation between the remaining life prediction from the vendor and the real failure, due to production trend.
brush = alt.selection_interval()
alt.Chart(df_historical).mark_point(filled=True).encode(
x='life:O',
y='mid:Q'
).properties(
width=800,
height=300
)
You now can train a simple linear regression model to predict the failure using vendor's remaining life indication and the planned production for the machine as features.
from sklearn import linear_model
X_train = df_historical.iloc[: , :-1]
y_train = df_historical.iloc[: , -1]
# Create linear regression object
regr = linear_model.LinearRegression()
# Train the model using the training sets
regr.fit(X_train, y_train)
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LinearRegression()
Now load new machines, with known characteristics, including the remaining life prediction from the vendor, and predict their failure using the linear regression model.
Load the machine table and perform some transformation.
df_machine = pd.read_csv('https://raw.githubusercontent.com/achabrier/data/master/machine.csv')
df_machine.head()
id | capacity | remaining life | cost of maintenance | maintenance loss | cost of repair | repair loss | loss per life day | production value unit | |
---|---|---|---|---|---|---|---|---|---|
0 | M-01 | 130 | 14 | 50 | 50 | 100 | 100 | 20 | 10 |
1 | M-02 | 120 | 13 | 50 | 50 | 100 | 100 | 20 | 10 |
2 | M-03 | 130 | 9 | 50 | 50 | 100 | 100 | 20 | 10 |
3 | M-04 | 110 | 10 | 50 | 50 | 100 | 100 | 20 | 10 |
4 | M-05 | 110 | 17 | 50 | 50 | 100 | 100 | 20 | 10 |
# Keep only useful columns
df_machine_x = df_machine[['id', 'remaining life']];
df_machine_x.columns = ['id', 'life']
df_machine_x = df_machine_x.set_index(['id'])
df_machine_x.head()
life | |
---|---|
id | |
M-01 | 14 |
M-02 | 13 |
M-03 | 9 |
M-04 | 10 |
M-05 | 17 |
Load the planned production and perform a simple transformation.
df_planned_production = pd.read_csv('https://raw.githubusercontent.com/achabrier/data/master/planned_production.csv')
df_planned_production.columns = ['id', 'day', 'production']
df_planned_production.head()
id | day | production | |
---|---|---|---|
0 | M-01 | Day-01 | 103 |
1 | M-01 | Day-02 | 108 |
2 | M-01 | Day-03 | 105 |
3 | M-01 | Day-04 | 130 |
4 | M-01 | Day-05 | 130 |
df_planned_production_x = df_planned_production.pivot(index='id', columns='day', values='production')
df_planned_production_x['total'] = df_planned_production_x.values[:, 1:].sum(1)
df_planned_production_x.head()
day | Day-01 | Day-02 | Day-03 | Day-04 | Day-05 | Day-06 | Day-07 | Day-08 | Day-09 | Day-10 | ... | Day-12 | Day-13 | Day-14 | Day-15 | Day-16 | Day-17 | Day-18 | Day-19 | Day-20 | total |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
M-01 | 103 | 108 | 105 | 130 | 130 | 130 | 104 | 130 | 120 | 115 | ... | 129 | 102 | 130 | 122 | 130 | 120 | 119 | 110 | 114 | 2278 |
M-02 | 87 | 118 | 89 | 120 | 92 | 87 | 88 | 116 | 120 | 115 | ... | 91 | 87 | 115 | 108 | 113 | 95 | 120 | 116 | 90 | 1973 |
M-03 | 130 | 115 | 113 | 130 | 121 | 130 | 130 | 126 | 121 | 130 | ... | 108 | 130 | 130 | 114 | 130 | 130 | 130 | 130 | 130 | 2365 |
M-04 | 86 | 92 | 100 | 94 | 103 | 89 | 80 | 105 | 92 | 85 | ... | 74 | 95 | 108 | 105 | 101 | 73 | 95 | 80 | 93 | 1759 |
M-05 | 93 | 83 | 73 | 73 | 93 | 104 | 108 | 88 | 92 | 86 | ... | 96 | 104 | 74 | 94 | 102 | 93 | 83 | 77 | 87 | 1700 |
5 rows × 21 columns
Merge both data frames to get the structure that can be used with a machine learning model.
X_test = df_machine_x.join(df_planned_production_x)
X_test.head()
life | Day-01 | Day-02 | Day-03 | Day-04 | Day-05 | Day-06 | Day-07 | Day-08 | Day-09 | ... | Day-12 | Day-13 | Day-14 | Day-15 | Day-16 | Day-17 | Day-18 | Day-19 | Day-20 | total | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
M-01 | 14 | 103 | 108 | 105 | 130 | 130 | 130 | 104 | 130 | 120 | ... | 129 | 102 | 130 | 122 | 130 | 120 | 119 | 110 | 114 | 2278 |
M-02 | 13 | 87 | 118 | 89 | 120 | 92 | 87 | 88 | 116 | 120 | ... | 91 | 87 | 115 | 108 | 113 | 95 | 120 | 116 | 90 | 1973 |
M-03 | 9 | 130 | 115 | 113 | 130 | 121 | 130 | 130 | 126 | 121 | ... | 108 | 130 | 130 | 114 | 130 | 130 | 130 | 130 | 130 | 2365 |
M-04 | 10 | 86 | 92 | 100 | 94 | 103 | 89 | 80 | 105 | 92 | ... | 74 | 95 | 108 | 105 | 101 | 73 | 95 | 80 | 93 | 1759 |
M-05 | 17 | 93 | 83 | 73 | 73 | 93 | 104 | 108 | 88 | 92 | ... | 96 | 104 | 74 | 94 | 102 | 93 | 83 | 77 | 87 | 1700 |
5 rows × 22 columns
Predict the 'mid' column corresponding to most probable failure day.
y_pred = regr.predict(X_test)
X_test['mid'] = y_pred
X_test.head()
life | Day-01 | Day-02 | Day-03 | Day-04 | Day-05 | Day-06 | Day-07 | Day-08 | Day-09 | ... | Day-13 | Day-14 | Day-15 | Day-16 | Day-17 | Day-18 | Day-19 | Day-20 | total | mid | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
M-01 | 14 | 103 | 108 | 105 | 130 | 130 | 130 | 104 | 130 | 120 | ... | 102 | 130 | 122 | 130 | 120 | 119 | 110 | 114 | 2278 | 10.045138 |
M-02 | 13 | 87 | 118 | 89 | 120 | 92 | 87 | 88 | 116 | 120 | ... | 87 | 115 | 108 | 113 | 95 | 120 | 116 | 90 | 1973 | 9.844368 |
M-03 | 9 | 130 | 115 | 113 | 130 | 121 | 130 | 130 | 126 | 121 | ... | 130 | 130 | 114 | 130 | 130 | 130 | 130 | 130 | 2365 | 6.320415 |
M-04 | 10 | 86 | 92 | 100 | 94 | 103 | 89 | 80 | 105 | 92 | ... | 95 | 108 | 105 | 101 | 73 | 95 | 80 | 93 | 1759 | 7.265803 |
M-05 | 17 | 93 | 83 | 73 | 73 | 93 | 104 | 108 | 88 | 92 | ... | 104 | 74 | 94 | 102 | 93 | 83 | 77 | 87 | 1700 | 11.722816 |
5 rows × 23 columns
brush = alt.selection_interval()
alt.Chart(X_test).mark_point(filled=True).encode(
x='life:O',
y='mid:Q'
).properties(
width=800,
height=300
)
df_day = pd.read_csv('https://raw.githubusercontent.com/achabrier/data/master/day.csv')
df_day.head()
id | |
---|---|
0 | Day-01 |
1 | Day-02 |
2 | Day-03 |
3 | Day-04 |
4 | Day-05 |
Transform the 'mid' most probable failure day into a failure probability distribution over time.
import random
import numpy as np
n_days = df_day['id'].count()
data_failure = []
for machine in df_machine['id']:
life = int(df_machine[df_machine.id==machine]['remaining life'])
capacity = int(df_machine[df_machine.id==machine]['capacity'])
base = random.randint(int(0.7*capacity), capacity)
x = [life]
mid = int(X_test.mid[machine])
#print (str(x) + " --> " + str(mid))
spread = random.randint(2, 6)
#print spread
n = 1000
#s = np.random.poisson(mid, n)
s = np.random.normal(mid, spread/4., n)
#print s
data = [0 for i in range(n_days)]
for i, day in np.ndenumerate(df_day['id']):
t = 0
for j in range(1000):
if int(s[j])==i[0]:
t = t+1
data_failure.append((machine, day, int (100.*t/n)))
df_predicted_failure = pd.DataFrame(data=data_failure, columns=['machine', 'day', 'failure'])
df_predicted_failure.head()
machine | day | failure | |
---|---|---|---|
0 | M-01 | Day-01 | 0 |
1 | M-01 | Day-02 | 0 |
2 | M-01 | Day-03 | 0 |
3 | M-01 | Day-04 | 0 |
4 | M-01 | Day-05 | 0 |
And display, for example, the predicted failure for each period for machine M-01.
df_failure_m01 = df_predicted_failure[df_predicted_failure.machine == 'M-01']
alt.Chart(df_failure_m01).mark_trail().encode(
x='day',
y='failure',
).properties(
width=800,
height=300
)
Perform some transformation, for example creating a structure with the cumulative probability to fail before some given day.
df_predicted_failure.reset_index(inplace=True)
df_predicted_failure = df_predicted_failure.set_index(['machine', 'day'])
df_planned_production.rename(columns={'id':'machine'}, inplace=True)
df_planned_production.reset_index(inplace=True)
df_planned_production = df_planned_production.set_index(['machine', 'day'])
# first global collections to iterate upon
all_machines = df_machine['id'].values
all_days = df_day['id'].values
data_cumul_failure = []
for machine in all_machines:
for i, d in np.ndenumerate(all_days):
cumul = 0
for i2, d2 in np.ndenumerate(all_days):
if i2==i:
break
cumul += int(df_predicted_failure.failure[machine, d2])
data_cumul_failure.append((machine, d, cumul))
df_cumul_failure = pd.DataFrame(data_cumul_failure, columns=['machine', 'day', 'cumul_failure'])
df_cumul_failure=df_cumul_failure.set_index(['machine', 'day'])
df_cumul_failure.head()
cumul_failure | ||
---|---|---|
machine | day | |
M-01 | Day-01 | 0 |
Day-02 | 0 | |
Day-03 | 0 | |
Day-04 | 0 | |
Day-05 | 0 |
Display this cumulative failure for the same M-01 machine.
Taken individually, the machine M-01 must certainly be maintained shortly before Day-10.
df_cumul = df_cumul_failure.reset_index()
df_cumul_m01 = df_cumul[df_cumul.machine == 'M-01']
alt.Chart(df_cumul_m01).mark_trail().encode(
x='day',
y='cumul_failure',
).properties(
width=800,
height=300
)
Now you will create an optimization model to create the optimal maintenance plan, taking into account some constraints.
One last input data frame you need are the parameters.
df_parameters = pd.read_csv('https://raw.githubusercontent.com/achabrier/data/master/parameters.csv')
df_parameters.head()
id | value | |
---|---|---|
0 | strategy | 1 |
1 | maintenance crew size | 2 |
You will use the docplex Python package to formulate the optimization model.
from docplex.mp.environment import Environment
env = Environment()
env.print_information()
* system is: Linux 64bit * Python version 3.10.13, located at: /opt/conda/envs/Python-RT23.1-Premium/bin/python * docplex is present, version is 2.25.236 * CPLEX library is present, version is 22.1.1.0, located at: /opt/conda/envs/Python-RT23.1-Premium/lib/python3.10/site-packages * pandas is present, version is 1.5.3
Create a new optimization model.
from docplex.mp.model import Model
mdl = Model(name="PredictiveMaintenance")
Create decision variables:
production = mdl.continuous_var_matrix(keys1=all_machines, keys2=all_days, name=lambda ns: "Production_%s_%s" % (ns[0],ns[1]))
df_production = pd.DataFrame({'production': production})
df_production.index.names=['all_machines', 'all_days']
maintenance = mdl.binary_var_matrix(keys1=all_machines, keys2=all_days, name=lambda ns: "Maintenance_%s_%s" % (ns[0],ns[1]))
df_maintenance = pd.DataFrame({'maintenance': maintenance})
df_maintenance.index.names=['all_machines', 'all_days']
Add some constraints linking real production with planned production and maintenance.
for machine in all_machines:
maintenance_loss = int(df_machine[df_machine.id==machine]['maintenance loss'])/100.
capacity = int(df_machine[df_machine.id==machine]['capacity'])
for day in all_days:
prod = df_planned_production.production[machine, day]
#mdl.add_if_then( maintenance[machine, day] == 1, production[machine, day]== 0 )
#mdl.add_if_then( maintenance[machine, day] == 0, production[machine, day]== df_production[df_production.machine==machine][df_production.day==day] )
if (prod <= capacity*(1-maintenance_loss)):
mdl.add_constraint( production[machine, day] == prod )
else:
mdl.add_constraint( production[machine, day] == prod - (prod-capacity*(1-maintenance_loss))*maintenance[machine, day])
Add other constraints:
# One maintenance per machine
for machine in all_machines:
mdl.add_constraint( mdl.sum(maintenance[machine, day] for day in all_days) == 1)
maintenance_crew_size = int(df_parameters[df_parameters.id=='maintenance crew size']['value'])
# One maintenance at a time
for day in all_days:
mdl.add_constraint( mdl.sum(maintenance[machine, day] for machine in all_machines) <= maintenance_crew_size)
Create some cost structures to be used for objectives.
data_cost_maintenance = []
cost_kpis = []
# Cost of repair
for machine in all_machines:
#print machine
life = int(df_machine[df_machine.id==machine]['remaining life'])
capacity = int(df_machine[df_machine.id==machine]['capacity'])
cost_of_maintenance = int(df_machine[df_machine.id==machine]['cost of maintenance'])
maintenance_loss = int(df_machine[df_machine.id==machine]['maintenance loss'])/100.
cost_of_repair = int(df_machine[df_machine.id==machine]['cost of repair'])
repair_loss = int(df_machine[df_machine.id==machine]['repair loss'])/100.
loss_per_life_day = int(df_machine[df_machine.id==machine]['loss per life day'])
production_value_unit = int(df_machine[df_machine.id==machine]['production value unit'])
previous_day = None
for i, day in np.ndenumerate(all_days):
cost = 0;
prob_break_before = 0
if (previous_day != None):
prob_break_before = int(df_cumul_failure.cumul_failure[machine, previous_day])/100.
previous_day = day
#print prob_break_before
# Cost of lost production if failure before maintenance
for i2, day2 in np.ndenumerate(all_days):
if (i2==i):
break
prob_break_day2 = int(df_predicted_failure.failure[machine, day2])/100.
production_day2 = int(df_planned_production.production[machine, day2])
if (production_day2 > capacity*(1-repair_loss)):
cost += production_value_unit*prob_break_day2*(production_day2 - capacity*(1-repair_loss))
# Cost of repair if breaking before maintenance
cost += cost_of_repair*prob_break_before
# Cost of maintenance
cost += cost_of_maintenance*(1-prob_break_before)
# Cost of lost production for maintenance
production_day = int(df_planned_production.production[machine, day])
if (production_day > capacity*(1-maintenance_loss)):
cost += production_value_unit*(production_day - capacity*(1-maintenance_loss))
# Cost of maintenance too early
cost += loss_per_life_day*max(life-i[0], 0)
#print cost
data_cost_maintenance.append((machine, day, cost))
cost_kpis.append(cost*maintenance[machine, day])
cost_kpi = mdl.sum(cost_kpis)
mdl.add_kpi(cost_kpi, "Cost")
df_cost_maintenance = pd.DataFrame(data_cost_maintenance, columns=['machine', 'day', 'cost_maintenance'])
#print df_cost_maintenance
total_planned_production = mdl.sum(df_planned_production.production)
mdl.add_kpi(total_planned_production, "Total Planned Production")
total_production = mdl.sum(df_production.production)
mdl.add_kpi(total_production, "Total Production")
DecisionKPI(name=Total Production,expr=Production_M-01_Day-01+Production_M-01_Day-02+Production_M-01_Da..)
Objective is depending of the strategy.
strategy = int(df_parameters[df_parameters.id=='strategy']['value'])
if (strategy == 1):
mdl.minimize(cost_kpi)
else:
early = 10
late = 1000
temp = []
for machine in all_machines:
last_day = None
for i, day in np.ndenumerate(all_days):
last_day = day;
cumul_failure = int(df_cumul_failure.cumul_failure[machine, day])
if (cumul_failure > 0):
temp.append(late * maintenance[machine, day] )
else:
temp.append(early * i[0] * maintenance[machine, day] )
late_kpi = mdl.sum(temp)
mdl.minimize(late_kpi)
Print information on the model.
Even with this small didactic data set, the number of variables is higher than 1000 and hence the Commercial Edition of CPLEX needs to be used.
mdl.print_information()
Model: PredictiveMaintenance - number of variables: 1200 - binary=600, integer=0, continuous=600 - number of constraints: 650 - linear=650 - parameters: defaults - objective: minimize - problem type is: MILP
You now can solve the model.
The engine log shows how fast the model is solved.
s = mdl.solve(log_output=True)
assert s, "solve failed"
mdl.report()
all_kpis = [(kp.name, kp.compute()) for kp in mdl.iter_kpis()]
df_kpis = pd.DataFrame(all_kpis, columns=['kpi', 'value'])
WARNING: Number of workers has been reduced to 2 to comply with platform limitations. Version identifier: 22.1.1.0 | 2022-11-28 | 9160aff4d CPXPARAM_Read_DataCheck 1 CPXPARAM_Threads 2 Tried aggregator 1 time. MIP Presolve eliminated 600 rows and 600 columns. Reduced MIP has 50 rows, 600 columns, and 1200 nonzeros. Reduced MIP has 600 binaries, 0 generals, 0 SOSs, and 0 indicators. Presolve time = 0.01 sec. (1.34 ticks) Found incumbent of value 34824.000000 after 0.02 sec. (2.90 ticks) Probing time = 0.00 sec. (0.82 ticks) Tried aggregator 1 time. Detecting symmetries... Reduced MIP has 50 rows, 600 columns, and 1200 nonzeros. Reduced MIP has 600 binaries, 0 generals, 0 SOSs, and 0 indicators. Presolve time = 0.01 sec. (1.10 ticks) Probing time = 0.00 sec. (0.82 ticks) Clique table members: 30. MIP emphasis: balance optimality and feasibility. MIP search method: dynamic search. Parallel mode: deterministic, using up to 2 threads. Root relaxation solution time = 0.00 sec. (0.59 ticks) Nodes Cuts/ Node Left Objective IInf Best Integer Best Bound ItCnt Gap * 0+ 0 34824.0000 0.0000 100.00% * 0+ 0 23966.7000 0.0000 100.00% * 0+ 0 23886.7000 0.0000 100.00% * 0 0 integral 0 19795.9000 19795.9000 50 0.00% Elapsed time = 0.07 sec. (6.99 ticks, tree = 0.00 MB, solutions = 4) Root node processing (before b&c): Real time = 0.07 sec. (7.05 ticks) Parallel b&c, 2 threads: Real time = 0.00 sec. (0.00 ticks) Sync time (average) = 0.00 sec. Wait time (average) = 0.00 sec. ------------ Total (root+branch&cut) = 0.07 sec. (7.05 ticks) * model PredictiveMaintenance solved with objective = 19795.900 * KPI: Cost = 19795.900 * KPI: Total Planned Production = 57832.000 * KPI: Total Production = 56956.000
You can now access the solution value and create useful pandas data frames.
df_production = df_production.production.apply(lambda v: v.solution_value)
df_production.head()
all_machines all_days M-01 Day-01 103.0 Day-02 108.0 Day-03 105.0 Day-04 130.0 Day-05 130.0 Name: production, dtype: float64
df_maintenance = df_maintenance.maintenance.apply(lambda v: int(v.solution_value))
df_maintenance.head()
all_machines all_days M-01 Day-01 0 Day-02 0 Day-03 0 Day-04 0 Day-05 0 Name: maintenance, dtype: int64
df_production = df_production.to_frame()
df_production['machine'] = df_production.index.get_level_values('all_machines')
df_production['day'] = df_production.index.get_level_values('all_days')
df_production.columns = ['production', 'machine', 'day']
df_production = df_production.reset_index(drop=True)
df_maintenance = df_maintenance.to_frame()
df_maintenance['machine'] = df_maintenance.index.get_level_values('all_machines')
df_maintenance['day'] = df_maintenance.index.get_level_values('all_days')
df_maintenance.columns = ['maintenance', 'machine', 'day']
df_maintenance = df_maintenance.reset_index(drop=True)
df_maintenance.head()
maintenance | machine | day | |
---|---|---|---|
0 | 0 | M-01 | Day-01 |
1 | 0 | M-01 | Day-02 |
2 | 0 | M-01 | Day-03 |
3 | 0 | M-01 | Day-04 |
4 | 0 | M-01 | Day-05 |
Display the maintenance plan.
alt.Chart(df_maintenance).mark_rect().encode(
x='day:O',
y='machine:O',
color='maintenance:Q'
).properties(
width=1200,
height=500
)
Alain Chabrier IBM, France.