Predictive Maintenance using Python and Machine Learning

Growthbotics
5 min readNov 7, 2020

There has been a lot of talks about how python can replace all of the existing programming languages out there. Python has been the default language for machine learning. So why not use Python to create a predictive maintenance model?

Predictive maintenance or preventive maintenance, is a technique to forecast breakdowns of a fixed asset, such as motors or a CNC machine. Predictive maintenance is important to plan useful lives of assets for a company, a factory or even a small business with one machine, such as a printer or a fax machine.

Predictive maintenance can also save lives by predicting gas leakage or likelihood of breakdowns for precarious industries.

How AI and Python Scripts can build a Predictive Maintenance Model

Automating predictive maintenance is the crux of the matter. We want to automate identification and actions against breakdowns of assets.

With the advent of AI, machines can learn to find patterns and even new issues arising in the future.

In this article, we will build a Predictive maintenance model for the automobile industry. We can predict anomaly before it affects your car performance.

Data Collection from both Open-sourced and Private data

Data for any AI model is the most crucial part. We may use open sourced data or private data from factories, manufacturers and companies.

Machine Learning using various methods

There are four methods in machine learning where we can apply to our predictive maintenance model: LSTM, Random Forests, Decision Trees, and Logistic Regression. We first predict the failure status by using classification and then predict the remaining useful life by regression. Random Forest Trees and Decision Trees will also be used for prediction.

Predictive maintenance using LSTM.

LSTM is a popular way of classification in the machine learning domain. Let’s set our target variable to “Faculty” and assign it as a binary variable (1,0). Input training ‘X’ may have different features, depending on which numerical data you use.

Below is a Python script to predict failure status using LSTM, Random Forests, Decision Trees, and Logistic Regression :

After having all the variables and dependencies setup, we perform a basic ETL to extract, organize and plot our data:

import pandas as pdimport numpy as npfrom sklearn.preprocessing import MinMaxScalerfrom sklearn.metrics import confusion_matrix,accuracy_scorefrom keras.models import Sequentialfrom keras.layers import Dense, Dropout, LSTM, Activationfrom keras.callbacks import EarlyStoppingimport matplotlib.pyplot as pltplt.style.use(‘ggplot’)%matplotlib inlinefeatures_col_name = [‘Meter_No’, ‘KVA Rating’, ‘month’, ‘Current_L1’, ‘Current_L2’,
‘Current_L3’, ‘voltage_L1’, ‘Voltage_L2’, ‘Voltage_L3’, ‘PF_L1’,
‘PF_L2’, ‘PF_L3’, ‘Avg_Current’, ‘Avg_Voltage’, ‘Average_PF’,
‘Avg KVA Monthly’, ‘Avg Peaks Amps’, ‘Current_Unb’, ‘DTS_jobs’]
target_col_name=’Faulty’sc=MinMaxScaler()
df_train[features_col_name]=sc.fit_transform(df_train[features_col_name])
df_test[features_col_name]=sc.transform(df_test[features_col_name])
def gen_sequence(id_df, seq_length, seq_cols):
df_zeros=pd.DataFrame(np.zeros((seq_length-1,id_df.shape[1])),columns=id_df.columns)
id_df=df_zeros.append(id_df,ignore_index=True)
data_array = id_df[seq_cols].values
num_elements = data_array.shape[0]
lstm_array=[]
for start, stop in zip(range(0, num_elements-seq_length), range(seq_length, num_elements)):
lstm_array.append(data_array[start:stop, :])
return np.array(lstm_array)

We then generate labels on our data for our model:

def gen_label(id_df, seq_length, seq_cols,label):
df_zeros=pd.DataFrame(np.zeros((seq_length-1,id_df.shape[1])),columns=id_df.columns)
id_df=df_zeros.append(id_df,ignore_index=True)
data_array = id_df[seq_cols].values
num_elements = data_array.shape[0]
y_label=[]
for start, stop in zip(range(0, num_elements-seq_length), range(seq_length, num_elements)):
y_label.append(id_df[label][stop])
return np.array(y_label)

Next, we timestamp or window size

seq_length=50
seq_cols=features_col_name
X_train=np.concatenate(list(list(gen_sequence(df_train[df_train[‘month’]==id], seq_length, seq_cols)) for id in df_train[‘month’].unique()))
print(X_train.shape)
# generate y_train
y_train=np.concatenate(list(list(gen_label(df_train[df_train[‘month’]==id], 50, seq_cols,’Faulty’)) for id in df_train[‘month’].unique()))
print(y_train.shape)

Then, we write a script to generate X_test

X_test=np.concatenate(list(list(gen_sequence(df_test[df_test[‘month’]==id], seq_length, seq_cols)) for id in df_test[‘month’].unique()))
print(X_test.shape)

We will also need to write a script to generate y_test

y_test=np.concatenate(list(list(gen_label(df_test[df_test[‘month’]==id], 50, seq_cols,’Faulty’)) for id in df_test[‘month’].unique()))
print(y_test.shape)

We then build out a LSTM model

nb_features =X_train.shape[2]
timestamp=seq_length
model = Sequential()model.add(LSTM(
input_shape=(timestamp, nb_features),
units=100,
return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(
units=50,
return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(units=1, activation=’sigmoid’))
model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
model.summary()model.fit(X_train, y_train, epochs=100, batch_size=200, validation_split=0.05, verbose=1,
callbacks = [EarlyStopping(monitor=’val_loss’, min_delta=0, patience=0, verbose=0, mode=’auto’)])

Lastly, we train the metrics and output the accuracy score

scores = model.evaluate(X_train, y_train, verbose=1, batch_size=200)
print(‘Accurracy: {}’.format(scores[1]))
y_pred=model.predict_classes(X_test)
print(‘Accuracy of model on test data: ‘,accuracy_score(y_test,y_pred))
print(‘Confusion Matrix: \n’,confusion_matrix(y_test,y_pred))

Let’s build out a Logistic Regression, again using Python. We drop the faculty column to assign it to X

X = df.drop(‘Faulty’, axis = 1)
X.Meter_No = pd.to_numeric(X.Meter_No)

We then assign faulty column to y target variable

y = df[‘Faulty’]

Let’s setup testing for the model

X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.3, random_state=100)

We set the model object to be logistic regression

log_reg = LogisticRegression()

We fit the model on training set

log_reg.fit(X_train,y_train)

We predict the results on Testing set

y_pred = log_reg.predict(X_test)

We then calculate the f1 score between our prediction and actual test set.

f1score = f1_score(y_test,y_pred,average=’weighted’)
print(“Logistic regression f1_score is : %f” % f1score)
cnf_matrix = metrics.confusion_matrix(y_test, y_pred)
cnf_matrix

Let’s build a Random Forest Classifier object

rfc = RandomForestClassifier(class_weight=”balanced”)

Fit the model on x_train , y_train. We tested teh class weight as balanced.

rfc.fit(X_train, y_train)

We then predict the model, on testing set X_test

rfc_predict = rfc.predict(X_test)
rfc_predict

The last machine learning method we are going to use is Decision Trees

def splitdataset(balance_data):

X = df.iloc[:,0:19]
Y = df.iloc[:,-1]

Let’s split the dataset in to train and test

X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size = 0.3, random_state = 100)

return X, Y, X_train, X_test, y_train, y_test

Next, we make a function to perform training with giniIndex

def train_using_gini(X_train, X_test, y_train):

We then create a classifier object

clf_gini = DecisionTreeClassifier(criterion = “gini”, random_state = 100,max_depth=3, min_samples_leaf=5)

Let’s start the training

clf_gini.fit(X_train, y_train)
return clf_gini

Later, we perform training with entropy

def tarin_using_entropy(X_train, X_test, y_train):

We use decision tree with entropy

clf_entropy = DecisionTreeClassifier(
criterion = “entropy”, random_state = 100,
max_depth = 3, min_samples_leaf = 5)

We perform training with entropy

clf_entropy.fit(X_train, y_train)
return clf_entropy

We make a function to make predictions

def prediction(X_test, clf_object):

Again, we predict with giniIndex

y_pred = clf_object.predict(X_test)
print(“Predicted values:”)
print(y_pred)
return y_pred

Let’s add a function to calculate accuracy and print out the results

def cal_accuracy(y_test, y_pred):
print(“Confusion Matrix: “,
confusion_matrix(y_test, y_pred))

print (“Accuracy : “,
accuracy_score(y_test,y_pred)*100)

print(“Report : “,
classification_report(y_test, y_pred))

Home stretch, we build out phases and predict using gini again.

def main():data = importdata()X, Y, X_train, X_test, y_train, y_test = splitdataset(df)
clf_gini = train_using_gini(X_train, X_test, y_train)
clf_entropy = tarin_using_entropy(X_train, X_test, y_train)
print(“Results Using Gini Index:”)
y_pred_gini = prediction(X_test, clf_gini)
cal_accuracy(y_test, y_pred_gini)
if __name__==”__main__”:
main()

Conclusion

After using four different machine learning methods: LSTM, Random Forests, Decision Trees and Logistic Regression, we built out a predictive maintenance system. Our finding shows that time series data helps to improve accuracy for these methods.

--

--

Growthbotics

Co-Founder of Growthbotics.com. AI Chatbots Service for the Finance and Blockchain Industry. Contact me at wilson@growthbotics.com