Predictive Maintenance using Python and Machine Learning
There has been a lot of talks about how python can replace all of the existing programming languages out there. Python has been the default language for machine learning. So why not use Python to create a predictive maintenance model?
Predictive maintenance or preventive maintenance, is a technique to forecast breakdowns of a fixed asset, such as motors or a CNC machine. Predictive maintenance is important to plan useful lives of assets for a company, a factory or even a small business with one machine, such as a printer or a fax machine.
Predictive maintenance can also save lives by predicting gas leakage or likelihood of breakdowns for precarious industries.
How AI and Python Scripts can build a Predictive Maintenance Model
Automating predictive maintenance is the crux of the matter. We want to automate identification and actions against breakdowns of assets.
With the advent of AI, machines can learn to find patterns and even new issues arising in the future.
In this article, we will build a Predictive maintenance model for the automobile industry. We can predict anomaly before it affects your car performance.
Data Collection from both Open-sourced and Private data
Data for any AI model is the most crucial part. We may use open sourced data or private data from factories, manufacturers and companies.
Machine Learning using various methods
There are four methods in machine learning where we can apply to our predictive maintenance model: LSTM, Random Forests, Decision Trees, and Logistic Regression. We first predict the failure status by using classification and then predict the remaining useful life by regression. Random Forest Trees and Decision Trees will also be used for prediction.
Predictive maintenance using LSTM.
LSTM is a popular way of classification in the machine learning domain. Let’s set our target variable to “Faculty” and assign it as a binary variable (1,0). Input training ‘X’ may have different features, depending on which numerical data you use.
Below is a Python script to predict failure status using LSTM, Random Forests, Decision Trees, and Logistic Regression :
After having all the variables and dependencies setup, we perform a basic ETL to extract, organize and plot our data:
import pandas as pdimport numpy as npfrom sklearn.preprocessing import MinMaxScalerfrom sklearn.metrics import confusion_matrix,accuracy_scorefrom keras.models import Sequentialfrom keras.layers import Dense, Dropout, LSTM, Activationfrom keras.callbacks import EarlyStoppingimport matplotlib.pyplot as pltplt.style.use(‘ggplot’)%matplotlib inlinefeatures_col_name = [‘Meter_No’, ‘KVA Rating’, ‘month’, ‘Current_L1’, ‘Current_L2’,
‘Current_L3’, ‘voltage_L1’, ‘Voltage_L2’, ‘Voltage_L3’, ‘PF_L1’,
‘PF_L2’, ‘PF_L3’, ‘Avg_Current’, ‘Avg_Voltage’, ‘Average_PF’,
‘Avg KVA Monthly’, ‘Avg Peaks Amps’, ‘Current_Unb’, ‘DTS_jobs’]target_col_name=’Faulty’sc=MinMaxScaler()
df_train[features_col_name]=sc.fit_transform(df_train[features_col_name])
df_test[features_col_name]=sc.transform(df_test[features_col_name])def gen_sequence(id_df, seq_length, seq_cols):
df_zeros=pd.DataFrame(np.zeros((seq_length-1,id_df.shape[1])),columns=id_df.columns)
id_df=df_zeros.append(id_df,ignore_index=True)
data_array = id_df[seq_cols].values
num_elements = data_array.shape[0]
lstm_array=[]
for start, stop in zip(range(0, num_elements-seq_length), range(seq_length, num_elements)):
lstm_array.append(data_array[start:stop, :])
return np.array(lstm_array)
We then generate labels on our data for our model:
def gen_label(id_df, seq_length, seq_cols,label):
df_zeros=pd.DataFrame(np.zeros((seq_length-1,id_df.shape[1])),columns=id_df.columns)
id_df=df_zeros.append(id_df,ignore_index=True)
data_array = id_df[seq_cols].values
num_elements = data_array.shape[0]
y_label=[]
for start, stop in zip(range(0, num_elements-seq_length), range(seq_length, num_elements)):
y_label.append(id_df[label][stop])
return np.array(y_label)
Next, we timestamp or window size
seq_length=50
seq_cols=features_col_nameX_train=np.concatenate(list(list(gen_sequence(df_train[df_train[‘month’]==id], seq_length, seq_cols)) for id in df_train[‘month’].unique()))
print(X_train.shape)
# generate y_train
y_train=np.concatenate(list(list(gen_label(df_train[df_train[‘month’]==id], 50, seq_cols,’Faulty’)) for id in df_train[‘month’].unique()))
print(y_train.shape)
Then, we write a script to generate X_test
X_test=np.concatenate(list(list(gen_sequence(df_test[df_test[‘month’]==id], seq_length, seq_cols)) for id in df_test[‘month’].unique()))
print(X_test.shape)
We will also need to write a script to generate y_test
y_test=np.concatenate(list(list(gen_label(df_test[df_test[‘month’]==id], 50, seq_cols,’Faulty’)) for id in df_test[‘month’].unique()))
print(y_test.shape)
We then build out a LSTM model
nb_features =X_train.shape[2]
timestamp=seq_lengthmodel = Sequential()model.add(LSTM(
input_shape=(timestamp, nb_features),
units=100,
return_sequences=True))
model.add(Dropout(0.2))model.add(LSTM(
units=50,
return_sequences=False))
model.add(Dropout(0.2))model.add(Dense(units=1, activation=’sigmoid’))
model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])model.summary()model.fit(X_train, y_train, epochs=100, batch_size=200, validation_split=0.05, verbose=1,
callbacks = [EarlyStopping(monitor=’val_loss’, min_delta=0, patience=0, verbose=0, mode=’auto’)])
Lastly, we train the metrics and output the accuracy score
scores = model.evaluate(X_train, y_train, verbose=1, batch_size=200)
print(‘Accurracy: {}’.format(scores[1]))y_pred=model.predict_classes(X_test)
print(‘Accuracy of model on test data: ‘,accuracy_score(y_test,y_pred))
print(‘Confusion Matrix: \n’,confusion_matrix(y_test,y_pred))
Let’s build out a Logistic Regression, again using Python. We drop the faculty column to assign it to X
X = df.drop(‘Faulty’, axis = 1)
X.Meter_No = pd.to_numeric(X.Meter_No)
We then assign faulty column to y target variable
y = df[‘Faulty’]
Let’s setup testing for the model
X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.3, random_state=100)
We set the model object to be logistic regression
log_reg = LogisticRegression()
We fit the model on training set
log_reg.fit(X_train,y_train)
We predict the results on Testing set
y_pred = log_reg.predict(X_test)
We then calculate the f1 score between our prediction and actual test set.
f1score = f1_score(y_test,y_pred,average=’weighted’)
print(“Logistic regression f1_score is : %f” % f1score)cnf_matrix = metrics.confusion_matrix(y_test, y_pred)
cnf_matrix
Let’s build a Random Forest Classifier object
rfc = RandomForestClassifier(class_weight=”balanced”)
Fit the model on x_train , y_train. We tested teh class weight as balanced.
rfc.fit(X_train, y_train)
We then predict the model, on testing set X_test
rfc_predict = rfc.predict(X_test)
rfc_predict
The last machine learning method we are going to use is Decision Trees
def splitdataset(balance_data):
X = df.iloc[:,0:19]
Y = df.iloc[:,-1]
Let’s split the dataset in to train and test
X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size = 0.3, random_state = 100)
return X, Y, X_train, X_test, y_train, y_test
Next, we make a function to perform training with giniIndex
def train_using_gini(X_train, X_test, y_train):
We then create a classifier object
clf_gini = DecisionTreeClassifier(criterion = “gini”, random_state = 100,max_depth=3, min_samples_leaf=5)
Let’s start the training
clf_gini.fit(X_train, y_train)
return clf_gini
Later, we perform training with entropy
def tarin_using_entropy(X_train, X_test, y_train):
We use decision tree with entropy
clf_entropy = DecisionTreeClassifier(
criterion = “entropy”, random_state = 100,
max_depth = 3, min_samples_leaf = 5)
We perform training with entropy
clf_entropy.fit(X_train, y_train)
return clf_entropy
We make a function to make predictions
def prediction(X_test, clf_object):
Again, we predict with giniIndex
y_pred = clf_object.predict(X_test)
print(“Predicted values:”)
print(y_pred)
return y_pred
Let’s add a function to calculate accuracy and print out the results
def cal_accuracy(y_test, y_pred):
print(“Confusion Matrix: “,
confusion_matrix(y_test, y_pred))
print (“Accuracy : “,
accuracy_score(y_test,y_pred)*100)
print(“Report : “,
classification_report(y_test, y_pred))
Home stretch, we build out phases and predict using gini again.
def main():data = importdata()X, Y, X_train, X_test, y_train, y_test = splitdataset(df)
clf_gini = train_using_gini(X_train, X_test, y_train)
clf_entropy = tarin_using_entropy(X_train, X_test, y_train)
print(“Results Using Gini Index:”)
y_pred_gini = prediction(X_test, clf_gini)
cal_accuracy(y_test, y_pred_gini)if __name__==”__main__”:
main()
Conclusion
After using four different machine learning methods: LSTM, Random Forests, Decision Trees and Logistic Regression, we built out a predictive maintenance system. Our finding shows that time series data helps to improve accuracy for these methods.