티스토리 뷰
FASHION MNIST with Python (DAY 10)¶
DATA SOURCE : https://www.kaggle.com/zalando-research/fashionmnist (Kaggle, Fashion MNIST)
FASHION MNIST with Python (DAY 1) : http://deepstat.tistory.com/35
FASHION MNIST with Python (DAY 2) : http://deepstat.tistory.com/36
FASHION MNIST with Python (DAY 3) : http://deepstat.tistory.com/37
FASHION MNIST with Python (DAY 4) : http://deepstat.tistory.com/38
FASHION MNIST with Python (DAY 5) : http://deepstat.tistory.com/39
FASHION MNIST with Python (DAY 6) : http://deepstat.tistory.com/40
FASHION MNIST with Python (DAY 7) : http://deepstat.tistory.com/41
FASHION MNIST with Python (DAY 8) : http://deepstat.tistory.com/42
FASHION MNIST with Python (DAY 9) : http://deepstat.tistory.com/43
Datasets¶
Importing numpy, pandas, pyplot¶
In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Loading datasets¶
In [2]:
data_train = pd.read_csv("../datasets/fashion-mnist_train.csv")
data_test = pd.read_csv("../datasets/fashion-mnist_test.csv")
In [3]:
data_train_y = data_train.label
y_test = data_test.label
In [4]:
data_train_x = data_train.drop("label",axis=1)/256
x_test = data_test.drop("label",axis=1)/256
Spliting valid and training¶
In [5]:
np.random.seed(0)
valid2_idx = np.random.choice(60000,10000,replace = False)
valid1_idx = np.random.choice(list(set(range(60000)) - set(valid2_idx)),10000,replace=False)
train_idx = list(set(range(60000))-set(valid1_idx)-set(valid2_idx))
x_train = data_train_x.iloc[train_idx,:]
y_train = data_train_y.iloc[train_idx]
x_valid1 = data_train_x.iloc[valid1_idx,:]
y_valid1 = data_train_y.iloc[valid1_idx]
x_valid2 = data_train_x.iloc[valid2_idx,:]
y_valid2 = data_train_y.iloc[valid2_idx]
Multinomial Logistic Regression¶
Importing LogisticRegression¶
In [6]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
Fitting Logistic Regression¶
In [7]:
LR_model = LogisticRegression().fit(x_train, y_train)
In [8]:
LR_model_pred_valid1 = LR_model.predict(x_valid1)
LR_model_pred_valid2 = LR_model.predict(x_valid2)
Validation Accuracy¶
In [9]:
LR_model_valid1_acc = (LR_model_pred_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",LR_model_valid1_acc)
LR_model_valid2_acc = (LR_model_pred_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",LR_model_valid2_acc)
Decision Tree¶
Importing DecisionTreeClassifier¶
In [10]:
from sklearn.tree import DecisionTreeClassifier
Smaller Tree (to avoid overfitting)¶
In [11]:
TR_model2 = DecisionTreeClassifier(min_samples_leaf = 5, max_depth = 12).fit(x_train, y_train)
In [12]:
TR_model2_pred_valid1 = TR_model2.predict(x_valid1)
TR_model2_pred_valid2 = TR_model2.predict(x_valid2)
Validation Accuracy¶
In [13]:
TR_model2_valid1_acc = (TR_model2_pred_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",TR_model2_valid1_acc)
TR_model_valid2_acc = (TR_model2_pred_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",TR_model_valid2_acc)
Bagging¶
Importing BaggingClassifier¶
In [14]:
from sklearn.ensemble import BaggingClassifier
Fitting Bagging¶
In [15]:
BG_model = BaggingClassifier(n_estimators=2000,n_jobs=-1).fit(x_train, y_train)
In [16]:
BG_model_pred_valid1 = BG_model.predict(x_valid1)
BG_model_pred_valid2 = BG_model.predict(x_valid2)
Validation Accuracy¶
In [17]:
BG_model_valid1_acc = (BG_model_pred_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",BG_model_valid1_acc)
BG_model_valid2_acc = (BG_model_pred_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",BG_model_valid2_acc)
Random Forest¶
Importing RandomForestClassifier¶
In [18]:
from sklearn.ensemble import RandomForestClassifier
Fitting Random Forest¶
In [19]:
RF_model = RandomForestClassifier(n_estimators=2000,n_jobs=-1).fit(x_train, y_train)
In [20]:
RF_model_predict_valid1 = RF_model.predict(x_valid1)
RF_model_predict_valid2 = RF_model.predict(x_valid2)
Validation Accuracy¶
In [21]:
RF_model_valid1_acc = (RF_model_predict_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",RF_model_valid1_acc)
RF_model_valid2_acc = (RF_model_predict_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",RF_model_valid2_acc)
Gradient Boosting¶
Importing GradientBoostingClassifier¶
In [22]:
from sklearn.ensemble import GradientBoostingClassifier
Fitting Gradient Boosting¶
In [23]:
GBST_model = GradientBoostingClassifier(n_estimators=2000,learning_rate=0.5).fit(x_train, y_train)
In [24]:
GBST_model_predict_valid1 = GBST_model.predict(x_valid1)
GBST_model_predict_valid2 = GBST_model.predict(x_valid2)
Validation Accuracy¶
In [25]:
GBST_model_valid1_acc = (GBST_model.predict(x_valid1) == y_valid1).mean()
print("VALIDATION ACCURACY =",GBST_model_valid1_acc)
GBST_model_valid2_acc = (GBST_model_predict_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",GBST_model_valid2_acc)
AdaBoost¶
Importing AdaBoostClassifier¶
In [26]:
from sklearn.ensemble import AdaBoostClassifier
Fitting AdaBoost¶
In [27]:
ABST_model = AdaBoostClassifier(n_estimators=2000,learning_rate=0.5).fit(x_train, y_train)
In [28]:
ABST_model_predict_valid1 = ABST_model.predict(x_valid1)
ABST_model_predict_valid2 = ABST_model.predict(x_valid2)
Validation Accuracy¶
In [29]:
ABST_model_valid1_acc = (ABST_model_predict_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",ABST_model_valid1_acc)
ABST_model_valid2_acc = (ABST_model_predict_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",ABST_model_valid2_acc)
Support Vector Machine¶
Importing SVC¶
In [30]:
from sklearn.svm import SVC
Fitting SVC with cost 100¶
In [31]:
SVM_model = SVC(C=100).fit(x_train, y_train)
In [32]:
SVM_model_predict_valid1 = SVM_model.predict(x_valid1)
SVM_model_predict_valid2 = SVM_model.predict(x_valid2)
Validation Accuracy¶
In [33]:
SVM_model_valid1_acc = (SVM_model_predict_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",SVM_model_valid1_acc)
SVM_model_valid2_acc = (SVM_model_predict_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",SVM_model_valid2_acc)
K-Nearest Neighbors¶
Importing KNeighborsClassifier¶
In [34]:
from sklearn.neighbors import KNeighborsClassifier
Fitting KNN with k=8¶
In [35]:
KNN_model = KNeighborsClassifier(n_neighbors=8).fit(x_train, y_train)
In [36]:
KNN_model_predict_valid1 = KNN_model.predict(x_valid1)
KNN_model_predict_valid2 = KNN_model.predict(x_valid2)
Validation Accuracy¶
In [37]:
KNN_model_valid1_acc = (KNN_model_predict_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",KNN_model_valid1_acc)
KNN_model_valid2_acc = (KNN_model_predict_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",KNN_model_valid2_acc)
Linear Discriminant Analysis (LDA)¶
Importing LinearDiscriminantAnalysis¶
In [38]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
Fitting LinearDiscriminantAnalysis¶
In [39]:
LDA_model = LinearDiscriminantAnalysis().fit(x_train, y_train)
In [40]:
LDA_model_predict_valid1 = LDA_model.predict(x_valid1)
LDA_model_predict_valid2 = LDA_model.predict(x_valid2)
Validation Accuracy¶
In [41]:
LDA_model_valid1_acc = (LDA_model_predict_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",LDA_model_valid1_acc)
LDA_model_valid2_acc = (LDA_model_predict_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",LDA_model_valid2_acc)
Multilayer Perceptron (MLP)¶
Importing TensorFlow¶
In [42]:
import tensorflow as tf
Restoring MLP structure¶
In [43]:
def weight_variables(shape):
initial = tf.truncated_normal(shape)
return tf.Variable(initial)
def bias_variables(shape):
initial = tf.truncated_normal(shape)
return tf.Variable(initial)
In [44]:
x = tf.placeholder("float", [None,784])
y = tf.placeholder("int64", [None,])
y_dummies = tf.one_hot(y,depth = 10)
drop_prob = tf.placeholder("float")
training = tf.placeholder("bool")
l1_w = weight_variables([784,640])
l1_b = bias_variables([640])
l1_inner_product = tf.matmul(x, l1_w) + l1_b
l1_leaky_relu = tf.nn.leaky_relu(l1_inner_product)
l1_dropout = tf.layers.dropout(l1_leaky_relu,rate = drop_prob, training = training)
l2_w = weight_variables([640,640])
l2_b = bias_variables([640])
l2_batch_normalization = tf.layers.batch_normalization(l1_dropout, training = training)
l2_inner_product = tf.matmul(l2_batch_normalization, l2_w) + l2_b
l2_reshape = tf.reshape(l2_inner_product,[-1,80,8])
l2_maxout = tf.reshape(
tf.contrib.layers.maxout(l2_reshape,num_units=1),
[-1,80])
l2_dropout = tf.layers.dropout(l2_maxout,rate = drop_prob, training = training)
l3_w = weight_variables([80,80])
l3_b = bias_variables([80])
l3_inner_product = tf.matmul(l2_dropout, l3_w) + l3_b
l3_leaky_relu = tf.nn.leaky_relu(l3_inner_product)
l3_dropout = tf.layers.dropout(l3_leaky_relu,rate = drop_prob, training = training)
l4_w = weight_variables([80,80])
l4_b = bias_variables([80])
l4_batch_normalization = tf.layers.batch_normalization(l3_dropout, training = training)
l4_inner_product = tf.matmul(l4_batch_normalization, l4_w) + l4_b
l4_reshape = tf.reshape(l4_inner_product,[-1,10,8])
l4_maxout = tf.reshape(
tf.contrib.layers.maxout(l4_reshape,num_units=1),
[-1,10])
l4_log_softmax = tf.nn.log_softmax(l4_maxout)
xent_loss = -tf.reduce_sum( tf.multiply(y_dummies,l4_log_softmax) )
pred_labels = tf.argmax(l4_log_softmax,axis=1)
acc = tf.reduce_mean(tf.cast(tf.equal(y, pred_labels),"float"))
saver = tf.train.Saver()
sess = tf.Session()
saver.restore(sess, "./MLP/model.ckpt")
print("MLP restored.")
In [45]:
feed_dict = {x : x_valid1, y : y_valid1, drop_prob : .15, training : False}
MLP_predict_valid1, MLP_valid1_acc = sess.run([pred_labels,acc], feed_dict = feed_dict)
feed_dict = {x : x_valid2, y : y_valid2, drop_prob : .15, training : False}
MLP_predict_valid2, MLP_valid2_acc = sess.run([pred_labels,acc], feed_dict = feed_dict)
Validation Accuracy¶
In [46]:
print("VALIDATION ACCURACY =",MLP_valid1_acc)
print("VALIDATION ACCURACY =",MLP_valid2_acc)
STACKING¶
In [47]:
x_stck_valid1 = pd.concat([
pd.get_dummies(LR_model_pred_valid1),
pd.get_dummies(TR_model2_pred_valid1),
pd.get_dummies(BG_model_pred_valid1),
pd.get_dummies(RF_model_predict_valid1),
pd.get_dummies(GBST_model_predict_valid1),
pd.get_dummies(ABST_model_predict_valid1),
pd.get_dummies(SVM_model_predict_valid1),
pd.get_dummies(KNN_model_predict_valid1),
pd.get_dummies(LDA_model_predict_valid1),
pd.get_dummies(MLP_predict_valid1)
],axis=1)
In [48]:
x_stck_valid2 = pd.concat([
pd.get_dummies(LR_model_pred_valid2),
pd.get_dummies(TR_model2_pred_valid2),
pd.get_dummies(BG_model_pred_valid2),
pd.get_dummies(RF_model_predict_valid2),
pd.get_dummies(GBST_model_predict_valid2),
pd.get_dummies(ABST_model_predict_valid2),
pd.get_dummies(SVM_model_predict_valid2),
pd.get_dummies(KNN_model_predict_valid2),
pd.get_dummies(LDA_model_predict_valid2),
pd.get_dummies(MLP_predict_valid2)
],axis=1)
STACKING(Multinomial Logistic Regression)¶
Importing LogisticRegression¶
In [49]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
Fitting Logistic Regression¶
In [50]:
stck_LR_model = LogisticRegression().fit(x_stck_valid1, y_valid1)
In [51]:
stck_LR_model_pred_valid1 = stck_LR_model.predict(x_stck_valid1)
stck_LR_model_pred_valid2 = stck_LR_model.predict(x_stck_valid2)
Training Accuracy¶
In [52]:
stck_LR_model_valid1_acc = (stck_LR_model_pred_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",stck_LR_model_valid1_acc)
Validation Accuracy¶
In [53]:
stck_LR_model_valid2_acc = (stck_LR_model_pred_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",stck_LR_model_valid2_acc)
STACKING(Ramdom Forest)¶
Importing RandomForestClassifier¶
In [54]:
from sklearn.ensemble import RandomForestClassifier
Fitting Random Forest¶
In [55]:
stck_RF_model = RandomForestClassifier(n_estimators=2000,n_jobs=-1).fit(x_stck_valid1, y_valid1)
In [56]:
stck_RF_model_pred_valid1 = stck_RF_model.predict(x_stck_valid1)
stck_RF_model_pred_valid2 = stck_RF_model.predict(x_stck_valid2)
Training Accuracy¶
In [57]:
stck_RF_model_valid1_acc = (stck_RF_model_pred_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",stck_RF_model_valid1_acc)
Validation Accuracy¶
In [58]:
stck_RF_model_valid2_acc = (stck_RF_model_pred_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",stck_RF_model_valid2_acc)
STACKING(LDA)¶
Importing LinearDiscriminantAnalysis¶
In [59]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
Fitting LinearDiscriminantAnalysis¶
In [60]:
stck_LDA_model = LinearDiscriminantAnalysis().fit(x_stck_valid1, y_valid1)
In [61]:
stck_LDA_model_pred_valid1 = stck_LDA_model.predict(x_stck_valid1)
stck_LDA_model_pred_valid2 = stck_LDA_model.predict(x_stck_valid2)
Training Accuracy¶
In [62]:
stck_LDA_model_valid1_acc = (stck_LDA_model_pred_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",stck_LDA_model_valid1_acc)
Validation Accuracy¶
In [63]:
stck_LDA_model_valid2_acc = (stck_LDA_model_pred_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",stck_LDA_model_valid2_acc)
Fitting LinearDiscriminantAnalysis with Shrinkage and Solver 'lsqr'¶
In [64]:
stck_LDA_model_2 = LinearDiscriminantAnalysis(solver='lsqr',shrinkage='auto').fit(x_stck_valid1, y_valid1)
In [65]:
stck_LDA_model_2_pred_valid1 = stck_LDA_model_2.predict(x_stck_valid1)
stck_LDA_model_2_pred_valid2 = stck_LDA_model_2.predict(x_stck_valid2)
Training Accuracy¶
In [66]:
stck_LDA_model_2_valid1_acc = (stck_LDA_model_2_pred_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",stck_LDA_model_2_valid1_acc)
Validation Accuracy¶
In [67]:
stck_LDA_model_2_valid2_acc = (stck_LDA_model_2_pred_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",stck_LDA_model_2_valid2_acc)
Fitting LinearDiscriminantAnalysis with Shrinkage and Solver 'eigen'¶
In [68]:
stck_LDA_model_3 = LinearDiscriminantAnalysis(solver='eigen',shrinkage='auto').fit(x_stck_valid1, y_valid1)
In [69]:
stck_LDA_model_3_pred_valid1 = stck_LDA_model_3.predict(x_stck_valid1)
stck_LDA_model_3_pred_valid2 = stck_LDA_model_3.predict(x_stck_valid2)
Training Accuracy¶
In [70]:
stck_LDA_model_3_valid1_acc = (stck_LDA_model_3_pred_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",stck_LDA_model_3_valid1_acc)
Validation Accuracy¶
In [71]:
stck_LDA_model_3_valid2_acc = (stck_LDA_model_3_pred_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",stck_LDA_model_3_valid2_acc)
BETTER MODEL¶
STACKING(Multinomial Logistic Regression)¶
Validation Accuracy¶
In [72]:
stck_LR_model_valid2_acc = (stck_LR_model_pred_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",stck_LR_model_valid2_acc)
In [73]:
LR_model_pred_test = LR_model.predict(x_test)
TR_model2_pred_test = TR_model2.predict(x_test)
BG_model_pred_test = BG_model.predict(x_test)
RF_model_predict_test = RF_model.predict(x_test)
GBST_model_predict_test = GBST_model.predict(x_test)
ABST_model_predict_test = ABST_model.predict(x_test)
SVM_model_predict_test = SVM_model.predict(x_test)
KNN_model_predict_test = KNN_model.predict(x_test)
LDA_model_predict_test = LDA_model.predict(x_test)
feed_dict = {x : x_test, y : y_test, drop_prob : .15, training : False}
MLP_predict_test = sess.run(pred_labels, feed_dict = feed_dict)
In [74]:
x_stck_test = pd.concat([
pd.get_dummies(LR_model_pred_test),
pd.get_dummies(TR_model2_pred_test),
pd.get_dummies(BG_model_pred_test),
pd.get_dummies(RF_model_predict_test),
pd.get_dummies(GBST_model_predict_test),
pd.get_dummies(ABST_model_predict_test),
pd.get_dummies(SVM_model_predict_test),
pd.get_dummies(KNN_model_predict_test),
pd.get_dummies(LDA_model_predict_test),
pd.get_dummies(MLP_predict_test)
],axis=1)
stck_LR_model_pred_test = stck_LR_model.predict(x_stck_test)
TEST ACCURACY (FINAL)¶
In [75]:
stck_LR_model_test_acc = (stck_LR_model_pred_test == y_test).mean()
print("TEST ACCURACY =",stck_LR_model_test_acc)
'Real Data Analysis' 카테고리의 다른 글
'ggmap'을 이용해서 대구 메트로 시각화하기 - 1일차 (0) | 2018.10.27 |
---|---|
CNN for FASHION MNIST with Tensorflow (test accuracy 0.9308) (0) | 2018.10.21 |
FASHION MNIST with Python (DAY 9) - MLP using reused variables (0) | 2018.09.01 |
FASHION MNIST with Python (DAY 8) - CNN (0) | 2018.08.26 |
FASHION MNIST with Python (DAY 7) - MLP (0) | 2018.08.22 |