티스토리 뷰

FASHION_MNIST_DAY10_with_Python

FASHION MNIST with Python (DAY 10)

DATA SOURCE : https://www.kaggle.com/zalando-research/fashionmnist (Kaggle, Fashion MNIST)

FASHION MNIST with Python (DAY 1) : http://deepstat.tistory.com/35

FASHION MNIST with Python (DAY 2) : http://deepstat.tistory.com/36

FASHION MNIST with Python (DAY 3) : http://deepstat.tistory.com/37

FASHION MNIST with Python (DAY 4) : http://deepstat.tistory.com/38

FASHION MNIST with Python (DAY 5) : http://deepstat.tistory.com/39

FASHION MNIST with Python (DAY 6) : http://deepstat.tistory.com/40

FASHION MNIST with Python (DAY 7) : http://deepstat.tistory.com/41

FASHION MNIST with Python (DAY 8) : http://deepstat.tistory.com/42

FASHION MNIST with Python (DAY 9) : http://deepstat.tistory.com/43

Datasets

Importing numpy, pandas, pyplot

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Loading datasets

In [2]:
data_train = pd.read_csv("../datasets/fashion-mnist_train.csv")
data_test = pd.read_csv("../datasets/fashion-mnist_test.csv")
In [3]:
data_train_y = data_train.label
y_test = data_test.label
In [4]:
data_train_x = data_train.drop("label",axis=1)/256
x_test = data_test.drop("label",axis=1)/256

Spliting valid and training

In [5]:
np.random.seed(0)
valid2_idx = np.random.choice(60000,10000,replace = False)
valid1_idx = np.random.choice(list(set(range(60000)) - set(valid2_idx)),10000,replace=False)
train_idx = list(set(range(60000))-set(valid1_idx)-set(valid2_idx))

x_train = data_train_x.iloc[train_idx,:]
y_train = data_train_y.iloc[train_idx]

x_valid1 = data_train_x.iloc[valid1_idx,:]
y_valid1 = data_train_y.iloc[valid1_idx]

x_valid2 = data_train_x.iloc[valid2_idx,:]
y_valid2 = data_train_y.iloc[valid2_idx]

Multinomial Logistic Regression

Importing LogisticRegression

In [6]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix

Fitting Logistic Regression

In [7]:
LR_model = LogisticRegression().fit(x_train, y_train)
In [8]:
LR_model_pred_valid1 = LR_model.predict(x_valid1)
LR_model_pred_valid2 = LR_model.predict(x_valid2)

Validation Accuracy

In [9]:
LR_model_valid1_acc = (LR_model_pred_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",LR_model_valid1_acc)

LR_model_valid2_acc = (LR_model_pred_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",LR_model_valid2_acc)
VALIDATION ACCURACY = 0.8501
VALIDATION ACCURACY = 0.8487

Decision Tree

Importing DecisionTreeClassifier

In [10]:
from sklearn.tree import DecisionTreeClassifier

Smaller Tree (to avoid overfitting)

In [11]:
TR_model2 = DecisionTreeClassifier(min_samples_leaf = 5, max_depth = 12).fit(x_train, y_train)
In [12]:
TR_model2_pred_valid1 = TR_model2.predict(x_valid1)
TR_model2_pred_valid2 = TR_model2.predict(x_valid2)

Validation Accuracy

In [13]:
TR_model2_valid1_acc = (TR_model2_pred_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",TR_model2_valid1_acc)

TR_model_valid2_acc = (TR_model2_pred_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",TR_model_valid2_acc)
VALIDATION ACCURACY = 0.8172
VALIDATION ACCURACY = 0.8163

Bagging

Importing BaggingClassifier

In [14]:
from sklearn.ensemble import BaggingClassifier

Fitting Bagging

In [15]:
BG_model = BaggingClassifier(n_estimators=2000,n_jobs=-1).fit(x_train, y_train)
In [16]:
BG_model_pred_valid1 = BG_model.predict(x_valid1)
BG_model_pred_valid2 = BG_model.predict(x_valid2)

Validation Accuracy

In [17]:
BG_model_valid1_acc = (BG_model_pred_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",BG_model_valid1_acc)

BG_model_valid2_acc = (BG_model_pred_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",BG_model_valid2_acc)
VALIDATION ACCURACY = 0.8779
VALIDATION ACCURACY = 0.8777

Random Forest

Importing RandomForestClassifier

In [18]:
from sklearn.ensemble import RandomForestClassifier

Fitting Random Forest

In [19]:
RF_model = RandomForestClassifier(n_estimators=2000,n_jobs=-1).fit(x_train, y_train)
In [20]:
RF_model_predict_valid1 = RF_model.predict(x_valid1)
RF_model_predict_valid2 = RF_model.predict(x_valid2)

Validation Accuracy

In [21]:
RF_model_valid1_acc = (RF_model_predict_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",RF_model_valid1_acc)

RF_model_valid2_acc = (RF_model_predict_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",RF_model_valid2_acc)
VALIDATION ACCURACY = 0.883
VALIDATION ACCURACY = 0.8806

Gradient Boosting

Importing GradientBoostingClassifier

In [22]:
from sklearn.ensemble import GradientBoostingClassifier

Fitting Gradient Boosting

In [23]:
GBST_model = GradientBoostingClassifier(n_estimators=2000,learning_rate=0.5).fit(x_train, y_train)
In [24]:
GBST_model_predict_valid1 = GBST_model.predict(x_valid1)
GBST_model_predict_valid2 = GBST_model.predict(x_valid2)

Validation Accuracy

In [25]:
GBST_model_valid1_acc = (GBST_model.predict(x_valid1) == y_valid1).mean()
print("VALIDATION ACCURACY =",GBST_model_valid1_acc)

GBST_model_valid2_acc = (GBST_model_predict_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",GBST_model_valid2_acc)
VALIDATION ACCURACY = 0.8667
VALIDATION ACCURACY = 0.8676

AdaBoost

Importing AdaBoostClassifier

In [26]:
from sklearn.ensemble import AdaBoostClassifier

Fitting AdaBoost

In [27]:
ABST_model = AdaBoostClassifier(n_estimators=2000,learning_rate=0.5).fit(x_train, y_train)
In [28]:
ABST_model_predict_valid1 = ABST_model.predict(x_valid1)
ABST_model_predict_valid2 = ABST_model.predict(x_valid2)

Validation Accuracy

In [29]:
ABST_model_valid1_acc = (ABST_model_predict_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",ABST_model_valid1_acc)

ABST_model_valid2_acc = (ABST_model_predict_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",ABST_model_valid2_acc)
VALIDATION ACCURACY = 0.5567
VALIDATION ACCURACY = 0.5525

Support Vector Machine

Importing SVC

In [30]:
from sklearn.svm import SVC

Fitting SVC with cost 100

In [31]:
SVM_model = SVC(C=100).fit(x_train, y_train)
In [32]:
SVM_model_predict_valid1 = SVM_model.predict(x_valid1)
SVM_model_predict_valid2 = SVM_model.predict(x_valid2)

Validation Accuracy

In [33]:
SVM_model_valid1_acc = (SVM_model_predict_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",SVM_model_valid1_acc)

SVM_model_valid2_acc = (SVM_model_predict_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",SVM_model_valid2_acc)
VALIDATION ACCURACY = 0.884
VALIDATION ACCURACY = 0.8821

K-Nearest Neighbors

Importing KNeighborsClassifier

In [34]:
from sklearn.neighbors import KNeighborsClassifier

Fitting KNN with k=8

In [35]:
KNN_model = KNeighborsClassifier(n_neighbors=8).fit(x_train, y_train)
In [36]:
KNN_model_predict_valid1 = KNN_model.predict(x_valid1)
KNN_model_predict_valid2 = KNN_model.predict(x_valid2)

Validation Accuracy

In [37]:
KNN_model_valid1_acc = (KNN_model_predict_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",KNN_model_valid1_acc)

KNN_model_valid2_acc = (KNN_model_predict_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",KNN_model_valid2_acc)
VALIDATION ACCURACY = 0.8519
VALIDATION ACCURACY = 0.851

Linear Discriminant Analysis (LDA)

Importing LinearDiscriminantAnalysis

In [38]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

Fitting LinearDiscriminantAnalysis

In [39]:
LDA_model = LinearDiscriminantAnalysis().fit(x_train, y_train)
/usr/local/lib/python3.6/dist-packages/sklearn/discriminant_analysis.py:442: UserWarning: The priors do not sum to 1. Renormalizing
  UserWarning)
In [40]:
LDA_model_predict_valid1 = LDA_model.predict(x_valid1)
LDA_model_predict_valid2 = LDA_model.predict(x_valid2)

Validation Accuracy

In [41]:
LDA_model_valid1_acc = (LDA_model_predict_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",LDA_model_valid1_acc)

LDA_model_valid2_acc = (LDA_model_predict_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",LDA_model_valid2_acc)
VALIDATION ACCURACY = 0.8188
VALIDATION ACCURACY = 0.8207

Multilayer Perceptron (MLP)

Importing TensorFlow

In [42]:
import tensorflow as tf

Restoring MLP structure

In [43]:
def weight_variables(shape):
    initial = tf.truncated_normal(shape)
    return tf.Variable(initial)

def bias_variables(shape):
    initial = tf.truncated_normal(shape)
    return tf.Variable(initial)    
In [44]:
x = tf.placeholder("float", [None,784])
y = tf.placeholder("int64", [None,])
y_dummies = tf.one_hot(y,depth = 10)

drop_prob = tf.placeholder("float")
training = tf.placeholder("bool")

l1_w = weight_variables([784,640])
l1_b = bias_variables([640])
l1_inner_product = tf.matmul(x, l1_w) + l1_b
l1_leaky_relu = tf.nn.leaky_relu(l1_inner_product)
l1_dropout = tf.layers.dropout(l1_leaky_relu,rate = drop_prob, training = training)

l2_w = weight_variables([640,640])
l2_b = bias_variables([640])
l2_batch_normalization = tf.layers.batch_normalization(l1_dropout, training = training)
l2_inner_product = tf.matmul(l2_batch_normalization, l2_w) + l2_b
l2_reshape = tf.reshape(l2_inner_product,[-1,80,8])
l2_maxout = tf.reshape(
    tf.contrib.layers.maxout(l2_reshape,num_units=1),
    [-1,80])
l2_dropout = tf.layers.dropout(l2_maxout,rate = drop_prob, training = training)

l3_w = weight_variables([80,80])
l3_b = bias_variables([80])
l3_inner_product = tf.matmul(l2_dropout, l3_w) + l3_b
l3_leaky_relu = tf.nn.leaky_relu(l3_inner_product)
l3_dropout = tf.layers.dropout(l3_leaky_relu,rate = drop_prob, training = training)

l4_w = weight_variables([80,80])
l4_b = bias_variables([80])
l4_batch_normalization = tf.layers.batch_normalization(l3_dropout, training = training)
l4_inner_product = tf.matmul(l4_batch_normalization, l4_w) + l4_b
l4_reshape = tf.reshape(l4_inner_product,[-1,10,8])
l4_maxout = tf.reshape(
    tf.contrib.layers.maxout(l4_reshape,num_units=1),
    [-1,10])
l4_log_softmax = tf.nn.log_softmax(l4_maxout)

xent_loss = -tf.reduce_sum( tf.multiply(y_dummies,l4_log_softmax) )

pred_labels = tf.argmax(l4_log_softmax,axis=1)
acc = tf.reduce_mean(tf.cast(tf.equal(y, pred_labels),"float"))

saver = tf.train.Saver()
sess = tf.Session()
saver.restore(sess, "./MLP/model.ckpt")
print("MLP restored.")
INFO:tensorflow:Restoring parameters from ./MLP/model.ckpt
MLP restored.
In [45]:
feed_dict = {x : x_valid1, y : y_valid1, drop_prob : .15, training : False}
MLP_predict_valid1, MLP_valid1_acc = sess.run([pred_labels,acc], feed_dict = feed_dict)

feed_dict = {x : x_valid2, y : y_valid2, drop_prob : .15, training : False}
MLP_predict_valid2, MLP_valid2_acc = sess.run([pred_labels,acc], feed_dict = feed_dict)

Validation Accuracy

In [46]:
print("VALIDATION ACCURACY =",MLP_valid1_acc)
print("VALIDATION ACCURACY =",MLP_valid2_acc)
VALIDATION ACCURACY = 0.8823
VALIDATION ACCURACY = 0.8784

STACKING

In [47]:
x_stck_valid1 = pd.concat([
    pd.get_dummies(LR_model_pred_valid1),
    pd.get_dummies(TR_model2_pred_valid1),
    pd.get_dummies(BG_model_pred_valid1),
    pd.get_dummies(RF_model_predict_valid1),
    pd.get_dummies(GBST_model_predict_valid1),
    pd.get_dummies(ABST_model_predict_valid1),
    pd.get_dummies(SVM_model_predict_valid1),
    pd.get_dummies(KNN_model_predict_valid1),
    pd.get_dummies(LDA_model_predict_valid1),
    pd.get_dummies(MLP_predict_valid1)
],axis=1)
In [48]:
x_stck_valid2 = pd.concat([
    pd.get_dummies(LR_model_pred_valid2),
    pd.get_dummies(TR_model2_pred_valid2),
    pd.get_dummies(BG_model_pred_valid2),
    pd.get_dummies(RF_model_predict_valid2),
    pd.get_dummies(GBST_model_predict_valid2),
    pd.get_dummies(ABST_model_predict_valid2),
    pd.get_dummies(SVM_model_predict_valid2),
    pd.get_dummies(KNN_model_predict_valid2),
    pd.get_dummies(LDA_model_predict_valid2),
    pd.get_dummies(MLP_predict_valid2)
],axis=1)

STACKING(Multinomial Logistic Regression)

Importing LogisticRegression

In [49]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix

Fitting Logistic Regression

In [50]:
stck_LR_model = LogisticRegression().fit(x_stck_valid1, y_valid1)
In [51]:
stck_LR_model_pred_valid1 = stck_LR_model.predict(x_stck_valid1)
stck_LR_model_pred_valid2 = stck_LR_model.predict(x_stck_valid2)

Training Accuracy

In [52]:
stck_LR_model_valid1_acc = (stck_LR_model_pred_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",stck_LR_model_valid1_acc)
VALIDATION ACCURACY = 0.9081

Validation Accuracy

In [53]:
stck_LR_model_valid2_acc = (stck_LR_model_pred_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",stck_LR_model_valid2_acc)
VALIDATION ACCURACY = 0.8993

STACKING(Ramdom Forest)

Importing RandomForestClassifier

In [54]:
from sklearn.ensemble import RandomForestClassifier

Fitting Random Forest

In [55]:
stck_RF_model = RandomForestClassifier(n_estimators=2000,n_jobs=-1).fit(x_stck_valid1, y_valid1)
In [56]:
stck_RF_model_pred_valid1 = stck_RF_model.predict(x_stck_valid1)
stck_RF_model_pred_valid2 = stck_RF_model.predict(x_stck_valid2)

Training Accuracy

In [57]:
stck_RF_model_valid1_acc = (stck_RF_model_pred_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",stck_RF_model_valid1_acc)
VALIDATION ACCURACY = 0.9618

Validation Accuracy

In [58]:
stck_RF_model_valid2_acc = (stck_RF_model_pred_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",stck_RF_model_valid2_acc)
VALIDATION ACCURACY = 0.8949

STACKING(LDA)

Importing LinearDiscriminantAnalysis

In [59]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

Fitting LinearDiscriminantAnalysis

In [60]:
stck_LDA_model = LinearDiscriminantAnalysis().fit(x_stck_valid1, y_valid1)
/usr/local/lib/python3.6/dist-packages/sklearn/discriminant_analysis.py:388: UserWarning: Variables are collinear.
  warnings.warn("Variables are collinear.")
In [61]:
stck_LDA_model_pred_valid1 = stck_LDA_model.predict(x_stck_valid1)
stck_LDA_model_pred_valid2 = stck_LDA_model.predict(x_stck_valid2)

Training Accuracy

In [62]:
stck_LDA_model_valid1_acc = (stck_LDA_model_pred_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",stck_LDA_model_valid1_acc)
VALIDATION ACCURACY = 0.9065

Validation Accuracy

In [63]:
stck_LDA_model_valid2_acc = (stck_LDA_model_pred_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",stck_LDA_model_valid2_acc)
VALIDATION ACCURACY = 0.897

Fitting LinearDiscriminantAnalysis with Shrinkage and Solver 'lsqr'

In [64]:
stck_LDA_model_2 = LinearDiscriminantAnalysis(solver='lsqr',shrinkage='auto').fit(x_stck_valid1, y_valid1)
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py:475: DataConversionWarning: Data with input dtype uint8 was converted to float64 by StandardScaler.
  warnings.warn(msg, DataConversionWarning)
In [65]:
stck_LDA_model_2_pred_valid1 = stck_LDA_model_2.predict(x_stck_valid1)
stck_LDA_model_2_pred_valid2 = stck_LDA_model_2.predict(x_stck_valid2)

Training Accuracy

In [66]:
stck_LDA_model_2_valid1_acc = (stck_LDA_model_2_pred_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",stck_LDA_model_2_valid1_acc)
VALIDATION ACCURACY = 0.8948

Validation Accuracy

In [67]:
stck_LDA_model_2_valid2_acc = (stck_LDA_model_2_pred_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",stck_LDA_model_2_valid2_acc)
VALIDATION ACCURACY = 0.8926

Fitting LinearDiscriminantAnalysis with Shrinkage and Solver 'eigen'

In [68]:
stck_LDA_model_3 = LinearDiscriminantAnalysis(solver='eigen',shrinkage='auto').fit(x_stck_valid1, y_valid1)
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py:475: DataConversionWarning: Data with input dtype uint8 was converted to float64 by StandardScaler.
  warnings.warn(msg, DataConversionWarning)
In [69]:
stck_LDA_model_3_pred_valid1 = stck_LDA_model_3.predict(x_stck_valid1)
stck_LDA_model_3_pred_valid2 = stck_LDA_model_3.predict(x_stck_valid2)

Training Accuracy

In [70]:
stck_LDA_model_3_valid1_acc = (stck_LDA_model_3_pred_valid1 == y_valid1).mean()
print("VALIDATION ACCURACY =",stck_LDA_model_3_valid1_acc)
VALIDATION ACCURACY = 0.8932

Validation Accuracy

In [71]:
stck_LDA_model_3_valid2_acc = (stck_LDA_model_3_pred_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",stck_LDA_model_3_valid2_acc)
VALIDATION ACCURACY = 0.8909

BETTER MODEL

STACKING(Multinomial Logistic Regression)

Validation Accuracy

In [72]:
stck_LR_model_valid2_acc = (stck_LR_model_pred_valid2 == y_valid2).mean()
print("VALIDATION ACCURACY =",stck_LR_model_valid2_acc)
VALIDATION ACCURACY = 0.8993
In [73]:
LR_model_pred_test = LR_model.predict(x_test)
TR_model2_pred_test = TR_model2.predict(x_test)
BG_model_pred_test = BG_model.predict(x_test)
RF_model_predict_test = RF_model.predict(x_test)
GBST_model_predict_test = GBST_model.predict(x_test)
ABST_model_predict_test = ABST_model.predict(x_test)
SVM_model_predict_test = SVM_model.predict(x_test)
KNN_model_predict_test = KNN_model.predict(x_test)
LDA_model_predict_test = LDA_model.predict(x_test)
feed_dict = {x : x_test, y : y_test, drop_prob : .15, training : False}
MLP_predict_test = sess.run(pred_labels, feed_dict = feed_dict)
In [74]:
x_stck_test = pd.concat([
    pd.get_dummies(LR_model_pred_test),
    pd.get_dummies(TR_model2_pred_test),
    pd.get_dummies(BG_model_pred_test),
    pd.get_dummies(RF_model_predict_test),
    pd.get_dummies(GBST_model_predict_test),
    pd.get_dummies(ABST_model_predict_test),
    pd.get_dummies(SVM_model_predict_test),
    pd.get_dummies(KNN_model_predict_test),
    pd.get_dummies(LDA_model_predict_test),
    pd.get_dummies(MLP_predict_test)
],axis=1)

stck_LR_model_pred_test = stck_LR_model.predict(x_stck_test)

TEST ACCURACY (FINAL)

In [75]:
stck_LR_model_test_acc = (stck_LR_model_pred_test == y_test).mean()
print("TEST ACCURACY =",stck_LR_model_test_acc)
TEST ACCURACY = 0.9002


공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
TAG
more
«   2025/05   »
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
글 보관함