티스토리 뷰

FASHION_MNIST_DAY3_with_Python

FASHION MNIST with Python (DAY 3)

DATA SOURCE : https://www.kaggle.com/zalando-research/fashionmnist (Kaggle, Fashion MNIST)

FASHION MNIST with Python (DAY 1) : http://deepstat.tistory.com/35

FASHION MNIST with Python (DAY 2) : http://deepstat.tistory.com/36

Datasets

Importing numpy, pandas, pyplot

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Loading datasets

In [2]:
data_train = pd.read_csv("..\\datasets\\fashion-mnist_train.csv")
data_test = pd.read_csv("..\\datasets\\fashion-mnist_test.csv")
In [3]:
data_train_y = data_train.label
y_test = data_test.label
In [4]:
data_train_x = data_train.drop("label",axis=1)/256
x_test = data_test.drop("label",axis=1)/256

Spliting valid and training

In [5]:
np.random.seed(0)
valid2_idx = np.random.choice(60000,10000,replace = False)
valid1_idx = np.random.choice(list(set(range(60000)) - set(valid2_idx)),10000,replace=False)
train_idx = list(set(range(60000))-set(valid1_idx)-set(valid2_idx))

x_train = data_train_x.iloc[train_idx,:]
y_train = data_train_y.iloc[train_idx]

x_valid1 = data_train_x.iloc[valid1_idx,:]
y_valid1 = data_train_y.iloc[valid1_idx]

x_valid2 = data_train_x.iloc[valid2_idx,:]
y_valid2 = data_train_y.iloc[valid2_idx]

Gradient Boosting

Importing GradientBoostingClassifier

In [6]:
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import confusion_matrix
c:\users\stat413server1\appdata\local\programs\python\python36\lib\site-packages\sklearn\ensemble\weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d

Fitting Gradient Boosting

In [7]:
GBST_model = GradientBoostingClassifier(n_estimators=1000,learning_rate=0.5).fit(x_train, y_train)

Training Accuracy

In [8]:
confusion_matrix(GBST_model.predict(x_train),y_train)
Out[8]:
array([[3992,    0,    0,    0,    0,    0,    0,    0,    0,    0],
       [   0, 3990,    1,    0,    0,    0,    0,    0,    0,    0],
       [   0,    0, 3905,    0,   47,    0,   46,    0,    1,    0],
       [   0,    0,    8, 3924,    6,    0,    8,    0,    4,    0],
       [   0,    0,   72,    1, 3920,    0,   45,    0,    5,    0],
       [   0,    0,    0,    0,    0, 3921,    0,    2,    4,    0],
       [   1,    0,   54,    2,   32,   10, 3886,    0,  108,   99],
       [   0,    0,    0,    0,    0,    1,    0, 4100,    3,    0],
       [   1,    0,   16,    2,   11,    0,   20,    0, 3821,    0],
       [   0,    0,    0,    0,    0,    0,    0,    1,    0, 3930]],
      dtype=int64)
In [9]:
GBST_model_train_acc = (GBST_model.predict(x_train) == y_train).mean()
print("TRAINING ACCURACY =",GBST_model_train_acc)
TRAINING ACCURACY = 0.984725

Validation Accuracy

In [10]:
confusion_matrix(GBST_model.predict(x_valid1),y_valid1)
Out[10]:
array([[ 851,    7,   14,   29,    2,    0,  116,    0,    9,    2],
       [   4, 1000,    0,    4,    1,    1,    3,    0,    4,    0],
       [   6,    1,  745,   10,   95,    0,  100,    0,    5,    0],
       [  35,   11,    9,  912,   39,    1,   27,    0,    9,    0],
       [   3,    1,   99,   28,  788,    0,   84,    0,    7,    0],
       [   0,    0,    1,    1,    1, 1002,    1,   11,    7,    9],
       [ 111,    5,   72,   25,   62,    8,  640,    0,   41,   26],
       [   0,    0,    0,    0,    0,   15,    0,  899,    9,   38],
       [   5,    1,    5,    3,    7,    5,   16,    1,  943,    4],
       [   0,    0,    0,    0,    0,   28,    0,   37,    0,  899]],
      dtype=int64)
In [11]:
GBST_model_valid1_acc = (GBST_model.predict(x_valid1) == y_valid1).mean()
print("VALIDATION ACCURACY =",GBST_model_valid1_acc)
VALIDATION ACCURACY = 0.8679
In [12]:
{"TRAIN_ACC" : GBST_model_train_acc , "VALID_ACC" : GBST_model_valid1_acc}
Out[12]:
{'TRAIN_ACC': 0.984725, 'VALID_ACC': 0.8679}

AdaBoost

Importing AdaBoostClassifier

In [13]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import confusion_matrix

Fitting AdaBoost

In [14]:
ABST_model = AdaBoostClassifier(n_estimators=1000,learning_rate=0.5).fit(x_train, y_train)

Training Accuracy

In [15]:
confusion_matrix(ABST_model.predict(x_train),y_train)
Out[15]:
array([[ 384,   14,   51,   17,   24,    1,  285,    0,  437,    2],
       [ 176, 2876,   11,  770,   59,    1,   99,    0,    4,    0],
       [2903,  885, 3767, 1315, 3095,    0, 2533,    0,  219,    1],
       [ 496,  207,   85, 1764,  319,    0,  680,    0,   15,    0],
       [   4,    3,  127,   45,  504,    0,  359,    0,    2,    0],
       [   2,    0,    0,    0,    0, 3500,    1,  901,   38,  789],
       [  14,    1,    7,   14,    9,    0,   37,    0,  112,    0],
       [   1,    0,    0,    0,    0,  195,    0, 3144,   66,  834],
       [  14,    4,    8,    4,    6,  220,   11,    8, 3052,    3],
       [   0,    0,    0,    0,    0,   15,    0,   50,    1, 2400]],
      dtype=int64)
In [16]:
ABST_model_train_acc = (ABST_model.predict(x_train) == y_train).mean()
print("TRAINING ACCURACY =",ABST_model_train_acc)
TRAINING ACCURACY = 0.5357

Validation Accuracy

In [17]:
confusion_matrix(ABST_model.predict(x_valid1),y_valid1)
Out[17]:
array([[ 82,   2,  15,   3,   7,   1,  64,   0, 100,   1],
       [ 45, 738,   1, 190,  21,   0,  16,   0,   4,   0],
       [737, 216, 872, 321, 760,   0, 652,   0,  59,   0],
       [140,  67,  23, 479,  86,   0, 150,   0,   4,   0],
       [  1,   2,  28,  13, 114,   0,  92,   0,   0,   0],
       [  0,   0,   0,   0,   0, 921,   1, 195,  12, 181],
       [  6,   0,   3,   2,   2,   0,  10,   0,  32,   0],
       [  0,   0,   0,   0,   0,  63,   0, 741,  17, 203],
       [  4,   1,   3,   4,   5,  63,   2,   4, 806,   7],
       [  0,   0,   0,   0,   0,  12,   0,   8,   0, 586]], dtype=int64)
In [18]:
ABST_model_valid1_acc = (ABST_model.predict(x_valid1) == y_valid1).mean()
print("VALIDATION ACCURACY =",ABST_model_valid1_acc)
VALIDATION ACCURACY = 0.5349
In [19]:
{"TRAIN_ACC" : ABST_model_train_acc , "VALID_ACC" : ABST_model_valid1_acc}
Out[19]:
{'TRAIN_ACC': 0.5357, 'VALID_ACC': 0.5349}


공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
TAG
more
«   2025/05   »
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
글 보관함