티스토리 뷰

FASHION_MNIST_DAY6_with_Python

FASHION MNIST with Python (DAY 6)

DATA SOURCE : https://www.kaggle.com/zalando-research/fashionmnist (Kaggle, Fashion MNIST)

FASHION MNIST with Python (DAY 1) : http://deepstat.tistory.com/35

FASHION MNIST with Python (DAY 2) : http://deepstat.tistory.com/36

FASHION MNIST with Python (DAY 3) : http://deepstat.tistory.com/37

FASHION MNIST with Python (DAY 4) : http://deepstat.tistory.com/38

FASHION MNIST with Python (DAY 5) : http://deepstat.tistory.com/39

Datasets

Importing numpy, pandas, pyplot

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Loading datasets

In [2]:
data_train = pd.read_csv("..\\datasets\\fashion-mnist_train.csv")
data_test = pd.read_csv("..\\datasets\\fashion-mnist_test.csv")
In [3]:
data_train_y = data_train.label
y_test = data_test.label
In [4]:
data_train_x = data_train.drop("label",axis=1)/256
x_test = data_test.drop("label",axis=1)/256

Spliting valid and training

In [5]:
np.random.seed(0)
valid2_idx = np.random.choice(60000,10000,replace = False)
valid1_idx = np.random.choice(list(set(range(60000)) - set(valid2_idx)),10000,replace=False)
train_idx = list(set(range(60000))-set(valid1_idx)-set(valid2_idx))

x_train = data_train_x.iloc[train_idx,:]
y_train = data_train_y.iloc[train_idx]

x_valid1 = data_train_x.iloc[valid1_idx,:]
y_valid1 = data_train_y.iloc[valid1_idx]

x_valid2 = data_train_x.iloc[valid2_idx,:]
y_valid2 = data_train_y.iloc[valid2_idx]

Linear Discriminant Analysis (LDA)

Importing LinearDiscriminantAnalysis

In [6]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.metrics import confusion_matrix

Fitting LinearDiscriminantAnalysis

In [7]:
LDA_model = LinearDiscriminantAnalysis().fit(x_train, y_train)
c:\users\stat413server1\appdata\local\programs\python\python36\lib\site-packages\sklearn\discriminant_analysis.py:442: UserWarning: The priors do not sum to 1. Renormalizing
  UserWarning)

Training Accuracy

In [8]:
confusion_matrix(LDA_model.predict(x_train),y_train)
Out[8]:
array([[3131,    8,   68,   92,    4,    0,  510,    0,    6,    0],
       [   1, 3741,    0,    8,    4,    0,    4,    0,    0,    0],
       [  61,   27, 2938,   51,  325,    0,  448,    0,   33,    1],
       [ 258,  171,   27, 3406,  118,    0,  150,    0,   43,    0],
       [  23,   10,  600,  109, 3124,    0,  385,    0,   22,    0],
       [  12,    2,    7,    4,    0, 3580,    9,  262,   56,  134],
       [ 467,   27,  403,  251,  437,    4, 2437,    0,   99,    1],
       [   0,    0,    0,    0,    0,  238,    1, 3634,   14,  184],
       [  40,    4,   13,    8,    4,   25,   61,   12, 3671,    0],
       [   1,    0,    0,    0,    0,   85,    0,  195,    2, 3709]],
      dtype=int64)
In [9]:
LDA_model_train_acc = (LDA_model.predict(x_train) == y_train).mean()
print("TRAINING ACCURACY =",LDA_model_train_acc)
TRAINING ACCURACY = 0.834275

Validation Accuracy

In [10]:
confusion_matrix(LDA_model.predict(x_valid1),y_valid1)
Out[10]:
array([[795,   4,  17,  32,   2,   0, 143,   0,   3,   0],
       [  1, 953,   0,   1,   1,   0,   0,   0,   0,   0],
       [ 12,  11, 666,  10,  90,   0, 130,   0,   6,   0],
       [ 72,  42,   8, 879,  40,   0,  37,   0,  10,   0],
       [  3,   4, 141,  22, 728,   0, 104,   0,   8,   0],
       [  0,   1,   2,   4,   0, 935,   4,  56,  20,  32],
       [122,  11, 110,  62, 126,   3, 551,   1,  33,   0],
       [  0,   0,   0,   0,   0,  74,   1, 834,   4,  48],
       [ 10,   0,   1,   2,   8,  15,  17,   4, 950,   1],
       [  0,   0,   0,   0,   0,  33,   0,  53,   0, 897]], dtype=int64)
In [11]:
LDA_model_valid1_acc = (LDA_model.predict(x_valid1) == y_valid1).mean()
print("VALIDATION ACCURACY =",LDA_model_valid1_acc)
VALIDATION ACCURACY = 0.8188
In [12]:
{"TRAIN_ACC" : LDA_model_train_acc , "VALID_ACC" : LDA_model_valid1_acc}
Out[12]:
{'TRAIN_ACC': 0.834275, 'VALID_ACC': 0.8188}

Fitting LinearDiscriminantAnalysis with shrinkage and solver 'lsqr'

In [13]:
LDA_model_with_shrinkage_lsqr = LinearDiscriminantAnalysis(solver='lsqr',shrinkage="auto").fit(x_train, y_train)
c:\users\stat413server1\appdata\local\programs\python\python36\lib\site-packages\sklearn\discriminant_analysis.py:442: UserWarning: The priors do not sum to 1. Renormalizing
  UserWarning)

Training Accuracy

In [14]:
confusion_matrix(LDA_model_with_shrinkage_lsqr.predict(x_train),y_train)
Out[14]:
array([[3126,    9,   64,  108,    4,    0,  528,    0,    6,    0],
       [   1, 3704,    0,    7,    3,    0,    2,    0,    0,    0],
       [  60,   40, 2911,   54,  325,    0,  445,    0,   28,    1],
       [ 251,  192,   26, 3352,  117,    0,  144,    0,   42,    0],
       [  22,   12,  615,  121, 3111,    0,  389,    0,   19,    0],
       [  14,    2,    8,    1,    0, 3566,    7,  285,   58,  140],
       [ 479,   27,  417,  276,  450,    5, 2421,    0,  111,    1],
       [   0,    0,    0,    0,    0,  236,    1, 3571,   13,  178],
       [  40,    4,   15,   10,    6,   27,   68,   12, 3668,    0],
       [   1,    0,    0,    0,    0,   98,    0,  235,    1, 3709]],
      dtype=int64)
In [15]:
LDA_model_with_shrinkage_lsqr_train_acc = (LDA_model_with_shrinkage_lsqr.predict(x_train) == y_train).mean()
print("TRAINING ACCURACY =",LDA_model_with_shrinkage_lsqr_train_acc)
TRAINING ACCURACY = 0.828475

Validation Accuracy

In [16]:
confusion_matrix(LDA_model_with_shrinkage_lsqr.predict(x_valid1),y_valid1)
Out[16]:
array([[794,   5,  16,  35,   2,   0, 144,   0,   2,   0],
       [  1, 947,   0,   2,   1,   0,   0,   0,   0,   0],
       [ 12,  13, 661,  13,  89,   0, 129,   0,   5,   0],
       [ 67,  45,   7, 874,  38,   0,  36,   0,  11,   0],
       [  5,   3, 142,  26, 733,   0, 108,   0,   7,   0],
       [  0,   1,   2,   2,   0, 942,   2,  59,  20,  34],
       [126,  12, 116,  58, 125,   3, 549,   1,  33,   0],
       [  0,   0,   0,   0,   0,  65,   1, 822,   3,  44],
       [ 10,   0,   1,   2,   7,  16,  18,   4, 953,   1],
       [  0,   0,   0,   0,   0,  34,   0,  62,   0, 899]], dtype=int64)
In [17]:
LDA_model_with_shrinkage_lsqr_valid1_acc = (LDA_model_with_shrinkage_lsqr.predict(x_valid1) == y_valid1).mean()
print("VALIDATION ACCURACY =",LDA_model_with_shrinkage_lsqr_valid1_acc)
VALIDATION ACCURACY = 0.8174
In [18]:
{"TRAIN_ACC" : LDA_model_with_shrinkage_lsqr_train_acc , "VALID_ACC" : LDA_model_with_shrinkage_lsqr_valid1_acc}
Out[18]:
{'TRAIN_ACC': 0.828475, 'VALID_ACC': 0.8174}

Fitting LinearDiscriminantAnalysis with shrinkage and solver 'eigen'

In [19]:
LDA_model_with_shrinkage_eigen = LinearDiscriminantAnalysis(solver='eigen',shrinkage="auto").fit(x_train, y_train)
c:\users\stat413server1\appdata\local\programs\python\python36\lib\site-packages\sklearn\discriminant_analysis.py:442: UserWarning: The priors do not sum to 1. Renormalizing
  UserWarning)

Training Accuracy

In [20]:
confusion_matrix(LDA_model_with_shrinkage_eigen.predict(x_train),y_train)
Out[20]:
array([[3134,    7,   79,  112,    3,    0,  610,    0,   11,    0],
       [   1, 3706,    0,    5,    2,    0,    2,    0,    0,    0],
       [  58,   47, 2930,   47,  358,    0,  485,    0,   28,    1],
       [ 237,  177,   24, 3310,  114,    0,  125,    0,   42,    0],
       [  15,   11,  584,  106, 3124,    0,  468,    0,   16,    0],
       [  11,    0,    7,    2,    0, 3538,    5,  311,   48,  149],
       [ 499,   37,  417,  337,  411,    4, 2245,    0,  110,    1],
       [   0,    0,    0,    0,    0,  258,    1, 3551,   12,  181],
       [  39,    5,   14,   10,    4,   30,   63,   11, 3678,    0],
       [   0,    0,    1,    0,    0,  102,    1,  230,    1, 3697]],
      dtype=int64)
In [21]:
LDA_model_with_shrinkage_eigen_train_acc = (LDA_model_with_shrinkage_eigen.predict(x_train) == y_train).mean()
print("TRAINING ACCURACY =",LDA_model_with_shrinkage_eigen_train_acc)
TRAINING ACCURACY = 0.822825

Validation Accuracy

In [22]:
confusion_matrix(LDA_model_with_shrinkage_eigen.predict(x_valid1),y_valid1)
Out[22]:
array([[802,   2,  22,  40,   2,   0, 160,   0,   3,   0],
       [  1, 950,   0,   2,   1,   0,   0,   0,   0,   0],
       [ 12,  12, 669,  12,  94,   0, 134,   0,   3,   0],
       [ 65,  42,   7, 864,  37,   0,  33,   0,   9,   0],
       [  0,   3, 139,  20, 739,   0, 135,   0,   4,   0],
       [  0,   1,   2,   2,   0, 928,   2,  60,  15,  37],
       [127,  16, 105,  70, 116,   2, 506,   1,  44,   0],
       [  0,   0,   0,   0,   0,  77,   0, 821,   3,  46],
       [  8,   0,   1,   2,   6,  14,  17,   4, 953,   1],
       [  0,   0,   0,   0,   0,  39,   0,  62,   0, 894]], dtype=int64)
In [23]:
LDA_model_with_shrinkage_eigen_valid1_acc = (LDA_model_with_shrinkage_eigen.predict(x_valid1) == y_valid1).mean()
print("VALIDATION ACCURACY =",LDA_model_with_shrinkage_eigen_valid1_acc)
VALIDATION ACCURACY = 0.8126
In [24]:
{"TRAIN_ACC" : LDA_model_with_shrinkage_eigen_train_acc , "VALID_ACC" : LDA_model_with_shrinkage_eigen_valid1_acc}
Out[24]:
{'TRAIN_ACC': 0.822825, 'VALID_ACC': 0.8126}

Quadratic Discriminant Analysis (QDA)

Importing QuadraticDiscriminantAnalysis

In [25]:
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.metrics import confusion_matrix

Fitting QDA

In [26]:
QDA_model = QuadraticDiscriminantAnalysis().fit(x_train, y_train)
c:\users\stat413server1\appdata\local\programs\python\python36\lib\site-packages\sklearn\discriminant_analysis.py:682: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")

Training Accuracy

In [27]:
confusion_matrix(QDA_model.predict(x_train),y_train)
Out[27]:
array([[2639,    0,    6,    1,    1,    0,  325,    0,    1,    0],
       [ 200, 3990,   11, 1637,   70,    0,   78,    0,   58,    0],
       [  28,    0,  980,    0,    9,    0,   69,    0,    0,    0],
       [ 668,    0, 1181, 2290, 1571,    6, 1236,    0,  429,    4],
       [ 304,    0, 1777,    0, 2344,    0, 1280,    0,  186,    0],
       [   0,    0,    0,    0,    0,  894,    0,    0,    9,    1],
       [ 108,    0,   57,    0,   18,    2,  945,    0,    9,    3],
       [   0,    0,    0,    0,    0, 2415,    0, 4098,    9, 1016],
       [  47,    0,   44,    1,    3,    9,   72,    0, 3241,    0],
       [   0,    0,    0,    0,    0,  606,    0,    5,    4, 3005]],
      dtype=int64)
In [28]:
QDA_model_train_acc = (QDA_model.predict(x_train) == y_train).mean()
print("TRAINING ACCURACY =",QDA_model_train_acc)
TRAINING ACCURACY = 0.61065

Validation Accuracy

In [29]:
confusion_matrix(QDA_model.predict(x_valid1),y_valid1)
Out[29]:
array([[601,   0,   1,   8,   1,   0,  92,   0,   1,   0],
       [ 63, 974,   3, 478,  17,   0,  17,   0,  11,   0],
       [  7,   0, 120,   1,  19,   0,  27,   0,   1,   0],
       [175,  39, 283, 499, 406,   2, 310,   0, 111,   1],
       [ 73,   0, 441,  10, 519,   0, 339,   0,  61,   0],
       [  0,   0,   0,   0,   0, 143,   0,   6,   1,   3],
       [ 59,  12,  78,  12,  23,   1, 138,   0,   6,   0],
       [  0,   0,   0,   0,   0, 660,   0, 919,   2, 249],
       [ 37,   1,  19,   4,  10,  22,  64,   5, 840,   9],
       [  0,   0,   0,   0,   0, 232,   0,  18,   0, 716]], dtype=int64)
In [30]:
QDA_model_valid1_acc = (QDA_model.predict(x_valid1) == y_valid1).mean()
print("VALIDATION ACCURACY =",QDA_model_valid1_acc)
VALIDATION ACCURACY = 0.5469
In [31]:
{"TRAIN_ACC" : QDA_model_train_acc , "VALID_ACC" : QDA_model_valid1_acc}
Out[31]:
{'TRAIN_ACC': 0.61065, 'VALID_ACC': 0.5469}


공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
TAG
more
«   2025/05   »
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
글 보관함