Real Data Analysis
FASHION MNIST with Python (DAY 3) - 1. gradient boosting, 2. adaboosting
딥스탯
2018. 8. 16. 15:56
FASHION MNIST with Python (DAY 3)¶
DATA SOURCE : https://www.kaggle.com/zalando-research/fashionmnist (Kaggle, Fashion MNIST)
FASHION MNIST with Python (DAY 1) : http://deepstat.tistory.com/35
FASHION MNIST with Python (DAY 2) : http://deepstat.tistory.com/36
Datasets¶
Importing numpy, pandas, pyplot¶
In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Loading datasets¶
In [2]:
data_train = pd.read_csv("..\\datasets\\fashion-mnist_train.csv")
data_test = pd.read_csv("..\\datasets\\fashion-mnist_test.csv")
In [3]:
data_train_y = data_train.label
y_test = data_test.label
In [4]:
data_train_x = data_train.drop("label",axis=1)/256
x_test = data_test.drop("label",axis=1)/256
Spliting valid and training¶
In [5]:
np.random.seed(0)
valid2_idx = np.random.choice(60000,10000,replace = False)
valid1_idx = np.random.choice(list(set(range(60000)) - set(valid2_idx)),10000,replace=False)
train_idx = list(set(range(60000))-set(valid1_idx)-set(valid2_idx))
x_train = data_train_x.iloc[train_idx,:]
y_train = data_train_y.iloc[train_idx]
x_valid1 = data_train_x.iloc[valid1_idx,:]
y_valid1 = data_train_y.iloc[valid1_idx]
x_valid2 = data_train_x.iloc[valid2_idx,:]
y_valid2 = data_train_y.iloc[valid2_idx]
Gradient Boosting¶
Importing GradientBoostingClassifier¶
In [6]:
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import confusion_matrix
Fitting Gradient Boosting¶
In [7]:
GBST_model = GradientBoostingClassifier(n_estimators=1000,learning_rate=0.5).fit(x_train, y_train)
Training Accuracy¶
In [8]:
confusion_matrix(GBST_model.predict(x_train),y_train)
Out[8]:
In [9]:
GBST_model_train_acc = (GBST_model.predict(x_train) == y_train).mean()
print("TRAINING ACCURACY =",GBST_model_train_acc)
Validation Accuracy¶
In [10]:
confusion_matrix(GBST_model.predict(x_valid1),y_valid1)
Out[10]:
In [11]:
GBST_model_valid1_acc = (GBST_model.predict(x_valid1) == y_valid1).mean()
print("VALIDATION ACCURACY =",GBST_model_valid1_acc)
In [12]:
{"TRAIN_ACC" : GBST_model_train_acc , "VALID_ACC" : GBST_model_valid1_acc}
Out[12]:
AdaBoost¶
Importing AdaBoostClassifier¶
In [13]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import confusion_matrix
Fitting AdaBoost¶
In [14]:
ABST_model = AdaBoostClassifier(n_estimators=1000,learning_rate=0.5).fit(x_train, y_train)
Training Accuracy¶
In [15]:
confusion_matrix(ABST_model.predict(x_train),y_train)
Out[15]:
In [16]:
ABST_model_train_acc = (ABST_model.predict(x_train) == y_train).mean()
print("TRAINING ACCURACY =",ABST_model_train_acc)
Validation Accuracy¶
In [17]:
confusion_matrix(ABST_model.predict(x_valid1),y_valid1)
Out[17]:
In [18]:
ABST_model_valid1_acc = (ABST_model.predict(x_valid1) == y_valid1).mean()
print("VALIDATION ACCURACY =",ABST_model_valid1_acc)
In [19]:
{"TRAIN_ACC" : ABST_model_train_acc , "VALID_ACC" : ABST_model_valid1_acc}
Out[19]: