티스토리 뷰
Real Data Analysis
FASHION MNIST with Python (DAY 1) - 1. training/validation set, 2. logistic regression, 3. decision tree
딥스탯 2018. 8. 12. 23:02FASHION MNIST with Python (DAY 1)¶
DATA SOURCE : https://www.kaggle.com/zalando-research/fashionmnist (Kaggle, Fashion MNIST)
Datasets¶
Importing numpy, pandas, pyplot¶
In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Loading datasets¶
In [2]:
data_train = pd.read_csv("..\\datasets\\fashion-mnist_train.csv")
data_test = pd.read_csv("..\\datasets\\fashion-mnist_test.csv")
In [3]:
data_train.head()
Out[3]:
In [4]:
data_train.shape
Out[4]:
In [5]:
data_test.shape
Out[5]:
In [6]:
data_train_y = data_train.label
y_test = data_test.label
In [7]:
data_train_x = data_train.drop("label",axis=1)/256
x_test = data_test.drop("label",axis=1)/256
In [8]:
plt.imshow(data_train_x.iloc[0,:].values.reshape([28,28])) ; data_train_y.iloc[0]
Out[8]:
In [9]:
plt.imshow(data_train_x.iloc[1,:].values.reshape([28,28])) ; data_train_y.iloc[1]
Out[9]:
Spliting valid and training¶
In [10]:
np.random.seed(0)
valid2_idx = np.random.choice(60000,10000,replace = False)
valid1_idx = np.random.choice(list(set(range(60000)) - set(valid2_idx)),10000,replace=False)
train_idx = list(set(range(60000))-set(valid1_idx)-set(valid2_idx))
x_train = data_train_x.iloc[train_idx,:]
y_train = data_train_y.iloc[train_idx]
x_valid1 = data_train_x.iloc[valid1_idx,:]
y_valid1 = data_train_y.iloc[valid1_idx]
x_valid2 = data_train_x.iloc[valid2_idx,:]
y_valid2 = data_train_y.iloc[valid2_idx]
Multinomial Logistic Regression¶
Importing LogisticRegression¶
In [11]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
Fitting Logistic Regression¶
In [12]:
LR_model = LogisticRegression().fit(x_train, y_train)
Training Accuracy¶
In [13]:
confusion_matrix(LR_model.predict(x_train),y_train)
Out[13]:
In [14]:
LR_model_train_acc = (LR_model.predict(x_train) == y_train).mean()
print("TRAINING ACCURACY =",LR_model_train_acc)
Validation Accuracy¶
In [15]:
confusion_matrix(LR_model.predict(x_valid1),y_valid1)
Out[15]:
In [16]:
LR_model_valid1_acc = (LR_model.predict(x_valid1) == y_valid1).mean()
print("VALIDATION ACCURACY =",LR_model_valid1_acc)
In [17]:
{"TRAIN_ACC" : LR_model_train_acc , "VALID_ACC" : LR_model_valid1_acc}
Out[17]:
Decision Tree¶
Importing DecisionTreeClassifier¶
In [18]:
from sklearn.tree import DecisionTreeClassifier
Huge Tree¶
In [19]:
TR_model1 = DecisionTreeClassifier().fit(x_train, y_train)
Training Accuracy1¶
In [20]:
confusion_matrix(TR_model1.predict(x_train),y_train)
Out[20]:
In [21]:
TR_model1_train_acc = (TR_model1.predict(x_train) == y_train).mean()
print("TRAINING ACCURACY =",TR_model1_train_acc)
Validation Accuracy1¶
In [22]:
confusion_matrix(TR_model1.predict(x_valid1),y_valid1)
Out[22]:
In [23]:
TR_model1_valid1_acc = (TR_model1.predict(x_valid1) == y_valid1).mean()
print("VALIDATION ACCURACY =",TR_model1_valid1_acc)
In [24]:
{"TRAIN_ACC" : TR_model1_train_acc , "VALID_ACC" : TR_model1_valid1_acc}
Out[24]:
Smaller Tree (to avoid overfitting)¶
In [25]:
TR_model2 = DecisionTreeClassifier(min_samples_leaf = 5, max_depth = 12).fit(x_train, y_train)
Training Accuracy1¶
In [26]:
confusion_matrix(TR_model2.predict(x_train),y_train)
Out[26]:
In [27]:
TR_model2_train_acc = (TR_model2.predict(x_train) == y_train).mean()
print("TRAINING ACCURACY =",TR_model2_train_acc)
Validation Accuracy1¶
In [28]:
confusion_matrix(TR_model2.predict(x_valid1),y_valid1)
Out[28]:
In [29]:
TR_model2_valid1_acc = (TR_model2.predict(x_valid1) == y_valid1).mean()
print("VALIDATION ACCURACY =",TR_model2_valid1_acc)
In [30]:
{"TRAIN_ACC" : TR_model2_train_acc , "VALID_ACC" : TR_model2_valid1_acc}
Out[30]:
'Real Data Analysis' 카테고리의 다른 글
FASHION MNIST with Python (DAY 6) - 1. lda, 2. qda (0) | 2018.08.20 |
---|---|
FASHION MNIST with Python (DAY 5) - knn (0) | 2018.08.19 |
FASHION MNIST with Python (DAY 4) - support vector machine (0) | 2018.08.18 |
FASHION MNIST with Python (DAY 3) - 1. gradient boosting, 2. adaboosting (0) | 2018.08.16 |
FASHION MNIST with Python (DAY 2) - 1. bagging, 2. random forest (0) | 2018.08.13 |