Tensorflow/Tensorflow for R
Multilayer Perceptron (ver.R)
딥스탯
2017. 9. 30. 11:22
DATA SET 출처¶
https://www.kaggle.com/ludobenistant/hr-analytics/data (Kaggle, Human Resources Analysis) (해당 데이터셋은 현재 사라졌습니다.)
참고자료¶
https://www.tensorflow.org (TensorFlow)
https://tensorflow.rstudio.com (TensorFlow for R)
Multilayer Perceptron (ver.R)¶
The Human Resources Analysis data set¶
In [1]:
HR_data_set <- read.csv("HR_comma_sep.csv")
In [2]:
str(HR_data_set)
In [3]:
summary(HR_data_set)
In [4]:
table(HR_data_set$number_project)
In [5]:
table(HR_data_set$time_spend_company)
In [6]:
table(HR_data_set$sales)
Handling the data set¶
In [7]:
set.seed(1)
test_obs <- sample(nrow(HR_data_set),4999)
In [8]:
training_set <- HR_data_set[-test_obs,]
testing_set <- HR_data_set[test_obs,]
In [9]:
training_y <- training_set$satisfaction_level
training_X <- model.matrix(satisfaction_level ~ .,data=training_set)
In [10]:
testing_y <- testing_set$satisfaction_level
testing_X <- model.matrix(satisfaction_level ~.,data=testing_set)
In [11]:
sd_amh <- sd(training_X[,4]) ; mean_amh <- mean(training_X[,4])
In [12]:
training_X[,4] <- (training_X[,4]-mean_amh)/sd_amh
testing_X[,4] <- (testing_X[,4]-mean_amh)/sd_amh
training_X <- training_X[,-1]
testing_X <- testing_X[,-1]
In [13]:
dim(training_X)
모형에 대한 자세한 설명은 생략하도록 하겠습니다.¶
MLP(multilayer perceptron), elu, ReLU, sigmoid, Adam
input -> [inner product -> elu] -> dropout -> [inner product -> relu] -> dropout -> [inner product -> sigmoid] -> output
Loss : squared error loss, Optimizer : Adam
In [14]:
require(tensorflow)
In [15]:
x <- tf$placeholder("float", shape(NULL, 18L))
y_ <- tf$placeholder("float", shape(NULL, 1L))
함수 정의 : weight_variable - truncated normal distribution에서 난수 발생해서 원하는 모양으로 weight tensor를 만드는 함수.
In [16]:
weight_variable <- function(shape){
initial <- tf$truncated_normal(as.integer(shape))
return(tf$Variable(initial))
}
함수 정의 : bias_variable - 원하는 모양으로 bias tensor를 만드는 함수.
In [17]:
bias_variable <- function(shape){
initial <- tf$constant(rep(1,shape))
return(tf$Variable(initial))
}
모형 설정¶
[inner product -> elu]
In [18]:
W1 <- weight_variable(c(18,2^5))
b1 <- bias_variable(2^5)
elu1 <- tf$nn$elu(tf$matmul(x, W1) + b1)
dropping out 1
In [19]:
keep_prob1 <- tf$placeholder("float")
layer1 <- tf$nn$dropout(elu1, keep_prob1)
[inner product -> relu]
In [20]:
W2 <- weight_variable(c(2^5,2^3))
b2 <- bias_variable(2^3)
relu2 <- tf$nn$relu(tf$matmul(layer1, W2) + b2)
dropping out 2
In [21]:
keep_prob2 <- tf$placeholder("float")
layer2 <- tf$nn$dropout(relu2, keep_prob2)
[inner product -> sigmoid]
In [22]:
W3 <- weight_variable(c(2^3,1))
b3 <- bias_variable(1)
sigmoid3 <- tf$nn$sigmoid(tf$matmul(layer2, W3) + b3)
Loss 와 Optimizer 설정¶
In [23]:
SSE <- tf$reduce_sum((sigmoid3 - y_)^2)
train_step <- tf$train$AdamOptimizer(1e-4)$minimize(SSE)
Session 반복 실행¶
In [24]:
sess <- tf$Session()
sess$run(tf$global_variables_initializer())
In [25]:
for(i in 0:2000){
batch_obs <- sample(10000,5000)
sess$run(train_step,feed_dict =
dict(x = training_X[batch_obs,], y_ = as.matrix(training_y[batch_obs]),
keep_prob1 = .5, keep_prob2 = .7))
if(i%%100 == 0){
train_accuracy <- sess$run(SSE, feed_dict =
dict(x = training_X, y_ = as.matrix(training_y),
keep_prob1 = 1, keep_prob2 = 1))
cat("step ", i, "training accuracy ", train_accuracy , "\n")
}
if(i%%500 == 0){
cat("test accuracy ", sess$run(SSE, feed_dict =
dict(x = testing_X, y_ = as.matrix(testing_y), keep_prob1 = 1, keep_prob2 = 1)),
"step ", i, "\n")
}
}
In [ ]: