RHADOOP MAPREDUCE -1. MAP (17/09/21 Lecture Note)

Rhadoop

RHADOOP MAPREDUCE -1. MAP (17/09/21 Lecture Note)

딥스탯 2017. 11. 28. 22:55

9월 21일 목요일

목표 : mapreduce 중 map 이해하기.

RHADOOP EXAMPLE 1

# 1부터 1000까지의 숫자들을 생성, 각 숫자들을 모두 제곱하는 연산 수행하는 Rhadoop Code.

> require(rhdfs)

> hdfs.init()

> require(rmr2)

> small_ints <- to.dfs(1:100)

> result <- mapreduce(input = small_ints,

map = function(k,v){

keyval(v,v^2)

}

)

> out <- from.dfs(result)

> out

EXERCISES 1

0부터 9까지의 숫자들을 생성, 각 숫자들의 factorial 연산.
1부터 20까지의 숫자들을 생성, 각 숫자들을 5배 하는 연산.
0부터 10까지의 숫자들을 생성, 2^0, 2^1, 2^2, … , 2^10 연산.
2부터 6까지의 숫자들을 생성, 64의 제곱근, 세제곱근, … 6제곱근 연산.

RHADOOP EXAMPLE 2

# Uniform 분포에서 1000개씩 난수를 발생 mean 계산하는 과정을 1000번 반복, 1000개의 값으로 히스토그램 그리기

> require(rhdfs)

> hdfs.init()

> require(rmr2)

> small_ints <- to.dfs(1:1000)

> result <- mapreduce(input = small_ints,

map = function(k,v){

key <- v

value <- sapply(v , function(x){

a <- runif(1000)

return(mean(a))}

)

keyval(key, value)

}

)

> out <- from.dfs(result)

> hist(out$val)

EXERCISES 2

난수를 평균 0, 분산 1인 Normal 분포에서 생성.
난수 생성을 평균 2인 포아송 분포에서, mean 대신 min 계산.
난수를 자유도 1인 t 분포에서 생성, mean 대신 median 계산.

2017/09/28 SOLUTIONS

require(rhdfs)

## Loading required package: rhdfs

## Loading required package: rJava

## 
## HADOOP_CMD=/home/stat/hadoop/hadoop-2.7.4/bin/hadoop

## 
## Be sure to run hdfs.init()

hdfs.init()
require(rmr2)

## Loading required package: rmr2

## Please review your hadoop settings. See help(hadoop.settings)

Exercises 1.1

small.ints<-to.dfs(0:9)

cal_<-mapreduce(
  input=small.ints,
  map=function(k,v) keyval(v,factorial(v))
  )

out<-from.dfs(cal_)
out

## $key
##  [1] 0 1 2 3 4 5 6 7 8 9
## 
## $val
##  [1]      1      1      2      6     24    120    720   5040  40320 362880

Exercises 1.2

small.ints<-to.dfs(1:20)

cal_<-mapreduce(
  input=small.ints,
  map=function(k,v) keyval(v,5*v)
  )

out<-from.dfs(cal_)
out

## $key
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
## 
## $val
##  [1]   5  10  15  20  25  30  35  40  45  50  55  60  65  70  75  80  85
## [18]  90  95 100

Exercises 1.3

small.ints<-to.dfs(0:10)

cal_<-mapreduce(
  input=small.ints,
  map=function(k,v) keyval(v,2^v)
  )

out<-from.dfs(cal_)
out

## $key
##  [1]  0  1  2  3  4  5  6  7  8  9 10
## 
## $val
##  [1]    1    2    4    8   16   32   64  128  256  512 1024

Exercises 1.4

small.ints<-to.dfs(2:6)

cal_<-mapreduce(
  input=small.ints,
  map=function(k,v) keyval(v,64^(1/v))
  )

out<-from.dfs(cal_)
out

## $key
## [1] 2 3 4 5 6
## 
## $val
## [1] 8.000000 4.000000 2.828427 2.297397 2.000000

Exercises 2.1

small.ints<-to.dfs(1:1000)

cal_<-mapreduce(
  input=small.ints,
  map=function(k,v){
    key <- v
    value <- sapply(v,function(x){
      a <- rnorm(1000)
      return(mean(a))}
    )
    keyval(key,value)
  }
)

out<-from.dfs(cal_)
hist(out$val)

Exercises 2.2

small.ints<-to.dfs(1:1000)

cal_<-mapreduce(
  input=small.ints,
  map=function(k,v){
    key <- v
    value <- sapply(v,function(x){
      a <- rpois(1000,2)
      return(min(a))}
    )
    keyval(key,value)
  }
)

out<-from.dfs(cal_)
hist(out$val)

Exercises 2.3

small.ints<-to.dfs(1:1000)

cal_<-mapreduce(
  input=small.ints,
  map=function(k,v){
    key <- v
    value <- sapply(v,function(x){
      a <- rt(1000,1)
      return(median(a))}
    )
    keyval(key,value)
  }
)

out<-from.dfs(cal_)
hist(out$val)

저작자표시 비영리 변경금지 (새창열림)