티스토리 뷰
10 12 Lecture Note
Choi Youngtae
Hadoop Streaming
Hadoop Streaming은 개념적으로 data가 어디서 와서 어떻게 흘러가서 어디에 모이게 할 것인지를 정의해주는 것이다.
Hadoop Streaming안에서 data는 우리가 정의해주는 특정한 mapper와 reducer를 따라가면서 우리가 필요한 형태로 가공되게 되고, 이를 우리가 원하는 위치에 저장하게 된다.
mapper와 reducer는 C, C++, Python, java, R 등등의 다양한 프로그래밍 언어로 작성될 수 있으며, 다른 프로그래밍 언어로 작성되더라도 사용방법이 크게 다르지 않다.
우리는 R로 작성된 mapper와 reducer를 가지고 실습하게 된다. 우리가 원하는 특정 작업을 하기 위해서 mapper와 reducer를 어떻게 작성해야 하는 지에 대해서는 자세하게 배우지 않는다.
Hadoop Streaming using R
출처 : CRAN - Package HadoopStreaming - R Project
https://cran.r-project.org/web/packages/HadoopStreaming/index.html
require(HadoopStreaming)
## Loading required package: HadoopStreaming
## Loading required package: getopt
lp <- .libPaths()[1]
directory <- paste(lp,"HadoopStreaming","wordCntDemo",sep="/")
# Make sure hsWordCnt.R is executable
cat(system(paste("chmod +x ",
directory,"/hsWordCnt.R"
,sep=""),intern = TRUE),sep = '\n')
# Take a look at the file we're going to be word counting
cat(system(paste("head ",
directory,"/anna.txt",
sep=""),intern = TRUE),sep = '\n')
## Happy families are all alike; every unhappy family is unhappy in
## its own way.
##
## Everything was in confusion in the Oblonskys' house. The wife
## had discovered that the husband was carrying on an intrigue with
## a French girl, who had been a governess in their family, and she
## had announced to her husband that she could not go on living in
## the same house with him. This position of affairs had now lasted
## three days, and not only the husband and wife themselves, but all
## the members of their family and household, were painfully
# Run the mapper on the first 5 lines of anna.txt:
cat(system(paste("head -n 5 ",
directory,"/anna.txt | ",
directory,"/hsWordCnt.R --mapper",
sep=""),intern = TRUE),sep = '\n')
## Happy 1
## families 1
## are 1
## all 1
## alike 1
## every 1
## unhappy 1
## family 1
## is 1
## unhappy 1
## in 1
## its 1
## own 1
## way 1
## Everything 1
## was 1
## in 1
## confusion 1
## in 1
## the 1
## Oblonskys 1
## house 1
## The 1
## wife 1
## had 1
## discovered 1
## that 1
## the 1
## husband 1
## was 1
## carrying 1
## on 1
## an 1
## intrigue 1
## with 1
# Run the mapper and reducer on the whole file:
cat(system(paste("cat ",
directory,"/anna.txt | ",
directory,"/hsWordCnt.R --mapper | sort | ",
directory,"/hsWordCnt.R --reducer",
sep=""),intern = TRUE),sep = '\n')
## Ah 1
## Alabin 2
## America 1
## American 1
## And 6
## Arkadyevitch 5
## But 1
## Catching 1
## Darmstadt 3
## Dolly 2
## English 1
## Every 1
## Everything 1
## French 1
## Happy 1
## He 2
## I 1
## Il 2
## Instead 1
## It 1
## Most 1
## Now 1
## Oblonsky 1
## Oblonskys 2
## Oh 1
## Oo 1
## Prince 1
## She 1
## Since 1
## Stepan 5
## Stiva 1
## That 1
## The 3
## There 2
## This 2
## Three 1
## To 1
## What 2
## Yes 5
## a 13
## about 2
## action 1
## acutely 1
## adapting 1
## affairs 1
## after 1
## again 2
## ah 2
## alike 1
## all 10
## always 1
## an 2
## and 26
## announced 1
## annoyed 1
## another 1
## answer 1
## any 1
## anything 1
## are 2
## as 8
## asked 1
## asking 1
## assumed 1
## at 13
## awake 1
## awful 1
## be 3
## bedroom 3
## been 3
## before 1
## begging 1
## being 1
## beside 1
## better 2
## birthday 1
## blame 2
## broke 1
## brought 1
## brows 1
## buried 1
## but 7
## by 4
## called 1
## can 1
## cared 1
## carrying 1
## case 1
## caught 1
## caused 1
## chance 1
## characteristic 1
## cheerfully 1
## children 1
## clock 1
## coachman 1
## colored 1
## coming 1
## common 1
## confusion 1
## conscious 1
## considered 1
## cook 1
## could 2
## covered 1
## cruel 1
## curtains 1
## day 2
## days 3
## deal 1
## decanters 1
## defending 1
## delightful 1
## denying 1
## despair 3
## detail 1
## details 1
## did 3
## dinner 3
## discovered 1
## discovery 1
## disgraceful 1
## do 1
## does 1
## done 3
## drawing 1
## dream 1
## dressing 1
## dropped 1
## edge 1
## eight 1
## either 1
## embraced 1
## even 2
## every 3
## everything 2
## expressing 1
## expression 1
## eyes 2
## face 4
## fact 1
## families 1
## family 4
## fashionable 1
## fault 4
## feet 1
## felt 2
## first 1
## flood 1
## fond 1
## for 9
## forever 1
## forgive 3
## forgiveness 1
## found 3
## friend 1
## from 2
## fussing 1
## gaily 1
## getting 1
## girl 1
## given 1
## giving 2
## glass 1
## gleam 1
## go 1
## going 1
## gold 1
## good 2
## governess 2
## gown 1
## great 1
## habitual 1
## had 14
## hand 4
## happen 1
## happened 2
## happy 1
## have 1
## he 24
## heat 1
## her 12
## him 5
## himself 3
## his 33
## home 1
## hopelessness 1
## horror 1
## hour 1
## house 4
## household 3
## housekeeper 1
## how 2
## huge 1
## humored 2
## hung 1
## hurt 1
## husband 5
## ideas 1
## idiotic 3
## imagination 1
## in 33
## indifferent 1
## indignation 1
## inn 1
## instant 1
## instead 1
## into 3
## intrigue 1
## involuntarily 2
## is 4
## it 10
## its 2
## itself 1
## jumped 1
## just 1
## kept 1
## kitchen 1
## knitted 1
## last 3
## lasted 1
## leather 1
## leave 1
## letter 3
## light 1
## limited 1
## little 1
## living 2
## long 1
## look 1
## looking 1
## m 1
## maid 1
## man 1
## me 2
## members 2
## met 1
## minute 1
## mio 2
## more 2
## morning 1
## morocco 1
## most 1
## much 1
## muttered 1
## my 2
## new 1
## nice 2
## nine 1
## no 4
## not 14
## noticing 1
## now 2
## o 1
## of 22
## off 1
## often 1
## oh 2
## on 11
## once 1
## one 3
## only 2
## opened 1
## or 1
## other 1
## out 4
## over 5
## own 3
## pain 1
## painful 1
## painfully 1
## pear 1
## peeping 1
## people 2
## perfectly 1
## person 2
## physical 1
## physiology 1
## pillow 1
## place 1
## placed 1
## point 1
## pointing 1
## pondered 1
## position 3
## present 2
## putting 1
## quarrel 3
## quarreled 1
## ran 1
## recalling 1
## recollection 1
## reflected 2
## reflex 1
## refused 1
## remaining 1
## remembered 3
## repeating 1
## revealed 1
## room 4
## rushed 1
## s 13
## said 1
## same 1
## sang 1
## sat 1
## saw 1
## see 1
## sensations 1
## sense 1
## serge 1
## she 6
## shuddered 1
## side 1
## sight 1
## sink 1
## sitting 1
## situation 2
## sleep 1
## sleeping 1
## slippers 1
## smile 6
## so 2
## sofa 4
## some 1
## something 3
## sort 1
## spinal 1
## springy 1
## still 1
## stout 1
## stray 1
## stretched 1
## study 3
## succeed 1
## suddenly 1
## sure 1
## surprise 1
## t 2
## table 1
## tables 2
## tesoro 2
## than 2
## that 14
## the 52
## theater 1
## their 3
## them 1
## themselves 1
## then 2
## there 3
## therefore 1
## thereupon 1
## they 3
## thing 1
## this 4
## though 4
## thought 2
## thoughts 1
## three 2
## time 1
## to 15
## together 2
## too 1
## towards 2
## turned 1
## twinkled 1
## unexpectedly 1
## unhappy 2
## unlucky 1
## unpleasant 1
## up 4
## usual 1
## utterly 2
## vanished 1
## very 2
## vigorously 1
## walked 1
## warning 1
## was 19
## way 2
## well 1
## were 3
## what 3
## when 2
## where 1
## which 2
## who 2
## whole 1
## why 1
## wife 11
## wild 1
## with 12
## without 1
## woke 1
## women 1
## won 1
## words 3
## worked 1
## world 1
## worrying 1
## worst 1
## would 2
## wrote 1
## years 1
## yes 1
# Run the mapper and reducer, and put headers on final output:
cat(system(paste("head -n 5 ",
directory,"/anna.txt | ",
directory,"/hsWordCnt.R --mapper | sort | ",
directory,"/hsWordCnt.R --reducer --reducecols",
sep=""),intern = TRUE),sep = '\n')
## word cnt
## Everything 1
## Happy 1
## Oblonskys 1
## The 1
## alike 1
## all 1
## an 1
## are 1
## carrying 1
## confusion 1
## discovered 1
## every 1
## families 1
## family 1
## had 1
## house 1
## husband 1
## in 3
## intrigue 1
## is 1
## its 1
## on 1
## own 1
## that 1
## the 2
## unhappy 2
## was 2
## way 1
## wife 1
## with 1
문법이 일반적인 R 문법이 아니고, 기본적으로 Linux Command 문법을 따른다. 이를 system이라는 R함수를 이용하여 Linux Command 창에서 실행한 것과 같은 결과를 얻을 수 있다.
HadoopStreaming Package in R
require(HadoopStreaming)
hsTableReader
HadoopStreaming Table Reader.
말 그대로 table을 읽어내는 함수이다.
str <- "key1\t1.91\nkey1\t2.1\nkey1\t20.2\nkey1\t3.2\nkey2\t1.2\nkey2\t10\nkey3\t2.5\nkey3\t2.1\nkey4\t1.2\n"
cat(str)
## key1 1.91
## key1 2.1
## key1 20.2
## key1 3.2
## key2 1.2
## key2 10
## key3 2.5
## key3 2.1
## key4 1.2
cols <- list(key=' ', val=0)
con <- textConnection(str, open = "r")
hsTableReader(con, cols, chunkSize=3, FUN=print, ignoreKey=TRUE)
## key val
## 1 key1 1.91
## 2 key1 2.10
## 3 key1 20.20
## key val
## 1 key1 3.2
## 2 key2 1.2
## 3 key2 10.0
## key val
## 1 key3 2.5
## 2 key3 2.1
## 3 key4 1.2
con <- textConnection(str, open = "r")
hsTableReader(con, cols, chunkSize=-1, FUN=print, ignoreKey=TRUE)
## key val
## 1 key1 1.91
## 2 key1 2.10
## 3 key1 20.20
## 4 key1 3.20
## 5 key2 1.20
## 6 key2 10.00
## 7 key3 2.50
## 8 key3 2.10
## 9 key4 1.20
chunkSize는 몇 개를 기준으로 나눌 것인가에 대한 parameter이다.
FUN = print 로 바로 출력하도록 했다.
hsKeyValReader
HadoopStreaming Key Value Reader
말 그대로 Key와 Value를 읽어내는 함수이다.
printFn <- function(k,v){
cat('A chunk: \n')
cat(paste(k, v, sep=': '), sep = '\n')
}
str <- "key1\t1.91\nkey1\t2.1\nkey1\t20.2\nkey1\t3.2\nkey2\t1.2\nkey2\t10\nkey3\t2.5\nkey3\t2.1\nkey4\t1.2\n"
con <- textConnection(str, open = "r")
hsKeyValReader(con, chunkSize = 1, FUN = printFn)
## A chunk:
## key1: 1.91
## A chunk:
## key1: 2.1
## A chunk:
## key1: 20.2
## A chunk:
## key1: 3.2
## A chunk:
## key2: 1.2
## A chunk:
## key2: 10
## A chunk:
## key3: 2.5
## A chunk:
## key3: 2.1
## A chunk:
## key4: 1.2
hsLineReader
HadoopStreaming Line Reader
말 그대로 Line을 읽어내는 함수이다.
str <- "This is HadoopStreaming!!\n here are,\n examples for chunk dataset!!\n in R\n ?"
cat(str)
## This is HadoopStreaming!!
## here are,
## examples for chunk dataset!!
## in R
## ?
con <- textConnection(str, open = "r")
hsLineReader(con, chunkSize = 2, FUN = print)
## [1] "This is HadoopStreaming!!" " here are,"
## [1] " examples for chunk dataset!!" " in R"
## [1] " ?"
'Rhadoop' 카테고리의 다른 글
RHADOOP - WORD COUNT & WORD CLOUD -1 (17/11/09 Lecture Note) (0) | 2017.11.28 |
---|---|
RHADOOP - K-means Clustering (17/11/02 Lecture Note) (0) | 2017.11.28 |
RHADOOP - Linear Regression (17/10/26 Lecture Note) (0) | 2017.11.28 |
RHADOOP MAPREDUCE -2. REDUCE (17/09/28 Lecture Note) (0) | 2017.11.28 |
RHADOOP MAPREDUCE -1. MAP (17/09/21 Lecture Note) (0) | 2017.11.28 |