티스토리 뷰
Introduction to DataFrames¶
Bogumił Kamiński, Apr 21, 2017
Reference¶
Series¶
- https://deepstat.tistory.com/69 (01. constructors)(in English)
- https://deepstat.tistory.com/70 (01. constructors)(한글)
- https://deepstat.tistory.com/71 (02. basicinfo)(in English)
- https://deepstat.tistory.com/72 (02. basicinfo)(한글)
- https://deepstat.tistory.com/73 (03. missingvalues)(in English)
- https://deepstat.tistory.com/74 (03. missingvalues)(한글)
- https://deepstat.tistory.com/75 (04. loadsave)(in English)
- https://deepstat.tistory.com/76 (04. loadsave)(한글)
- https://deepstat.tistory.com/77 (05. columns)(in English)
- https://deepstat.tistory.com/78 (05. columns)(한글)
- https://deepstat.tistory.com/79 (06. rows)(in English)
- https://deepstat.tistory.com/80 (06. rows)(한글)
- https://deepstat.tistory.com/81 (07. factors)(in English)
- https://deepstat.tistory.com/82 (07. factors)(한글)
- https://deepstat.tistory.com/83 (08. joins)(in English)
- https://deepstat.tistory.com/84 (08. joins)(한글)
In [1]:
using DataFrames # load package
Joining DataFrames¶
Preparing DataFrames for a join¶
In [2]:
x = DataFrame(ID=[1,2,3,4,missing], name = ["Alice", "Bob", "Conor", "Dave","Zed"])
y = DataFrame(id=[1,2,5,6,missing], age = [21,22,23,24,99])
println(x)
println(y)
In [3]:
rename!(x, :ID=>:id) # names of columns on which we want to join must be the same
Out[3]:
Standard joins: inner, left, right, outer, semi, anti¶
In [4]:
join(x, y, on=:id) # :inner join by default, missing is joined
Out[4]:
In [5]:
join(x, y, on=:id, kind=:left)
Out[5]:
In [6]:
join(x, y, on=:id, kind=:right)
Out[6]:
In [7]:
join(x, y, on=:id, kind=:outer)
Out[7]:
In [8]:
join(x, y, on=:id, kind=:semi)
Out[8]:
In [9]:
join(x, y, on=:id, kind=:anti)
Out[9]:
Cross join¶
In [10]:
# cross-join does not require on argument
# it produces a Cartesian product or arguments
function expand_grid(;xs...) # a simple replacement for expand.grid in R
reduce((x,y) -> join(x, DataFrame(Pair(y...)), kind=:cross),
DataFrame(Pair(xs[1]...)), xs[2:end])
end
expand_grid(a=[1,2], b=["a","b","c"], c=[true,false])
In [11]:
?reduce
Out[11]:
Complex cases of joins¶
In [12]:
x = DataFrame(id1=[1,1,2,2,missing,missing],
id2=[1,11,2,21,missing,99],
name = ["Alice", "Bob", "Conor", "Dave","Zed", "Zoe"])
y = DataFrame(id1=[1,1,3,3,missing,missing],
id2=[11,1,31,3,missing,999],
age = [21,22,23,24,99, 100])
println(x)
println(y)
In [13]:
join(x, y, on=[:id1, :id2]) # joining on two columns
Out[13]:
In [14]:
join(x, y, on=[:id1], makeunique=true) # with duplicates all combinations are produced (here :inner join)
Out[14]:
In [15]:
join(x, y, on=[:id1], kind=:semi) # but not by :semi join (as it would duplicate rows)
Out[15]:
'Flux in Julia > Learning Julia (Intro_to_Julia_DFs)' 카테고리의 다른 글
09. reshaping (0) | 2018.10.15 |
---|---|
08. joins (한글) (0) | 2018.10.14 |
07. factors (한글) (0) | 2018.10.13 |
07. factors (0) | 2018.10.13 |
06. rows (한글) (0) | 2018.10.12 |