Introduction to DataFrames¶

Bogumił Kamiński, Apr 21, 2017

출처¶

https://github.com/JuliaComputing/JuliaBoxTutorials/tree/master/introductory-tutorials/broader-topics-and-ecosystem/intro-to-julia-DataFrames

함께보기¶

https://deepstat.tistory.com/69 (01. constructors)(in English)
https://deepstat.tistory.com/70 (01. constructors)(한글)
https://deepstat.tistory.com/71 (02. basicinfo)(in English)
https://deepstat.tistory.com/72 (02. basicinfo)(한글)
https://deepstat.tistory.com/73 (03. missingvalues)(in English)
https://deepstat.tistory.com/74 (03. missingvalues)(한글)
https://deepstat.tistory.com/75 (04. loadsave)(in English)
https://deepstat.tistory.com/76 (04. loadsave)(한글)
https://deepstat.tistory.com/77 (05. columns)(in English)
https://deepstat.tistory.com/78 (05. columns)(한글)
https://deepstat.tistory.com/79 (06. rows)(in English)
https://deepstat.tistory.com/80 (06. rows)(한글)
https://deepstat.tistory.com/81 (07. factors)(in English)
https://deepstat.tistory.com/82 (07. factors)(한글)
https://deepstat.tistory.com/83 (08. joins)(in English)
https://deepstat.tistory.com/84 (08. joins)(한글)

using DataFrames # load package

데이터프레임 조인하기 (Joining DataFrames)¶

조인할 데이터프레임 준비하기 (Preparing DataFrames for a join)¶

x = DataFrame(ID=[1,2,3,4,missing], name = ["Alice", "Bob", "Conor", "Dave","Zed"])
y = DataFrame(id=[1,2,5,6,missing], age = [21,22,23,24,99])
println(x)
println(y)

5×2 DataFrame
│ Row │ ID      │ name   │
│     │ Int64⍰  │ String │
├─────┼─────────┼────────┤
│ 1   │ 1       │ Alice  │
│ 2   │ 2       │ Bob    │
│ 3   │ 3       │ Conor  │
│ 4   │ 4       │ Dave   │
│ 5   │ missing │ Zed    │
5×2 DataFrame
│ Row │ id      │ age   │
│     │ Int64⍰  │ Int64 │
├─────┼─────────┼───────┤
│ 1   │ 1       │ 21    │
│ 2   │ 2       │ 22    │
│ 3   │ 5       │ 23    │
│ 4   │ 6       │ 24    │
│ 5   │ missing │ 99    │

rename!(x, :ID=>:id) # 조인(joini할 기준이 되는 행 이름은 같아야만 한다.

기본 조인 (Standard joins: inner, left, right, outer, semi, anti)¶

join(x, y, on=:id) # 기본적으로 이너조인(inner join)을 수행한다. 결측(missing)도 조인된다.

join(x, y, on=:id, kind=:left) # 레프트조인(left join)

join(x, y, on=:id, kind=:right) # 라이트조인(right join)

join(x, y, on=:id, kind=:outer) #아우터조인(outer join)

join(x, y, on=:id, kind=:semi) #세미조인(semi join)

join(x, y, on=:id, kind=:anti) #안티조인(anti join)

크로스조인 (Cross join)¶

# 크로스조인(cross-join)은 "on" 인자(argument)를 필요로 하지 않는다.
# 크로스조인(cross-join)은 카테이션 곱(Cartesian product) 혹은 인자(argument)를 만든다.
function expand_grid(;xs...) # R 언어에서 쓰이는 expand.grid의 간단한 형태의 함수
    reduce((x,y) -> join(x, DataFrame(Pair(y...)), kind=:cross),
           DataFrame(Pair(xs[1]...)), xs[2:end])
end

expand_grid(a=[1,2], b=["a","b","c"], c=[true,false])

ArgumentError: unable to construct DataFrame from Pair{Int64,Int64}

Stacktrace:
 [1] DataFrame(::Pair{Int64,Int64}) at /home/yt/.julia/packages/DataFrames/1PqZ3/src/other/tables.jl:32
 [2] #expand_grid#3(::Base.Iterators.Pairs{Symbol,Array{T,1} where T,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:a, :b, :c),Tuple{Array{Int64,1},Array{String,1},Array{Bool,1}}}}, ::Function) at ./In[10]:4
 [3] (::getfield(Main, Symbol("#kw##expand_grid")))(::NamedTuple{(:a, :b, :c),Tuple{Array{Int64,1},Array{String,1},Array{Bool,1}}}, ::typeof(expand_grid)) at ./none:0
 [4] top-level scope at In[10]:7

?reduce

search: reduce mapreduce

reduce(op, itr; [init])

jldoctest
julia> reduce(*, [2; 3; 4])
24

julia> reduce(*, [2; 3; 4]; init=-1)
-24

reduce(f, A; dims=:, [init])

jldoctest
julia> a = reshape(Vector(1:16), (4,4))
4×4 Array{Int64,2}:
 1  5   9  13
 2  6  10  14
 3  7  11  15
 4  8  12  16

julia> reduce(max, a, dims=2)
4×1 Array{Int64,2}:
 13
 14
 15
 16

julia> reduce(max, a, dims=1)
1×4 Array{Int64,2}:
 4  8  12  16

복잡한 형태의 조인 (Complex cases of joins)¶

x = DataFrame(id1=[1,1,2,2,missing,missing],
              id2=[1,11,2,21,missing,99],
              name = ["Alice", "Bob", "Conor", "Dave","Zed", "Zoe"])
y = DataFrame(id1=[1,1,3,3,missing,missing],
              id2=[11,1,31,3,missing,999],
              age = [21,22,23,24,99, 100])
println(x)
println(y)

6×3 DataFrame
│ Row │ id1     │ id2     │ name   │
│     │ Int64⍰  │ Int64⍰  │ String │
├─────┼─────────┼─────────┼────────┤
│ 1   │ 1       │ 1       │ Alice  │
│ 2   │ 1       │ 11      │ Bob    │
│ 3   │ 2       │ 2       │ Conor  │
│ 4   │ 2       │ 21      │ Dave   │
│ 5   │ missing │ missing │ Zed    │
│ 6   │ missing │ 99      │ Zoe    │
6×3 DataFrame
│ Row │ id1     │ id2     │ age   │
│     │ Int64⍰  │ Int64⍰  │ Int64 │
├─────┼─────────┼─────────┼───────┤
│ 1   │ 1       │ 11      │ 21    │
│ 2   │ 1       │ 1       │ 22    │
│ 3   │ 3       │ 31      │ 23    │
│ 4   │ 3       │ 3       │ 24    │
│ 5   │ missing │ missing │ 99    │
│ 6   │ missing │ 999     │ 100   │

join(x, y, on=[:id1, :id2]) # 2개 행을 기준으로 조인

join(x, y, on=[:id1], makeunique=true) # 중복되는 경우 모든 경우의 결합을 다 만들어준다. (이 예제는 이너조인(inner join))

join(x, y, on=[:id1], kind=:semi) # 예외적으로 세미조인(semi join)인 경우는 모든 결합을 다 만들어주지 않는다.

티스토리

08. joins (한글)

08. joins (한글)

Introduction to DataFrames¶

출처¶

함께보기¶

데이터프레임 조인하기 (Joining DataFrames)¶

조인할 데이터프레임 준비하기 (Preparing DataFrames for a join)¶

기본 조인 (Standard joins: inner, left, right, outer, semi, anti)¶

크로스조인 (Cross join)¶

Examples¶

Examples¶

복잡한 형태의 조인 (Complex cases of joins)¶

	id	name	age
	Int64⍰	String	Int64⍰
1	1	Alice	21
2	2	Bob	22
3	3	Conor	missing
4	4	Dave	missing
5	missing	Zed	99

	id	name	age
	Int64⍰	String⍰	Int64
1	1	Alice	21
2	2	Bob	22
3	missing	Zed	99
4	5	missing	23
5	6	missing	24