Introduction to DataFrames¶

Bogumił Kamiński, Apr 21, 2017

Reference¶

https://github.com/JuliaComputing/JuliaBoxTutorials/tree/master/introductory-tutorials/broader-topics-and-ecosystem/intro-to-julia-DataFrames

Series¶

https://deepstat.tistory.com/69 (01. constructors)(in English)
https://deepstat.tistory.com/70 (01. constructors)(한글)
https://deepstat.tistory.com/71 (02. basicinfo)(in English)
https://deepstat.tistory.com/72 (02. basicinfo)(한글)
https://deepstat.tistory.com/73 (03. missingvalues)(in English)
https://deepstat.tistory.com/74 (03. missingvalues)(한글)
https://deepstat.tistory.com/75 (04. loadsave)(in English)
https://deepstat.tistory.com/76 (04. loadsave)(한글)
https://deepstat.tistory.com/77 (05. columns)(in English)
https://deepstat.tistory.com/78 (05. columns)(한글)
https://deepstat.tistory.com/79 (06. rows)(in English)
https://deepstat.tistory.com/80 (06. rows)(한글)
https://deepstat.tistory.com/81 (07. factors)(in English)
https://deepstat.tistory.com/82 (07. factors)(한글)
https://deepstat.tistory.com/83 (08. joins)(in English)
https://deepstat.tistory.com/84 (08. joins)(한글)

using DataFrames # load package

Joining DataFrames¶

Preparing DataFrames for a join¶

x = DataFrame(ID=[1,2,3,4,missing], name = ["Alice", "Bob", "Conor", "Dave","Zed"])
y = DataFrame(id=[1,2,5,6,missing], age = [21,22,23,24,99])
println(x)
println(y)

5×2 DataFrame
│ Row │ ID      │ name   │
│     │ Int64⍰  │ String │
├─────┼─────────┼────────┤
│ 1   │ 1       │ Alice  │
│ 2   │ 2       │ Bob    │
│ 3   │ 3       │ Conor  │
│ 4   │ 4       │ Dave   │
│ 5   │ missing │ Zed    │
5×2 DataFrame
│ Row │ id      │ age   │
│     │ Int64⍰  │ Int64 │
├─────┼─────────┼───────┤
│ 1   │ 1       │ 21    │
│ 2   │ 2       │ 22    │
│ 3   │ 5       │ 23    │
│ 4   │ 6       │ 24    │
│ 5   │ missing │ 99    │

rename!(x, :ID=>:id) # names of columns on which we want to join must be the same

Standard joins: inner, left, right, outer, semi, anti¶

join(x, y, on=:id) # :inner join by default, missing is joined

join(x, y, on=:id, kind=:left)

join(x, y, on=:id, kind=:right)

join(x, y, on=:id, kind=:outer)

join(x, y, on=:id, kind=:semi)

join(x, y, on=:id, kind=:anti)

Cross join¶

# cross-join does not require on argument
# it produces a Cartesian product or arguments
function expand_grid(;xs...) # a simple replacement for expand.grid in R
    reduce((x,y) -> join(x, DataFrame(Pair(y...)), kind=:cross),
           DataFrame(Pair(xs[1]...)), xs[2:end])
end

expand_grid(a=[1,2], b=["a","b","c"], c=[true,false])

ArgumentError: unable to construct DataFrame from Pair{Int64,Int64}

Stacktrace:
 [1] DataFrame(::Pair{Int64,Int64}) at /home/yt/.julia/packages/DataFrames/1PqZ3/src/other/tables.jl:32
 [2] #expand_grid#3(::Base.Iterators.Pairs{Symbol,Array{T,1} where T,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:a, :b, :c),Tuple{Array{Int64,1},Array{String,1},Array{Bool,1}}}}, ::Function) at ./In[10]:4
 [3] (::getfield(Main, Symbol("#kw##expand_grid")))(::NamedTuple{(:a, :b, :c),Tuple{Array{Int64,1},Array{String,1},Array{Bool,1}}}, ::typeof(expand_grid)) at ./none:0
 [4] top-level scope at In[10]:7

?reduce

search: reduce mapreduce

reduce(op, itr; [init])

jldoctest
julia> reduce(*, [2; 3; 4])
24

julia> reduce(*, [2; 3; 4]; init=-1)
-24

reduce(f, A; dims=:, [init])

jldoctest
julia> a = reshape(Vector(1:16), (4,4))
4×4 Array{Int64,2}:
 1  5   9  13
 2  6  10  14
 3  7  11  15
 4  8  12  16

julia> reduce(max, a, dims=2)
4×1 Array{Int64,2}:
 13
 14
 15
 16

julia> reduce(max, a, dims=1)
1×4 Array{Int64,2}:
 4  8  12  16

Complex cases of joins¶

x = DataFrame(id1=[1,1,2,2,missing,missing],
              id2=[1,11,2,21,missing,99],
              name = ["Alice", "Bob", "Conor", "Dave","Zed", "Zoe"])
y = DataFrame(id1=[1,1,3,3,missing,missing],
              id2=[11,1,31,3,missing,999],
              age = [21,22,23,24,99, 100])
println(x)
println(y)

6×3 DataFrame
│ Row │ id1     │ id2     │ name   │
│     │ Int64⍰  │ Int64⍰  │ String │
├─────┼─────────┼─────────┼────────┤
│ 1   │ 1       │ 1       │ Alice  │
│ 2   │ 1       │ 11      │ Bob    │
│ 3   │ 2       │ 2       │ Conor  │
│ 4   │ 2       │ 21      │ Dave   │
│ 5   │ missing │ missing │ Zed    │
│ 6   │ missing │ 99      │ Zoe    │
6×3 DataFrame
│ Row │ id1     │ id2     │ age   │
│     │ Int64⍰  │ Int64⍰  │ Int64 │
├─────┼─────────┼─────────┼───────┤
│ 1   │ 1       │ 11      │ 21    │
│ 2   │ 1       │ 1       │ 22    │
│ 3   │ 3       │ 31      │ 23    │
│ 4   │ 3       │ 3       │ 24    │
│ 5   │ missing │ missing │ 99    │
│ 6   │ missing │ 999     │ 100   │

join(x, y, on=[:id1, :id2]) # joining on two columns

join(x, y, on=[:id1], makeunique=true) # with duplicates all combinations are produced (here :inner join)

join(x, y, on=[:id1], kind=:semi) # but not by :semi join (as it would duplicate rows)

09. reshaping (0)	2018.10.15
08. joins (한글) (0)	2018.10.14
07. factors (한글) (0)	2018.10.13
07. factors (0)	2018.10.13
06. rows (한글) (0)	2018.10.12

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

DeepStat

티스토리 뷰

08. joins

Introduction to DataFrames¶

Reference¶

Series¶

Joining DataFrames¶

Preparing DataFrames for a join¶

Standard joins: inner, left, right, outer, semi, anti¶

Cross join¶

Examples¶

Examples¶

Complex cases of joins¶

'Flux in Julia > Learning Julia (Intro_to_Julia_DFs)' 카테고리의 다른 글

티스토리툴바

	id	name	age
	Int64⍰	String	Int64⍰
1	1	Alice	21
2	2	Bob	22
3	3	Conor	missing
4	4	Dave	missing
5	missing	Zed	99

	id	name	age
	Int64⍰	String⍰	Int64
1	1	Alice	21
2	2	Bob	22
3	missing	Zed	99
4	5	missing	23
5	6	missing	24