Introduction to DataFrames¶

Bogumił Kamiński, Apr 21, 2018

Reference¶

https://github.com/JuliaComputing/JuliaBoxTutorials/tree/master/introductory-tutorials/broader-topics-and-ecosystem/intro-to-julia-DataFrames

Series¶

https://deepstat.tistory.com/69 (01. constructors)(in English)
https://deepstat.tistory.com/70 (01. constructors)(한글)
https://deepstat.tistory.com/71 (02. basicinfo)(in English)
https://deepstat.tistory.com/72 (02. basicinfo)(한글)
https://deepstat.tistory.com/73 (03. missingvalues)(in English)
https://deepstat.tistory.com/74 (03. missingvalues)(한글)
https://deepstat.tistory.com/75 (04. loadsave)(in English)
https://deepstat.tistory.com/76 (04. loadsave)(한글)
https://deepstat.tistory.com/77 (05. columns)(in English)
https://deepstat.tistory.com/78 (05. columns)(한글)
https://deepstat.tistory.com/79 (06. rows)(in English)
https://deepstat.tistory.com/80 (06. rows)(한글)
https://deepstat.tistory.com/81 (07. factors)(in English)
https://deepstat.tistory.com/82 (07. factors)(한글)
https://deepstat.tistory.com/83 (08. joins)(in English)
https://deepstat.tistory.com/84 (08. joins)(한글)
https://deepstat.tistory.com/85 (09. reshaping)(in English)
https://deepstat.tistory.com/86 (09. reshaping)(한글)

using DataFrames # load package

Reshaping DataFrames¶

Wide to long¶

x = DataFrame(id=[1,2,3,4], id2=[1,1,2,2], M1=[11,12,13,14], M2=[111,112,113,114])

melt(x, :id, [:M1, :M2]) # first pass id-variables and then measure variables; meltdf makes a view

# optionally you can rename columns; melt and stack are identical but order of arguments is reversed
stack(x, [:M1, :M2], :id, variable_name=:key, value_name=:observed) # first measures and then id-s; stackdf creates view

# if second argument is omitted in melt or stack , all other columns are assumed to be the second argument
# but measure variables are selected only if they are <: AbstractFloat
melt(x, [:id, :id2])

melt(x, [1, 2]) # you can use index instead of symbol

bigx = DataFrame(rand(10^6, 10)) # a test comparing creation of new DataFrame and a view
bigx[:id] = 1:10^6
@time melt(bigx, :id)
@time melt(bigx, :id)
@time meltdf(bigx, :id)
@time meltdf(bigx, :id);

  0.255109 seconds (172.28 k allocations: 237.679 MiB, 34.60% gc time)
  0.203728 seconds (144 allocations: 228.889 MiB, 53.30% gc time)
  0.386479 seconds (633.47 k allocations: 32.617 MiB, 15.71% gc time)
  0.000075 seconds (117 allocations: 6.453 KiB)

x = DataFrame(id = [1,1,1], id2=['a','b','c'], a1 = rand(3), a2 = rand(3))

melt(x)

melt(DataFrame(rand(3,2))) # by default stack and melt treats floats as value columns

df = DataFrame(rand(3,2))
df[:key] = [1,1,1]
mdf = melt(df) # duplicates in key are silently accepted

Long to wide¶

x = DataFrame(id = [1,1,1], id2=['a','b','c'], a1 = rand(3), a2 = rand(3))

y = melt(x, [1,2])
display(x)
display(y)

unstack(y, :id2, :variable, :value) # stndard unstack with a unique key

unstack(y, :variable, :value) # all other columns are treated as keys

# by default :id, :variable and :value names are assumed; in this case it produces duplicate keys
unstack(y)

┌ Warning: In the future `unstack(df)` will call `unstack(df, :variable, :value)`. use `unstack(df, :id, :variable, :value)` to treat `:id` as the only `rowkeys` column
│   caller = top-level scope at In[16]:1
└ @ Core In[16]:1
┌ Warning: Duplicate entries in unstack at row 2 for key 1 and variable a1.
└ @ DataFrames /home/yt/.julia/packages/DataFrames/1PqZ3/src/abstractdataframe/reshape.jl:244

df = stack(DataFrame(rand(3,2)))

unstack(df, :variable, :value) # unable to unstack when no key column is present

ArgumentError: No key column found

Stacktrace:
 [1] unstack(::DataFrame, ::Array{Symbol,1}, ::Int64, ::Int64) at /home/yt/.julia/packages/DataFrames/1PqZ3/src/abstractdataframe/reshape.jl:279
 [2] unstack(::DataFrame, ::Int64, ::Int64) at /home/yt/.julia/packages/DataFrames/1PqZ3/src/abstractdataframe/reshape.jl:269
 [3] unstack(::DataFrame, ::Symbol, ::Symbol) at /home/yt/.julia/packages/DataFrames/1PqZ3/src/abstractdataframe/reshape.jl:265
 [4] top-level scope at In[18]:1

	id	id2	a1	a2
	Int64	Char	Float64	Float64
1	1	'a'	0.446038	0.735251
2	1	'b'	0.508045	0.783346
3	1	'c'	0.874669	0.724064

	variable	value
	Symbol	Float64
1	x1	0.407512
2	x1	0.958294
3	x1	0.993427
4	x2	0.121015
5	x2	0.987261
6	x2	0.438873

	variable	value	key
	Symbol	Float64	Int64
1	x1	0.0148016	1
2	x1	0.0783944	1
3	x1	0.794611	1
4	x2	0.113415	1
5	x2	0.966621	1
6	x2	0.0950933	1

	id	id2	a1	a2
	Int64	Char	Float64	Float64
1	1	'a'	0.982001	0.765671
2	1	'b'	0.00268151	0.780911
3	1	'c'	0.333175	0.0896065

	id	id2	a1	a2
	Int64	Char	Float64	Float64
1	1	'a'	0.982001	0.765671
2	1	'b'	0.00268151	0.780911
3	1	'c'	0.333175	0.0896065

DeepStat

티스토리 뷰

09. reshaping

Introduction to DataFrames¶

Reference¶

Series¶

Reshaping DataFrames¶

Wide to long¶

Long to wide¶

'Flux in Julia > Learning Julia (Intro_to_Julia_DFs)' 카테고리의 다른 글

티스토리툴바

	id2	a1	a2
	Char	Float64⍰	Float64⍰
1	'a'	0.982001	0.765671
2	'b'	0.00268151	0.780911
3	'c'	0.333175	0.0896065

	variable	value
	Symbol	Float64
1	x1	0.524652
2	x1	0.990633
3	x1	0.419322
4	x2	0.583264
5	x2	0.0647236
6	x2	0.0752103

10. transforms (0)	2018.10.16
09. reshaping(한글) (0)	2018.10.15
08. joins (한글) (0)	2018.10.14
08. joins (0)	2018.10.14
07. factors (한글) (0)	2018.10.13

	key	observed	id
	Symbol	Int64	Int64
1	M1	11	1
2	M1	12	2
3	M1	13	3
4	M1	14	4
5	M2	111	1
6	M2	112	2
7	M2	113	3
8	M2	114	4

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31