티스토리 뷰
Introduction to DataFrames¶
Bogumił Kamiński, Apr 21, 2018
Reference¶
Series¶
- https://deepstat.tistory.com/69 (01. constructors)(in English)
- https://deepstat.tistory.com/70 (01. constructors)(한글)
- https://deepstat.tistory.com/71 (02. basicinfo)(in English)
- https://deepstat.tistory.com/72 (02. basicinfo)(한글)
- https://deepstat.tistory.com/73 (03. missingvalues)(in English)
- https://deepstat.tistory.com/74 (03. missingvalues)(한글)
- https://deepstat.tistory.com/75 (04. loadsave)(in English)
- https://deepstat.tistory.com/76 (04. loadsave)(한글)
- https://deepstat.tistory.com/77 (05. columns)(in English)
- https://deepstat.tistory.com/78 (05. columns)(한글)
- https://deepstat.tistory.com/79 (06. rows)(in English)
- https://deepstat.tistory.com/80 (06. rows)(한글)
- https://deepstat.tistory.com/81 (07. factors)(in English)
- https://deepstat.tistory.com/82 (07. factors)(한글)
- https://deepstat.tistory.com/83 (08. joins)(in English)
- https://deepstat.tistory.com/84 (08. joins)(한글)
- https://deepstat.tistory.com/85 (09. reshaping)(in English)
- https://deepstat.tistory.com/86 (09. reshaping)(한글)
In [1]:
using DataFrames # load package
Reshaping DataFrames¶
Wide to long¶
In [2]:
x = DataFrame(id=[1,2,3,4], id2=[1,1,2,2], M1=[11,12,13,14], M2=[111,112,113,114])
Out[2]:
In [3]:
melt(x, :id, [:M1, :M2]) # first pass id-variables and then measure variables; meltdf makes a view
Out[3]:
In [4]:
# optionally you can rename columns; melt and stack are identical but order of arguments is reversed
stack(x, [:M1, :M2], :id, variable_name=:key, value_name=:observed) # first measures and then id-s; stackdf creates view
Out[4]:
In [5]:
# if second argument is omitted in melt or stack , all other columns are assumed to be the second argument
# but measure variables are selected only if they are <: AbstractFloat
melt(x, [:id, :id2])
Out[5]:
In [6]:
melt(x, [1, 2]) # you can use index instead of symbol
Out[6]:
In [7]:
bigx = DataFrame(rand(10^6, 10)) # a test comparing creation of new DataFrame and a view
bigx[:id] = 1:10^6
@time melt(bigx, :id)
@time melt(bigx, :id)
@time meltdf(bigx, :id)
@time meltdf(bigx, :id);
In [8]:
x = DataFrame(id = [1,1,1], id2=['a','b','c'], a1 = rand(3), a2 = rand(3))
Out[8]:
In [9]:
melt(x)
Out[9]:
In [10]:
melt(DataFrame(rand(3,2))) # by default stack and melt treats floats as value columns
Out[10]:
In [11]:
df = DataFrame(rand(3,2))
df[:key] = [1,1,1]
mdf = melt(df) # duplicates in key are silently accepted
Out[11]:
Long to wide¶
In [12]:
x = DataFrame(id = [1,1,1], id2=['a','b','c'], a1 = rand(3), a2 = rand(3))
Out[12]:
In [13]:
y = melt(x, [1,2])
display(x)
display(y)
In [14]:
unstack(y, :id2, :variable, :value) # stndard unstack with a unique key
Out[14]:
In [15]:
unstack(y, :variable, :value) # all other columns are treated as keys
Out[15]:
In [16]:
# by default :id, :variable and :value names are assumed; in this case it produces duplicate keys
unstack(y)
Out[16]:
In [17]:
df = stack(DataFrame(rand(3,2)))
Out[17]:
In [18]:
unstack(df, :variable, :value) # unable to unstack when no key column is present
'Flux in Julia > Learning Julia (Intro_to_Julia_DFs)' 카테고리의 다른 글
10. transforms (0) | 2018.10.16 |
---|---|
09. reshaping(한글) (0) | 2018.10.15 |
08. joins (한글) (0) | 2018.10.14 |
08. joins (0) | 2018.10.14 |
07. factors (한글) (0) | 2018.10.13 |