티스토리 뷰
Introduction to DataFrames¶
Bogumił Kamiński, Apr 21, 2018
Reference¶
Series¶
- https://deepstat.tistory.com/69 (01. constructors)(in English)
- https://deepstat.tistory.com/70 (01. constructors)(한글)
- https://deepstat.tistory.com/71 (02. basicinfo)(in English)
- https://deepstat.tistory.com/72 (02. basicinfo)(한글)
- https://deepstat.tistory.com/73 (03. missingvalues)(in English)
- https://deepstat.tistory.com/74 (03. missingvalues)(한글)
- https://deepstat.tistory.com/75 (04. loadsave)(in English)
- https://deepstat.tistory.com/76 (04. loadsave)(한글)
- https://deepstat.tistory.com/77 (05. columns)(in English)
- https://deepstat.tistory.com/78 (05. columns)(한글)
- https://deepstat.tistory.com/79 (06. rows)(in English)
- https://deepstat.tistory.com/80 (06. rows)(한글)
- https://deepstat.tistory.com/81 (07. factors)(in English)
- https://deepstat.tistory.com/82 (07. factors)(한글)
- https://deepstat.tistory.com/83 (08. joins)(in English)
- https://deepstat.tistory.com/84 (08. joins)(한글)
- https://deepstat.tistory.com/85 (09. reshaping)(in English)
- https://deepstat.tistory.com/86 (09. reshaping)(한글)
- https://deepstat.tistory.com/87 (10. transforms)(in English)
- https://deepstat.tistory.com/88 (10. transforms)(한글)
In [1]:
using DataFrames # load package
Split-apply-combine¶
In [2]:
x = DataFrame(id=[1,2,3,4,1,2,3,4], id2=[1,2,1,2,1,2,1,2], v=rand(8))
Out[2]:
In [3]:
gx1 = groupby(x, :id)
Out[3]:
In [4]:
gx2 = groupby(x, [:id, :id2])
Out[4]:
In [5]:
vcat(gx2...) # back to the original DataFrame
Out[5]:
In [6]:
x = DataFrame(id = [missing, 5, 1, 3, missing], x = 1:5)
Out[6]:
In [7]:
show(groupby(x, :id), allgroups=true) # by default groups include mising values and are not sorted
In [8]:
show(groupby(x, :id, sort=true, skipmissing=true), allgroups=true) # but we can change it :)
In [9]:
x = DataFrame(id=rand('a':'d', 100), v=rand(100));
using Statistics
by(x, :id, y->mean(y[:v])) # apply a function to each group of a data frame
Out[9]:
In [10]:
by(x, :id, y->mean(y[:v]), sort=true) # we can sort the output
Out[10]:
In [11]:
by(x, :id, y->DataFrame(res=mean(y[:v]))) # this way we can set a name for a column - DataFramesMeta @by is better
Out[11]:
In [12]:
x = DataFrame(id=rand('a':'d', 100), x1=rand(100), x2=rand(100))
aggregate(x, :id, sum) # apply a function over all columns of a data frame in groups given by id
Out[12]:
In [13]:
aggregate(x, :id, sum, sort=true) # also can be sorted
Out[13]:
We omit the discussion of of map/combine as I do not find them very useful (better to use by)
In [14]:
x = DataFrame(rand(3, 5))
Out[14]:
In [15]:
map(mean, eachcol(x)) # map a function over each column and return a data frame
Out[15]:
In [16]:
foreach(c -> println(c[1], ": ", mean(c[2])), eachcol(x)) # a raw iteration returns a tuple with column name and values
In [17]:
colwise(mean, x) # colwise is similar, but produces a vector
Out[17]:
In [18]:
x[:id] = [1,1,2]
colwise(mean,groupby(x, :id)) # and works on GroupedDataFrame
Out[18]:
In [19]:
map(r -> r[:x1]/r[:x2], eachrow(x)) # now the returned value is DataFrameRow which works similarly to a one-row DataFrame
Out[19]:
'Flux in Julia > Learning Julia (Intro_to_Julia_DFs)' 카테고리의 다른 글
11. performance (0) | 2018.10.18 |
---|---|
10. transforms (한글) (0) | 2018.10.16 |
09. reshaping(한글) (0) | 2018.10.15 |
09. reshaping (0) | 2018.10.15 |
08. joins (한글) (0) | 2018.10.14 |