티스토리 뷰
Introduction to DataFrames¶
Bogumił Kamiński, May 23, 2018
Reference¶
Series¶
- https://deepstat.tistory.com/69 (01. constructors)(in English)
- https://deepstat.tistory.com/70 (01. constructors)(한글)
- https://deepstat.tistory.com/71 (02. basicinfo)(in English)
- https://deepstat.tistory.com/72 (02. basicinfo)(한글)
- https://deepstat.tistory.com/73 (03. missingvalues)(in English)
- https://deepstat.tistory.com/74 (03. missingvalues)(한글)
- https://deepstat.tistory.com/75 (04. loadsave)(in English)
- https://deepstat.tistory.com/76 (04. loadsave)(한글)
- https://deepstat.tistory.com/77 (05. columns)(in English)
- https://deepstat.tistory.com/78 (05. columns)(한글)
using DataFrames # load package
Manipulating columns of DataFrame¶
Renaming columns¶
Let's start with a DataFrame
of Bool
s that has default column names.
x = DataFrame(Bool, 3, 4)
With rename
, we create new DataFrame
; here we rename the column :x1
to :A
. (rename
also accepts collections of Pairs.)
rename(x, :x1 => :A)
With rename!
we do an in place transformation.
This time we've applied a function to every column name.
rename!(c -> Symbol(string(c)^2), x)
We can also change the name of a particular column without knowing the original.
Here we change the name of the third column, creating a new DataFrame
.
rename(x, names(x)[3] => :third)
With names!
, we can change the names of all variables.
names!(x, [:a, :b, :c, :d])
We get an error when we try to provide duplicate names
names!(x, fill(:a, 4))
unless we pass makeunique=true
, which allows us to handle duplicates in passed names.
names!(x, fill(:a, 4), makeunique=true)
Reordering columns¶
We can reorder the names(x) vector as needed, creating a new DataFrame.
using Random
Random.seed!(1234)#srand(1234)
x[shuffle(names(x))]
we can also reorder dataframes with permutecols!
.
permutecols!(x, [2, 1, 3, 4])
permutecols!(x, [:a, :a_1, :a_2, :a_3])
Merging/adding columns¶
x = DataFrame([(i,j) for i in 1:3, j in 1:4])
With hcat
we can merge two DataFrame
s. Also [x y] syntax is supported but only when DataFrames have unique column names.
hcat(x, x, makeunique=true)
[x x]
We can also use hcat
to add a new column; a default name :x1
will be used for this column, so makeunique=true
is needed.
y = hcat(x, [1,2,3], makeunique=true)
[x [1,2,3]]
You can also prepend a vector with hcat
.
hcat([1,2,3], x, makeunique=true)
[[1,2,3] x]
Alternatively you could append a vector with the following syntax. This is a bit more verbose but cleaner.
y = [x DataFrame(A=[1,2,3])]
Here we do the same but add column :A
to the front.
y = [DataFrame(A=[1,2,3]) x]
A column can also be added in the middle. Here a brute-force method is used and a new DataFrame is created.
using BenchmarkTools
@btime [$x[1:2] DataFrame(A=[1,2,3]) $x[3:4]]
We could also do this with a specialized in place method insert!
. Let's add :newcol
to the DataFrame
y
.
insert!(y, 2, [1,2,3], :newcol)
If you want to insert the same column name several times makeunique=true
is needed as usual.
insert!(y, 2, [1,2,3], :newcol, makeunique=true)
We can see how much faster it is to insert a column with insert!
than with hcat
using @btime
.
@btime insert!(copy($x), 3, [1,2,3], :A)
Let's use insert!
to append a column in place,
insert!(x, ncol(x)+1, [1,2,3], :A)
and to in place prepend a column.
insert!(x, 1, [1,2,3], :B)
With merge!
, let's merge the second DataFrame into first, but overwriting duplicates.
df1 = DataFrame(x=1:3, y=4:6)
df2 = DataFrame(x='a':'c', z = 'd':'f', new=11:13)
df1, df2, merge!(df1, df2)
For comparison: merge two DataFrames
s but renaming duplicate names via hcat
.
df1 = DataFrame(x=1:3, y=4:6)
df2 = DataFrame(x='a':'c', z = 'd':'f', new=11:13)
println(df1)
println(df2)
hcat(df1, df2, makeunique=true)
merge!(df1,df2)
Subsetting/removing columns¶
Let's create a new DataFrame
x
and show a few ways to create DataFrames with a subset of x
's columns.
x = DataFrame([(i,j) for i in 1:3, j in 1:5])
First we could do this by index
x[[1,2,4,5]]
or by column name.
x[[:x1, :x4]]
We can also choose to keep or exclude columns by Bool
. (We need a vector whose length is the number of columns in the original DataFrame
.)
x[[true, false, true, false, true]]
Here we create a single column DataFrame
,
x[[:x1]]
and here we access the vector contained in column :x1
.
x[:x1]
We could grab the same vector by column number
x[1]
and remove everything from a DataFrame
with empty!
.
empty!(y)
Here we create a copy of x
and delete the 3rd column from the copy with delete!
.
z = copy(x)
x, delete!(z, 3)
Modify column by name¶
x = DataFrame([(i,j) for i in 1:3, j in 1:5])
With the following syntax, the existing column is modified without performing any copying.
x[:x1] = x[:x2]
x
We can also use the following syntax to add a new column at the end of a DataFrame
.
x[:A] = [1,2,3]
x
A new column name will be added to our DataFrame
with the following syntax as well (7 is equal to ncol(x)+1
).
x[7] = 11:13
x
Find column name¶
x = DataFrame([(i,j) for i in 1:3, j in 1:5])
We can check if a column with a given name exists via
:x1 in names(x)
and determine its index via
findfirst(names(x) .== :x2)
'Flux in Julia > Learning Julia (Intro_to_Julia_DFs)' 카테고리의 다른 글
06. rows (0) | 2018.10.12 |
---|---|
05. columns (한글) (0) | 2018.10.11 |
04. loadsave (한글) (0) | 2018.10.10 |
04. loadsave (0) | 2018.10.10 |
03. missingvalues (한글) (0) | 2018.10.09 |