Introduction to DataFrames¶

Bogumił Kamiński, Apr 21, 2018

Reference¶

https://github.com/JuliaComputing/JuliaBoxTutorials/tree/master/introductory-tutorials/broader-topics-and-ecosystem/intro-to-julia-DataFrames

Series¶

https://deepstat.tistory.com/69 (01. constructors)(in English)
https://deepstat.tistory.com/70 (01. constructors)(한글)
https://deepstat.tistory.com/71 (02. basicinfo)(in English)
https://deepstat.tistory.com/72 (02. basicinfo)(한글)
https://deepstat.tistory.com/73 (03. missingvalues)(in English)
https://deepstat.tistory.com/74 (03. missingvalues)(한글)
https://deepstat.tistory.com/75 (04. loadsave)(in English)
https://deepstat.tistory.com/76 (04. loadsave)(한글)
https://deepstat.tistory.com/77 (05. columns)(in English)
https://deepstat.tistory.com/78 (05. columns)(한글)
https://deepstat.tistory.com/79 (06. rows)(in English)
https://deepstat.tistory.com/80 (06. rows)(한글)
https://deepstat.tistory.com/81 (07. factors)(in English)
https://deepstat.tistory.com/82 (07. factors)(한글)
https://deepstat.tistory.com/83 (08. joins)(in English)
https://deepstat.tistory.com/84 (08. joins)(한글)
https://deepstat.tistory.com/85 (09. reshaping)(in English)
https://deepstat.tistory.com/86 (09. reshaping)(한글)
https://deepstat.tistory.com/87 (10. transforms)(in English)
https://deepstat.tistory.com/88 (10. transforms)(한글)
https://deepstat.tistory.com/89 (11. performance)(in English)
https://deepstat.tistory.com/90 (11. performance)(한글)
https://deepstat.tistory.com/91 (12. pitfalls)(in English)
https://deepstat.tistory.com/92 (12. pitfalls)(한글)

using DataFrames

Possible pitfalls¶

Know what is copied when creating a `DataFrame`¶

x = DataFrame(rand(3, 5))

y = DataFrame(x)
x === y # no copyinng performed

┌ Warning: In the future DataFrame constructor called with a `DataFrame` argument will return a copy. Use `convert(DataFrame, df)` to avoid copying if `df` is a `DataFrame`.
│   caller = top-level scope at In[3]:1
└ @ Core In[3]:1

true

y = copy(x)
x === y # not the same object

false

all(x[i] === y[i] for i in ncol(x)) # but the columns are the same

true

x = 1:3; y = [1, 2, 3]; df = DataFrame(x=x,y=y) # the same when creating arrays or assigning columns, except ranges

y === df[:y] # the same object

true

typeof(x), typeof(df[:x]) # range is converted to a vector

(UnitRange{Int64}, Array{Int64,1})

Do not modify the parent of `GroupedDataFrame`¶

x = DataFrame(id=repeat([1,2], outer=3), x=1:6)
g = groupby(x, :id)

GroupedDataFrame with 2 groups based on key: :id
First Group: 3 rows
│ Row │ id    │ x     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 1     │
│ 2   │ 1     │ 3     │
│ 3   │ 1     │ 5     │
⋮
Last Group: 3 rows
│ Row │ id    │ x     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 2     │ 2     │
│ 2   │ 2     │ 4     │
│ 3   │ 2     │ 6     │

x[1:3, 1]=[2,2,2]
g # well - it is wrong now, g is only a view

GroupedDataFrame with 2 groups based on key: :id
First Group: 3 rows
│ Row │ id    │ x     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 2     │ 1     │
│ 2   │ 2     │ 3     │
│ 3   │ 1     │ 5     │
⋮
Last Group: 3 rows
│ Row │ id    │ x     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 2     │ 2     │
│ 2   │ 2     │ 4     │
│ 3   │ 2     │ 6     │

Remember that you can filter columns of a `DataFrame` using booleans¶

using Random
Random.seed!(1)
x = DataFrame(rand(5, 5))

x[x[:x1] .< 0.25] # well - we have filtered columns not rows by accident as you can select columns using booleans

x[x[:x1] .< 0.25, :] # probably this is what we wanted

Column selection for DataFrame creates aliases unless explicitly copied¶

x = DataFrame(a=1:3)
x[:b] = x[1] # alias
x[:c] = x[:, 1] # also alias
x[:d] = x[1][:] # copy
x[:e] = copy(x[1]) # explicit copy
display(x)
x[1,1] = 100
display(x)

┌ Warning: indexing with colon as row will create a copy in the future use df[col_inds] to get the columns without copying
│   caller = top-level scope at In[14]:3
└ @ Core In[14]:3

	x1	x2	x3	x4	x5
	Float64	Float64	Float64	Float64	Float64
1	0.379061	0.53744	0.776147	0.253968	0.0953152
2	0.536293	0.907652	0.371186	0.283286	0.0397663
3	0.00914099	0.546185	0.498417	0.364973	0.453046

	x1	x2	x3	x4	x5
	Float64	Float64	Float64	Float64	Float64
1	0.236033	0.210968	0.555751	0.209472	0.0769509
2	0.346517	0.951916	0.437108	0.251379	0.640396
3	0.312707	0.999905	0.424718	0.0203749	0.873544
4	0.00790928	0.251662	0.773223	0.287702	0.278582
5	0.488613	0.986666	0.28119	0.859512	0.751313

	x1	x4
	Float64	Float64
1	0.236033	0.209472
2	0.346517	0.251379
3	0.312707	0.0203749
4	0.00790928	0.287702
5	0.488613	0.859512

	x1	x2	x3	x4	x5
	Float64	Float64	Float64	Float64	Float64
1	0.236033	0.210968	0.555751	0.209472	0.0769509
2	0.00790928	0.251662	0.773223	0.287702	0.278582

13. extras (0)	2018.10.20
12. pitfalls (한글) (0)	2018.10.19
11. performance (한글) (0)	2018.10.18
11. performance (0)	2018.10.18
10. transforms (한글) (0)	2018.10.16

DeepStat

티스토리 뷰

12. pitfalls

Introduction to DataFrames¶

Reference¶

Series¶

Possible pitfalls¶

Know what is copied when creating a `DataFrame`¶

Do not modify the parent of `GroupedDataFrame`¶

Remember that you can filter columns of a `DataFrame` using booleans¶

Column selection for DataFrame creates aliases unless explicitly copied¶

'Flux in Julia > Learning Julia (Intro_to_Julia_DFs)' 카테고리의 다른 글

티스토리툴바

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

티스토리 뷰

12. pitfalls

Introduction to DataFrames¶

Reference¶

Series¶

Possible pitfalls¶

Know what is copied when creating a DataFrame¶

Do not modify the parent of GroupedDataFrame¶

Remember that you can filter columns of a DataFrame using booleans¶

Column selection for DataFrame creates aliases unless explicitly copied¶

'Flux in Julia > Learning Julia (Intro_to_Julia_DFs)' 카테고리의 다른 글

티스토리툴바

Know what is copied when creating a `DataFrame`¶

Do not modify the parent of `GroupedDataFrame`¶

Remember that you can filter columns of a `DataFrame` using booleans¶