티스토리 뷰
Introduction to DataFrames¶
Bogumił Kamiński, May 23, 2018
Reference¶
Series¶
- https://deepstat.tistory.com/69 (01. constructors)(in English)
- https://deepstat.tistory.com/70 (01. constructors)(한글)
Let's get started by loading the DataFrames package.
using DataFrames
Constructors and conversion¶
Constructors¶
In this section, you'll see many ways to create a DataFrame using the DataFrame() constructor.
First, we could create an empty DataFrame,
DataFrame() # empty DataFrame
Or we could call the constructor using keyword arguments to add columns to the DataFrame.
DataFrame(A=1:3, B=rand(3), C=randstring.([3,3,3]))
using Random
DataFrame(A=1:3, B=rand(3), C=Random.randstring.([3,3,3]))
We can create a DataFrame from a dictionary, in which case keys from the dictionary will be sorted to create the DataFrame columns.
x = Dict("A" => [1,2], "B" => [true, false], "C" => ['a', 'b'])
DataFrame(x)
Rather than explicitly creating a dictionary first, as above, we could pass DataFrame arguments with the syntax of dictionary key-value pairs.
Note that in this case, we use symbols to denote the column names and arguments are not sorted. For example, :A, the symbol, produces A, the name of the first column here:
DataFrame(:A => [1,2], :B => [true, false], :C => ['a', 'b'])
Here we create a DataFrame from a vector of vectors, and each vector becomes a column.
DataFrame([rand(3) for i in 1:3])
For now we can construct a single DataFrame from a Vector of atoms, creating a DataFrame with a single row. this will throw an error.
DataFrame(rand(3))
Instead use a transposed vector if you have a vector of atoms (in this way you effectively pass a two dimensional array to the constructor which is supported).
DataFrame(transpose([1, 2, 3]))
Pass a second argument to give the columns names.
DataFrame([1:3, 4:6, 7:9], [:A, :B, :C])
Here we create a DataFrame from a matrix,
DataFrame(rand(3,4))
and here we do the same but also pass column names.
DataFrame(rand(3,4), Symbol.('a':'d'))
We can also construct an uninitialized DataFrame.
Here we pass column types, names and number of rows; we get missing in column :C because Any >: Missing.
DataFrame([Int, Float64, Any], [:A, :B, :C], 1)
Here we create a DataFrame, but column :C is #undef.
DataFrame([Int, Float64, String], [:A, :B, :C], 1)
To initialize a DataFrame with column names, but no rows use
DataFrame([Int, Float64, String], [:A, :B, :C], 0)
This syntax gives us a quick way to create homogenous DataFrame.
DataFrame(Int, 3, 5)
This example is similar, but has nonhomogenous columns.
DataFrame([Int, Float64], 4)
Finally, we can create a DataFrame by copying an existing DataFrame.
Note that copy creates a shallow copy.
y = DataFrame(x)
z = copy(x)
(x === y), (x === z), isequal(x, z)
Conversion to a matrix¶
Let's start by creating a DataFrame with two rows and two columns.
x = DataFrame(x=1:2, y=["A", "B"])
We can create a matrix by passing this DataFrame to Matrix.
Matrix(x)
This would work even if the DataFrame had some missings:
x = DataFrame(x=1:2, y=[missing,"B"])
Matrix(x)
In the two previous matrix examples, Julia created matrices with elements of type Any. We can see more clearly that the type of matrix is inferred when we pass, for example, a DataFrame of integers to Matrix, creating a 2D Array of Int64s:
x = DataFrame(x=1:2, y=3:4)
Matrix(x)
In this next example, Julia correctly identifies that Union is needed to express the type of the resulting Matrix (which contains missings).
x = DataFrame(x=1:2, y=[missing,4])
Matrix(x)
Note that we can't force a conversion of missing values to Ints!
Matrix{Int}(x)
Handling of duplicate column names¶
We cannot use duplicate names in DataFrame. We can pass the makeunique keyword argument to get deduplicated names.
df = DataFrame(:a=>1, :a=>2, :a_1=>3; makeunique=true)
Otherwise, duplicates will not be allowed.
df = DataFrame(:a=>1, :a=>2, :a_1=>3)
A constructor that is passed column names as keyword arguments is a corner case.
You cannot pass makeunique to allow duplicates here.
df = DataFrame(a=1, a=2, makeunique=true)
'Flux in Julia > Learning Julia (Intro_to_Julia_DFs)' 카테고리의 다른 글
| 03. missingvalues (한글) (0) | 2018.10.09 |
|---|---|
| 03. missingvalues (0) | 2018.10.09 |
| 02. basicinfo (한글) (0) | 2018.10.08 |
| 02. basicinfo (0) | 2018.10.08 |
| 01. Constructors (한글) (0) | 2018.10.07 |
