딥스탯 2018. 10. 8. 20:00
02_basicinfo
In [1]:
using DataFrames # load package

Getting basic information about a data frame

Let's start by creating a DataFrame object, x, so that we can learn how to get information on that data frame.

In [2]:
x = DataFrame(A = [1, 2], B = [1.0, missing], C = ["a", "b"])
Out[2]:
ABC
Int64Float64⍰String
111.0a
22missingb

The standard size function works to get dimensions of the DataFrame,

In [3]:
size(x), size(x, 1), size(x, 2)
Out[3]:
((2, 3), 2, 3)

as well as nrow and ncol from R; length gives number of columns.

In [4]:
nrow(x), ncol(x), length(x)
Out[4]:
(2, 3, 3)

describe gives basic summary statistics of data in your DataFrame.

In [5]:
describe(x)
Out[5]:
variablemeanminmedianmaxnuniquenmissingeltype
SymbolUnion…AnyUnion…AnyUnion…Union…DataType
1A1.511.52Int64
2B1.01.01.01.01Float64
3Cab2String

Use showcols to get informaton about columns stored in a DataFrame.

In [6]:
showcols(x)
┌ Warning: `showcols(df::AbstractDataFrame, all::Bool=false, values::Bool=true)` is deprecated, use `describe(df, stats=[:eltype, :nmissing, :first, :last])` instead.
│   caller = showcols(::DataFrame) at deprecated.jl:54
└ @ DataFrames ./deprecated.jl:54
Out[6]:
variableeltypenmissingfirstlast
SymbolDataTypeUnion…AnyAny
1AInt6412
2BFloat6411.0missing
3CStringab

names will return the names of all columns,

In [7]:
names(x)
Out[7]:
3-element Array{Symbol,1}:
 :A
 :B
 :C

and eltypes returns their types.

In [8]:
eltypes(x)
Out[8]:
3-element Array{Type,1}:
 Int64                  
 Union{Missing, Float64}
 String                 

Here we create some large DataFrame

In [9]:
y = DataFrame(rand(1:10, 1000, 10));

and then we can use head to peek into its top rows

In [10]:
head(y)
Out[10]:
x1x2x3x4x5x6x7x8x9x10
Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64
171214351510
2102259621093
38108104744103
463481328710
5410231034343
675610679578

and tail to see its bottom rows.

In [11]:
tail(y, 3)
Out[11]:
x1x2x3x4x5x6x7x8x9x10
Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64
14187113987
210103210591109
36627873372

Most elementary get and set operations

Given the DataFrame, x, here are three ways to grab one of its columns as a Vector:

In [12]:
x[1], x[:A], x[:, 1]
┌ Warning: indexing with colon as row will create a copy in the future use df[col_inds] to get the columns without copying
│   caller = top-level scope at In[12]:1
└ @ Core In[12]:1
Out[12]:
([1, 2], [1, 2], [1, 2])

To grab one row as a DataFrame, we can index as follows.

In [13]:
x[1, :]
Out[13]:
ABC
Int64Float64⍰String
111.0a

We can grab a single cell or element with the same syntax to grab an element of an array.

In [14]:
x[1, 1]
Out[14]:
1

Assignment can be done in ranges to a scalar,

In [15]:
x[1:2, 1:2] = 1
x
Out[15]:
ABC
Int64Float64⍰String
111.0a
211.0b

to a vector of length equal to the number of assigned rows,

In [16]:
x[1:2, 1:2] = [1,2]
x
Out[16]:
ABC
Int64Float64⍰String
111.0a
222.0b

or to another data frame of matching size.

In [17]:
x[1:2, 1:2] = DataFrame([5 6; 7 8])
x
Out[17]:
ABC
Int64Float64⍰String
156.0a
278.0b