딥스탯 2018. 10. 1. 19:08
9_Julia_is_fast (한글)

Julia is fast

출처

https://github.com/JuliaComputing/JuliaBoxTutorials/tree/master/introductory-tutorials/intro-to-julia (github : JuliaComputing/JuliaBoxTutorials/introductory-tutorials/intro-to-julia/)

Topics:

  1. 함수 sum의 정의
  2. sum의 구현 및 벤치마킹
    1. C (직접 작성)
    2. C (-ffast-math 이용)
    3. python (내장함수)
    4. python (numpy 이용)
    5. python (직접 작성)
    6. Julia (내장함수)
    7. Julia (직접 작성)
    8. Julia (SIMD 이용)
  3. 벤치마킹 결론

함께보기

종종, 벤치마크를 이용해서 언어들을 비교한다. 이런 벤치마크를 통해서, 벤치마킹 대상을 더 잘 파악하게되고, 무엇이 차이인지 알게된다.

이 notebook의 목적은 간단한 벤치마크를 보여주기 위함이다.

(이 자료는 MIT의 Steven Johnson의 훌륭한 강의로부터 시작되었다: https://github.com/stevengj/18S096/blob/master/lectures/lecture1/Boxes-and-registers.ipynb.)

함수 sum의 정의

sum: 이해하기 쉬운 연산

sum(a) 라는 숫자를 합하는 함수를 생각해보자. 이는 아래와 같은 수식으로 표현된다.$$sum(a) = \sum_{i=1}^n a_i$$ 단, $n$은 a의 길이이다.

In [1]:
a = rand(10^7)
Out[1]:
10000000-element Array{Float64,1}:
 0.8764393962070791 
 0.53427827890005   
 0.4150486162292266 
 0.1622462878305566 
 0.49160883341727346
 0.4557914860681669 
 0.7135174388683116 
 0.12944909127630067
 0.3940247342295471 
 0.4140105339224238 
 0.4262468310919194 
 0.5840212988440989 
 0.9983949516598687 
 ⋮                  
 0.15353075451314924
 0.08236678960318766
 0.9855194225754351 
 0.7695038205642308 
 0.3111062654296637 
 0.40754803998149813
 0.1251454815669082 
 0.9868489650864021 
 0.6327599808807194 
 0.7865294384772965 
 0.6516479263353987 
 0.5068312780916264 
In [2]:
sum(a)
Out[2]:
4.999529889971698e6

각 원소가 평균 0.5인 분포에서 생성되는 난수이므로, 기대되는 결과는 0.5 * 10^7이다.

sum의 구현 및 벤치마킹

In [3]:
@time sum(a)
  0.008004 seconds (5 allocations: 176 bytes)
Out[3]:
4.999529889971698e6
In [4]:
@time sum(a)
  0.010160 seconds (5 allocations: 176 bytes)
Out[4]:
4.999529889971698e6
In [5]:
@time sum(a)
  0.008193 seconds (5 allocations: 176 bytes)
Out[5]:
4.999529889971698e6

@time 매크로로부터 얻어지는 결과는 조금씩 다르기 때문에, 벤치마킹 하기에 최적의 선택은 아니다.

운 좋게도, Julia는 BenchmarkTools.jl 패키지가 있어서 쉽고 정확한 벤치마킹을 할 수 있다.

In [6]:
using Pkg
Pkg.add("BenchmarkTools")
  Updating registry at `~/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`
 Resolving package versions...
  Updating `~/.julia/environments/v1.0/Project.toml`
 [no changes]
  Updating `~/.julia/environments/v1.0/Manifest.toml`
 [no changes]
In [7]:
using BenchmarkTools

2.A C (직접 작성)

C는 보통 좋은 표준이라고 한다. 왜냐하면 사람에게 어렵고, 컴퓨터에게 좋은 언어이기 때문이다. 그럼에도 불구하고, C 사용자는 좋든 나쁘든 많은 종류의 최적화를 사용할 수 있다.

이 notebook을 만든 사람은 C에 대해서 말하려는 것도 아니고, 아래 코드를 읽을 것도 아니지만, Julia session에서 C 코드를 돌릴 수 있다는 것을 아는 것만으로도 충분하다. 참고로 """ 기호는 여러 줄의 문자를 넣으려고 쓰는 것이다.

In [8]:
using Libdl
C_code = """
#include <stddef.h>
double c_sum(size_t n, double *X) {
    double s = 0.0;
    for (size_t i = 0; i < n; ++i) {
        s += X[i];
    }
    return s;
}
"""

const Clib = tempname()   # 임시 파일을 만든다.

# gcc에 C_code를 넣어서 공유 라이브러리를 컴파일한다.
# (gcc가 설치돼 있을때만 작동한다.):

open(`gcc -fPIC -O3 -msse3 -xc -shared -o $(Clib * "." * Libdl.dlext) -`, "w") do f
    print(f, C_code) 
end

# C 함수를 불러오는 Julia 함수를 정의한다.
c_sum(X::Array{Float64}) = ccall(("c_sum", Clib), Float64, (Csize_t, Ptr{Float64}), length(X), X)
Out[8]:
c_sum (generic function with 1 method)
In [9]:
c_sum(a)
Out[9]:
4.999529889971053e6
In [10]:
c_sum(a)  sum(a) # \approx 를 치고 <TAB>을 누르면 ≈ 기호를 쓸 수 있다.
Out[10]:
true
In [11]:
c_sum(a) - sum(a)
Out[11]:
-6.444752216339111e-7
In [12]:
  # `isapprox` 함수의 별명이라는 것을 알 수 있다.
Out[12]:
isapprox (generic function with 8 methods)
In [13]:
?isapprox
search: isapprox

Out[13]:
isapprox(x, y; rtol::Real=atol>0 ? 0 : √eps, atol::Real=0, nans::Bool=false, norm::Function)

Inexact equality comparison: true if norm(x-y) <= max(atol, rtol*max(norm(x), norm(y))). The default atol is zero and the default rtol depends on the types of x and y. The keyword argument nans determines whether or not NaN values are considered equal (defaults to false).

For real or complex floating-point values, if an atol > 0 is not specified, rtol defaults to the square root of eps of the type of x or y, whichever is bigger (least precise). This corresponds to requiring equality of about half of the significand digits. Otherwise, e.g. for integer arguments or if an atol > 0 is supplied, rtol defaults to zero.

x and y may also be arrays of numbers, in which case norm defaults to vecnorm but may be changed by passing a norm::Function keyword argument. (For numbers, norm is the same thing as abs.) When x and y are arrays, if norm(x-y) is not finite (i.e. ±Inf or NaN), the comparison falls back to checking whether all elements of x and y are approximately equal component-wise.

The binary operator is equivalent to isapprox with the default arguments, and x ≉ y is equivalent to !isapprox(x,y).

Note that x ≈ 0 (i.e., comparing to zero with the default tolerances) is equivalent to x == 0 since the default atol is 0. In such cases, you should either supply an appropriate atol (or use norm(x) ≤ atol) or rearrange your code (e.g. use x ≈ y rather than x - y ≈ 0). It is not possible to pick a nonzero atol automatically because it depends on the overall scaling (the "units") of your problem: for example, in x - y ≈ 0, atol=1e-9 is an absurdly small tolerance if x is the radius of the Earth in meters, but an absurdly large tolerance if x is the radius of a Hydrogen atom in meters.

Examples

jldoctest
julia> 0.1 ≈ (0.1 - 1e-10)
true

julia> isapprox(10, 11; atol = 2)
true

julia> isapprox([10.0^9, 1.0], [10.0^9, 2.0])
true

julia> 1e-10 ≈ 0
false

julia> isapprox(1e-10, 0, atol=1e-8)
true

이제 C 코드를 Julia에서 바로 벤치마크 할 수 있다.

In [14]:
c_bench = @benchmark c_sum($a)
Out[14]:
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     8.737 ms (0.00% GC)
  median time:      9.369 ms (0.00% GC)
  mean time:        11.719 ms (0.00% GC)
  maximum time:     25.851 ms (0.00% GC)
  --------------
  samples:          427
  evals/sample:     1
In [15]:
println("C: Fastest time was $(minimum(c_bench.times) / 1e6) msec")
C: Fastest time was 8.736966 msec
In [16]:
d = Dict()  # a "dictionary", i.e. an associative array
d["C"] = minimum(c_bench.times) / 1e6  # in milliseconds
d
Out[16]:
Dict{Any,Any} with 1 entry:
  "C" => 8.73697

2.B C (-ffast-math 이용)

만일 C가 부동 소수점 연산을 재정렬하도록 허용하면 SIMD (single instruction, multiple data) instruction으로 벡터화될거다.

In [17]:
const Clib_fastmath = tempname()   # make a temporary file

# The same as above but with a -ffast-math flag added
open(`gcc -fPIC -O3 -msse3 -xc -shared -ffast-math -o $(Clib_fastmath * "." * Libdl.dlext) -`, "w") do f
    print(f, C_code) 
end

# define a Julia function that calls the C function:
c_sum_fastmath(X::Array{Float64}) = ccall(("c_sum", Clib_fastmath), Float64, (Csize_t, Ptr{Float64}), length(X), X)
Out[17]:
c_sum_fastmath (generic function with 1 method)
In [18]:
c_fastmath_bench = @benchmark $c_sum_fastmath($a)
Out[18]:
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     5.579 ms (0.00% GC)
  median time:      5.793 ms (0.00% GC)
  mean time:        5.856 ms (0.00% GC)
  maximum time:     9.699 ms (0.00% GC)
  --------------
  samples:          853
  evals/sample:     1
In [19]:
d["C -ffast-math"] = minimum(c_fastmath_bench.times) / 1e6  # in milliseconds
Out[19]:
5.578917

2.C python (내장함수)

PyCall 패키지를 이용해서 Python을 Julia에서 사용할 수 있다:

In [20]:
using Pkg; Pkg.add("PyCall")
using PyCall
 Resolving package versions...
  Updating `~/.julia/environments/v1.0/Project.toml`
 [no changes]
  Updating `~/.julia/environments/v1.0/Manifest.toml`
 [no changes]
In [21]:
# Python 내장 "sum"함수를 불러온다:
pysum = pybuiltin("sum")
Out[21]:
PyObject <built-in function sum>
In [22]:
pysum(a)
Out[22]:
4.999529889971053e6
In [23]:
pysum(a)  sum(a)
Out[23]:
true
In [24]:
py_list_bench = @benchmark $pysum($a)
Out[24]:
BenchmarkTools.Trial: 
  memory estimate:  368 bytes
  allocs estimate:  8
  --------------
  minimum time:     925.317 ms (0.00% GC)
  median time:      937.152 ms (0.00% GC)
  mean time:        938.926 ms (0.00% GC)
  maximum time:     961.045 ms (0.00% GC)
  --------------
  samples:          6
  evals/sample:     1
In [25]:
d["Python 내장"] = minimum(py_list_bench.times) / 1e6
d
Out[25]:
Dict{Any,Any} with 3 entries:
  "Python 내장"     => 925.317
  "C"             => 8.73697
  "C -ffast-math" => 5.57892

2.D python (numpy 이용)

하드웨어 "SIMD"를 활용하지만 작동 할 때만 작동한다.

numpy는 Python에서 호출 할 수 있는 최적화된 C 라이브러리다. 다음과 같이 Julia 내에 불러올 수 있다.

In [26]:
numpy_sum = pyimport("numpy")["sum"]

py_numpy_bench = @benchmark $numpy_sum($a)
Out[26]:
BenchmarkTools.Trial: 
  memory estimate:  368 bytes
  allocs estimate:  8
  --------------
  minimum time:     5.346 ms (0.00% GC)
  median time:      5.432 ms (0.00% GC)
  mean time:        5.456 ms (0.00% GC)
  maximum time:     6.372 ms (0.00% GC)
  --------------
  samples:          915
  evals/sample:     1
In [27]:
numpy_sum(a)
Out[27]:
4.999529889971708e6
In [28]:
numpy_sum(a)  sum(a)
Out[28]:
true
In [29]:
d["Python numpy"] = minimum(py_numpy_bench.times) / 1e6
d
Out[29]:
Dict{Any,Any} with 4 entries:
  "Python 내장"     => 925.317
  "C"             => 8.73697
  "Python numpy"  => 5.34619
  "C -ffast-math" => 5.57892

2.E python (직접 작성)

In [30]:
py"""
def py_sum(A):
    s = 0.0
    for a in A:
        s += a
    return s
"""

sum_py = py"py_sum"
Out[30]:
PyObject <function py_sum at 0x7f7c6c9b9ea0>
In [31]:
py_hand = @benchmark $sum_py($a)
Out[31]:
BenchmarkTools.Trial: 
  memory estimate:  368 bytes
  allocs estimate:  8
  --------------
  minimum time:     1.018 s (0.00% GC)
  median time:      1.133 s (0.00% GC)
  mean time:        1.140 s (0.00% GC)
  maximum time:     1.278 s (0.00% GC)
  --------------
  samples:          5
  evals/sample:     1
In [32]:
sum_py(a)
Out[32]:
4.999529889971053e6
In [33]:
sum_py(a)  sum(a)
Out[33]:
true
In [34]:
d["Python 직접 작성"] = minimum(py_hand.times) / 1e6
d
Out[34]:
Dict{Any,Any} with 5 entries:
  "Python 내장"     => 925.317
  "C"             => 8.73697
  "Python numpy"  => 5.34619
  "Python 직접 작…   => 1017.68
  "C -ffast-math" => 5.57892

2.F Julia (내장함수)

C가 아니라 Julia로 바로 써졌다!

In [35]:
@which sum(a)
Out[35]:
sum(a::AbstractArray) in Base at reducedim.jl:645
In [36]:
j_bench = @benchmark sum($a)
Out[36]:
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     5.333 ms (0.00% GC)
  median time:      5.497 ms (0.00% GC)
  mean time:        6.234 ms (0.00% GC)
  maximum time:     16.190 ms (0.00% GC)
  --------------
  samples:          801
  evals/sample:     1
In [37]:
d["Julia 내장"] = minimum(j_bench.times) / 1e6
d
Out[37]:
Dict{Any,Any} with 6 entries:
  "Python 내장"     => 925.317
  "C"             => 8.73697
  "Python numpy"  => 5.34619
  "Python 직접 작…   => 1017.68
  "C -ffast-math" => 5.57892
  "Julia 내장"      => 5.33261

2.G Julia (직접 작성)

In [38]:
function mysum(A)   
    s = 0.0 # s = zero(eltype(a))
    for a in A
        s += a
    end
    s
end
Out[38]:
mysum (generic function with 1 method)
In [39]:
j_bench_hand = @benchmark mysum($a)
Out[39]:
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     9.276 ms (0.00% GC)
  median time:      9.822 ms (0.00% GC)
  mean time:        10.908 ms (0.00% GC)
  maximum time:     24.930 ms (0.00% GC)
  --------------
  samples:          458
  evals/sample:     1
In [40]:
d["Julia 직접 작성"] = minimum(j_bench_hand.times) / 1e6
d
Out[40]:
Dict{Any,Any} with 7 entries:
  "Python 내장"     => 925.317
  "C"             => 8.73697
  "Python numpy"  => 5.34619
  "Python 직접 작…   => 1017.68
  "Julia 직접 작…    => 9.27648
  "C -ffast-math" => 5.57892
  "Julia 내장"      => 5.33261

2.H Julia (SIMD 이용)

In [41]:
function mysum_simd(A)   
    s = 0.0 # s = zero(eltype(A))
    @simd for a in A
        s += a
    end
    s
end
Out[41]:
mysum_simd (generic function with 1 method)
In [42]:
j_bench_hand_simd = @benchmark mysum_simd($a)
Out[42]:
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     5.306 ms (0.00% GC)
  median time:      5.438 ms (0.00% GC)
  mean time:        5.972 ms (0.00% GC)
  maximum time:     17.020 ms (0.00% GC)
  --------------
  samples:          836
  evals/sample:     1
In [43]:
mysum_simd(a)
Out[43]:
4.9995298899717005e6
In [44]:
d["Julia SIMD이용"] = minimum(j_bench_hand_simd.times) / 1e6
d
Out[44]:
Dict{Any,Any} with 8 entries:
  "Python 내장"     => 925.317
  "C"             => 8.73697
  "Python numpy"  => 5.34619
  "Python 직접 작…   => 1017.68
  "Julia 직접 작…    => 9.27648
  "Julia SIMD이용…  => 5.30645
  "C -ffast-math" => 5.57892
  "Julia 내장"      => 5.33261

벤치마킹 결론

In [45]:
for (key, value) in sort(collect(d), by=last)
    println(rpad(key, 25, "."), lpad(round(value; digits=2), 6, "."))
end
Julia SIMD이용...............5.31
Julia 내장...................5.33
Python numpy...............5.35
C -ffast-math..............5.58
C..........................8.74
Julia 직접 작성................9.28
Python 내장................925.32
Python 직접 작성.............1017.68