Julia is fast¶

출처¶

https://github.com/JuliaComputing/JuliaBoxTutorials/tree/master/introductory-tutorials/intro-to-julia (github : JuliaComputing/JuliaBoxTutorials/introductory-tutorials/intro-to-julia/)

Topics:

함수 sum의 정의
sum의 구현 및 벤치마킹
벤치마킹 결론

함께보기¶

http://deepstat.tistory.com/45 (01. Getting started)(in English)
http://deepstat.tistory.com/46 (01. Getting started(한글))
http://deepstat.tistory.com/47 (02. Strings)(in English)
http://deepstat.tistory.com/48 (02. Strings(한글))
http://deepstat.tistory.com/49 (03. Data structures)(in English)
http://deepstat.tistory.com/50 (03. Data structures(한글))
http://deepstat.tistory.com/51 (04. Loops)(in English)
http://deepstat.tistory.com/52 (04. Loops(한글))
http://deepstat.tistory.com/53 (05. Conditionals)(in English)
http://deepstat.tistory.com/54 (05. Conditionals(한글))
http://deepstat.tistory.com/55 (06. Functions)(in English)
http://deepstat.tistory.com/56 (06. Functions(한글))
http://deepstat.tistory.com/57 (07. Packages)(in English)
http://deepstat.tistory.com/58 (07. Packages(한글))
http://deepstat.tistory.com/59 (08. Plotting)(in English)
http://deepstat.tistory.com/60 (08. Plotting(한글))
http://deepstat.tistory.com/61 (09. Julia is fast)(in English)

종종, 벤치마크를 이용해서 언어들을 비교한다. 이런 벤치마크를 통해서, 벤치마킹 대상을 더 잘 파악하게되고, 무엇이 차이인지 알게된다.

이 notebook의 목적은 간단한 벤치마크를 보여주기 위함이다.

(이 자료는 MIT의 Steven Johnson의 훌륭한 강의로부터 시작되었다: https://github.com/stevengj/18S096/blob/master/lectures/lecture1/Boxes-and-registers.ipynb.)

함수 sum의 정의

sum: 이해하기 쉬운 연산¶

sum(a) 라는 숫자를 합하는 함수를 생각해보자. 이는 아래와 같은 수식으로 표현된다.$$sum(a) = \sum_{i=1}^n a_i$$ 단, $n$은 a의 길이이다.

a = rand(10^7)

10000000-element Array{Float64,1}:
 0.8764393962070791 
 0.53427827890005   
 0.4150486162292266 
 0.1622462878305566 
 0.49160883341727346
 0.4557914860681669 
 0.7135174388683116 
 0.12944909127630067
 0.3940247342295471 
 0.4140105339224238 
 0.4262468310919194 
 0.5840212988440989 
 0.9983949516598687 
 ⋮                  
 0.15353075451314924
 0.08236678960318766
 0.9855194225754351 
 0.7695038205642308 
 0.3111062654296637 
 0.40754803998149813
 0.1251454815669082 
 0.9868489650864021 
 0.6327599808807194 
 0.7865294384772965 
 0.6516479263353987 
 0.5068312780916264

sum(a)

4.999529889971698e6

각 원소가 평균 0.5인 분포에서 생성되는 난수이므로, 기대되는 결과는 0.5 * 10^7이다.

sum의 구현 및 벤치마킹

@time sum(a)

  0.008004 seconds (5 allocations: 176 bytes)

4.999529889971698e6

@time sum(a)

  0.010160 seconds (5 allocations: 176 bytes)

4.999529889971698e6

@time sum(a)

  0.008193 seconds (5 allocations: 176 bytes)

4.999529889971698e6

@time 매크로로부터 얻어지는 결과는 조금씩 다르기 때문에, 벤치마킹 하기에 최적의 선택은 아니다.

운 좋게도, Julia는 BenchmarkTools.jl 패키지가 있어서 쉽고 정확한 벤치마킹을 할 수 있다.

using Pkg
Pkg.add("BenchmarkTools")

  Updating registry at `~/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`
 Resolving package versions...
  Updating `~/.julia/environments/v1.0/Project.toml`
 [no changes]
  Updating `~/.julia/environments/v1.0/Manifest.toml`
 [no changes]

using BenchmarkTools

2.A C (직접 작성)

C는 보통 좋은 표준이라고 한다. 왜냐하면 사람에게 어렵고, 컴퓨터에게 좋은 언어이기 때문이다. 그럼에도 불구하고, C 사용자는 좋든 나쁘든 많은 종류의 최적화를 사용할 수 있다.

이 notebook을 만든 사람은 C에 대해서 말하려는 것도 아니고, 아래 코드를 읽을 것도 아니지만, Julia session에서 C 코드를 돌릴 수 있다는 것을 아는 것만으로도 충분하다. 참고로 """ 기호는 여러 줄의 문자를 넣으려고 쓰는 것이다.

using Libdl
C_code = """
#include <stddef.h>
double c_sum(size_t n, double *X) {
    double s = 0.0;
    for (size_t i = 0; i < n; ++i) {
        s += X[i];
    }
    return s;
}
"""

const Clib = tempname()   # 임시 파일을 만든다.

# gcc에 C_code를 넣어서 공유 라이브러리를 컴파일한다.
# (gcc가 설치돼 있을때만 작동한다.):

open(`gcc -fPIC -O3 -msse3 -xc -shared -o $(Clib * "." * Libdl.dlext) -`, "w") do f
    print(f, C_code) 
end

# C 함수를 불러오는 Julia 함수를 정의한다.
c_sum(X::Array{Float64}) = ccall(("c_sum", Clib), Float64, (Csize_t, Ptr{Float64}), length(X), X)

c_sum (generic function with 1 method)

c_sum(a)

4.999529889971053e6

c_sum(a) ≈ sum(a) # \approx 를 치고 <TAB>을 누르면 ≈ 기호를 쓸 수 있다.

true

c_sum(a) - sum(a)

-6.444752216339111e-7

≈  # `isapprox` 함수의 별명이라는 것을 알 수 있다.

isapprox (generic function with 8 methods)

?isapprox

search: isapprox

isapprox(x, y; rtol::Real=atol>0 ? 0 : √eps, atol::Real=0, nans::Bool=false, norm::Function)

jldoctest
julia> 0.1 ≈ (0.1 - 1e-10)
true

julia> isapprox(10, 11; atol = 2)
true

julia> isapprox([10.0^9, 1.0], [10.0^9, 2.0])
true

julia> 1e-10 ≈ 0
false

julia> isapprox(1e-10, 0, atol=1e-8)
true

이제 C 코드를 Julia에서 바로 벤치마크 할 수 있다.

c_bench = @benchmark c_sum($a)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     8.737 ms (0.00% GC)
  median time:      9.369 ms (0.00% GC)
  mean time:        11.719 ms (0.00% GC)
  maximum time:     25.851 ms (0.00% GC)
  --------------
  samples:          427
  evals/sample:     1

println("C: Fastest time was $(minimum(c_bench.times) / 1e6) msec")

C: Fastest time was 8.736966 msec

d = Dict()  # a "dictionary", i.e. an associative array
d["C"] = minimum(c_bench.times) / 1e6  # in milliseconds
d

Dict{Any,Any} with 1 entry:
  "C" => 8.73697

2.B C (-ffast-math 이용)

만일 C가 부동 소수점 연산을 재정렬하도록 허용하면 SIMD (single instruction, multiple data) instruction으로 벡터화될거다.

const Clib_fastmath = tempname()   # make a temporary file

# The same as above but with a -ffast-math flag added
open(`gcc -fPIC -O3 -msse3 -xc -shared -ffast-math -o $(Clib_fastmath * "." * Libdl.dlext) -`, "w") do f
    print(f, C_code) 
end

# define a Julia function that calls the C function:
c_sum_fastmath(X::Array{Float64}) = ccall(("c_sum", Clib_fastmath), Float64, (Csize_t, Ptr{Float64}), length(X), X)

c_sum_fastmath (generic function with 1 method)

c_fastmath_bench = @benchmark $c_sum_fastmath($a)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     5.579 ms (0.00% GC)
  median time:      5.793 ms (0.00% GC)
  mean time:        5.856 ms (0.00% GC)
  maximum time:     9.699 ms (0.00% GC)
  --------------
  samples:          853
  evals/sample:     1

d["C -ffast-math"] = minimum(c_fastmath_bench.times) / 1e6  # in milliseconds

5.578917

2.C python (내장함수)

PyCall 패키지를 이용해서 Python을 Julia에서 사용할 수 있다:

using Pkg; Pkg.add("PyCall")
using PyCall

 Resolving package versions...
  Updating `~/.julia/environments/v1.0/Project.toml`
 [no changes]
  Updating `~/.julia/environments/v1.0/Manifest.toml`
 [no changes]

# Python 내장 "sum"함수를 불러온다:
pysum = pybuiltin("sum")

PyObject <built-in function sum>

pysum(a)

4.999529889971053e6

pysum(a) ≈ sum(a)

true

py_list_bench = @benchmark $pysum($a)

BenchmarkTools.Trial: 
  memory estimate:  368 bytes
  allocs estimate:  8
  --------------
  minimum time:     925.317 ms (0.00% GC)
  median time:      937.152 ms (0.00% GC)
  mean time:        938.926 ms (0.00% GC)
  maximum time:     961.045 ms (0.00% GC)
  --------------
  samples:          6
  evals/sample:     1

d["Python 내장"] = minimum(py_list_bench.times) / 1e6
d

Dict{Any,Any} with 3 entries:
  "Python 내장"     => 925.317
  "C"             => 8.73697
  "C -ffast-math" => 5.57892

2.D python (numpy 이용)

하드웨어 "SIMD"를 활용하지만 작동 할 때만 작동한다.

numpy는 Python에서 호출 할 수 있는 최적화된 C 라이브러리다. 다음과 같이 Julia 내에 불러올 수 있다.

numpy_sum = pyimport("numpy")["sum"]

py_numpy_bench = @benchmark $numpy_sum($a)

BenchmarkTools.Trial: 
  memory estimate:  368 bytes
  allocs estimate:  8
  --------------
  minimum time:     5.346 ms (0.00% GC)
  median time:      5.432 ms (0.00% GC)
  mean time:        5.456 ms (0.00% GC)
  maximum time:     6.372 ms (0.00% GC)
  --------------
  samples:          915
  evals/sample:     1

numpy_sum(a)

4.999529889971708e6

numpy_sum(a) ≈ sum(a)

true

d["Python numpy"] = minimum(py_numpy_bench.times) / 1e6
d

Dict{Any,Any} with 4 entries:
  "Python 내장"     => 925.317
  "C"             => 8.73697
  "Python numpy"  => 5.34619
  "C -ffast-math" => 5.57892

2.E python (직접 작성)

py"""
def py_sum(A):
    s = 0.0
    for a in A:
        s += a
    return s
"""

sum_py = py"py_sum"

PyObject <function py_sum at 0x7f7c6c9b9ea0>

py_hand = @benchmark $sum_py($a)

BenchmarkTools.Trial: 
  memory estimate:  368 bytes
  allocs estimate:  8
  --------------
  minimum time:     1.018 s (0.00% GC)
  median time:      1.133 s (0.00% GC)
  mean time:        1.140 s (0.00% GC)
  maximum time:     1.278 s (0.00% GC)
  --------------
  samples:          5
  evals/sample:     1

sum_py(a)

4.999529889971053e6

sum_py(a) ≈ sum(a)

true

d["Python 직접 작성"] = minimum(py_hand.times) / 1e6
d

Dict{Any,Any} with 5 entries:
  "Python 내장"     => 925.317
  "C"             => 8.73697
  "Python numpy"  => 5.34619
  "Python 직접 작…   => 1017.68
  "C -ffast-math" => 5.57892

2.F Julia (내장함수)

C가 아니라 Julia로 바로 써졌다!

@which sum(a)

j_bench = @benchmark sum($a)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     5.333 ms (0.00% GC)
  median time:      5.497 ms (0.00% GC)
  mean time:        6.234 ms (0.00% GC)
  maximum time:     16.190 ms (0.00% GC)
  --------------
  samples:          801
  evals/sample:     1

d["Julia 내장"] = minimum(j_bench.times) / 1e6
d

Dict{Any,Any} with 6 entries:
  "Python 내장"     => 925.317
  "C"             => 8.73697
  "Python numpy"  => 5.34619
  "Python 직접 작…   => 1017.68
  "C -ffast-math" => 5.57892
  "Julia 내장"      => 5.33261

2.G Julia (직접 작성)

function mysum(A)   
    s = 0.0 # s = zero(eltype(a))
    for a in A
        s += a
    end
    s
end

mysum (generic function with 1 method)

j_bench_hand = @benchmark mysum($a)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     9.276 ms (0.00% GC)
  median time:      9.822 ms (0.00% GC)
  mean time:        10.908 ms (0.00% GC)
  maximum time:     24.930 ms (0.00% GC)
  --------------
  samples:          458
  evals/sample:     1

d["Julia 직접 작성"] = minimum(j_bench_hand.times) / 1e6
d

Dict{Any,Any} with 7 entries:
  "Python 내장"     => 925.317
  "C"             => 8.73697
  "Python numpy"  => 5.34619
  "Python 직접 작…   => 1017.68
  "Julia 직접 작…    => 9.27648
  "C -ffast-math" => 5.57892
  "Julia 내장"      => 5.33261

2.H Julia (SIMD 이용)

function mysum_simd(A)   
    s = 0.0 # s = zero(eltype(A))
    @simd for a in A
        s += a
    end
    s
end

mysum_simd (generic function with 1 method)

j_bench_hand_simd = @benchmark mysum_simd($a)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     5.306 ms (0.00% GC)
  median time:      5.438 ms (0.00% GC)
  mean time:        5.972 ms (0.00% GC)
  maximum time:     17.020 ms (0.00% GC)
  --------------
  samples:          836
  evals/sample:     1

mysum_simd(a)

4.9995298899717005e6

d["Julia SIMD이용"] = minimum(j_bench_hand_simd.times) / 1e6
d

Dict{Any,Any} with 8 entries:
  "Python 내장"     => 925.317
  "C"             => 8.73697
  "Python numpy"  => 5.34619
  "Python 직접 작…   => 1017.68
  "Julia 직접 작…    => 9.27648
  "Julia SIMD이용…  => 5.30645
  "C -ffast-math" => 5.57892
  "Julia 내장"      => 5.33261

벤치마킹 결론

for (key, value) in sort(collect(d), by=last)
    println(rpad(key, 25, "."), lpad(round(value; digits=2), 6, "."))
end

Julia SIMD이용...............5.31
Julia 내장...................5.33
Python numpy...............5.35
C -ffast-math..............5.58
C..........................8.74
Julia 직접 작성................9.28
Python 내장................925.32
Python 직접 작성.............1017.68

티스토리

09. Julia is fast (한글)