[python] Project Euler와의 속도 비교 : C vs Python vs Erlang vs Haskell

Project Euler의 문제 # 12 를 프로그래밍 연습으로 취하고 C, Python, Erlang 및 Haskell의 ( 실제로는 최적이 아닌) 구현을 비교했습니다. 더 높은 실행 시간을 얻기 위해 원래 문제에서 설명한대로 500 대신 1000이 아닌 제수가있는 첫 번째 삼각형 수를 검색합니다.

결과는 다음과 같습니다.

씨:

lorenzo@enzo:~/erlang$ gcc -lm -o euler12.bin euler12.c
lorenzo@enzo:~/erlang$ time ./euler12.bin
842161320

real    0m11.074s
user    0m11.070s
sys 0m0.000s

파이썬 :

lorenzo@enzo:~/erlang$ time ./euler12.py
842161320

real    1m16.632s
user    1m16.370s
sys 0m0.250s

PyPy가 포함 된 Python :

lorenzo@enzo:~/Downloads/pypy-c-jit-43780-b590cf6de419-linux64/bin$ time ./pypy /home/lorenzo/erlang/euler12.py
842161320

real    0m13.082s
user    0m13.050s
sys 0m0.020s

얼랭 :

lorenzo@enzo:~/erlang$ erlc euler12.erl
lorenzo@enzo:~/erlang$ time erl -s euler12 solve
Erlang R13B03 (erts-5.7.4) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.7.4  (abort with ^G)
1> 842161320

real    0m48.259s
user    0m48.070s
sys 0m0.020s

하스켈 :

lorenzo@enzo:~/erlang$ ghc euler12.hs -o euler12.hsx
[1 of 1] Compiling Main             ( euler12.hs, euler12.o )
Linking euler12.hsx ...
lorenzo@enzo:~/erlang$ time ./euler12.hsx
842161320

real    2m37.326s
user    2m37.240s
sys 0m0.080s

요약:

C : 100 %
Python : 692 % (PyPy 사용시 118 %)
얼랭 : 436 % (RichardC 덕분에 135 %)
하스켈 : 1421 %

C는 계산에 오래 사용하고 다른 세 정수는 임의의 길이 정수가 아니기 때문에 큰 이점이 있다고 가정합니다. 또한 런타임을 먼저로드 할 필요가 없습니다 (다른 작업을 수행합니까?).

질문 1 :
Erlang, Python 및 Haskell은 임의의 길이 정수를 사용하여 속도를 잃 MAXINT습니까?

질문 2 :
Haskell은 왜 그렇게 느립니까? 브레이크를 끄는 컴파일러 플래그가 있습니까, 아니면 내 구현입니까? (후자는 Haskell이 나에게 7 개의 인장이있는 책이므로 가능성이 높다.)

질문 3 :
요소를 결정하는 방식을 변경하지 않고 이러한 구현을 최적화하는 방법에 대한 힌트를 제공 할 수 있습니까? 어떤 방식 으로든 최적화 : 언어에 대해 더 좋고 빠르며 “네이티브”합니다.

편집하다:

질문 4 :
기능 구현에서 LCO (마지막 호출 최적화, 일명 테일 재귀 제거)를 허용하므로 호출 스택에 불필요한 프레임을 추가하지 않습니까?

나는 Haskell과 Erlang 지식이 매우 제한되어 있음을 인정해야하지만 4 가지 언어에서 가능한 한 비슷한 알고리즘을 구현하려고했습니다.

사용 된 소스 코드 :

#include <stdio.h>
#include <math.h>

int factorCount (long n)
{
    double square = sqrt (n);
    int isquare = (int) square;
    int count = isquare == square ? -1 : 0;
    long candidate;
    for (candidate = 1; candidate <= isquare; candidate ++)
        if (0 == n % candidate) count += 2;
    return count;
}

int main ()
{
    long triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index ++;
        triangle += index;
    }
    printf ("%ld\n", triangle);
}

#! /usr/bin/env python3.2

import math

def factorCount (n):
    square = math.sqrt (n)
    isquare = int (square)
    count = -1 if isquare == square else 0
    for candidate in range (1, isquare + 1):
        if not n % candidate: count += 2
    return count

triangle = 1
index = 1
while factorCount (triangle) < 1001:
    index += 1
    triangle += index

print (triangle)

-module (euler12).
-compile (export_all).

factorCount (Number) -> factorCount (Number, math:sqrt (Number), 1, 0).

factorCount (_, Sqrt, Candidate, Count) when Candidate > Sqrt -> Count;

factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;

factorCount (Number, Sqrt, Candidate, Count) ->
    case Number rem Candidate of
        0 -> factorCount (Number, Sqrt, Candidate + 1, Count + 2);
        _ -> factorCount (Number, Sqrt, Candidate + 1, Count)
    end.

nextTriangle (Index, Triangle) ->
    Count = factorCount (Triangle),
    if
        Count > 1000 -> Triangle;
        true -> nextTriangle (Index + 1, Triangle + Index + 1)
    end.

solve () ->
    io:format ("~p~n", [nextTriangle (1, 1) ] ),
    halt (0).

factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
    where square = sqrt $ fromIntegral number
          isquare = floor square

factorCount' number sqrt candidate count
    | fromIntegral candidate > sqrt = count
    | number `mod` candidate == 0 = factorCount' number sqrt (candidate + 1) (count + 2)
    | otherwise = factorCount' number sqrt (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

답변

사용하여 GHC 7.0.3, gcc 4.4.6, Linux 2.6.29, x86_64의 코어 2 듀오 (2.5GHz의) 시스템에서 사용 컴파일 ghc -O2 -fllvm -fforce-recomp하스켈과 gcc -O3 -lmC.에 대한

C 루틴은 8.4 초 만에 실행됩니다 (으로 인해 달리는 것보다 빠름 -O3)
Haskell 솔루션은 -O2플래그 로 인해 36 초 안에 실행됩니다.
귀하의 factorCount'코드는 명시 적으로 입력되지 않았으며 기본값으로 설정되지 않았습니다 Integer(여기서 오진 진단을 해준 Daniel에게 감사드립니다!). 사용하여 명시 적 유형 서명 (어쨌든 표준 연습)을 제공 Int하고 시간이 11.1 초로 변경됩니다.
에 factorCount'당신 불필요하게 불렀다 fromIntegral. 그래도 수정 사항은 변경되지 않습니다 (컴파일러는 똑똑하고 운이 좋습니다).
더 빠르고 충분한 mod곳을 사용했습니다 rem. 시간이 8.5 초로 변경됩니다 .
factorCount'절대 변하지 않는 두 개의 추가 인수를 지속적으로 적용하고 있습니다 ( number, sqrt). 작업자 / 래퍼 변환은 다음을 제공합니다.

 $ time ./so
 842161320

 real    0m7.954s
 user    0m7.944s
 sys     0m0.004s

맞습니다, 7.95 초 . C 솔루션보다 0.5 초 빠르게 일관 됩니다. -fllvm플래그가 없으면 여전히 8.182 seconds이므로 NCG 백엔드 도이 경우에도 잘 수행됩니다.

결론 : 하스켈은 대단합니다.

결과 코드

factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
    where square = sqrt $ fromIntegral number
          isquare = floor square

factorCount' :: Int -> Int -> Int -> Int -> Int
factorCount' number sqrt candidate0 count0 = go candidate0 count0
  where
  go candidate count
    | candidate > sqrt = count
    | number `rem` candidate == 0 = go (candidate + 1) (count + 2)
    | otherwise = go (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

편집 : 이제 우리는 그것을 탐구 했으므로 질문을 다루겠습니다.

질문 1 : 임의의 길이 정수를 사용하여 erlang, python 및 haskell이 속도를 잃습니까?

Haskell에서는 사용 Integer속도가 느리지 Int만 수행 속도는 계산 속도에 따라 다릅니다. 운 좋게도 (64 비트 시스템의 경우) Int충분합니다. 이식성을 위해서 당신은 아마 사용하려면 코드를 다시 작성해야 Int64또는 Word64(C는 유일한 언어가 아닙니다 long).

질문 2 : 왜 하스켈이 그렇게 느립니까? 브레이크를 끄는 컴파일러 플래그가 있습니까, 아니면 내 구현입니까? (하스켈은 나에게 7 개의 인장이있는 책이기 때문에 후자는 상당히 가능성이 높다.)

질문 3 : 요소를 결정하는 방식을 변경하지 않고 이러한 구현을 최적화하는 방법에 대한 힌트를 제공 할 수 있습니까? 어떤 방식 으로든 최적화 : 언어에 대해 더 좋고 빠르며 “네이티브”합니다.

그것이 내가 위에서 대답 한 것입니다. 대답은

0) 통해 최적화 사용 -O2
1) 가능하면 빠른 (특히 : unboxable) 유형을 사용하십시오.
2) rem하지 않음 mod(자주 잊어 버린 최적화)
3) 작업자 / 래퍼 변환 (아마도 가장 일반적인 최적화).

질문 4 : 기능 구현으로 LCO가 허용되므로 호출 스택에 불필요한 프레임을 추가하지 않습니까?

예, 그게 문제가 아니 었습니다. 잘하고 당신이 이것을 고려 기쁘게 생각합니다.

답변

Erlang 구현에는 몇 가지 문제점이 있습니다. 다음 기준으로 수정되지 않은 Erlang 프로그램의 측정 실행 시간은 47.6 초로 C 코드의 12.7 초와 비교되었습니다.

계산 집약적 인 Erlang 코드를 실행하려면 가장 먼저해야 할 일은 기본 코드를 사용하는 것입니다. 컴파일 erlc +native euler12하면 시간이 41.3 초로 줄었습니다. 그러나 이것은 이런 종류의 코드에 대한 네이티브 컴파일에서 예상되는 것보다 훨씬 느린 속도 향상 (15 %)이며 문제는을 사용하는 것입니다 -compile(export_all). 이것은 실험에 유용하지만 외부에서 모든 기능에 도달 할 수 있다는 사실은 원시 컴파일러를 매우 보수적으로 만듭니다. 일반적인 BEAM 에뮬레이터는 그다지 영향을받지 않습니다.이 선언을 -export([solve/0]).바꾸면 31.5 초 (기준선에서 거의 35 %)의 속도가 훨씬 향상됩니다.

그러나 코드 자체에는 문제가 있습니다. factorCount 루프의 각 반복 에 대해 다음 테스트를 수행하십시오.

factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;

C 코드는 이것을하지 않습니다. 일반적으로 동일한 코드의 다른 구현간에, 특히 알고리즘이 수치 인 경우, 실제로 동일한 작업을 수행하고 있는지 확인해야하기 때문에 공정하게 비교하는 것은 까다로울 수 있습니다. 어딘가에 어떤 타입 캐스트로 인해 한 구현에서 약간의 반올림 오류가 발생하면 둘 다 동일한 결과에 도달하더라도 다른 것보다 더 많은 반복을 수행 할 수 있습니다.

이 가능한 오류 소스를 제거하고 각 반복에서 추가 테스트를 제거하기 위해 C 코드에서 자세히 모델링 한 다음과 같이 factorCount 함수를 다시 작성했습니다.

factorCount (N) ->
    Sqrt = math:sqrt (N),
    ISqrt = trunc(Sqrt),
    if ISqrt == Sqrt -> factorCount (N, ISqrt, 1, -1);
       true          -> factorCount (N, ISqrt, 1, 0)
    end.

factorCount (_N, ISqrt, Candidate, Count) when Candidate > ISqrt -> Count;
factorCount ( N, ISqrt, Candidate, Count) ->
    case N rem Candidate of
        0 -> factorCount (N, ISqrt, Candidate + 1, Count + 2);
        _ -> factorCount (N, ISqrt, Candidate + 1, Count)
    end.

이 재 작성 no export_all및 기본 컴파일은 다음 런타임을 제공했습니다.

$ erlc +native euler12.erl
$ time erl -noshell -s euler12 solve
842161320

real    0m19.468s
user    0m19.450s
sys 0m0.010s

C 코드에 비해 그리 나쁘지 않습니다.

$ time ./a.out
842161320

real    0m12.755s
user    0m12.730s
sys 0m0.020s

Erlang이 숫자 코드를 작성하는 데 전혀 적합하지 않다는 것을 고려할 때, 이와 같은 프로그램에서 C보다 50 % 느리다는 것은 꽤 좋습니다.

마지막으로 귀하의 질문에 대해

질문 1 : 임의의 길이 정수를 사용하여 erlang, python 및 haskell 속도가 느리거나 값이 MAXINT보다 작지 않으면 속도가 느립니까?

예, 다 소요 Erlang에는 “wrap-around와 함께 32/64 비트 산술 사용”이라고 말하는 방법이 없으므로, 컴파일러가 정수에 대한 일부 한계를 증명할 수 없으면 (그리고 일반적으로 할 수없는 경우) 모든 계산을 확인해야합니다. 단일 태그 단어에 들어갈 수 있거나 힙 할당 큰 숫자로 바꾸어야하는 경우. 실제로 런타임에 큰 숫자를 사용하지 않더라도 이러한 검사를 수행해야합니다. 반면에, 그 의미는 당신이 알고 갑자기 이전보다 그에게 더 큰 입력을 주면 알고리즘 때문에 예기치 않은 정수 랩 어라운드 실패하지 않을 것입니다.

질문 4 : 기능 구현으로 LCO가 허용되므로 호출 스택에 불필요한 프레임을 추가하지 않습니까?

예, Erlang 코드는 마지막 통화 최적화와 관련하여 정확합니다.

답변

Python 최적화와 관련하여 PyPy를 사용하는 것 외에도 (코드를 전혀 변경하지 않고 상당히 인상적인 속도 향상을 위해) PyPy의 번역 도구 체인 을 사용하여 RPython 호환 버전을 컴파일하거나 Cython 에서 확장 모듈을 빌드 할 수 있습니다. Cython 모듈은 거의 두 배 빠른 내 테스트에서 C 버전보다 빠릅니다 . 참고로 C 및 PyPy 벤치 마크 결과도 포함합니다.

C (로 컴파일 됨 gcc -O3 -lm)

% time ./euler12-c
842161320

./euler12-c  11.95s
 user 0.00s
 system 99%
 cpu 11.959 total

파이 파이 1.5

% time pypy euler12.py
842161320
pypy euler12.py
16.44s user
0.01s system
99% cpu 16.449 total

RPython (최신 PyPy 개정판 사용 c2f583445aee)

% time ./euler12-rpython-c
842161320
./euler12-rpy-c
10.54s user 0.00s
system 99%
cpu 10.540 total

Cython 0.15

% time python euler12-cython.py
842161320
python euler12-cython.py
6.27s user 0.00s
system 99%
cpu 6.274 total

RPython 버전에는 몇 가지 주요 변경 사항이 있습니다. 독립형 프로그램으로 변환하려면을 정의해야 합니다. target이 경우 main함수입니다. sys.argv인수로만 받아 들일 것으로 예상되며 int를 반환해야합니다. translate.py를 사용하여 % translate.py euler12-rpython.py번역하면 C로 변환되고 컴파일됩니다.

# euler12-rpython.py

import math, sys

def factorCount(n):
    square = math.sqrt(n)
    isquare = int(square)
    count = -1 if isquare == square else 0
    for candidate in xrange(1, isquare + 1):
        if not n % candidate: count += 2
    return count

def main(argv):
    triangle = 1
    index = 1
    while factorCount(triangle) < 1001:
        index += 1
        triangle += index
    print triangle
    return 0

if __name__ == '__main__':
    main(sys.argv)

def target(*args):
    return main, None

Cython 버전은 확장 _euler12.pyx파이썬 모듈로 다시 작성되었으며 , 일반적인 파이썬 파일에서 가져 와서 호출합니다. 는 _euler12.pyx기본적으로 몇 가지 추가 정적 유형 선언으로, 버전과 동일합니다. setup.py에는을 사용하여 확장을 빌드하는 일반적인 상용구가 python setup.py build_ext --inplace있습니다.

# _euler12.pyx
from libc.math cimport sqrt

cdef int factorCount(int n):
    cdef int candidate, isquare, count
    cdef double square
    square = sqrt(n)
    isquare = int(square)
    count = -1 if isquare == square else 0
    for candidate in range(1, isquare + 1):
        if not n % candidate: count += 2
    return count

cpdef main():
    cdef int triangle = 1, index = 1
    while factorCount(triangle) < 1001:
        index += 1
        triangle += index
    print triangle

# euler12-cython.py
import _euler12
_euler12.main()

# setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

ext_modules = [Extension("_euler12", ["_euler12.pyx"])]

setup(
  name = 'Euler12-Cython',
  cmdclass = {'build_ext': build_ext},
  ext_modules = ext_modules
)

나는 솔직히 RPython이나 Cython에 대한 경험이 거의 없으며 결과에 즐겁게 놀랐습니다. CPython을 사용하는 경우 Cython 확장 모듈에서 CPU를 많이 사용하는 코드 비트를 작성하면 프로그램을 최적화하는 쉬운 방법 인 것 같습니다.

답변

질문 3 : 요소를 결정하는 방식을 변경하지 않고 이러한 구현을 최적화하는 방법에 대한 힌트를 제공 할 수 있습니까? 어떤 방식 으로든 최적화 : 언어에 대해 더 좋고 빠르며 “네이티브”합니다.

C 구현은 차선책이며 (Thomas M. DuBuisson이 암시 한 바와 같이) 버전은 64 비트 정수 (예 : 긴 데이터 유형)를 사용합니다. 나중에 어셈블리 목록을 조사 할 것이지만, 교육을받은 추측으로 컴파일 된 코드에서 일부 메모리 액세스가 진행되어 64 비트 정수 사용이 상당히 느려집니다. 그것은 코드이거나 생성 된 코드입니다 (SSE 레지스터에 64 비트 정수를 적게 넣거나 64 비트 정수로 반올림 할 수 있다는 사실이 느리다는 것입니다).

다음은 수정 된 코드입니다 ( 길이 를 int로 간단히 바꾸고 명시 적으로 인라인 된 factorCount를 사용하지만 gcc -O3에서는 필요하다고 생각하지 않습니다).

#include <stdio.h>
#include <math.h>

static inline int factorCount(int n)
{
    double square = sqrt (n);
    int isquare = (int)square;
    int count = isquare == square ? -1 : 0;
    int candidate;
    for (candidate = 1; candidate <= isquare; candidate ++)
        if (0 == n % candidate) count += 2;
    return count;
}

int main ()
{
    int triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index++;
        triangle += index;
    }
    printf ("%d\n", triangle);
}

실행 + 타이밍 :

$ gcc -O3 -lm -o euler12 euler12.c; time ./euler12
842161320
./euler12  2.95s user 0.00s system 99% cpu 2.956 total

참고로 이전 답변에서 Thomas의 haskell 구현은 다음과 같습니다.

$ ghc -O2 -fllvm -fforce-recomp euler12.hs; time ./euler12                                                                                      [9:40]
[1 of 1] Compiling Main             ( euler12.hs, euler12.o )
Linking euler12 ...
842161320
./euler12  9.43s user 0.13s system 99% cpu 9.602 total

결론 : 훌륭한 컴파일러 인 ghc에서 아무것도 빼앗지 않지만 gcc는 일반적으로 더 빠른 코드를 생성합니다.

답변

이 블로그를 살펴보십시오 . 지난 1 년 동안 그는 Haskell과 Python에서 몇 가지 프로젝트 오일러 문제를 수행했으며 일반적으로 Haskell 이 훨씬 빠릅니다. 나는 그 언어들 사이에서 유창함과 코딩 스타일과 더 관련이 있다고 생각합니다.

파이썬 속도와 관련하여 잘못된 구현을 사용하고 있습니다! PyPy를 사용해보십시오. 이와 같은 것들이 훨씬 빠릅니다.

답변

Haskell 패키지의 일부 기능을 사용하여 Haskell 구현을 크게 가속화 할 수 있습니다. 이 경우에는 ‘cabal install primes’와 함께 설치된 소수를 사용했습니다.

import Data.Numbers.Primes
import Data.List

triangleNumbers = scanl1 (+) [1..]
nDivisors n = product $ map ((+1) . length) (group (primeFactors n))
answer = head $ filter ((> 500) . nDivisors) triangleNumbers

main :: IO ()
main = putStrLn $ "First triangle number to have over 500 divisors: " ++ (show answer)

타이밍 :

원래 프로그램 :

PS> measure-command { bin\012_slow.exe }

TotalSeconds      : 16.3807409
TotalMilliseconds : 16380.7409

개선 된 구현

PS> measure-command { bin\012.exe }

TotalSeconds      : 0.0383436
TotalMilliseconds : 38.3436

보시다시피,이 기계는 16 초 동안 실행 된 동일한 컴퓨터에서 38 밀리 초 안에 실행됩니다

컴파일 명령 :

ghc -O2 012.hs -o bin\012.exe
ghc -O2 012_slow.hs -o bin\012_slow.exe

답변

재미로. 다음은보다 ‘기본’Haskell 구현입니다.

import Control.Applicative
import Control.Monad
import Data.Either
import Math.NumberTheory.Powers.Squares

isInt :: RealFrac c => c -> Bool
isInt = (==) <$> id <*> fromInteger . round

intSqrt :: (Integral a) => a -> Int
--intSqrt = fromIntegral . floor . sqrt . fromIntegral
intSqrt = fromIntegral . integerSquareRoot'

factorize :: Int -> [Int]
factorize 1 = []
factorize n = first : factorize (quot n first)
  where first = (!! 0) $ [a | a <- [2..intSqrt n], rem n a == 0] ++ [n]

factorize2 :: Int -> [(Int,Int)]
factorize2 = foldl (\ls@((val,freq):xs) y -> if val == y then (val,freq+1):xs else (y,1):ls) [(0,0)] . factorize

numDivisors :: Int -> Int
numDivisors = foldl (\acc (_,y) -> acc * (y+1)) 1 <$> factorize2

nextTriangleNumber :: (Int,Int) -> (Int,Int)
nextTriangleNumber (n,acc) = (n+1,acc+n+1)

forward :: Int -> (Int, Int) -> Either (Int, Int) (Int, Int)
forward k val@(n,acc) = if numDivisors acc > k then Left val else Right (nextTriangleNumber val)

problem12 :: Int -> (Int, Int)
problem12 n = (!!0) . lefts . scanl (>>=) (forward n (1,1)) . repeat . forward $ n

main = do
  let (n,val) = problem12 1000
  print val

를 사용하면 ghc -O3내 컴퓨터 (1.73GHz Core i7)에서 0.55-0.58 초 동안 일관되게 실행됩니다.

C 버전에 대한보다 효율적인 factorCount 함수 :

int factorCount (int n)
{
  int count = 1;
  int candidate,tmpCount;
  while (n % 2 == 0) {
    count++;
    n /= 2;
  }
    for (candidate = 3; candidate < n && candidate * candidate < n; candidate += 2)
    if (n % candidate == 0) {
      tmpCount = 1;
      do {
        tmpCount++;
        n /= candidate;
      } while (n % candidate == 0);
       count*=tmpCount;
      }
  if (n > 1)
    count *= 2;
  return count;
}

를 사용하여 main에서 long을 int로 변경하면 gcc -O3 -lm0.31-0.35 초 동안 일관되게 실행됩니다.

n 번째 삼각형 수 = n * (n + 1) / 2이고, n과 (n + 1)이 완전히 다른 소수 분해를 가지고 있다는 사실을 이용하면 둘 다 더 빠르게 실행되도록 할 수 있습니다. 전체의 요인 수를 구하기 위해 각 절반의 곱셈을 곱할 수 있습니다. 다음과 같은:

int main ()
{
  int triangle = 0,count1,count2 = 1;
  do {
    count1 = count2;
    count2 = ++triangle % 2 == 0 ? factorCount(triangle+1) : factorCount((triangle+1)/2);
  } while (count1*count2 < 1001);
  printf ("%lld\n", ((long long)triangle)*(triangle+1)/2);
}

c 코드 실행 시간을 0.17-0.19 초로 줄이고 훨씬 더 큰 검색을 처리 할 수 있으며 10000 개 이상의 요소는 내 컴퓨터에서 약 43 초가 걸립니다. 나는 관심있는 독자들에게 비슷한 속력을 남긴다.