[sql] SQL Server에서 누계 계산

다음 표 ( TestTable)를 상상해보십시오 .

id     somedate    somevalue
--     --------    ---------
45     01/Jan/09   3
23     08/Jan/09   5
12     02/Feb/09   0
77     14/Feb/09   7
39     20/Feb/09   34
33     02/Mar/09   6

다음과 같이 누적 합계를 날짜 순서로 반환하는 쿼리를 원합니다.

id     somedate    somevalue  runningtotal
--     --------    ---------  ------------
45     01/Jan/09   3          3
23     08/Jan/09   5          8
12     02/Feb/09   0          8
77     14/Feb/09   7          15
39     20/Feb/09   34         49
33     02/Mar/09   6          55

SQL Server 2000/2005/2008 에는 다양한 방법이 있습니다.

특히 집합 집합 문을 사용하는 이런 종류의 방법에 관심이 있습니다.

INSERT INTO @AnotherTbl(id, somedate, somevalue, runningtotal)
   SELECT id, somedate, somevalue, null
   FROM TestTable
   ORDER BY somedate

DECLARE @RunningTotal int
SET @RunningTotal = 0

UPDATE @AnotherTbl
SET @RunningTotal = runningtotal = @RunningTotal + somevalue
FROM @AnotherTbl

… 이것은 매우 효율적이지만 UPDATE명령문이 올바른 순서로 행을 처리 한다고 반드시 보장 할 수는 없기 때문에이 문제가 있다고 들었습니다 . 아마도 우리는 그 문제에 대한 확실한 대답을 얻을 수있을 것입니다.

그러나 사람들이 제안 할 수있는 다른 방법이 있습니까?

편집 : 이제 설정 및 위의 ‘업데이트 트릭’예가 있는 SqlFiddle

답변

SQL Server 2012를 실행중인 경우 업데이트 : https://stackoverflow.com/a/10309947

문제는 Over 절의 SQL Server 구현이 다소 제한적이라는 것 입니다.

Oracle (및 ANSI-SQL)을 사용하면 다음과 같은 작업을 수행 할 수 있습니다.

 SELECT somedate, somevalue,
  SUM(somevalue) OVER(ORDER BY somedate
     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
          AS RunningTotal
  FROM Table

SQL Server는이 문제에 대한 명확한 해결책을 제공하지 않습니다. 내 직감은 커서가 가장 빠른 드문 경우 중 하나라고 말하지만 큰 결과에 대해 벤치마킹해야합니다.

업데이트 트릭은 편리하지만 상당히 취약합니다. 전체 테이블을 업데이트하는 경우 기본 키 순서대로 진행되는 것 같습니다. 따라서 날짜를 기본 키 오름차순으로 설정하면 probably안전합니다. 그러나 문서화되지 않은 SQL Server 구현 세부 사항에 의존하고 있습니다 (또한 쿼리가 두 프로세스에 의해 수행되면 어떻게 될지 궁금합니다 .MAXDOP 참조).

전체 작업 샘플 :

drop table #t
create table #t ( ord int primary key, total int, running_total int)

insert #t(ord,total)  values (2,20)
-- notice the malicious re-ordering 
insert #t(ord,total) values (1,10)
insert #t(ord,total)  values (3,10)
insert #t(ord,total)  values (4,1)

declare @total int
set @total = 0
update #t set running_total = @total, @total = @total + total

select * from #t
order by ord

ord         total       running_total
----------- ----------- -------------
1           10          10
2           20          30
3           10          40
4           1           41

당신은 벤치 마크를 요청했습니다. 이것은 낮습니다.

이 작업을 수행하는 가장 빠른 SAFE 방법은 커서가되며 상호 결합의 하위 쿼리보다 상관성이 훨씬 빠릅니다.

가장 빠른 방법은 UPDATE 트릭입니다. 그것에 대한 나의 유일한 관심은 모든 상황에서 업데이트가 선형 방식으로 진행될 것이라는 확신이 없다는 것입니다. 명시 적으로 말하는 쿼리에는 아무것도 없습니다.

결론적으로, 생산 코드의 경우 커서로 이동합니다.

테스트 데이터 :

create table #t ( ord int primary key, total int, running_total int)

set nocount on
declare @i int
set @i = 0
begin tran
while @i < 10000
begin
   insert #t (ord, total) values (@i,  rand() * 100)
    set @i = @i +1
end
commit

시험 1 :

SELECT ord,total,
    (SELECT SUM(total)
        FROM #t b
        WHERE b.ord <= a.ord) AS b
FROM #t a

-- CPU 11731, Reads 154934, Duration 11135

시험 2 :

SELECT a.ord, a.total, SUM(b.total) AS RunningTotal
FROM #t a CROSS JOIN #t b
WHERE (b.ord <= a.ord)
GROUP BY a.ord,a.total
ORDER BY a.ord

-- CPU 16053, Reads 154935, Duration 4647

시험 3 :

DECLARE @TotalTable table(ord int primary key, total int, running_total int)

DECLARE forward_cursor CURSOR FAST_FORWARD
FOR
SELECT ord, total
FROM #t
ORDER BY ord


OPEN forward_cursor

DECLARE @running_total int,
    @ord int,
    @total int
SET @running_total = 0

FETCH NEXT FROM forward_cursor INTO @ord, @total
WHILE (@@FETCH_STATUS = 0)
BEGIN
     SET @running_total = @running_total + @total
     INSERT @TotalTable VALUES(@ord, @total, @running_total)
     FETCH NEXT FROM forward_cursor INTO @ord, @total
END

CLOSE forward_cursor
DEALLOCATE forward_cursor

SELECT * FROM @TotalTable

-- CPU 359, Reads 30392, Duration 496

시험 4 :

declare @total int
set @total = 0
update #t set running_total = @total, @total = @total + total

select * from #t

-- CPU 0, Reads 58, Duration 139

답변

SQL Server 2012에서는 SUM () 을 OVER () 절 과 함께 사용할 수 있습니다 .

select id,
       somedate,
       somevalue,
       sum(somevalue) over(order by somedate rows unbounded preceding) as runningtotal
from TestTable

SQL 바이올린

답변

Sam Saffron은 훌륭한 작업을 수행했지만 이 문제에 대한 재귀 공통 테이블 표현식 코드는 제공하지 않았습니다 . 그리고 Denali가 아닌 SQL Server 2008 R2를 사용하는 우리에게는 여전히 총 실행 속도가 가장 빠르며 작업 컴퓨터의 커서보다 10 만 줄 더 빠르며 인라인 쿼리입니다.
따라서 여기에 있습니다 ( ord테이블에 열이 있고 간격이없는 순차 번호 라고 가정합니다. 빠른 처리를 위해서는이 번호에 대한 고유 한 제약 조건이 있어야합니다).

;with
CTE_RunningTotal
as
(
    select T.ord, T.total, T.total as running_total
    from #t as T
    where T.ord = 0
    union all
    select T.ord, T.total, T.total + C.running_total as running_total
    from CTE_RunningTotal as C
        inner join #t as T on T.ord = C.ord + 1
)
select C.ord, C.total, C.running_total
from CTE_RunningTotal as C
option (maxrecursion 0)

-- CPU 140, Reads 110014, Duration 132

sql fiddle demo

업데이트
나는 또한 변수 또는 기발한 업데이트 로이 업데이트 에 대해 궁금했습니다 . 일반적으로 정상적으로 작동하지만 매번 작동하는지 어떻게 확인할 수 있습니까? 글쎄, 여기 약간의 트릭이 있습니다 ( http://www.sqlservercentral.com/Forums/Topic802558-203-21.aspx#bm981258)- 현재와 이전을 확인 ord하고 1/0할당이 사용 하는 것과 다른 경우 당신은 기대 :

declare @total int, @ord int

select @total = 0, @ord = -1

update #t set
    @total = @total + total,
    @ord = case when ord <> @ord + 1 then 1/0 else ord end,
    ------------------------
    running_total = @total

select * from #t

-- CPU 0, Reads 58, Duration 139

테이블에 적절한 클러스터 된 인덱스 / 기본 키가있는 경우 (이 경우 인덱스 기준 인 경우 ord_id) 업데이트는 항상 선형 방식으로 진행됩니다 (0으로 나누지 않음). 즉, 프로덕션 코드에서 사용할 것인지 결정하는 것은 당신에게 달려 있습니다 🙂

업데이트 2 이 답변을 연결하고 있는데, 기발한 업데이트 -nvarchar 연결 / 인덱스 / nvarchar (max) 설명 할 수없는 동작의 신뢰성에 대한 유용한 정보가 포함되어 있습니다 .

답변

SQL 2005 이상의 APPLY 연산자는 다음과 같이 작동합니다.

select
    t.id ,
    t.somedate ,
    t.somevalue ,
    rt.runningTotal
from TestTable t
 cross apply (select sum(somevalue) as runningTotal
                from TestTable
                where somedate <= t.somedate
            ) as rt
order by t.somedate

답변

SELECT TOP 25   amount,
    (SELECT SUM(amount)
    FROM time_detail b
    WHERE b.time_detail_id <= a.time_detail_id) AS Total FROM time_detail a

ROW_NUMBER () 함수와 임시 테이블을 사용하여 내부 SELECT 문을 비교하는 데 사용할 임의의 열을 만들 수도 있습니다.

답변

상관 된 하위 쿼리를 사용하십시오. 매우 간단합니다. 여기 있습니다 :

SELECT
somedate,
(SELECT SUM(somevalue) FROM TestTable t2 WHERE t2.somedate<=t1.somedate) AS running_total
FROM TestTable t1
GROUP BY somedate
ORDER BY somedate

코드가 정확하지 않을 수도 있지만 아이디어는 확실합니다.

GROUP BY는 날짜가 두 번 이상 나타나는 경우 결과 집합에서 한 번만보고자합니다.

반복되는 날짜가 마음에 들지 않거나 원래 값과 ID를 보려면 다음이 필요합니다.

SELECT
id,
somedate,
somevalue,
(SELECT SUM(somevalue) FROM TestTable t2 WHERE t2.somedate<=t1.somedate) AS running_total
FROM TestTable t1
ORDER BY somedate

답변

같은 테이블에 누적 합계를 저장하여 비정규화할 수도 있습니다.

http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx

다른 솔루션보다 훨씬 빠르게 작동하지만 수정 속도가 느릴 수 있습니다