Wednesday, August 3, 2011

Sequential Numbering of Group of Events with T-SQL

An example to illustrate the title is winning and losing streaks in a football game. Suppose we have the below set of data about a number of a series games.


mnummatch_dateresult
107/01/2011L[oss]
2 07/02/2011L
3 07/03/2011 W[in]
4 07/04/2011W
507/05/2011 L
607/06/2011W
707/07/2011 W

The goal is to sequentially number each losing or winning streak. So for the above example, the number would look like the below:
mnummatch_dateresult streak
107/01/2011L -1
2 07/02/2011L -2
3 07/03/2011 W 1
4 07/04/2011W 2
507/05/2011 L -1
607/06/2011W 1
707/07/2011 W 2
We chose to number the two events in two different directions (negative for losses and positive for wins) . Below is an approach I took a while ago (and needed recently) using T-SQL to solve this problem:

First let's create a temporary table (a table variable) that adds the streak column. To do this we basically self-join the table so that each game is compared to the following game. The streak is then 1 if two consecutive games have different results (W/L or L/W). If, however, two consecutive games have the same result we use row_number() ranking function to denote the streak number. We do this operation once for the losses and once for the wins as the code illustrates below

declare @temp_results table(row_num int,match_date datetime,result char(1),streak int);
insert into @temp_results (row_num,match_date,result,streak)
select row_number() over (order by match_date) as row_num, match_date,result,streak
from(
select m1.match_date,m1.result
,(case when m1.result = m2.result then row_number() over(order by m1.match_date) else 1 end)
as streak
from @matches m1 left join @matches m2
on m1.mnum = m2.mnum+1
where m1.result= 'W'
UNION ALL
select m1.match_date,m1.result
,(case when m1.result = m2.result then -1 * (row_number() over(order by m1.match_date)) else -1 end )
as streak
from @matches m1 left join @matches m2
on m1.mnum = m2.mnum+1
where m1.result= 'L'
) as chld
The above code is not enough because row_number will not generate sequential streaks. In fact the data so far looks like the table below

mnummatch_dateresult streak
107/01/2011L -1
2 07/02/2011L -2
3 07/03/2011 W 1
4 07/04/2011W 2
507/05/2011 L -1
607/06/2011W 1
707/07/2011 W 4

The second step is to get rid of the gaps between sequence numbers. To do this we update the table (the table variable) so that for each streak if it's not 1 or -1 we find the last game (max) that has the same result as the current place but it took place before the current game and then subtract that game's row_number for current row_number. Code explains it better:
-- we need this update as a fix to get rid of the gaps between successive wins or successive losses
update t1
set t1.streak =
(case when t1.result = 'W' then 1+t1.row_num - (select max(row_num)
from @temp_results t
where t.streak = 1
and t.row_num < t1.row_num ) else -1 * (1+t1.row_num - (select max(row_num) from @temp_results t where t.streak = -1 and t.row_num < t1.row_num )) end) from @temp_results t1 --left join @temp_results t2 on t1.row_num = t2.row_num where t1.streak not in (1,-1)