Samstag, 25. April 2015

SQL-Query to compute start- and endpoints of consecutive integers in a given list of distinct integers

In this post we want to compute the starting and ending points of consecutive integers in a given list of distinct integers.
Consider the following table:
drop table if exists p;
create table p ( proj_start integer primary key);
insert into p values (1),(2),(3),(4),(6),(16),(17),(18),(19),(21),(26),(27),(28),(29);
In this table we have the following sequences of consecutive integers:
1,2,3,4; 
6 ; 
16,17,18,19; 
21;
26, 27, 28, 29
We want to compute the starting and the end points of those sequences, which are:

1,4
6,6
16,19
21,21
26,29

We will do this, by first computing the starting points, then the end points and then do a join on them:
select
   startPoints.proj_start start, min(endPoints.proj_start) end
 from
 (
   select min(proj_start) proj_start from p
     union
   select p2.proj_start from p p1, p p2 where p1.proj_start < p2.proj_start group by p2.proj_start having p2.proj_start > max(p1.proj_start)+1
 ) as startPoints, # compute the starting points of the sequences
 (
  select p1.proj_start from p p1, p p2 where p1.proj_start < p2.proj_start group by p1.proj_start having min(p2.proj_start)>p1.proj_start+1
    union # we have to take the last point because it is not in the above query:
  select max(proj_start) as proj_start from p
  ) as endPoints  # compute the end points of the sequences
  where startPoints.proj_start <= endPoints.proj_start group by startPoints.proj_start
 ;

There are also other solutions to this problem (See for example Joe Celko's , SQL For Smarties, Advanced SQL Programming, Fourth Edition, Chapter 32.3, page 600). One solution makes use of the following observation for sequences of consecutive integers:
A sequence of distinct integers are consecutive if and only if
( maximal number in sequence - minimal number in sequence + 1 ) = ( number of integers in sequence)
For example consider the consecutive sequence:
5,6,7,8
 We have:
 8 - 5 + 1  = 4
On the other hand consider the following sequence which is not consecutive:
1,2,3,10
We have:
10-1+1 = 10 != 4
Here is the solution found at Joe Celko adopted to the above table with minor modification:
select min(proj_start) start, item end from
   (
     select p1.proj_start, max(p2.proj_start)     item   from p as p1
                 inner join      p as p2
            on p1.proj_start <= p2.proj_start        and
                  (p2.proj_start - p1.proj_start +1)
                     = (select count(*)  from p as p3  where p3.proj_start between p1.proj_start and p2.proj_start)
          group by p1.proj_start
   ) x group by item;