Mittwoch, 30. September 2015

Computing polynomials with cyclic Galois groups in SAGE

Here is some SAGE code to compute a polynomial with degree \(n\) which has cyclic Galois group. The computation is done as in the example https://en.wikipedia.org/wiki/Inverse_Galois_problem#Worked_example:_the_cyclic_group_of_order_three
#!/usr/bin/env spython

import sys
from sage.all import *
from random import sample
from collections import Counter

def t(n,k):
  return binomial(n-floor((k+1)/2),floor(k/2))

def alpha(n,k):
  return (-1)**(k*(k-1)/2)

def pbin(n):
  return sum([alpha(n,k)*t(n,k)*x**(n-k) for k in range(n+1)])


def genCyclicPoly(n):
  x = var('x')
  k = 0
  while True:
    p = k*n+1
    k += 1
    if is_prime(p):
      break
  y = primitive_root(p)
  K = CyclotomicField(p,names='a')
  a = K.gen()
  o = Mod(y**n,p).multiplicative_order()
  s = sum([a**((y**n)**i%p) for i in range(o)])
  pol = s.minpoly('x')
  return (pol,k-1,p,y)

for n in range(1,40+1):
  pol,k,p,y = genCyclicPoly(n)
  print n
  print pol
  print 

For example for \(n = \frac{17-1}{2} = 8 \) we get the following polynomial: \[ x^8 + x^7 - 7x^6 - 6x^5 + 15x^4 + 10x^3 - 10x^2 - 4x + 1 \] Looking at the coefficients of the polynomial and searching for them in OEIS we find the sequence A065941. Consider the following polynomial: \[ p(n,x) = \sum_{k=0}^{n} \binom{n-\left \lfloor \frac{k+1}{2} \right \rfloor}{\left \lfloor \frac{k}{2} \right \rfloor} \cdot (-1)^{\frac{k(k+1)}{2}}\cdot x^{n-k} \] We conjecture that the following is true (let \(p>2\) be prime): \[ Gal(p(n,x)) = C_n \Leftrightarrow n = \frac{p-1}{2} \Leftrightarrow p(n,x) \text{ is irreducible }\] If someone finds a proof or a counterexample to this, please let me know.

An heuristic argument why factoring is difficult

Let \(n=pq\) be the prime decomposition of \(n\) and let \(n=2^e\). Let us suppose that \(p = q = \sqrt{n}\). Let \(p_n\) be the probability to have a number \(1 \le x \le n\) with \(1 < gcd(x,n) < n\) and as such to also have a non-trivial factor of \(n\). Then we have \[p_n = 1-\frac{\phi(n)}{n}=1-\prod_{p|n}{(1-\frac{1}{p})} = 1-(1-\frac{1}{p})(1-\frac{1}{q}) \] \[ = 1-(1-\frac{1}{\sqrt{n}})^2 = 1-(1-\frac{1}{\sqrt{2^e}})^2 \] Suppose there exists a factoring-algorithm which runs in polynomial time \(f(e)\) and which draws a number \(x\) at every step and computes \(gcd(x,n)\). Let \(X\) be the number of numbers \(x\) with \(1 \lt x \lt n\) and \(1 \lt gcd(x,n) \lt n\) which the algorithm finds after \(f(e)\) steps. Then by definition of the algorithm we must have \[ 1 = P(X \ge 1) \] But on the other hand we have \[ P(X\ge 1) = 1 - P(X=0) = 1 - (1-p_n)^{f(e)} \] \[= 1-(1-\frac{1}{\sqrt{2^e}})^{2 \cdot f(e)} \] The last equality is by definition of the algorithm valid for every \(e\). But for \(e \rightarrow \infty\) we have \[1 = P(X \ge 1) = 1-(1-\frac{1}{\sqrt{2^e}})^{2 \cdot f(e)} \rightarrow_{e \rightarrow \infty} 0\] hence for \(e \rightarrow \infty\) we have the contradiction \(1 = 0\).

Candidate One Way Function

The construction of the function goes like this: The main idea is the construction of a "random" permutation on words of fixed size and then applying this permutation to the given word. Let \[w = a_0 \cdot a_1 \cdots a_{n-1} \] be a word from \( \{0,1\}^n \), \(|w| = n\) Let \[m = \sum_{i=0}^{n-1}{ a_i \cdot 2 ^ {n-1-i} } \] be the corresponding binary number constructed from the word. Let \(k= \left \lfloor \frac{n!}{2^n} \right \rfloor \cdot (m+1)\) , then \( 1 \le k \le n! \). Compute the Lehmer-Permutation \(\pi_k\) from \(k\) on \(n\) numbers. ( https://en.wikipedia.org/wiki/Factorial_number_system, https://en.wikipedia.org/wiki/Lehmer_code ) Set \[ x := \pi_k \cdot w = a_{\pi_k(0)} \cdot a_{\pi_k(1)} \cdots a_{\pi_k(n-1)} \] Then \[f(w) := x\]. So the function permutes the digits in the word \(w\) and the permutation is determined by \(w\). One might first ask, why the construction with \(k\) is needed, and why one might not just compute the Lehmer-Permutation of \(m\) instead. This is done so that the permutation is better distributed among the numbers \(1 ,\ldots, n!\). Otherwise since \(2^n << n!\) the permutation \(\pi_m\) will have to many fixed points \(\pi_m(i) = i\) for \(1 \le i \le L\) for some \(L\) . One possible attack would be given a word \(x = f(w)\) with \(l\) zeros and \(n-l\) ones, to consider all words \(y\) with \(l\) zeros and \(n-l\) ones and then compute \(f(y)\) of such words. There are exactly \(\binom{n}{l}\) such words. This attack would function if \(l\) is either too small or too large. Since as is known the \(\binom{n}{l}\) takes the maximum value at the central binomial coefficient (the middle term), we might conjecture, that words with \(l=\frac{n}{2}\) zeros and \(n-l=\frac{n}{2}\) ones are hard to attack. But for some random word \(w\), the expeted number of zeros and ones is equal to \(l = n-l = \frac{n}{2}\). Hence since the central binomial coefficient fullfills the inequality: \[\binom{n}{n/2} \ge \frac{2^n }{ \sqrt{2 \cdot n} } \] we might conjecture, that the work to be done to invert such words is exponential in \(n\). Here are some small examples, all words of length \(n = 4\):
m=0, n=4, k=1
Lehmer-Code = (Factorial-Digits of m) = [0, 0, 0, 0]
pi_k = [0, 1, 2, 3]
w= [0, 0, 0, 0]
pi_k*w= [0, 0, 0, 0]
m=1, n=4, k=2
Lehmer-Code = (Factorial-Digits of m) = [0, 0, 1, 0]
pi_k = [0, 1, 3, 2]
w= [0, 0, 0, 1]
pi_k*w= [0, 0, 1, 0]
m=2, n=4, k=3
Lehmer-Code = (Factorial-Digits of m) = [0, 1, 0, 0]
pi_k = [0, 2, 1, 3]
w= [0, 0, 1, 0]
pi_k*w= [0, 1, 0, 0]
m=3, n=4, k=4
Lehmer-Code = (Factorial-Digits of m) = [0, 1, 1, 0]
pi_k = [0, 2, 3, 1]
w= [0, 0, 1, 1]
pi_k*w= [0, 1, 1, 0]
m=4, n=4, k=5
Lehmer-Code = (Factorial-Digits of m) = [0, 2, 0, 0]
pi_k = [0, 3, 1, 2]
w= [0, 1, 0, 0]
pi_k*w= [0, 0, 1, 0]
m=5, n=4, k=6
Lehmer-Code = (Factorial-Digits of m) = [0, 2, 1, 0]
pi_k = [0, 3, 2, 1]
w= [0, 1, 0, 1]
pi_k*w= [0, 1, 0, 1]
m=6, n=4, k=7
Lehmer-Code = (Factorial-Digits of m) = [1, 0, 0, 0]
pi_k = [1, 0, 2, 3]
w= [0, 1, 1, 0]
pi_k*w= [1, 0, 1, 0]
m=7, n=4, k=8
Lehmer-Code = (Factorial-Digits of m) = [1, 0, 1, 0]
pi_k = [1, 0, 3, 2]
w= [0, 1, 1, 1]
pi_k*w= [1, 0, 1, 1]
m=8, n=4, k=9
Lehmer-Code = (Factorial-Digits of m) = [1, 1, 0, 0]
pi_k = [1, 2, 0, 3]
w= [1, 0, 0, 0]
pi_k*w= [0, 0, 1, 0]
m=9, n=4, k=10
Lehmer-Code = (Factorial-Digits of m) = [1, 1, 1, 0]
pi_k = [1, 2, 3, 0]
w= [1, 0, 0, 1]
pi_k*w= [0, 0, 1, 1]
m=10, n=4, k=11
Lehmer-Code = (Factorial-Digits of m) = [1, 2, 0, 0]
pi_k = [1, 3, 0, 2]
w= [1, 0, 1, 0]
pi_k*w= [0, 0, 1, 1]
m=11, n=4, k=12
Lehmer-Code = (Factorial-Digits of m) = [1, 2, 1, 0]
pi_k = [1, 3, 2, 0]
w= [1, 0, 1, 1]
pi_k*w= [0, 1, 1, 1]
m=12, n=4, k=13
Lehmer-Code = (Factorial-Digits of m) = [2, 0, 0, 0]
pi_k = [2, 0, 1, 3]
w= [1, 1, 0, 0]
pi_k*w= [0, 1, 1, 0]
m=13, n=4, k=14
Lehmer-Code = (Factorial-Digits of m) = [2, 0, 1, 0]
pi_k = [2, 0, 3, 1]
w= [1, 1, 0, 1]
pi_k*w= [0, 1, 1, 1]
m=14, n=4, k=15
Lehmer-Code = (Factorial-Digits of m) = [2, 1, 0, 0]
pi_k = [2, 1, 0, 3]
w= [1, 1, 1, 0]
pi_k*w= [1, 1, 1, 0]
m=15, n=4, k=16
Lehmer-Code = (Factorial-Digits of m) = [2, 1, 1, 0]
pi_k = [2, 1, 3, 0]
w= [1, 1, 1, 1]
pi_k*w= [1, 1, 1, 1]

As one can see for \(m=9\) and \(m=10\) (we have in both cases \(\pi_k \cdot w = 0011\)), the function is in general neither surjective nor injective. I have implemented this idea in python, and here is a nontrivial example on \(150\) bits which runs in under one second even though implemented in python:
w = [1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0]
m=1279607855208070195064338423740863454312235238, n=150, k=51223701589084024400750942404902242134696092112154131410359938801887859702545711950279545972412289530642923770897655341449893740303460239109720926509319367414721427119543767321207725214681463099908033034573794457858117619817707577286035424763678710865742056652800
pi_k = [134, 72, 4, 119, 108, 25, 91, 73, 112, 107, 32, 29, 54, 58, 80, 8, 94, 123, 38, 36, 69, 135, 127, 51, 141, 20, 124, 106, 83, 97, 82, 44, 55, 27, 3, 138, 52, 113, 118, 43, 147, 59, 34, 132, 45, 26, 121, 24, 18, 2, 11, 6, 126, 130, 81, 12, 48, 139, 1, 142, 46, 47, 120, 71, 88, 140, 104, 100, 23, 40, 143, 63, 9, 122, 136, 5, 67, 105, 33, 57, 84, 15, 49, 76, 98, 102, 111, 70, 41, 60, 86, 16, 13, 89, 22, 77, 17, 61, 129, 75, 21, 131, 62, 19, 95, 117, 78, 101, 109, 56, 90, 68, 7, 137, 53, 110, 146, 30, 125, 14, 96, 28, 50, 31, 114, 37, 148, 93, 144, 145, 133, 115, 92, 149, 87, 64, 42, 65, 79, 66, 39, 85, 74, 35, 128, 116, 103, 99, 10, 0]
pi_k * w = [1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1]
Here is the python code, to compute the function:
#!/usr/bin/python
import math
from itertools import permutations
import random

def getPermFromLehmer(fnr,n):
  # gives the permutation from lehmer-code: 
  # for example 4041000 -> (4,0,6,2,1,3,5)
  l = range(n)
  perm = []
  for d in fnr:
    perm.append(l.pop(d))
  return perm

def getPermutations(n):
  perms = permutations(range(0,n))
  return list(perms)

def getFactoradicDigits(num,d):
  digits = []
  i = 1
  while num > 0:
    rem = num % i
    num = num / i
    i += 1
    digits.append(rem)
  digits.reverse()
  diff = d - len(digits)
  newDigits = diff * [0]
  newDigits.extend(digits)
  return newDigits

def getLehmerCode(p):
  if len(set(p)) != len(p):
    return "is not a permutation"
  else:
    l = []
    for i in range(len(p)):
      pi = p[i]
      di = len([1 for pj in p[(i+1):] if pj < pi ])
      l.append(di)
    return l

def getFactoradicNumber(perm):
  lehmerCode = getLehmerCode(perm)
  n = len(perm)
  return sum([lehmerCode[i]*math.factorial(n-1-i) for i in range(n)])+1

def getPermFromNr(n,d):
  return getPermFromLehmer(getFactoradicDigits(n-1,d),d)


def getRandomBits(n):
  return [random.randint(0,1) for i in range(n)]

def computeNFromBits(r):
  l = len(r)
  return sum([2**(l-1-i)*r[i] for i in range(l)])


def applyPerm(perm,l):
  if not len(perm)==len(l):
    return None
  else:
    return [l[perm[i]] for i in range(len(perm))]


def f(bits):
  m = computeNFromBits(bits)
  if m == 0:
    m = 1
  n = len(bits)
  k = int(math.floor(math.factorial(n)/(2**n*1.0)))*(m+1)
  print "m=%s, n=%s, k=%s" % (m,n, k)
  perm = getPermFromNr(k,len(bits))
  print "pi_k = %s" % perm
  bitsPerm = applyPerm(perm,bits)
  return bitsPerm

def digits(n,d):
  k = n
  dig = []
  while k> 0:
    dig.append(k%d)
    k = k/d
  dig.reverse()   
  return dig


def computeAll(m):
  p = getPermutations(m)
  for i in range(0,2**m):
    r = digits(i,2)
    if len(r) < m:
       diff = (m-len(r))*[0]
       r.reverse()
       r.extend(diff)
       r.reverse()
    rPerm = f(r)
    print "w= %s" % r
    print "pi_k*w= %s" % rPerm
  for x in p:
    print x, getLehmerCode(x), getFactoradicNumber(x), getPermFromNr(getFactoradicNumber(x),m)


if __name__=='__main__':
  computeAll(3)
  r = getRandomBits(150)
  rPerm = f(r)
  print "w= %s" % r
  print "pi_k*w= %s" % rPerm

If someone has an idea on how to invert the function, please let me know.

Directed Graphs in PostgreSQL

In this post we want to make use of CTE in PostgreSQL to compute paths in directed graphs. We consider only the edges of the graphs, this means, that if some graph has a vertex without any edge in or out of this vertex, it is not shown here.
Given the edges of some graph, we want to construct the startpoints and endpoints only of paths, which are not contained in other paths.
First we want to construct some different graphs and we want also to note the desired output of the query when applied to the specific graph, to clarify what our solution should do.
Lets start with a graph wich consinsts of three components:

drop table if exists p;
create table p(
  startPoint integer,
  endPoint integer);

-- example 1:
delete from p;
insert into p(startPoint, endPoint) values (4,3),(2,5),(5,20),(1,10),(10,15),(15,16),(16,18);

-- desired output of example 1:
-- startpoint | endpoint | distance |     path
--------------+----------+----------+---------------
--          1 |       18 |        4 | 1,10,15,16,18
--          2 |       20 |        2 | 2,5,20
--          4 |        3 |        1 | 4,3
In the above example, we do not want to see for example startpoint=10, endpoint=16, distance=2 and path=10,15,16 because this path is already contained in the first path of the above table, which starts at 1 and ends at 18.
Now we want to take a look at a more complex example. In the following graph, we have two components and also paths of interest which are of length two, namely 1,4.
-- example 2:
delete from p;
insert into p(startPoint,endPoint) values (3,9),(9,2),(9,10),(10,7),(3,17),(1,4),(6,5),(5,9);

-- desired output of example 2:
-- startpoint | endpoint | distance |    path
--------------+----------+----------+------------
--          1 |        4 |        1 | 1,4
--          3 |        2 |        2 | 3,9,2
--          3 |        7 |        3 | 3,9,10,7
--          3 |       17 |        1 | 3,17
--          6 |        2 |        3 | 6,5,9,2
--          6 |        7 |        4 | 6,5,9,10,7
In this example we have several starting points occuring more than once, namely 3 and 6 which is ok, since they correspond to different paths.
What about cycles in a graph?
Should the query go in an infinte loop?
Should it terminate, and if it terminates, what should the output be?
Here is the most simple cyclic graph with more than one vertex:
It makes sense, that the query terminates, and if some component of the graph contains a loop, than this component should not be shown. In the above graph, the result would be an empty table:
-- example 3: (graph contains cycles)
delete from p;
insert into p(startPoint, endPoint) values (1,2),(2,1);

-- desired output of example 3:
-- startpoint | endpoint | distance | path
--------------+----------+----------+------

-- notice that in this example the desired output is an empty table, since it makes no sense to define the starting and the end poin of a cycle;
What happens if a graph contains multiple routes from the same starting and endpoints? Here a small graph :
In this case all paths should be shown. (One might consider an alternative solution for example, to take only the shortest or the longest path.):
-- example 4: (graph contains multiple routes)
delete from p;
insert into p(startPoint, endPoint) values (1,2),(2,3),(1,3);

-- desired output of example 3:
-- startpoint | endpoint | distance | path
--------------+----------+----------+-------
--          1 |        3 |        1 | 1,3
--          1 |        3 |        2 | 1,2,3
In the last example, we want to consider a graph which has more than one component one of which contains a cycle and the other not:
Here we want to show only the paths of the component which has no cycle:
-- example 5: (graph contains two components, one of which contains a cycle, the other not)
delete from p;
insert into p(startPoint, endPoint) values (1,2),(2,1),(3,4),(4,5);

-- notice that in this example, it makes sense to ignore the component of the graph which contains a cycle and to just show the paths of the other components which do not have a cycle

-- desired output of example 4:
-- startpoint | endpoint | distance | path
--------------+----------+----------+-------
--          3 |        5 |        2 | 3,4,5
All this can be done with CTE in PostgreSQL, which at this time is not available in MySQL. The basic idea to recursively define a table is the same as the definition of a recursive function. In the recursive function you need some starting value. This starting value is implemented through a "starting table", or "starting query". This query is then "UNION [ALL]" to the recursive part.
One solution is:
with recursive t(intermediatePoint,startPoint,endPoint,distance,path) as (
    select
        ( select greatest(max(startPoint),max(endPoint)) from p)+1 as  intermediatePoint,
        startPoint, endPoint, 1 as distance, cast(startPoint as text) || ',' || cast(endPoint as text) from p
      union all
    select t2.startPoint as intermediatePoint, t1.startPoint, t2.endPoint,1+t2.distance, cast(t1.startPoint as text) || ',' || t2.path from p t1, t t2
     where t1.endPoint = t2.startPoint
      and 1+t2.distance <= (select count(*) from p)
) select startpoint, endpoint,distance,path from t
     where
       startPoint not in (select intermediatePoint from t)
       and endPoint not in (select intermediatePoint from t)
     order by startpoint, endpoint;
To keep track if some path is contained in some other path, we keep track of intermediate points, when joining together two paths. In this way we can later recognize if some edge is an intermediate point or not. If the starting or ending points have been considered to be intermediate points, than the corresponding entry is not shown, because this means, that this path is contained in some other path.
Since paths of length two (edges) do not have intermediate points, but we have to assign them also a value, because of "UNION [ALL]", on idea would be to assign them the value NULL. Unfortunotely this did not work. I guess this is becaus in the definiton of the recursive table t(intermediatPoint,startPoint,endPoint,distance,path) we have nothing said about the data-types of the columns involved, for example of the column intermediatePoint. So I guess, to determine the data-type of this column, the data-type of the corresponding column before the "UNION-ALL" is taken, but this guess might be wrong. So in order to have some value of which is distinct from all edges, I decided to take the maximum over all edges and add one to it.
To avoid the query to go into an infinte loop when it tries to join paths of a cycle, we make use of the following observation:
The start and endpoints with corresponding paths which are of interest to us can have at most a length which is equal to the number of edges in the graph. This would correspond to, a graph of consecutive integers where there is an edge from every integer n to n+1 for example 1,2,3,..,100,101.

Samstag, 25. April 2015

SQL-Query to compute start- and endpoints of consecutive integers in a given list of distinct integers

In this post we want to compute the starting and ending points of consecutive integers in a given list of distinct integers.
Consider the following table:
drop table if exists p;
create table p ( proj_start integer primary key);
insert into p values (1),(2),(3),(4),(6),(16),(17),(18),(19),(21),(26),(27),(28),(29);
In this table we have the following sequences of consecutive integers:
1,2,3,4; 
6 ; 
16,17,18,19; 
21;
26, 27, 28, 29
We want to compute the starting and the end points of those sequences, which are:

1,4
6,6
16,19
21,21
26,29

We will do this, by first computing the starting points, then the end points and then do a join on them:
select
   startPoints.proj_start start, min(endPoints.proj_start) end
 from
 (
   select min(proj_start) proj_start from p
     union
   select p2.proj_start from p p1, p p2 where p1.proj_start < p2.proj_start group by p2.proj_start having p2.proj_start > max(p1.proj_start)+1
 ) as startPoints, # compute the starting points of the sequences
 (
  select p1.proj_start from p p1, p p2 where p1.proj_start < p2.proj_start group by p1.proj_start having min(p2.proj_start)>p1.proj_start+1
    union # we have to take the last point because it is not in the above query:
  select max(proj_start) as proj_start from p
  ) as endPoints  # compute the end points of the sequences
  where startPoints.proj_start <= endPoints.proj_start group by startPoints.proj_start
 ;

There are also other solutions to this problem (See for example Joe Celko's , SQL For Smarties, Advanced SQL Programming, Fourth Edition, Chapter 32.3, page 600). One solution makes use of the following observation for sequences of consecutive integers:
A sequence of distinct integers are consecutive if and only if
( maximal number in sequence - minimal number in sequence + 1 ) = ( number of integers in sequence)
For example consider the consecutive sequence:
5,6,7,8
 We have:
 8 - 5 + 1  = 4
On the other hand consider the following sequence which is not consecutive:
1,2,3,10
We have:
10-1+1 = 10 != 4
Here is the solution found at Joe Celko adopted to the above table with minor modification:
select min(proj_start) start, item end from
   (
     select p1.proj_start, max(p2.proj_start)     item   from p as p1
                 inner join      p as p2
            on p1.proj_start <= p2.proj_start        and
                  (p2.proj_start - p1.proj_start +1)
                     = (select count(*)  from p as p3  where p3.proj_start between p1.proj_start and p2.proj_start)
          group by p1.proj_start
   ) x group by item;