Dienstag, 4. Juni 2024
Das Kartenspiel "2 oder 4"
Ich möchte heute mein neues Kartenspiel "2 oder 4" vorstellen. Es ist auf MeinSpiel erschienen als Print-On-Demand und wird so gespielt: Auf je zwei Karten gibt es genau 2 oder 4 gemeinsame Bilder. Das Ziel des Spieles ist es schnell so viele Karten wie möglich zu gewinnen, indem man die 2 oder 4 Bilder findet und benennt. Wer das Spiel zuerst einmal selbst ausprobieren möchte und gerne bastelt, der kann hier das Spiel als pdf herunterladen und selbst drucken und ausschneiden.
Es eignet sich gut als Partyspiel oder als Familienspiel. Dass es bei 36 Karten und genau zwei Karten je genau je 2 oder genau je 4 Bilder gibt, die gemeinsam sind, liegt an der Existenz eines mathematischen Objektes.
Montag, 4. März 2024
The abc conjecture and positive definite kernel on the natural numbers with inductive Cholesky decomposition.
This is a transcription of the following MathOverflow question, found here.
One formulation of the abc-conjecture is:
\[\forall a,b \in \mathbb{N}: \frac{a+b}{\gcd(a,b)} < \operatorname{rad}\left ( \frac{ab(a+b)}{\gcd(a,b)^3}\right )^2 \]
Let us define:
\[K(a,b) := \frac{2(a+b)}{\gcd(a,b)\operatorname{rad}\left ( \frac{ab(a+b)}{\gcd(a,b)^3}\right )^2} \]
I have checked in Sagemath up to \(n=100\) that the matrix
\[(K(i,j))_{1 \le i,j \le n} \text{ is positive definite }\]
and up to \(1\le a,b \le 100\) that :
\[K(a,b)\le 1\]
In the Encyclopedia of Distances the word "similarity" is described as:
A similarity \(s:X\times X \rightarrow \mathbb{R}\) is defined in the Encyclopedia of Distances as:
Edit: I think that \(4.\) can be proved by considering that if \(K(x,y)=1\) and assuming there exists a prime \(p\) which divides \(x\) but not \(y\), leads to a contradiction.
Second edit: The abc conjecture (in the version above) would follow, if one could prove that the function \[K(a,b) := \frac{2(a+b)}{\gcd(a,b)\operatorname{rad}\left ( \frac{ab(a+b)}{\gcd(a,b)^3}\right )^2} \] is positive definite over the natural numbers.
Proof: If \(K\) is a positive definite function over the natural numbers, then by the Moore-Aronszajn theorem , there exists a Hilbert space \(H\) and a feature mapping \(\phi: \mathbb{N} \rightarrow H\) such that: \[K(a,b) = \left < \phi(a),\phi(b) \right >_{H}\] But by Cauchy-Schwarz inequality we get: \[K(a,b) = |K(a,b)| = |\left < \phi(a), \phi(b) \right >_{H}|\] \[\le |\phi(a)||\phi(b)| = \sqrt{K(a,a)K(b,b)} = \sqrt{1 \cdot 1} = 1\] But then we get by division of \(2\) and application of the last inequality: \[\frac{(a+b)}{\gcd(a,b)\operatorname{rad}\left ( \frac{ab(a+b)}{\gcd(a,b)^3}\right )^2} \le \frac{1}{2} < 1 \] hence it would follow that: \[\forall a,b \in \mathbb{N}: \frac{a+b}{\gcd(a,b)} < \operatorname{rad}\left ( \frac{ab(a+b)}{\gcd(a,b)^3}\right )^2 \] Modified question: Is there a way to speed up the computations to try and test if the Gram matrix for \(n=1,\cdots,10000 \) is positive definite? Here is my code so far, but it is rather slow:
It is based on inductive Cholesky decomposition as described below and keeps track in a dictionary if the \(\phi(n)\) or \(C_n\) have already been computed. It uses float approximation for greater speed and in the last step we check if each component of the vector \(\phi(n)\) is a real number. If it is for all \(1 \le k \le n\) then we conclude that the Gram matrix of \(n\) is positive definite. If it is not, then at least the Gram matrix for \(n\) is not positive definite:
Inductive Cholesky decomposition to find an embedding \(\phi(n)\) for each number \(n\):
Equation 1): \[\phi(n+1) = \left(\begin{array}{r} {C_{n}^{-1} v_n} \\ \sqrt{k(n+1,n+1)-{\left| {C_{n}^{-1} v_n} \right|}^{2} } \end{array}\right)\] Equation 2): \[v_n = \left(\begin{array}{r} k(1,n+1) \\ k(2,n+1) \\ \cdots\\ k(n,n+1)\\ \end{array}\right)\] Equation 3): \[C_n = \left(\begin{array}{r} \phi(1)^T \\ \phi(2)^T \\ \cdots\\ \phi(n)^T\\ \end{array}\right)\] Equation 4): \[\phi(1) := \sqrt{k(1,1)} e_1\] If we assume that the function \(k\) is positive definite, then this inductive method, should give us the vectors \(\phi(a),\phi(b)\) such that under the normal inner product we have: \[k(a,b) = \left < \phi(a), \phi(b) \right > \]. Edit (11.03.2024): This conjecture about the positive semidefiniteness of \(K\) is wrong as the determinant of \(G_{192}\) is < \(0\)
- \(s(x,y) \ge 0 \forall x,y \in X\)
- \(s(x,y) = s(y,x) \forall x,y \in X\)
- \(s(x,y) \le s(x,x) \forall x,x \in X\)
- \(s(x,y) = s(x,x) \iff x=y\)
Edit: I think that \(4.\) can be proved by considering that if \(K(x,y)=1\) and assuming there exists a prime \(p\) which divides \(x\) but not \(y\), leads to a contradiction.
Second edit: The abc conjecture (in the version above) would follow, if one could prove that the function \[K(a,b) := \frac{2(a+b)}{\gcd(a,b)\operatorname{rad}\left ( \frac{ab(a+b)}{\gcd(a,b)^3}\right )^2} \] is positive definite over the natural numbers.
Proof: If \(K\) is a positive definite function over the natural numbers, then by the Moore-Aronszajn theorem , there exists a Hilbert space \(H\) and a feature mapping \(\phi: \mathbb{N} \rightarrow H\) such that: \[K(a,b) = \left < \phi(a),\phi(b) \right >_{H}\] But by Cauchy-Schwarz inequality we get: \[K(a,b) = |K(a,b)| = |\left < \phi(a), \phi(b) \right >_{H}|\] \[\le |\phi(a)||\phi(b)| = \sqrt{K(a,a)K(b,b)} = \sqrt{1 \cdot 1} = 1\] But then we get by division of \(2\) and application of the last inequality: \[\frac{(a+b)}{\gcd(a,b)\operatorname{rad}\left ( \frac{ab(a+b)}{\gcd(a,b)^3}\right )^2} \le \frac{1}{2} < 1 \] hence it would follow that: \[\forall a,b \in \mathbb{N}: \frac{a+b}{\gcd(a,b)} < \operatorname{rad}\left ( \frac{ab(a+b)}{\gcd(a,b)^3}\right )^2 \] Modified question: Is there a way to speed up the computations to try and test if the Gram matrix for \(n=1,\cdots,10000 \) is positive definite? Here is my code so far, but it is rather slow:
# inductive cholesky decomposition def rad(n): return prod(prime_divisors(n)) def kk(a,b): return (2*(a+b)*a**1*b**1/gcd(a,b))*1/rad(a*b*(a+b)/gcd(a,b)**3)**2 def notPositiveDefinite(a,b): return rad(a*b/gcd(a,b)) #kk = notPositiveDefinite def grammat(kk,n): return matrix([[kk(a,b) for a in range(1,n+1)] for b in range(1,n+1)],ring=QQ) PHI = dict([]) def phi(n,useN=False): if n in PHI.keys(): return PHI[n] if n==1: if not useN: x = vector([sqrt(kk(1,1))]) else: x = vector([sqrt(kk(1,1)+0.0)]) PHI[n] = x return x else: w = CC(n-1)**(-1)*vv(n-1) lw = list(w) if not useN: lw.append(sqrt(kk(n,n)-w.norm()**2)) else: lw.append(sqrt(kk(n,n)-w.norm().n()**2).n()) x = vector(lw) PHI[n]= x return x def vv(n): return vector([kk(i,n+1) for i in range(1,n+1)]) CCDict = dict([]) def CC(n): if n in CCDict.keys(): return matrix(CCDict[n]) mm = [] for i in range(1,n+1): vi = list(phi(i)) if len(vi) < n: vi.extend((n-i)*[0]) mm.append(vi) M= matrix(mm) CCDict[n] = mm return M for n in range(1,200): print(n,all(x in RR for x in phi(n,useN=True))) #print(grammat(kk,n).rational_form()) ```Comment on code:
It is based on inductive Cholesky decomposition as described below and keeps track in a dictionary if the \(\phi(n)\) or \(C_n\) have already been computed. It uses float approximation for greater speed and in the last step we check if each component of the vector \(\phi(n)\) is a real number. If it is for all \(1 \le k \le n\) then we conclude that the Gram matrix of \(n\) is positive definite. If it is not, then at least the Gram matrix for \(n\) is not positive definite:
Inductive Cholesky decomposition to find an embedding \(\phi(n)\) for each number \(n\):
Equation 1): \[\phi(n+1) = \left(\begin{array}{r} {C_{n}^{-1} v_n} \\ \sqrt{k(n+1,n+1)-{\left| {C_{n}^{-1} v_n} \right|}^{2} } \end{array}\right)\] Equation 2): \[v_n = \left(\begin{array}{r} k(1,n+1) \\ k(2,n+1) \\ \cdots\\ k(n,n+1)\\ \end{array}\right)\] Equation 3): \[C_n = \left(\begin{array}{r} \phi(1)^T \\ \phi(2)^T \\ \cdots\\ \phi(n)^T\\ \end{array}\right)\] Equation 4): \[\phi(1) := \sqrt{k(1,1)} e_1\] If we assume that the function \(k\) is positive definite, then this inductive method, should give us the vectors \(\phi(a),\phi(b)\) such that under the normal inner product we have: \[k(a,b) = \left < \phi(a), \phi(b) \right > \]. Edit (11.03.2024): This conjecture about the positive semidefiniteness of \(K\) is wrong as the determinant of \(G_{192}\) is < \(0\)
Samstag, 2. März 2024
Application of Ehrhart polynomials for counting of numbers and lattices for natural numbers.
This is a transcription of the MO question written here:
It is known that \(\log(p)\) is independent as a vector over the rational numbers where \(p\) is a prime.
We have \(\log(ab) = \log(a)+\log(b)\) and \(\log(n) = \sum_{p |n}v_p(n) \log(p)\).
We define \(\psi(n) := \sum_{p |n}v_p(n) e_p\) where \(e_p\) is the \(p\)-th standard basis vector of the Hilbert space of sequences.
for \(q = a/b \in \mathbb{Q}_{>0}\) we define:
\[\psi(a/b) := \psi(a/\gcd(a,b))-\psi(b/\gcd(a,b))\].
We have for \(n \in \mathbb{N}\):
\[\|\psi(n)\|^2 = \sum_{p|n} v_p(n)^2 \text{ is a natural number}\]
Furthermore the function
\[K(a,b) = \left < \psi(a),\psi(b)\right > = \sum_{p|\gcd(a,b)} v_p(a)v_p(b)\]
is a positive definite kernel on the natural numbers.
Let us define the \(\infty\)-dimensional lattice \(\Gamma\) as:
\(\Gamma:= \{\psi(q)| q \in \mathbb{Q}_{>0}, \|\psi(q)\|^2 \equiv 0 \mod(2)\}\)
and let us also define:
\[\eta(n) := (-1)^{\|\psi(n)\|^2}= \prod_{p|n}(-1)^{v_p(n)^2}\]
For \(\psi(a),\psi(b) \in \Gamma\) we have:
\[\|\psi(ab)\|^2 = \|\psi(a)+\psi(b)\|^2 = \|\psi(a)\|^2+\|\psi(b)\|^2+2K(a,b) \equiv 0+0+0 \equiv 0 \mod(2)\]
Therefore \(\psi(a)+\psi(b) \in \Gamma\) and \(\psi(1) \in \Gamma\).
The lattice shares some properties with the Leech-lattice and the E8 lattice:
1. It is unimodular in the sense that: For a finite set of primes, the gram matrix \((K(p,q))_{p,q \text{ is prime}}\) has \(\det=1\).
2. It is even: \(\forall x \in \Gamma: |x|^2 \equiv 0 \mod(2)\)
3. For all nonzero vectors in the lattice the squared norm is at least \(2\).
My questions are these:
1) Probably an easy question:
\[\eta(mn) = \eta(m) \eta(n) \forall m,n\in \mathbb{N}\]
2) Does it satisfy a Riemann-Hypothesis type of random variable equality:
\[\forall \epsilon > 0 : \lim_{N\rightarrow \infty} \frac{1}{N^{1/2+\epsilon}}\sum_{n=1}^N \eta(n) = 0\]
3) Is there a relationship to the Riemann Hypothesis and this lattice?
Here is some Sagemath code for experiments.
And here is some picture concerning the 2. question:
**Edit**:
Let \(\lambda(n)=(-1)^{\Omega(n)}\) be the Liouville function.
Then because \(v_p(n)^2\equiv v_p(n) \mod(2)\), we have indeed:
\[\eta(n)= \lambda(n)\]
This answers the first question since it is known that \(\lambda\) is multiplicative.
This also answers the second and third question.
Modified question: Is this lattice known in literature?
Thanks for your help.
-------------------------------------------------------------------------------------------------
A maybe useful point of view, is to use the Ehrhart polynomial of a polytope, namely the simplex, to count the numbers as points in a simplex:
Let \(Q_N:=\) Polytope of \((\{\psi(p)|1\le p \le N, p \text{ prime }\})\)
Then the polytope generated by the primes is the simplex, and it is known ( "Computing the continuous discretely" Matthias Beck, Sinai Robins, Second edition, p.31 ff) that the Ehrhart polynomial is given by:
\[L(Q_N,t) = \operatorname{binomial}(d+t,d)\]
where \(d= \pi(N)\) is the dimension of the simplex, and it is equal to the number of primes \(\le N\), which is denoted by \(\pi(N)\). Let \(\hat{\pi}(n):=\{p \text{ prime}: p|n \}\) and let \(p_n:=n\)-th prime.
Since \(L(Q_N,t)\) counts the points \(\psi(n)=(x_1,\cdots,x_d)\) in the dilated polytope \(t Q_N\) and those points have coordinates \(\ge 0\) with \(x_1+\cdots+x_d \le t\), we conclude that, by observing that the sum of the coordinates correspond to \(\Omega(n) := \sum_{p|n} v_p(n)\):
\[L(Q_N,t) = |\{\psi(n) : \Omega(n) \le t , \hat{\pi}(n) \subset \hat{\pi}(N!), 1 \le n \le p_{\pi(N)}^t\}|\]
The condition \(\hat{\pi}(n) \subset \hat{\pi}(N!)\) is for making sure we look at those numbers which have only prime divisors \(\le N\).
We also observe that:
\[L(Q_N,t)-L(Q_N,t-1) = |\{ n : 1 \le n \le p_d^t, \Omega(n)=t, \hat{\pi}(n) \subset \hat{\pi}(N!) \}| = |A_{N,t}|\]
where I have defined:
\[A_{N,t} = \{ n : 1 \le n \le p_d^t, \Omega(n)=t, \hat{\pi}(n) \subset \hat{\pi}(N!) \}\]
We can now define and try to evaluate the following sum:
\[F(N,t):= \sum_{k=0}^t \sum_{n \in A_{N,k}} \lambda(n)\]
\[=\sum_{k=0}^t (-1)^k ( L(Q_N,k)-L(Q_N,k-1) )\]
\[=\sum_{k=0}^t (-1)^k ( \operatorname{binomial}(d+k,d)-\operatorname{binomial}(d+k-1,d) )\]
which according to Wolfram Alpha is equal to:
\[=((-1)^t \operatorname{binomial}(d+t+1,d) F_{2,
1}(1,d+t+2;t+2;-1)+2^{-d-1})-2^{-d-1}(2^{d+1}(-1)^t \operatorname{binomial}(d+t,d)F_{2,1}(1,d+t+1;t+1;-1))\]
\[=((-1)^t \operatorname{binomial}(\pi(N)+t+1,\pi(N)) F_{2,1}(1,\pi(N)+t+2;t+2;-1)+2^{-\pi(N)-1})-2^{-\pi(N)-1}(2^{\pi(N)+1}(-1)^t \operatorname{binomial}(\pi(N)+t,\pi(N))F_{2,1}(1,\pi(N)+t+1;t+1;-1))\]
where \(F_{2,1}(a,b;c;z)\) is the hypergeometric function.
I must admit that the formula is not really "nice" but I think it should be useful, because using the RHS of the last equality with \(d\) could maybe be extended to other values for \(d,t\) other than the natural numbers.
I can also not see, how to relate the sum \(F(N,t)\) to the Liouville sum:
\[\sum_{n=1}^N \lambda(n)\]
The naive idea is that in the limit \(N\rightarrow \infty, t \rightarrow \infty\), the two sets:
\[\mathbb{N}:= \lim_{N\rightarrow \infty}\{1,\cdots,N\} \text{ and } \lim_{N,t\rightarrow \infty} A_{N,t} \]
should be intuitively equal.
There is also an equivalent formulation of the Riemann Hypothesis with the points on a different polytope, but while it uses the Ehrhardt polynomials, which in this case are very complicated at \(t=1\) I can not see how to derive an analytic formula in this case.
Here you can find some sanity check in SageMath.
Let \(B_{N,t} = \{n : \Omega(n) \le t , \hat{\pi}(n) \subset \hat{\pi}(N!), 1 \le n \le p_{\pi(N)}^t\}\), so that
\[\operatorname{binomial}(d+t,d) = L(Q_N,t) = |B_{N,t}|\]
The arguments above show that:
\[\sum_{n \in B_{N,t}} \lambda(n) = F(N,t) = \]
\[=((-1)^t \operatorname{binomial}(d+t+1,d) F_{2,1}(1,d+t+2;t+2;-1)+2^{-d-1})-2^{-d-1}(2^{d+1}(-1)^t \operatorname{binomial}(d+t,d)F_{2,1}(1,d+t+1;t+1;-1))\]
hence dividing by the number of points under which the sum runs, we get:
\[\frac{1}{|B_{N,t}|} \sum_{n \in B_{N,t}} \lambda(n) = \frac{1}{|B_{N,t}|} F(N,t) = \]
\[=(-1)^t \frac{d}{t+1}F_{2,1}(1,d+t+2;t+2;-1)+(-1)^t F_{2,1}(1,d+t+2;t+2;-1)+\frac{1}{2^{d+1} \operatorname{binomial}(d+t,d)}+(-1)^{t+1}F_{2,1}(1,d+t+1;t+1;-1)\]
where \(d:=\pi(N)\).
Notice the similarity to the prime number theorem:
We have in both cases sets \(C_N\) such that:
\[\lim_{N\rightarrow \infty} \frac{1}{|C_N|} \sum_{n \in C_N} \lambda(n) = 0\]
The case \(C_N := \{1,\cdots,N\}\) corresponds by Landau equivalently to the prime number theorem.
We observe that:
\[\lim_{N \rightarrow \infty} B_{N,t} = \{n \in \mathbb{N} | \Omega(n) \le t \} =: B_{\infty,t}\]
and that
\[\lim_{t \rightarrow \infty}(\lim_{N\rightarrow \infty} B_{N,t} ) = \lim_{t \rightarrow \infty} B_{\infty,t} = \lim_{t
\rightarrow \infty} \{n \in \mathbb{N} | \Omega(n) \le t \} = \mathbb{N}\]
Similarly:
\[\lim_{N \rightarrow \infty}(\lim_{t\rightarrow \infty} B_{N,t} ) = \lim_{N \rightarrow \infty} B_{N,\infty} = \lim_{N \rightarrow \infty} \{n \in \mathbb{N} | \hat{\pi}(n) \subset \hat{\pi}(N!)\} = \mathbb{N}\]
Hence we have:
\[\lim_{t \rightarrow \infty, N \rightarrow \infty} B_{N,t} = \mathbb{N}\]
(Here we have used that there are infinitely many primes.)
Now let us put \(C_N := B_{N,\pi(N)}\).
Then one empirical observation, which I was not able to prove directly, is:
\[\lim_{N \rightarrow \infty} \frac{1}{|C_N|} \sum_{n \in C_N} \lambda(n) =^? 0\]
which reminds a little bit on the prime number theorem where \(C_N:=\{1,\cdots,N\}\).
Sonntag, 17. Dezember 2023
A class of number theoretic finite semirings
Abstract: This work presents a novel structure of finite semirings constructed from factorization trees. We first establish the foundational elements of our theory by defining the largest and smallest prime divisors of a natural number. Utilizing these divisors, we construct factorization trees that inherently determine the lexicographic ordering of the factorizations. These trees enable us to define a projection function, which reinterprets factorization trees as decision trees for the projection of larger numbers.
We introduce two recursive definitions for the projection function, $\pi_n(N)$, one based on the decision tree interpretation and another on divisors and lexicographic sorting. Tables of values for $\pi_i(N)$ provide insight into the behavior of these functions. Moreover, we identify the properties of $\pi_n(N)$, which are pivotal to the semiring operations defined on the set $[n]$.
The addition and multiplication operations, denoted as $\oplus$ and $\otimes$ respectively, are defined through the iteration of the successor function. This leads to the establishment of $([n], \oplus, \otimes)$ as an abelian semiring. We also propose a notation to express congruence under projection and demonstrate the properties that validate the semiring definition.
The abstract concludes with conjectures to simplify the computation of $\oplus$ and $\otimes$, supported by visual representations of addition and multiplication tables and successor graphs for various $n$ values. These conjectures, if proven, can significantly enhance computational efficiency and confirm the structure as an abelian semiring.
Download-Link (pdf).
Sonntag, 26. November 2023
A note in inverse Galois theory.
Abstract:
The note explores connections between inverse Galois theory and Hilbert irreducibility, presenting results in the form of theorems and lemmas. The main focus is on establishing conditions under which a finite group can be realized as a Galois group over the rational numbers. The note introduces a corresponding polynomial associated with a finite group and explores its irreducibility over specific varieties. The main results include Theorem 1, which establishes conditions for a group to be a Galois group, and Theorem 2, which demonstrates the equivalence between the applicability of Hilbert irreducibility to the corresponding polynomial and the realizability of every finite group as a Galois group over the rational numbers. The note concludes with corollaries and lemmas supporting the main theorems. It can be downloaded as pdf from here.
The note explores connections between inverse Galois theory and Hilbert irreducibility, presenting results in the form of theorems and lemmas. The main focus is on establishing conditions under which a finite group can be realized as a Galois group over the rational numbers. The note introduces a corresponding polynomial associated with a finite group and explores its irreducibility over specific varieties. The main results include Theorem 1, which establishes conditions for a group to be a Galois group, and Theorem 2, which demonstrates the equivalence between the applicability of Hilbert irreducibility to the corresponding polynomial and the realizability of every finite group as a Galois group over the rational numbers. The note concludes with corollaries and lemmas supporting the main theorems. It can be downloaded as pdf from here.
Freitag, 2. November 2018
A marketplace for machine learning models? onnx, pmml or pfa?
A simple search for "marketplace for machine learning models" reveals that a company called Algorithmia has built such a thing and has over 5000 models in the marketplace, most of which are machine learning related. However, if one wants to use/buy a model, one has to trust a third party, namely Algorithmia, to run that model. Also, the cost to use a model is based on the number of request. Hence the buyer does not own the model. There is another option, which does not exist yet to my knowledge, namely to directly sell the machine learning model to the buyer. This option is proposed here: https://arxiv.org/pdf/1805.11450.pdf. Although the paper is very technical, the main idea is to directly sell the machine learning model. This decouples the process of buying and running a model. Since models are trained on data and later scored on new data, this means, that the seller of the model and the buyer of the model, must have access to some similar dataset. This could be achieved by using open data. This is also the reason, why most of the models on Algorithmia are based on commond data, such as text and images. The data scientist trains his model on open data to solve a specific problem and then uploads his model to a marketplace. The buyer, could then upload open train data to the marketplace and let the model score the data to see how well the model performs. So he does not have to trust the seller of the model in terms of model accurracy. Based on the performance of different models, the buyer of the model, can choose the model, with the best accurracy and then the buyer can decide where to run the model, either on the cloud or on premise. This raises the question: What actually is a machine learning model? You could say, that a machine learning model is a specification of how to produce an output given some input. The next question is, how to transport a model from seller to buyer? This could be achieved using a model-format such as ONNX, PMML or PFA (portable format for analytics). The next question is , which of these model-format will be used? This is a wrong question to be posed. Why? Because as there are many document formats ( doc, docx, pdf, odf etc.) there can be many model-based formats. The analogy to document formats makes one thing clear: There are producers of documents and consumer of documents. The producers can nowadays choose from a set of marketplaces [just google: marketplace sell pdf], where to sell the document. Similarily the buyer can choose where to buy. This is only possible by decoupling the process of creating a document and viewing the document. Thus, it should be possible to do the same with machine learning models, using model-formats. A company comes to mind, where already a lot of data scientists, compete for producing the best model: Kaggle. Thus Kaggle or similar websites, might be candidates for offering such a marketplace for "opend-data-ml-models". The only thing they have to do, is offering a service for data scientists to export their "kernels" to a model-format such as ONNX, PMML or PFA and then let companys find the best model which solves their problem.
Edit: I wrote a request to Kaggle, but did not get a response, so I decided to start an experiment, to see how the idea eveolves:
mlmarketplace, buy and sell onnx, pmml, pfa
What is your opinion on this topic? Comments are welcome!
Please share this article if you like the idea.
Dienstag, 24. Januar 2017
A note in Galois Theory
Theorem 1:
Let $$\mathbb{G} = \{ \sigma_1,\cdots,\sigma_n\}$$ be a finite group, $$x_1,\cdots,x_n$$ be indeterminates
and let $$\rho$$ be the regular representation of $$\mathbb{G}$$. Let $$x = (x_1,\cdots,x_n)^T$$ and $$G = (\rho(\sigma_1)x,\cdots,\rho(\sigma_n)x)$$ , $$A' = diag(x_1,\cdots,x_n)$$.
Set $$A=GA'G^{-1}$$. Suppose there exists $$\alpha:=(\alpha_1,\cdots,\alpha_n) \in \mathbb{C}^n$$ such that:
1) $$det(G(\alpha)) \neq 0$$
2) $$A = A(\alpha)$$ has rational coefficients
3) the characteristic polynomial $$\chi_A(t)$$ is irreducible over $$\mathbb{Q}$$.
Then $$Gal(\chi_A(t)/\mathbb{Q}) = \mathbb{G}$$.
Proof:
On the one hand we have
\[ GA' = AG = GA' = (x_1\rho(\sigma_1)x,\cdots,x_n\rho(\sigma_n)x) \]
On the other hand we have
\[ GA' = AG = A (\rho(\sigma_1)x,\cdots,\rho(\sigma_n)x) = (A\rho(\sigma_1)x,\cdots,A\rho(\sigma_n)x) \]
and it follows that
\[
(A\rho(\sigma_1)x,\cdots,A\rho(\sigma_n)x) = GA' = (x_1\rho(\sigma_1)x,\cdots,x_n\rho(\sigma_n)x)
\]
and so we get
\begin{equation}\label{eq:1}
A\rho(\sigma_i)x = x_i \rho(\sigma_i)x \text{ for } i=1,\cdots,n
\end{equation}
Let $$K:=\mathbb{Q}(\alpha_1,\cdots,\alpha_n)$$. From $$A\rho(\sigma_i) \alpha = \alpha_i \rho(\sigma_i)\alpha$$ follows
for $$i=1$$, $$\rho(\sigma_1) = 1$$ that $$A\alpha = \alpha_1 \alpha$$. Application of $$\tau \in Gal(K/\mathbb{Q})$$
at the coefficients of the last equation gives us with $$\tau(A) = A$$ (since $$A$$ has rational coefficients)
that
\begin{equation}\label{eq:2}
A\tau(\alpha) = \tau(\alpha_1)\tau(\alpha) = \alpha_{\tau} \tau(\alpha)
\end{equation}
where we have set $$\alpha_{\tau} = \tau(\alpha_1)$$. This means that $$\tau(\alpha)$$ is an eigenvector of $$A$$ to the
eigenvalue $$\alpha_{\tau}$$. But because of equation $$(1)$$ we have:
\begin{equation}\label{eq:3}
A\rho(\sigma_{\tau}) \alpha = \alpha_{\tau} \rho(\sigma_{\tau}) \alpha
\end{equation}
Since the eigenvectors $$\rho(\sigma_1)\alpha,\cdots,\rho(\sigma_n)\alpha$$ of $$A$$ build a basis of $$K^n$$, it means
that every eigenspace has dimension $$1$$. Especially there exists $$a_{\tau} \in K$$ such that
\begin{equation}\label{eq:4}
\tau(\alpha) = a_{\tau} \rho(\sigma_{\tau}) \alpha
\end{equation}
If we build the polynomials with roots from the coefficients of the left and right side of $$(4)$$, then they must
be because of $$(4)$$ be equal:
\begin{equation}\label{eq:5}
\chi_A(t) = \prod_{i=1}^n (t -\alpha_i) =^{(4)} \prod_{j=1}^n (t - a_{\tau} \alpha_j)
\end{equation}
This means that the coefficients of both polynomials must be equal, especially we get
\begin{equation}\label{eq:6}
\sum_{i=1}^n \alpha_i = a_{\tau} \sum_{i=1}^n \alpha_i
\end{equation}
By a theorem of Frobenius concerning the factorization of the group determinant, we have
\begin{equation}\label{eq:7}
det( G(x_1,\cdots,x_n)) = (x_1+\cdots+x_n) p(x_1,\cdots,x_n)
\end{equation}
for some polynomial $$p$$. From $$(7)$$ follows by assumption
\[
0 \neq det(G(\alpha_1,\cdots,\alpha_n)) = (\alpha_1+\cdots+\alpha_n) p(\alpha_1,\cdots,\alpha_n)
\]
which means that $$0 \neq \alpha_1+\cdots+\alpha_n$$ and because of $$(6)$$ we get
\begin{equation}\label{eq:8}
a_{\tau} = 1
\end{equation}
From $$(4)$$ and $$(8)$$ it follows that
\begin{equation}\label{eq:9}
P_{\tau} \alpha = \tau(\alpha) = \rho(\sigma_{\tau}) \alpha
\end{equation}
where $$P_{\tau}$$ is the permutation matrix which corresponds to $$\tau$$. Since
\[
\chi_A(t) = \prod_{i=1}^n t-\alpha_i
\]
is separable, it follows that $$\alpha_k \neq \alpha_l$$ for $$k \neq l$$.
This means that the two permutation matrices $$P_{\tau}$$ and $$\rho(\sigma_{\tau})$$ from $$(9)$$ must be equal,
that is
\begin{equation}\label{eq:10}
P_{\tau} = \rho(\sigma_{\tau})
\end{equation}
From this we get a mapping:
\[
\phi: Gal(K/\mathbb{Q}) \rightarrow \mathbb{G}, \tau \mapsto \sigma_{\tau}
\]
$$\phi$$ is injective:
Let $$\sigma_{\tau} = \phi(\tau) = \phi(\tau') = \sigma_{\tau'}$$. Then we get
\[
P_{\tau} =^{(10)} \rho(\sigma_{\tau}) = \rho(\sigma_{\tau'}) = ^{(10)} P_{\tau'}
\]
and since the regular representation $$P$$ is faithful, it follows that $$\tau = \tau'$$.
$$\phi$$ is surjective:
Let $$m = |Gal(K/\mathbb{Q})|$$, $$n=|\mathbb{G}|=deg(\chi_A(t))$$.
Since $$\chi_A(t)$$ is irreducible, $$deg(\chi_A(t)) = n$$ is a divisor of $$m = |Gal(K/\mathbb{Q})|$$ and
it follows that $$n \le m$$. Since $$\phi$$ is injective it follows that $$m \le n$$, which means that $$m=n$$.
Since the mapping $$\phi$$ is injective on two finite sets of equal size, this means that $$\phi$$ must also
be surjective.
$$\phi$$ is an antihomomorphism of groups:
\[ \tau(\alpha) = \rho(\sigma_m)\alpha \rightarrow \phi(\tau) = \sigma_m\]
\[ \tau'(\alpha) = \rho(\sigma_k)\alpha \rightarrow \phi(\tau') = \sigma_k\]
\[ (\tau \tau')(\alpha) = \rho(\sigma_l)\alpha \rightarrow \phi(\tau \tau') = \sigma_l\]
From this it follows that:
\[
\rho(\sigma_l)\alpha = (\tau \tau')(\alpha) = \tau(\tau'(\alpha)) = \tau(\rho(\sigma_k) \alpha) = \rho(\sigma_k) \tau(\alpha) = \rho(\sigma_k) \rho(\sigma_m)\alpha
\]
Since $$\alpha_i$$ are separable, it follows that the permutation matrices $$\rho(\sigma_l)$$ and $$\rho(\sigma_k \sigma_m)$$ must be equal:
\[
\rho(\sigma_l) = \rho(\sigma_k \sigma_m)
\]
and from this, since the regular representation is faithful, it follows that $$\sigma_l = \sigma_k \sigma_m$$.
From this we get:
\[
\phi(\tau \tau') = \sigma_l = \sigma_k \sigma_m = \phi(\tau') \phi(\tau)
\]
Since for two finite groups which are antiisomorph it follows that they are isomorph (if $$\phi(xy)= \phi(y)\phi(x)$$ is an antihomomorphism, then $$\Phi(x):=\phi(x)^{-1}$$ is a homomorphism),
we conclude that
\[
Gal(K/\mathbb{Q}) = \mathbb{G}
\]
Definition:
Let $$\mathbb{G}$$ be a finite group with $$n$$ elements. Let $$\mathbb{Q}[x_1,\cdots,x_n]^{\mathbb{G}} = \mathbb{Q}[g_1,\cdots,g_m]$$. Then there exist polynomials $$s_j \in \mathbb{Q}[y_1,\cdots,y_m]$$ for $$j=1,\cdots,n$$ such that
$$e_j(x_1,\cdots,x_n) = s_j(g_1(x_1,\cdots,x_n), \cdots, g_m(x_1,\cdots,x_n))$$ for $$j=1,\cdots,n$$ where $$e_j$$ is the $$j$$-elementary symmetric polynomial in $$x_k$$. Consider the polynomial
$$p(t,y_1,\cdots,y_m) = t^n - s_1(y_1,\cdots,y_m) t^{n-1}+\cdots+(-1)^n s_n(y_1,\cdots,y_m)$$ which is a polynomial in $$\mathbb{Q}[t,y_1,\cdots,y_m]$$. We call $$p$$ the corresponding polynomial of the group $$\mathbb{G}$$ to the invariants
$$g_1,\cdots,g_m$$. Let $$I=\{h \in \mathbb{Q}[y_1,\cdots,y_m] | h(g_1,\cdots,g_m) = 0 \}$$. Define the variety $$V_{\mathbb{G}} = \{ a \in \mathbb{Q}^m | h(a) = 0 \forall h \in I \}$$. We say that a polynomial is Hilbert irreducible over some variety,
if it is irreducible and there exists a point in the variety such that the polynomial remains irreducible after specializing to the point of the variety and that other polynomials are not zero on this point.
Lemma 1:
The corresponding polynomial is irreducible in $$\mathbb{Q}[t,y_1,\cdots,y_m]$$.
Proof:
If the polynomial factored in a non-trivial way, then because of the $$t^n$$ term the factors must have degree less than $$n$$ in $$t$$ (consider the factorization in $$R=\mathbb{Q}(y_1,y_2,\ldots,y_n)[t]$$; note also that we can assume that the factors over $$R$$ are monic polynomials in $$t$$). Now specialise via $$y_i\mapsto g_i$$ and we get a non-trivial factorization of the specialised polynomial in $$\mathbb{Q}[g_1,\ldots,g_m][t]$$ and hence in $$\mathbb{Q}[x_1,\ldots,x_n][t]$$ and so in $$\mathbb{Q}(x_1,\ldots,x_n)[t]$$. But we know the complete factorization in this ring, it's just $$\prod(t-x_i)$$, so our given factorization must specialise into factors of the form $$\prod_{i\in I}(t-x_i)$$ for some subsets $$I$$ of $$\{1,2,\ldots,n\}$$ (with each $$I$$ not empty or the whole thing), and the constant term of each factor must be in $$\mathbb{Q}[g_1,\ldots,g_m]$$ and hence $$G$$-invariant. This is a contradiction.
Lemma 2:
Let $$f_1,\cdots,f_r \in \mathbb{Q}(x_1,\cdots,x_n)^{\mathbb{G}}$$ and suppose that the corresponding polynomial of $$\mathbb{G}$$ is Hilbert irreducible over the variety $$V_G$$.
Then there exists $$\alpha = (\alpha_1,\cdots,\alpha_n) \in \mathbb{C}^n$$ such that:
1) $$f_1(\alpha),\cdots,f_r(\alpha)$$ are rational numbers.
2) $$\prod_{i=1}^n t - \alpha_i$$ is irreducible in $$\mathbb{Q}[t]$$.
Proof:
We can write
\[
f_i = \frac{P_i(g_1,\cdots,g_m)}{Q_i(g_1,\cdots,g_m)} \text{ for } P_i,Q_i \in \mathbb{Q}[y_1,\cdots,y_m]
\]
The corresponding polynomial $$p(t,y_1,\cdots,y_m)$$ is irreducible in $$\mathbb{Q}[t,y_1,\cdots,y_m]$$ by Lemma 1. By assumption, we can find $$(a_1,\cdots,a_m)$$ in $$V_{\mathbb{G}}$$ such that:
1) $$p(t,a_1,\cdots,a_m)$$ is irreducible in $$\mathbb{Q}[t]$$.
2) $$Q_i(a_1,\cdots,a_m) \neq 0$$ for $$i=1,\cdots,r$$.
By application of the Hilbert Nullstellensatz we can find $$\alpha =( \alpha_1,\cdots,\alpha_n) \in \mathbb{C}^n$$ such that $$g_i(\alpha) = a_i$$ for $$i=1,\cdots,m$$.
Then
\[
e_j(\alpha) = s_j(g_1(\alpha),\cdots,g_m(\alpha)) = s_j(a_1,\cdots, a_m)
\]
and the polynomial $$p(t,a_1,\cdots,a_m)$$ factors over $$\mathbb{C}$$ as
$$p(t,a_1,\cdots,a_m) = \prod_{i=1}^n t - \alpha_i $$.
Then we get
\[
f_i(\alpha) = \frac{P_i(g_1(\alpha),\cdots,g_m(\alpha))}{Q_i(g_1(\alpha),\cdots,g_m(\alpha))} = \frac{P_i(a_1,\cdots,a_m)}{Q_i(a_1,\cdots,a_m)}
\]
is a rational number, which was to be shown.
Lemma 3:
Let the notation be as in Theorem 1. Then the entries $$a_{ij}$$ of the matrix $$A$$ are elements of $$\mathbb{Q}(x_1,\cdots,x_n)^{\mathbb{G}}$$.
Proof:
We have $$A(x) = G(x) A'(x) G(x)^{-1}$$ and want to show that $$A(\rho(\sigma)x) = A(x)$$ for all $$\sigma \in \mathbb{G}$$.
For this, it is sufficient to show that
\[
A'(\rho(\sigma)x) = \rho(\sigma)A'(x)\rho(\sigma)^{-1}
\]
and
\[
G(\rho(\sigma)x) = G(x) \rho(\sigma)^{-1}
\]
as it follows that
\[
A(\rho(\sigma) x) = G(\rho(\sigma)x) A'(\rho(\sigma) x ) G(\rho(\sigma) x) ^{-1}
\]
\[
= G(x) \rho(\sigma)^{-1} \rho(\sigma) A'(x) \rho(\sigma)^{-1} \rho(\sigma) G(x)^{-1}
\]
\[
= G(x) A'(x) G(x)^{-1} = A(x)
\]
Corollary 1:
Let $$\mathbb{G}$$ be a finite group such that the correspoding polynomial is Hilbert irreducible over $$V_{\mathbb{G}}$$.
Then $$\mathbb{G}$$ is realizable over $$\mathbb{Q}$$ as a Galois group.
Proof:
Consider the regular representation of $$\mathbb{G}$$ and let $$A = (a_{ij})$$ be defined as in Theorem 1.
Then by Lemma 3 we have $$a_{ij} \in \mathbb{Q}(x_1,\cdots,x_n)^{\mathbb{G}}$$. Hence
by Lemma 2 there exists $$\alpha \in \mathbb{C}^n$$ such that
1) $$a_{ij}(\alpha)$$ are rational numbers.
2) The characteristic polynomial $$\chi_A(t)=\prod_{i=1}^n t-\alpha_i$$ is irreducible in $$\mathbb{Q}[t]$$.
Since $$a_{ij}(\alpha)$$ are rational numbers, the matrix $$A(\alpha) = G(\alpha) A'(\alpha) G(\alpha)^{-1}$$
is defined for $$\alpha$$, hence $$det(G(\alpha)) \neq 0$$.
By Theorem 1 therefore it follows that $$Gal(\chi_A(t) / \mathbb{Q}) = \mathbb{G}$$.
Theorem 2:
The following are equivalent:
1) Hilbert Irreducibility can be applied to the corresponding polynomial $$p(t,y_1,\cdots,y_m)$$ over the variety $$V_\mathbb{G}$$
2) Every finite group $$\mathbb{G}$$ is Galois over $$\mathbb{Q}$$.
Proof:
1) $$\rightarrow$$ 2): Corollary 1.
2) $$\rightarrow$$ 1)
Let $$\mathbb{G} = Gal(K/\mathbb{Q}) = \{ \sigma_1,\cdots,\sigma_n\}$$ be a Galois group. By the normal
basis theorem there exists a $$\theta \in K$$ such that $$\sigma_1(\theta),\cdots,\sigma_n(\theta)$$ build a
basis of $$K$$ over $$\mathbb{Q}$$. Let $$\mathbb{Q}[x_1,\cdots,x_n]^\mathbb{G} = \mathbb{Q}[g_1,\cdots,g_m]$$ where
$$\mathbb{G}$$ acts through the regular representation. Let $$I=\{h \in \mathbb{Q}[y_1,\cdots,y_m] | h(g_1,\cdots,g_m)=0\}$$
and $$V_\mathbb{G} = \{ a \in \mathbb{Q}^m | h(a) = 0 \forall h \in I \}$$.
Let now $$Q_1,\cdots,Q_r \in \mathbb{Q}[y_1,\cdots,y_m]$$ be given, such that $$Q_i \notin I$$ for $$i=1,\cdots,r$$. Let $$Q:=Q_1\cdots Q_r$$.
Since $$I$$ is a prime ideal and $$Q_i \notin I$$, we must have $$Q \notin I$$, which means that there exists $$b \in \mathbb{Q}^n$$ such that
\[ Q(g_1(b),\cdots,g_m(b)) \neq 0 \]
that is $$\hat{f}(x_1,\cdots,x_n) = Q(g_1(x_1,\cdots,x_n),\cdots,g_m(x_1,\cdots,x_n))$$ is a reduced polynomial ( since $$\mathbb{Q}$$ has infinite elements)
(see Algebra, Lang, p. 177, p. 312).
Since the $$\sigma_i$$ by Artin (p. 311, Th. 12.2 Algebra, Lang) are algebraically independent and $$\hat{f}$$ is reduced,
there exists $$\alpha \in K$$, such that
\[
0 \neq Q(g_1(\sigma_1(\alpha),\cdots,\sigma_n(\alpha)),\cdots,g_m(\sigma_1(\alpha),\cdots,\sigma_n(\alpha)))
\]
Let $$\alpha = \sum_{i=1}^n{\alpha_i \sigma_i(\theta)}$$ with $$\alpha_i \in \mathbb{Q}$$
Then we have for $$i=1,\cdots,m$$ that $$g_i(\sigma_1(\alpha),\cdots,\sigma_n(\alpha)) = p_i(\alpha_1,\cdots,\alpha_n)$$ is a rational number,
where $$p_i$$ is some polynomial in $$\mathbb{Q}[x_1,\cdots,x_n]$$.
But then also
\[ 0 \neq Q(g_1(\sigma_1(\alpha),\cdots,\sigma_n(\alpha)),\cdots,g_m(\sigma_1(\alpha),\cdots,\sigma_n(\alpha)))
\]
\[
= Q(p_1(\alpha_1,\cdots,\alpha_n),\cdots, p_m(\alpha_1,\cdots,\alpha_n))
\]
\[
= q(\alpha_1,\cdots,\alpha_n)
\]
for some polynomial $$q \in \mathbb{Q}[x_1,\cdots,x_n]$$. We can choose the $$\alpha_i$$ such that $$\alpha_i \neq \alpha_j$$ for $$i \neq j$$,
otherwise replace $$\hat{\alpha}_j = \alpha_i + u$$ for some rational number $$u \neq 0$$ and we get the polynomial in u:
\[
q(u) := q(\alpha_1,\cdots,\alpha_{j-1}, \alpha_i + u , \alpha_{j+1}, \cdots, \alpha_n)
\]
Since $$q(0) \neq 0$$, $$q$$ is not the zero polynomial in $$\mathbb{Q}[u]$$. By choosing $$u \neq 0$$ as a rational number to avoid the
finitely many roots of $$q$$, we get $$q(u) \neq 0$$ and $$\hat{\alpha}_j = \alpha_i + u \neq \alpha_i$$.
But then also $$\sigma_i(\alpha) \neq \sigma_j(\alpha)$$ for $$i \neq j$$, since $$\alpha$$ has pairwise distinct coordinates $$\alpha_k \neq \alpha_l$$ to
the given basis $$\sigma_1(\theta),\cdots,\sigma_n(\theta)$$. The rational and separable polynomial
\[
\prod_{i=1}^n (x -\sigma_i(\alpha))
\]
is irreducible in $$\mathbb{Q}[x]$$, since the Galois group operates transitively on the roots.
Let $$a_i := g_i(\sigma_1(\alpha),\cdots,\sigma_n(\alpha))$$ for $$i=1,\cdots,m$$.
Then we have $$a:=(a_1,\cdots,a_m)$$ is an element of $$V_\mathbb{G}$$ and also
\[
p(t,a_1,\cdots,a_m) = \sum_{i=0}^n (-1)^i s_i(a_1,\cdots,a_m) t^{n-i}
\]
\[
= \sum_{i=0}^n (-1)^i s_i(g_1(\sigma_1(\alpha),\cdots,\sigma_n(\alpha)),\cdots,g_m(\sigma_1(\alpha),\cdots,\sigma_n(\alpha))) t^{n-i}
\]
\[
= \sum_{i=0}^n (-1)^i e_i(\sigma_1(\alpha),\cdots,\sigma_n(\alpha)) t^{n-i}
\]
\[
= \prod_{i=1}^n (t -\sigma_i(\alpha))
\]
is an irreducible polynomial in $$\mathbb{Q}[t]$$. Furthermore we have
$$Q(a_1,\cdots,a_m) \neq 0$$, which means that $$Q_j(a_1,\cdots,a_m) \neq 0$$ for all $$j=1,\cdots,r$$, which was to be shown.
This note can be downloaded from here (A note in Galois Theory, Orges Leka) as a pdf file.
Abonnieren
Posts (Atom)