There is plenty of room in hyperspace
How much space? What does “amount of space” even mean? In our three-dimensional world, in what sense does a square meter of grass contain more room than one meter of string, or one cubic meter of water more than the patch of grass? In their respective dimension, they are all just “one something”. Yet, a cube is clearly “more” than a square in the intuitive sense. How to approach this? To compare something two-dimensional against a one-dimensional entity, one can look at the boundary of the two-dimensional region. Similarly, to compare a tree-dimensional object against a two-dimensional, we can look at the surface area. You can arrange 4 lines along the boundary of the square, and 6 squares along the surface of the cube. Now let us extrapolate this argument using combinatorics. Consider the number of $n$-facets you can arrange on a $n$-cube. An $n$-facet is defined as a subset of a hyperplane in $n$-dimension, that is, it is an entity of $n-1$ dimensions (note that some might define this is an $(n-1)$-facet). To select one side of the $n$-cube, you choose $n-1$ of the axes and form an $n$-facet from it. The number of such choices grows by ${n \choose n-1} = n$. This forms the surface of half the $n$-cube, so the total surface can be covered with $2n$ facets. This means that, in some sense, the “accumulated surface area” as you construct an object in each dimension from facets in one dimensions below is
$$2 (2 \times 2) (2 \times 4) … (2 \times n) = 2^n n!$$
which is a lot of space!
A more general question is: how many $k$-facets can we fit onto the surface of the $n$-cube? A quick and dirty heuristic is just the “size of n-space divided the size by k-space”
$$ \frac{2^nn!}{2^kk!} $$
However, that double counts $k$-facets (think overlapping square boundaries on the cube). The proper equation is not hard to derive though. Given a $k$-facet, we can translate it along an $l$-facet picked from the orthogonal $(n-k)$-subspace, for any $0 \leq l \leq n - k$ (you can imagine the lines orthogonal to each square on the cube for example). Counting the number of ways to select a $k$-facet in $n$ dimensions times the number of translations (that is, way to pick $l$-facets in $n-k$ dimensions), we get
$$ \sum_{l=0}^{n - k} {n - k \choose l}{n \choose k} = 2^{n-k}{n \choose k} = \frac{2^n n!}{2^k k! (n-k)!} $$
So the number of double counts in the heuristic equation grows by $(n-k)!$ This gives us a direct way to compare the “amount of space” of two entities of different dimensionality. We can check the equation in physical dimensions for lines on a square, squares on a cube and lines on a cube respectively
what | n | k | count |
lines on square | $2$ | $1$ | $2 {2 \choose 1} = 2 \times 2 = 4$ |
squares on cube | $3$ | $2$ | $2 {3 \choose 2} = 2 \times 3 = 6$ |
lines on cube | $3$ | $1$ | $2^2 {3 \choose 1} = 4 \times 3 = 12$ |
which is indeed correct.
This article explores the topic of surface areas in higher dimensions, partly to gain some intuition for it, but also to derive some practical heuristics for developing “hyper-dimensional software”. When going into higher dimensions one has to thread carefully, intuition is quickly lost as analogies in lower dimensions tend to break (there are plenty of examples that demonstrates geometrical paradoxes for cubes and spheres). There is no red thread here really, just a bunch of thoughts that has come to mind and I decided to formalize them a bit.
Contents
Hypercube geostatistics
Given two random vectors $x$ and $y$ on the hypercube $H_n = \{-1,+1\}^n$, what is the distribution of their inner product $x^T y$? Since the components of the vectors are drawn uniformly at random, we can easily compute this distribution using combinatorics. To begin with, the vectors are discrete and the inner product can only have certain values. To simplify the analysis, assume $n$ is even, then
$$ x^T y \in \{ -n, -n + 2, …, -2, 0, +2, …, n - 2, n \} $$
For $x$ fixed (but chosen at random), define the set $Y_m = \{ x^T y = m \}$. It is obvious that $Y_{-n} = \{ -x \}$ and $Y_n = \{ x \}$. Any inner product value $n-2k$ is obtained by “flipping” $k$ number of bits, which can be chosen arbitrarily. Hence the size of $Y_{n-2k}$ is
$$ |Y_{n-2k}| = {n \choose k} $$
Since the bits are drawn uniformly at random, the “bit-flip distribution” is the binomial distribution $B(n,1/2)$. For increasing dimension $n$, this distribution have an increasingly narrow maxima at $k = n/2$ (the standard deviation grows by $\sqrt{n}/2$). The conclusion from this is that in higher dimensions, most vectors are orthogonal against each other! This insight is taken advantage of in the field of hyperdimensional computing, a.k.a. vector symbolic architectures (VSA). 1 VSA define an algebra based on the insight of almost guaranteed orthogonality in the probabilistic setting. The algebra formalize storage and retrieval of information in associative memories, something that has recently sparked renewed interest in certain fields of research and industry.
Hypersphere geometry
It is a well known results that in higher dimensions, directions play a bigger role than position, and that is why analyzing the hypersphere is of big interest. A couple of years ago I worked on building a vector database for matching biometric markers. Biomarkers are usually represented as unit vector embedding in $n$ dimensions, that is, they belong to the $n$-sphere $S_n$, defined as
$$ S_n = \{ x \in \mathbb{R}^n ; ||x|| = 1 \} $$
Note that some define the $n$-sphere as the $n$-dimensional manifold embedded in $n+1$ ambient space. However, I find it more intuitive to talk about $n$-balls and $n$-spheres as the volume and surface of some entity in $n$-dimensions, even though the surface is an $(n-1)$-dimensional manifold. Distance between two points is given by the geodesic on the hypersphere. Note that this distance is an angle, and the maximum distance is $\pi$ rad $= 180^\circ$. When making a query against the database, several questions come to mind. What is the minimum distance between any two points? When can a match be considered close to the query? These questions are interesting to answer in order to gain intuition to these constructs, and as a practical tool to set thresholds in real matching algorithms.
Optimal point allocation
To answer the question regarding minimum distance between any two points, assume $k$ points are placed optimally on the sphere. We define optimal allocation of a set $Q$ points on a closed manifold $M$ equipped with a metric $d$ as follows: Assume a point $x \in Q$ is picked uniformly at random, and let $y$ be the nearest point to $x$. The allocation $Q$ is optimal if it maximizes the expected minimum distance
$$ Q^* = \max_Q E_{x}[d(x,\arg\min_{y \neq x} d(x,y))] $$
Finding such an allocation is not just hard, it is not even unique if the manifold $M$ have symmetries. For example, allocations on the hypersphere can be rotated around any hyperplane while maintaining optimality.
Ball-parking with the hypercube
With the definition of optimal point allocation in mind, let us continue by constructing one (not necessarily optimal) allocation and calculate some properties to get a better understanding of the problem. Create a base set $A_n$ as the set of unit norm axis aligned vectors defined as
$$ \newcommand\rad[1]{#1 \text{ rad}} \newcommand\deg[1]{#1^\circ} \newcommand\setof[1]{ \{ #1 \} } A_n = \setof{(\pm 1, 0_{n-1}), (0, \pm 1, 0_{n-2}), …, (0_{n-1}, \pm 1} $$
such that $|A_n| = 2n$. This means that in $n$ dimensions we can always pick $2n$ points with $\deg{90}$ separation. Given the base set, we can construct a derived set $Q_{nm}$ as the set created by interpolating the points of all “non-canceling subsets” of $A_n$ of size $m$. A subset is non-canceling if no points sum to 0 within it. In other words, subsets a constructed by picking one axis and sign at a time and compensate for double counts. Hence
$$ |Q_{nm}| = \frac{2n \times 2(n-1) \times … \times 2(n-m+1)}{m!} = \frac{2^mn!}{m!(n-m)!} = 2^m {n \choose m} $$
Let us confirm the formulate for some values of $n$ and $m$
$n$ | $m$ | $Q_{nm}$ |
$2$ | $1$ | $2^1 {2 \choose 1} = 4 $ (axes) |
$2$ | $2$ | $2^2 {2 \choose 2} = 4 $ (quadrants) |
$3$ | $1$ | $2^1 {3 \choose 1} = 2 \times 3 = 6 $ (axes) |
$3$ | $2$ | $2^2 {3 \choose 2} = 4 \times 3 = 12 $ (plane aligned quadrants) |
$3$ | $3$ | $2^3 {3 \choose 3} = 8 \times 1 = 8 $ (cube octants) |
It looks correct for the physical dimensions which is easy to think about at least!
Interpolation is done by addition followed by renormalization. What is the minimum distance between any two points in $Q_{nm}$? A point $q \in Q_{nm}$ have $m$ non-zero coordinates. For any two points $p \neq q$, they have at most $m-1$ coordinates in common, and the last with reversed sign. Hence, the maximal inner product is $(p,q)_{\text{max}} = \frac{(m-2)}{m}$ and the minimum distance is $d_\text{min} = \cos^{-1}(\frac{m-2}{m})$. Tabulated for some values of $m$ we get
m | deg |
2 | 90 |
3 | 71 |
4 | 60 |
5 | 53 |
6 | 48 |
7 | 44 |
8 | 41 |
9 | 39 |
10 | 37 |
… | … |
100 | 11 |
1000 | 4 |
10000 | 1 |
Note that the numbers are independent of the dimension n, and that numbers falls of slower and slower towards zero due to the slowly converging series $\frac{m-2}{m}$.
We now define the set $P_{n} = \bigcup_{m=1}^{n} Q_{nm}$ as the union of all derived sets, starting with $Q_{n,1} = A_n$ up to $Q_{nn} = H_n$ where $H_n$ is the discrete hypercube projected onto the hypersphere. What is the minimum distance between any two points $p \neq q \in P_{nm}$? Given two derived sets $Q_{na}$ and $Q_{nb}$, where $a < b$, for any $p \in Q_{na}$, one point $q \in Q_{nb}$ closest to $p$ is has non-zero coordinates where $p$ is non-zero. Note that the components of $p_i = \pm \frac{1}{\sqrt{a}}$ and $q_i = \pm \frac{1}{\sqrt{b}}$, so matching all $a$ non-zero components of $p$ gives $(p,q)_{\text{max}} = \frac{a}{\sqrt{a}\sqrt{b}} = \sqrt\frac{a}{b}$. For $a = 1$ and $b=2$ we have $d_\text{min} = \cos^{-1}\sqrt\frac{1}{2} = \deg{45}$ as expected (in two dimensions this is the minimum angle between axis aligned and hypercube vectors)!
Note that with this set construction, the distance between two consecutive derived sets are less than within a single derived set. The series $\sqrt{a/b}$ is monotonically increasing, hence the smallest distance withing $P_n$ is $d_\min = \cos^{-1}\sqrt\frac{n-1}{n}$. Tabulated for the first dimensions we get
n | deg |
2 | 45 |
3 | 35 |
4 | 30 |
5 | 27 |
6 | 24 |
7 | 22 |
8 | 21 |
9 | 19 |
10 | 18 |
… | … |
100 | 5 |
1000 | 2 |
10000 | 0.5 |
This gives an explicit construction with computable point allocation properties.
Small angle approximation on the tangent plane
Two answer the theoretical question regarding closeness of a query point to a nearest match in the general case, let us transist from combinatorics and hypercube approximation to hyperspherical geometry. The volume of the $n$-ball with radius $r$ is
$$ V_n(r) = \frac{\pi^{n/2}}{\Gamma(1 + n/2)} r^n $$
The area of the $n$-sphere with radius $r$ is
$$ S_n(r) = \frac{2 \pi^{n/2}}{\Gamma(n/2)} r^{n-1} $$
Around any point on the $n$-sphere we can make a small angle approximation such that the geodesic distance to another point is approximately the same as the radius of a $(n-1)$-ball on the tangent plane. In three dimensions, this is equivalent of calculating the area of a circle centered around a point on the sphere. Hence, with this approximation we can set up the equation
$$\begin{eqnarray} V_{n-1}(r) &=& \frac{S_n(1)}{k} \newline \frac{\pi^{(n-1)/2} r^{n-1}}{\Gamma(n/2)} &=& \frac{2 \pi^{n/2}}{k \Gamma(n/2)} \newline r^{n-1} &=& \frac{\sqrt{4\pi}}{k} \newline r &=& (\sqrt{4\pi}/k)^{1/(n-1)} = (\alpha / k)^{1/(n-1)} \approx (\alpha / k)^{1/n} \end{eqnarray}$$
For $n = 3$ we have $r = \sqrt{\alpha / k}$. Let us say we place 10 points on the sphere, that gives $r \approx 34^\circ$. For 100 points it gives $r \approx 11^\circ$. Is that a reasonable approximation to the average distance between any two points? Looking back at the explicitly constructed set $P_n$, it has size $|P_n| = 26$, and the minimum distance within it is computed as $\cos^{-1}\sqrt{25/26} \approx \deg{11}$. The number computed with the tangent plane approximation is $\approx \deg{21}$. This does not tell us that much since we neither know if $P_n$ is optimal nor do we know how accurate the tangent plane approximation is. Let us tabulate for varying number of points $k$ in 3 dimensions just to see how the numbers behave
k | deg |
2 | 74 |
3 | 62 |
4 | 54 |
5 | 48 |
6 | 44 |
7 | 41 |
8 | 38 |
9 | 36 |
10 | 34 |
… | … |
100 | 11 |
1000 | 5 |
10000 | 1 |
which looks very similar to previous tabulations (though those are for increasing dimension and not number of points!).
Failing to provide an answer to the accuracy of the tangent plane approximation, let us at least derive some bounds on it. For increasing number of dimensions $n$, with $k$ fixed the formula converges to $1$ radians ($\approx 57^\circ$), and becomes increasingly insensitive to the number of points. Obviously, for any $k \leq 2n$, we can always place the points along each axis and hence have a separation of at least 90 degrees. This is not a surprise since we rely on a small angle approximation, which is invalid for angles that large. Hence, the approximation is only reasonable to use $k$ greater than $2n$, and it becomes increasingly better for higher $k$. Can we parameterize $k$ as a function of $n$, such that the small angle approximation always holds? Let us try $k = n^n$
$$ r \approx (\alpha/n^n)^{1/n} = \frac{\alpha^{1/n}}{n^{n/n}} \rightarrow \frac{1}{n} $$
Now that is good news! As we increase the number of dimensions, we can fit “hyper-exponentially” many more points on the hypersphere while only shrinking the average distance inversely proportional to the dimension!
Just to illustrate just how quickly this grows
$$\begin{eqnarray} n = 1 &\rightarrow& k = 1 \newline n = 2 &\rightarrow& k = 4 \newline n = 3 &\rightarrow& k = 27 \newline n = 4 &\rightarrow& k = 256 \newline n = 5 &\rightarrow& k = 3125 \newline n = 6 &\rightarrow& k = 46656 \newline \end{eqnarray}$$
However, if we parameterize it as $k = n^{\log n}$ we get
$$ r \approx (\alpha/n^{\log n})^{1/n} = \frac{\alpha^{1/n}}{n^{\log n/n}} \rightarrow 1 $$
which indicate that the approximation fails. Hence $\log^2 n < \log k < n \log n$ is a bound on the validity of the approximation.
We can also reverse the question and ask ourselves “how many points $k$ do can I fit on a sphere in $n$ dimensions with at least $r$ degree separation”? Answer is approximately
$$ k = \frac{\alpha}{r^n} $$
Which also grows exponentially with the number of dimensions for $r < 1$ rad $\approx 57^\circ$ (but is probably only valid for even less separation due to the small angle approximation).
Solid angle of $n$-cone
Let us tackle the problem without resorting to approximations at all. The area of the $n$-cone can be derived by integration of the area of the $n-1$-sphere over a great arc
$$ C_n = \int_0^\theta S_{n-1}(\sin(\theta)) d\theta = S_{n-1} I_{n-2}(\theta) $$
The integral $I_n(\theta)$ can be solved recursively through integration by parts for $n \geq 2$
$$\begin{eqnarray} I_n(\theta) &=& \int_0^\theta \sin^n(\theta) d\theta \newline &=& \Big[ - \sin^{n-1}(\theta) \cos(\theta) \Big]_0^\theta + \int_0^\theta (n-1) \sin^{n-2}(\theta) \cos^2(\theta) d\theta \newline &=& - \sin^{n-1}(\theta) \cos(\theta) + \int_0^\theta (n-1) \sin^{n-2}(\theta) (1 - \sin^2(\theta)) d\theta \newline &=& - \sin^{n-1}(\theta) \cos(\theta) + (n-1) I_{n-2}(\theta) - (n-1) I_n(\theta) \end{eqnarray}$$
Rearranging gives
$$ I_n(\theta) = \frac{(n-1) I_{n-2}(\theta) - \sin^{n-1}(\theta)\cos(\theta)}{n} $$
Tabulating for the some values of $n$, with $n=0$ and $n=1$ being trivial to do by hand a
$n$ | $I_n(\theta)$ |
0 | $\theta$ |
1 | $1 - \cos(\theta)$ |
2 | $(\theta - \sin(\theta)\cos(\theta))/2$ |
3 | $(2 (1 - \cos(\theta)) - \sin^2(\theta)\cos(\theta))/3 = (2 - 3\cos(\theta) + \cos^3(\theta))/3$ |
where $\sin^2 = 1 - \cos^2$ have been used for $n=3$. Naturally, in two dimensions, the cone surface area is proportional to the angle, that is, the arc length along the circle. In three dimensions, the two-dimensional surface drawn out by the cone have an area proportional to a $\cos$, aligned with the intuition that “you get more solid angle per radial angle”, reaching a peak at $\deg{90}$. Let us verify the cone equations by evaluating at $\theta=\pi$, which should yield the sphere area $S_n = C_n(\pi)$.
$n$ | $S_{n-1}$ | $I_{n-2}$ | $C_n$ |
2 | $2$ | $\pi$ | $2\pi$ |
3 | $2\pi$ | $2$ | $4\pi$ |
4 | $4\pi$ | $\pi/2$ | $2\pi^2$ |
5 | $2\pi^2$ | $4/3$ | $8\pi^2/3$ |
Looks correct so far. If we to place $k$ points on the $n$-sphere, assuming the placement is optimal, then we have an upper bound on the average minimum distance between any two points by solving for the angle that yields the cone with the expected area
$$ d_\max \leq C_n^{-1} \Bigg( \frac{S_n}{k} \Bigg) $$
I will not try to solve this analytically, but this algorithm could be implemented on a computer to solve numerically. This formula provides a means to assess how close a point allocation is to the (unknown) optimal solution, which can be useful in practice.