Coverage for moptipyapps / order1d / __init__.py: 100%
0 statements
« prev ^ index » next coverage.py v7.13.0, created at 2025-12-11 04:40 +0000
« prev ^ index » next coverage.py v7.13.0, created at 2025-12-11 04:40 +0000
1"""
2A set of tools for ordering objects in 1 dimension.
4Let's assume that we have `n` objects and a distance metric that can compute
5the distance between two objects. We do not know and also do not care about in
6how many dimension the objects exist - we just have objects and a distance
7metric.
9Now we want to find a one-dimensional order of the objects that reflects their
10original distance-based topology. For each object `a`, we want that its
11closest neighbor in the order is also its actual closest neighbor according to
12the distance metric. It's second-closest neighbor should be the actual
13second-closest neighbor according to the distance metric. And so on.
15Since we only care about the object order and do not want to metrically map
16the distances to one dimension, we can represent the solution as permutation
17of natural numbers.
19Of course, in a one-dimensional order, each object has exactly two closest
20neighbors (the one on its left and the one on its right) unless it is situated
21either at the beginning or end of the order, in which case it has exactly one
22closest neighbor. Based on the actual distance metric, an object may have any
23number of closest neighbors, maybe only one, or maybe three equally-far away
24objects. So it is not clear whether a perfect mapping to the one-dimensional
25permutations even exists. Probably it will not, but this shall not bother us.
27Either way, we can try to find one that comes as close as possible to the real
28deal.
30Another way to describe this problem is as follows:
31Imagine that you have `n` objects and only know their mutual distances.
32You want to arrange them on a one-dimensional axis in a way that does sort of
33reflect their neighborhood structure in whatever space they are originally
34located in.
36The goal of solving this one-dimensional ordering problem is then to arrange
37`n` objects on a 1-dimensional (e.g., horizontal) axis given a distance matrix
38describing their location in a potentially high-dimensional or unstructured
39space.
40The objects should be arranged in such a way that, for each object,
42- the nearest neighbors on the 1-dimensional axis are also the nearest
43 neighbors in the original space (according to the distance matrix
44 provided),
45- the second nearest neighbors on the 1-dimensional axis are also the
46 second nearest neighbors in the original space (according to the distance
47 matrix provided),
48- the third nearest neighbors on the 1-dimensional axis are also the third
49 nearest neighbors in the original space (according to the distance matrix
50 provided),
51- and so on; with (e.g., quadratically) decreasing weights of neighbor
52 distance ranks.
54We do not care about the actual precise distances (e.g., something like
55"0.001") between the objects on either the one-dimensional nor the original
56space. We only care about the distance ranks, i.e., about "2nd nearest
57neighbor," but not "0.012 distance units away." The solutions of this problem
58are thus permutations (orders) of the objects. Of course, if we really want
59to plot the objects, such a permutation can easily be translated to
60`x`-coordinates, say, by dividing the index of an object by the number of
61objects, which nets values in `[0,1]`. But basically, we reduce the task to
62finding permutations of objects that reflect the neighbor structure of the
63original space as closely as possible.
65If such a problem is solved correctly, then the arrangement on the
66one-dimensional axis should properly reflect the arrangement of the objects in
67the original space. Of course, solving this problem exactly may not actually
68be possible, since an object on a one-dimensional axis may either have exactly
69two `i`-nearest-neighbors (if it is at least `i` slots away from either end of
70the permutation) or exactly `1` such neighbor, if it is closer that `i` units.
71The object directly at the start of the permutation has only 1 nearest
72neighbor (the object that comes next). That next object, however, has two,
73namely the first object and the third object. In the original space where the
74objects come from, however, there may be any number of "nearest neighbors."
75Imagine a two-dimensional space where one object sits at the center of a
76circle of other objects. Then all other objects are its nearest neighbors,
77whereas an object on the circle either has exactly two nearest neighbors or,
78maybe, in the odd situation that the radius equals a multiple of the
79distance to the neighbors on the circle, three. Such a structure cannot be
80represented exactly in one dimension.
82But that's OK.
83Because we mainly do this for visualization purposes anyway.
85Now here comes the cool thing: We can cast this problem to a Quadratic
86Assignment Problem (:mod:`~moptipyapps.qap`). A QAP is defined by a distance
87matrix and a flow matrix. The idea is to assign `n` facilities to locations.
88The distances between the locations are given by the distance matrix. At the
89same time, the flow matrix defines the flows between the facilities. The goal
90is to find the assignment of facilities to locations that minimizes the
91overall "flow times distance" product sum.
93We can translate the "One-Dimensional Ordering Problem" to the QAP as follows:
94Imagine that we have `n` unique objects and know the (symmetric) distances
95between them.
97Let's say that we have four numbers as objects, `1`, `2`, `3`, `4`.
99>>> data = [1, 2, 3, 4]
100>>> n = len(data)
102The distance between two numbers `a` and `b` be `abs(a - b)`.
104>>> def dist(a, b):
105... return abs(a - b)
107The **original** distance matrix would be
109>>> dm = [[dist(i, j) for j in range(n)] for i in range(n)]
110>>> print(dm)
111[[0, 1, 2, 3], [1, 0, 1, 2], [2, 1, 0, 1], [3, 2, 1, 0]]
113Now from this matrix, we can find the nearest neighbors as follows:
115>>> from scipy.stats import rankdata # type: ignore
116>>> import numpy as np
117>>> rnks = rankdata(dm, axis=1, method="average") - 1
118>>> print(rnks)
119[[0. 1. 2. 3. ]
120 [1.5 0. 1.5 3. ]
121 [3. 1.5 0. 1.5]
122 [3. 2. 1. 0. ]]
124From the perspective of "`1`", "`2`" is the first nearest neighbor,
125"`3`" is the second-nearest neighbor, and "`4`" is the third-nearest neighbor.
126From the perspective of "`2`", both "`1`" and "`3`" are nearest neighbors, so
127they share rank `1.5`.
128And so on.
130Let's multiply this by `2` to get integers:
131[If no fractional ranks appear, then we do not need to multiply by `2`.]
133>>> rnks = np.array(rnks * 2, dtype=int)
134>>> print(rnks)
135[[0 2 4 6]
136 [3 0 3 6]
137 [6 3 0 3]
138 [6 4 2 0]]
140Now we want to translate this to a flow matrix (NOT a distance matrix).
141What we want is that there is a big flow from each object to its nearest
142neighbor, a smaller flow from each object to its second-nearest neighbor,
143a yet smaller flow to the third nearest neighbor, and so on.
144So all we need to do is to subtract the elements off the diagonal from
145`8 = 2*n`.
146[If no fractional elements were there, then we would use `n` instead of
147`2*n`.]
148Anyway, we get:
150>>> rnks = np.array([[0 if x == 0 else 2*n - x for x in a]
151... for a in rnks], int)
152>>> print(rnks)
153[[0 6 4 2]
154 [5 0 5 2]
155 [2 5 0 5]
156 [2 4 6 0]]
158We can see:
159The flow from "`1`" to "`2`" is `6`, which higher than the flow from "`1`" to
160"`3`" (namely `4`), and so on.
161The low from "`2`" to "`1`" is `5`, which is a bit lower than `6` but higher
162than `4`, because "`1`" is the "1.5th nearest neighbor of `2`".
164Just for good measures, we can weight this qudratically, so we get:
166>>> rnks = rnks ** 2
167>>> print(rnks)
168[[ 0 36 16 4]
169 [25 0 25 4]
170 [ 4 25 0 25]
171 [ 4 16 36 0]]
173This way, we give much more importance to "getting the nearest neighbors
174right" than getting "far-away neighbors" right. Nice.
176OK, but how about the distance matrix for the QAP?
177What's the distance?
178Well, it's the distance that I would get by arranging the objects in a
179specific order.
180It is the difference of the indices in the permutation:
182>>> print(np.array([[abs(i - j) for j in range(n)] for i in range(n)], int))
183[[0 1 2 3]
184 [1 0 1 2]
185 [2 1 0 1]
186 [3 2 1 0]]
188This means:
189The object that I will place at index `1` has a distance of "`1`" to the
190object at index `2`.
191It has a distance of "`2`" to the object at index `3`.
192It has a distance of "`3`" to the object at index `4`.
193The object at index `2` will have a distance of "`1`" to both the objects
194at indices `1` and `3` and a distance of "`2`" to the object at index `4`.
195And so on.
197In other words, objects that are close in their original space have a big
198flow between each other.
199If we put them at directly neighboring indexes, then we will multiply this
200big flow with a small number.
201If we put them at distant indices, then we will multiply this big flow with
202a large number.
204Objects that are far away from each other in their original space have a small
205flow between each other.
206If we put them at indices that are far apart, we will multiply the larger
207index-difference = distance with a small number.
208If we put them close, then we multiply a small distance with a small flow -
209but we will have to pay for this elsewhere, because then we may need to put
210objects with a larger flow between each other farther apart.
212>>> from moptipyapps.order1d.instance import Instance
213>>> the_instance = Instance.from_sequence_and_distance(
214... [1, 2, 3, 4], dist, 2, 100, ("bla", ), lambda i: str(i))
215>>> print(the_instance.flows)
216[[ 0 36 16 4]
217 [25 0 25 4]
218 [ 4 25 0 25]
219 [ 4 16 36 0]]
220>>> print(the_instance.distances)
221[[0 1 2 3]
222 [1 0 1 2]
223 [2 1 0 1]
224 [3 2 1 0]]
226There is one little thing that needs to be done:
227If we have many objects, then the QAP objective values can get very large.
228So it makes sense to define a "horizon" after which the relationship for
229objects are ignored.
230Above, we chose "100", which had no impact.
231If we choose "2", then only the two nearest neighbors would be considered and
232we get:
234>>> the_instance = Instance.from_sequence_and_distance(
235... [1, 2, 3, 4], dist, 2, 2, ("bla", ), lambda i: str(i))
236>>> print(the_instance.flows)
237[[ 0 16 4 0]
238 [ 9 0 9 0]
239 [ 0 9 0 9]
240 [ 0 4 16 0]]
241>>> print(the_instance.distances)
242[[0 1 2 3]
243 [1 0 1 2]
244 [2 1 0 1]
245 [3 2 1 0]]
247The distance matrix of the QAP stays the same, but the flow matrix now has
248several zeros.
249Anything farther away than the second-nearest neighbor will be ignored, i.e.,
250get a flow of `0`.
251"""