Coverage for moptipyapps/order1d/__init_

1"""

2A set of tools for ordering objects in 1 dimension.

4Let's assume that we have `n` objects and a distance metric that can compute

5the distance between two objects. We do not know and also do not care about in

6how many dimension the objects exist - we just have objects and a distance

7metric.

9Now we want to find a one-dimensional order of the objects that reflects their

10original distance-based topology. For each object `a`, we want that its

11closest neighbor in the order is also its actual closest neighbor according to

12the distance metric. It's second-closest neighbor should be the actual

13second-closest neighbor according to the distance metric. And so on.

15Since we only care about the object order and do not want to metrically map

16the distances to one dimension, we can represent the solution as permutation

17of natural numbers.

19Of course, in a one-dimensional order, each object has exactly two closest

20neighbors (the one on its left and the one on its right) unless it is situated

21either at the beginning or end of the order, in which case it has exactly one

22closest neighbor. Based on the actual distance metric, an object may have any

23number of closest neighbors, maybe only one, or maybe three equally-far away

24objects. So it is not clear whether a perfect mapping to the one-dimensional

25permutations even exists. Probably it will not, but this shall not bother us.

27Either way, we can try to find one that comes as close as possible to the real

28deal.

30Another way to describe this problem is as follows:

31Imagine that you have `n` objects and only know their mutual distances.

32You want to arrange them on a one-dimensional axis in a way that does sort of

33reflect their neighborhood structure in whatever space they are originally

34located in.

36The goal of solving this one-dimensional ordering problem is then to arrange

37`n` objects on a 1-dimensional (e.g., horizontal) axis given a distance matrix

38describing their location in a potentially high-dimensional or unstructured

39space.

40The objects should be arranged in such a way that, for each object,

42- the nearest neighbors on the 1-dimensional axis are also the nearest

43 neighbors in the original space (according to the distance matrix

44 provided),

45- the second nearest neighbors on the 1-dimensional axis are also the

46 second nearest neighbors in the original space (according to the distance

47 matrix provided),

48- the third nearest neighbors on the 1-dimensional axis are also the third

49 nearest neighbors in the original space (according to the distance matrix

50 provided),

51- and so on; with (e.g., quadratically) decreasing weights of neighbor

52 distance ranks.

54We do not care about the actual precise distances (e.g., something like

55"0.001") between the objects on either the one-dimensional nor the original

56space. We only care about the distance ranks, i.e., about "2nd nearest

57neighbor," but not "0.012 distance units away." The solutions of this problem

58are thus permutations (orders) of the objects. Of course, if we really want

59to plot the objects, such a permutation can easily be translated to

60`x`-coordinates, say, by dividing the index of an object by the number of

61objects, which nets values in `[0,1]`. But basically, we reduce the task to

62finding permutations of objects that reflect the neighbor structure of the

63original space as closely as possible.

65If such a problem is solved correctly, then the arrangement on the

66one-dimensional axis should properly reflect the arrangement of the objects in

67the original space. Of course, solving this problem exactly may not actually

68be possible, since an object on a one-dimensional axis may either have exactly

69two `i`-nearest-neighbors (if it is at least `i` slots away from either end of

70the permutation) or exactly `1` such neighbor, if it is closer that `i` units.

71The object directly at the start of the permutation has only 1 nearest

72neighbor (the object that comes next). That next object, however, has two,

73namely the first object and the third object. In the original space where the

74objects come from, however, there may be any number of "nearest neighbors."

75Imagine a two-dimensional space where one object sits at the center of a

76circle of other objects. Then all other objects are its nearest neighbors,

77whereas an object on the circle either has exactly two nearest neighbors or,

78maybe, in the odd situation that the radius equals a multiple of the

79distance to the neighbors on the circle, three. Such a structure cannot be

80represented exactly in one dimension.

82But that's OK.

83Because we mainly do this for visualization purposes anyway.

85Now here comes the cool thing: We can cast this problem to a Quadratic

86Assignment Problem (:mod:`~moptipyapps.qap`). A QAP is defined by a distance

87matrix and a flow matrix. The idea is to assign `n` facilities to locations.

88The distances between the locations are given by the distance matrix. At the

89same time, the flow matrix defines the flows between the facilities. The goal

90is to find the assignment of facilities to locations that minimizes the

91overall "flow times distance" product sum.

93We can translate the "One-Dimensional Ordering Problem" to the QAP as follows:

94Imagine that we have `n` unique objects and know the (symmetric) distances

95between them.

97Let's say that we have four numbers as objects, `1`, `2`, `3`, `4`.

99>>> data = [1, 2, 3, 4]

100>>> n = len(data)

101

102The distance between two numbers `a` and `b` be `abs(a - b)`.

103

104>>> def dist(a, b):

105... return abs(a - b)

106

107The **original** distance matrix would be

108

109>>> dm = [[dist(i, j) for j in range(n)] for i in range(n)]

110>>> print(dm)

111[[0, 1, 2, 3], [1, 0, 1, 2], [2, 1, 0, 1], [3, 2, 1, 0]]

112

113Now from this matrix, we can find the nearest neighbors as follows:

114

115>>> from scipy.stats import rankdata # type: ignore

116>>> import numpy as np

117>>> rnks = rankdata(dm, axis=1, method="average") - 1

118>>> print(rnks)

119[[0. 1. 2. 3. ]

120 [1.5 0. 1.5 3. ]

121 [3. 1.5 0. 1.5]

122 [3. 2. 1. 0. ]]

123

124From the perspective of "`1`", "`2`" is the first nearest neighbor,

125"`3`" is the second-nearest neighbor, and "`4`" is the third-nearest neighbor.

126From the perspective of "`2`", both "`1`" and "`3`" are nearest neighbors, so

127they share rank `1.5`.

128And so on.

129

130Let's multiply this by `2` to get integers:

131[If no fractional ranks appear, then we do not need to multiply by `2`.]

132

133>>> rnks = np.array(rnks * 2, dtype=int)

134>>> print(rnks)

135[[0 2 4 6]

136 [3 0 3 6]

137 [6 3 0 3]

138 [6 4 2 0]]

139

140Now we want to translate this to a flow matrix (NOT a distance matrix).

141What we want is that there is a big flow from each object to its nearest

142neighbor, a smaller flow from each object to its second-nearest neighbor,

143a yet smaller flow to the third nearest neighbor, and so on.

144So all we need to do is to subtract the elements off the diagonal from

145`8 = 2*n`.

146[If no fractional elements were there, then we would use `n` instead of

147`2*n`.]

148Anyway, we get:

149

150>>> rnks = np.array([[0 if x == 0 else 2*n - x for x in a]

151... for a in rnks], int)

152>>> print(rnks)

153[[0 6 4 2]

154 [5 0 5 2]

155 [2 5 0 5]

156 [2 4 6 0]]

157

158We can see:

159The flow from "`1`" to "`2`" is `6`, which higher than the flow from "`1`" to

160"`3`" (namely `4`), and so on.

161The low from "`2`" to "`1`" is `5`, which is a bit lower than `6` but higher

162than `4`, because "`1`" is the "1.5th nearest neighbor of `2`".

163

164Just for good measures, we can weight this qudratically, so we get:

165

166>>> rnks = rnks ** 2

167>>> print(rnks)

168[[ 0 36 16 4]

169 [25 0 25 4]

170 [ 4 25 0 25]

171 [ 4 16 36 0]]

172

173This way, we give much more importance to "getting the nearest neighbors

174right" than getting "far-away neighbors" right. Nice.

175

176OK, but how about the distance matrix for the QAP?

177What's the distance?

178Well, it's the distance that I would get by arranging the objects in a

179specific order.

180It is the difference of the indices in the permutation:

181

182>>> print(np.array([[abs(i - j) for j in range(n)] for i in range(n)], int))

183[[0 1 2 3]

184 [1 0 1 2]

185 [2 1 0 1]

186 [3 2 1 0]]

187

188This means:

189The object that I will place at index `1` has a distance of "`1`" to the

190object at index `2`.

191It has a distance of "`2`" to the object at index `3`.

192It has a distance of "`3`" to the object at index `4`.

193The object at index `2` will have a distance of "`1`" to both the objects

194at indices `1` and `3` and a distance of "`2`" to the object at index `4`.

195And so on.

196

197In other words, objects that are close in their original space have a big

198flow between each other.

199If we put them at directly neighboring indexes, then we will multiply this

200big flow with a small number.

201If we put them at distant indices, then we will multiply this big flow with

202a large number.

203

204Objects that are far away from each other in their original space have a small

205flow between each other.

206If we put them at indices that are far apart, we will multiply the larger

207index-difference = distance with a small number.

208If we put them close, then we multiply a small distance with a small flow -

209but we will have to pay for this elsewhere, because then we may need to put

210objects with a larger flow between each other farther apart.

211

212>>> from moptipyapps.order1d.instance import Instance

213>>> the_instance = Instance.from_sequence_and_distance(

214... [1, 2, 3, 4], dist, 2, 100, ("bla", ), lambda i: str(i))

215>>> print(the_instance.flows)

216[[ 0 36 16 4]

217 [25 0 25 4]

218 [ 4 25 0 25]

219 [ 4 16 36 0]]

220>>> print(the_instance.distances)

221[[0 1 2 3]

222 [1 0 1 2]

223 [2 1 0 1]

224 [3 2 1 0]]

225

226There is one little thing that needs to be done:

227If we have many objects, then the QAP objective values can get very large.

228So it makes sense to define a "horizon" after which the relationship for

229objects are ignored.

230Above, we chose "100", which had no impact.

231If we choose "2", then only the two nearest neighbors would be considered and

232we get:

233

234>>> the_instance = Instance.from_sequence_and_distance(

235... [1, 2, 3, 4], dist, 2, 2, ("bla", ), lambda i: str(i))

236>>> print(the_instance.flows)

237[[ 0 16 4 0]

238 [ 9 0 9 0]

239 [ 0 9 0 9]

240 [ 0 4 16 0]]

241>>> print(the_instance.distances)

242[[0 1 2 3]

243 [1 0 1 2]

244 [2 1 0 1]

245 [3 2 1 0]]

246

247The distance matrix of the QAP stays the same, but the flow matrix now has

248several zeros.

249Anything farther away than the second-nearest neighbor will be ignored, i.e.,

250get a flow of `0`.

251"""

Coverage for moptipyapps / order1d / init.py: 100%

0 statements