Coverage for moptipyapps / order1d / __init__.py: 100%

0 statements  

« prev     ^ index     » next       coverage.py v7.13.0, created at 2025-12-11 04:40 +0000

1""" 

2A set of tools for ordering objects in 1 dimension. 

3 

4Let's assume that we have `n` objects and a distance metric that can compute 

5the distance between two objects. We do not know and also do not care about in 

6how many dimension the objects exist - we just have objects and a distance 

7metric. 

8 

9Now we want to find a one-dimensional order of the objects that reflects their 

10original distance-based topology. For each object `a`, we want that its 

11closest neighbor in the order is also its actual closest neighbor according to 

12the distance metric. It's second-closest neighbor should be the actual 

13second-closest neighbor according to the distance metric. And so on. 

14 

15Since we only care about the object order and do not want to metrically map 

16the distances to one dimension, we can represent the solution as permutation 

17of natural numbers. 

18 

19Of course, in a one-dimensional order, each object has exactly two closest 

20neighbors (the one on its left and the one on its right) unless it is situated 

21either at the beginning or end of the order, in which case it has exactly one 

22closest neighbor. Based on the actual distance metric, an object may have any 

23number of closest neighbors, maybe only one, or maybe three equally-far away 

24objects. So it is not clear whether a perfect mapping to the one-dimensional 

25permutations even exists. Probably it will not, but this shall not bother us. 

26 

27Either way, we can try to find one that comes as close as possible to the real 

28deal. 

29 

30Another way to describe this problem is as follows: 

31Imagine that you have `n` objects and only know their mutual distances. 

32You want to arrange them on a one-dimensional axis in a way that does sort of 

33reflect their neighborhood structure in whatever space they are originally 

34located in. 

35 

36The goal of solving this one-dimensional ordering problem is then to arrange 

37`n` objects on a 1-dimensional (e.g., horizontal) axis given a distance matrix 

38describing their location in a potentially high-dimensional or unstructured 

39space. 

40The objects should be arranged in such a way that, for each object, 

41 

42- the nearest neighbors on the 1-dimensional axis are also the nearest 

43 neighbors in the original space (according to the distance matrix 

44 provided), 

45- the second nearest neighbors on the 1-dimensional axis are also the 

46 second nearest neighbors in the original space (according to the distance 

47 matrix provided), 

48- the third nearest neighbors on the 1-dimensional axis are also the third 

49 nearest neighbors in the original space (according to the distance matrix 

50 provided), 

51- and so on; with (e.g., quadratically) decreasing weights of neighbor 

52 distance ranks. 

53 

54We do not care about the actual precise distances (e.g., something like 

55"0.001") between the objects on either the one-dimensional nor the original 

56space. We only care about the distance ranks, i.e., about "2nd nearest 

57neighbor," but not "0.012 distance units away." The solutions of this problem 

58are thus permutations (orders) of the objects. Of course, if we really want 

59to plot the objects, such a permutation can easily be translated to 

60`x`-coordinates, say, by dividing the index of an object by the number of 

61objects, which nets values in `[0,1]`. But basically, we reduce the task to 

62finding permutations of objects that reflect the neighbor structure of the 

63original space as closely as possible. 

64 

65If such a problem is solved correctly, then the arrangement on the 

66one-dimensional axis should properly reflect the arrangement of the objects in 

67the original space. Of course, solving this problem exactly may not actually 

68be possible, since an object on a one-dimensional axis may either have exactly 

69two `i`-nearest-neighbors (if it is at least `i` slots away from either end of 

70the permutation) or exactly `1` such neighbor, if it is closer that `i` units. 

71The object directly at the start of the permutation has only 1 nearest 

72neighbor (the object that comes next). That next object, however, has two, 

73namely the first object and the third object. In the original space where the 

74objects come from, however, there may be any number of "nearest neighbors." 

75Imagine a two-dimensional space where one object sits at the center of a 

76circle of other objects. Then all other objects are its nearest neighbors, 

77whereas an object on the circle either has exactly two nearest neighbors or, 

78maybe, in the odd situation that the radius equals a multiple of the 

79distance to the neighbors on the circle, three. Such a structure cannot be 

80represented exactly in one dimension. 

81 

82But that's OK. 

83Because we mainly do this for visualization purposes anyway. 

84 

85Now here comes the cool thing: We can cast this problem to a Quadratic 

86Assignment Problem (:mod:`~moptipyapps.qap`). A QAP is defined by a distance 

87matrix and a flow matrix. The idea is to assign `n` facilities to locations. 

88The distances between the locations are given by the distance matrix. At the 

89same time, the flow matrix defines the flows between the facilities. The goal 

90is to find the assignment of facilities to locations that minimizes the 

91overall "flow times distance" product sum. 

92 

93We can translate the "One-Dimensional Ordering Problem" to the QAP as follows: 

94Imagine that we have `n` unique objects and know the (symmetric) distances 

95between them. 

96 

97Let's say that we have four numbers as objects, `1`, `2`, `3`, `4`. 

98 

99>>> data = [1, 2, 3, 4] 

100>>> n = len(data) 

101 

102The distance between two numbers `a` and `b` be `abs(a - b)`. 

103 

104>>> def dist(a, b): 

105... return abs(a - b) 

106 

107The **original** distance matrix would be 

108 

109>>> dm = [[dist(i, j) for j in range(n)] for i in range(n)] 

110>>> print(dm) 

111[[0, 1, 2, 3], [1, 0, 1, 2], [2, 1, 0, 1], [3, 2, 1, 0]] 

112 

113Now from this matrix, we can find the nearest neighbors as follows: 

114 

115>>> from scipy.stats import rankdata # type: ignore 

116>>> import numpy as np 

117>>> rnks = rankdata(dm, axis=1, method="average") - 1 

118>>> print(rnks) 

119[[0. 1. 2. 3. ] 

120 [1.5 0. 1.5 3. ] 

121 [3. 1.5 0. 1.5] 

122 [3. 2. 1. 0. ]] 

123 

124From the perspective of "`1`", "`2`" is the first nearest neighbor, 

125"`3`" is the second-nearest neighbor, and "`4`" is the third-nearest neighbor. 

126From the perspective of "`2`", both "`1`" and "`3`" are nearest neighbors, so 

127they share rank `1.5`. 

128And so on. 

129 

130Let's multiply this by `2` to get integers: 

131[If no fractional ranks appear, then we do not need to multiply by `2`.] 

132 

133>>> rnks = np.array(rnks * 2, dtype=int) 

134>>> print(rnks) 

135[[0 2 4 6] 

136 [3 0 3 6] 

137 [6 3 0 3] 

138 [6 4 2 0]] 

139 

140Now we want to translate this to a flow matrix (NOT a distance matrix). 

141What we want is that there is a big flow from each object to its nearest 

142neighbor, a smaller flow from each object to its second-nearest neighbor, 

143a yet smaller flow to the third nearest neighbor, and so on. 

144So all we need to do is to subtract the elements off the diagonal from 

145`8 = 2*n`. 

146[If no fractional elements were there, then we would use `n` instead of 

147`2*n`.] 

148Anyway, we get: 

149 

150>>> rnks = np.array([[0 if x == 0 else 2*n - x for x in a] 

151... for a in rnks], int) 

152>>> print(rnks) 

153[[0 6 4 2] 

154 [5 0 5 2] 

155 [2 5 0 5] 

156 [2 4 6 0]] 

157 

158We can see: 

159The flow from "`1`" to "`2`" is `6`, which higher than the flow from "`1`" to 

160"`3`" (namely `4`), and so on. 

161The low from "`2`" to "`1`" is `5`, which is a bit lower than `6` but higher 

162than `4`, because "`1`" is the "1.5th nearest neighbor of `2`". 

163 

164Just for good measures, we can weight this qudratically, so we get: 

165 

166>>> rnks = rnks ** 2 

167>>> print(rnks) 

168[[ 0 36 16 4] 

169 [25 0 25 4] 

170 [ 4 25 0 25] 

171 [ 4 16 36 0]] 

172 

173This way, we give much more importance to "getting the nearest neighbors 

174right" than getting "far-away neighbors" right. Nice. 

175 

176OK, but how about the distance matrix for the QAP? 

177What's the distance? 

178Well, it's the distance that I would get by arranging the objects in a 

179specific order. 

180It is the difference of the indices in the permutation: 

181 

182>>> print(np.array([[abs(i - j) for j in range(n)] for i in range(n)], int)) 

183[[0 1 2 3] 

184 [1 0 1 2] 

185 [2 1 0 1] 

186 [3 2 1 0]] 

187 

188This means: 

189The object that I will place at index `1` has a distance of "`1`" to the 

190object at index `2`. 

191It has a distance of "`2`" to the object at index `3`. 

192It has a distance of "`3`" to the object at index `4`. 

193The object at index `2` will have a distance of "`1`" to both the objects 

194at indices `1` and `3` and a distance of "`2`" to the object at index `4`. 

195And so on. 

196 

197In other words, objects that are close in their original space have a big 

198flow between each other. 

199If we put them at directly neighboring indexes, then we will multiply this 

200big flow with a small number. 

201If we put them at distant indices, then we will multiply this big flow with 

202a large number. 

203 

204Objects that are far away from each other in their original space have a small 

205flow between each other. 

206If we put them at indices that are far apart, we will multiply the larger 

207index-difference = distance with a small number. 

208If we put them close, then we multiply a small distance with a small flow - 

209but we will have to pay for this elsewhere, because then we may need to put 

210objects with a larger flow between each other farther apart. 

211 

212>>> from moptipyapps.order1d.instance import Instance 

213>>> the_instance = Instance.from_sequence_and_distance( 

214... [1, 2, 3, 4], dist, 2, 100, ("bla", ), lambda i: str(i)) 

215>>> print(the_instance.flows) 

216[[ 0 36 16 4] 

217 [25 0 25 4] 

218 [ 4 25 0 25] 

219 [ 4 16 36 0]] 

220>>> print(the_instance.distances) 

221[[0 1 2 3] 

222 [1 0 1 2] 

223 [2 1 0 1] 

224 [3 2 1 0]] 

225 

226There is one little thing that needs to be done: 

227If we have many objects, then the QAP objective values can get very large. 

228So it makes sense to define a "horizon" after which the relationship for 

229objects are ignored. 

230Above, we chose "100", which had no impact. 

231If we choose "2", then only the two nearest neighbors would be considered and 

232we get: 

233 

234>>> the_instance = Instance.from_sequence_and_distance( 

235... [1, 2, 3, 4], dist, 2, 2, ("bla", ), lambda i: str(i)) 

236>>> print(the_instance.flows) 

237[[ 0 16 4 0] 

238 [ 9 0 9 0] 

239 [ 0 9 0 9] 

240 [ 0 4 16 0]] 

241>>> print(the_instance.distances) 

242[[0 1 2 3] 

243 [1 0 1 2] 

244 [2 1 0 1] 

245 [3 2 1 0]] 

246 

247The distance matrix of the QAP stays the same, but the flow matrix now has 

248several zeros. 

249Anything farther away than the second-nearest neighbor will be ignored, i.e., 

250get a flow of `0`. 

251"""