The major bottleneck of the current matching algorithm is the computation of the "expensive" pairwise similarity scores. The implemented heuristic tries to overcome this problem by applying a search window which reduces the number of necessary pairwise similarity score calculations on the one hand, but imposes several restrictions and the need for further processing on the other hand (e.g. how to detect model changes like moving an element out of the search window).
Developing and implementing another strategy, based on an index for the faster retrieval of neighboring model elements, is the main objective of my GSoC project. In its current form, this indexing process leverages the following concept:
- transform the problem to a metric space where a notion of distance is defined and "cheaper" to compute → k-dimensional space with Euclidian distance
- provide an algorithm to map the model elements into this metric space while keeping domain-specific-similarity → FastMap
- use an appropriate data structure for the metric space enabling efficient nearest neighbor search → KDtree
- for each provided model element of version 1, find the immediate neighbors (all model elements with minimum Euclidian distance) of version 2
- for each of these immediate neighbors of version 2, compute their immediate neighborhood and verify that the model element of version 1 is also an (the) immediate neighbor
- if there are several "candidates" with the same minimum distance, delegate to the EMFSimilarity score for final retrieval of the "best matching" neighbor
... SpaceThe full source code of the matcher component can be found in the implementation of the FastMapMatcher.space1 = new TreeBackedSpace (); for (EObject object1 : objects1) { space1.add(object1, provideCoordinates(object1)); } Space space2 = new TreeBackedSpace (); for (EObject object2 : objects2) { space2.add(object2, provideCoordinates(object2)); } NeighborhoodComputor neighborhoodComputor1 = space1.provideNeighborhoodComputor(); NeighborhoodComputor neighborhoodComputor2 = space2.provideNeighborhoodComputor(); for (EObject object1 : objects1) { double[] coordinates1 = space1.getCoordinates(object1); Neighborhood neighborhood2 = new ImmediateNeighborhood (); neighborhoodComputor2.computeNeighborhood(neighborhood2, coordinates1, METRIC); if (neighborhood2.size() == 0) continue; EObject bestObject2 = null; double bestDistance = Double.POSITIVE_INFINITY; for (EObject object2 : neighborhood2) { double[] coordinates2 = space2.getCoordinates(object2); Neighborhood neighborhood1 = new ImmediateNeighborhood (); neighborhoodComputor1.computeNeighborhood(neighborhood1, coordinates2, METRIC); if (!neighborhood1.contains(object1)) continue; double distance = fastMap.distance(object1, object2); if (bestObject2 == null || Double.compare(distance, bestDistance) < 0) { bestObject2 = object2; bestDistance = distance; } } if (bestObject2 == null) continue; space1.remove(object1); space2.remove(bestObject2); handler.handleMatch(object1, bestObject2, bestDistance); } for (EObject object1 : space1) { handler.handleNoMatch1(object1); } for (EObject object2 : space2) { handler.handleNoMatch2(object2); } } ...
If you think that the code is the most understandable part of this blog post, then I know I haven't completely messed it up :-) Feedback welcome!
Hey, that's really awesome!
ReplyDeleteCan't wait to see the final results! Good luck!