August 10, 2010

Cooking 2.0

In my last post I proposed some refactorings on the current implemention of the EMF Compare match engine and tried to explain (from my point of view) the reasoning behind it. In this post I'll show how another matching strategy - based on a neighborhood search instead of a full iteration with pairwise matchings - fits into this adopted structure.

The major bottleneck of the current matching algorithm is the computation of the "expensive" pairwise similarity scores. The implemented heuristic tries to overcome this problem by applying a search window which reduces the number of necessary pairwise similarity score calculations on the one hand, but imposes several restrictions and the need for further processing on the other hand (e.g. how to detect model changes like moving an element out of the search window).

Developing and implementing another strategy, based on an index for the faster retrieval of neighboring model elements, is the main objective of my GSoC project. In its current form, this indexing process leverages the following concept:
  1. transform the problem to a metric space where a notion of distance is defined and "cheaper" to compute → k-dimensional space with Euclidian distance
  2. provide an algorithm to map the model elements into this metric space while keeping domain-specific-similarity → FastMap
  3. use an appropriate data structure for the metric space enabling efficient nearest neighbor search → KDtree
Based on such indexed model elements - i.e. model elements represented by coordinates in a k-dimensional space - the matcher component can follow this path:
  • for each provided model element of version 1, find the immediate neighbors (all model elements with minimum Euclidian distance) of version 2
  • for each of these immediate neighbors of version 2, compute their immediate neighborhood and verify that the model element of version 1 is also an (the) immediate neighbor
  • if there are several "candidates" with the same minimum distance, delegate to the EMFSimilarity score for final retrieval of the "best matching" neighbor
...

Space space1 = new TreeBackedSpace();
  for (EObject object1 : objects1) {
   space1.add(object1, provideCoordinates(object1));
  }

  Space space2 = new TreeBackedSpace();
  for (EObject object2 : objects2) {
   space2.add(object2, provideCoordinates(object2));
  }

  NeighborhoodComputor neighborhoodComputor1 = space1.provideNeighborhoodComputor();
  NeighborhoodComputor neighborhoodComputor2 = space2.provideNeighborhoodComputor();

  for (EObject object1 : objects1) {
   double[] coordinates1 = space1.getCoordinates(object1);
   Neighborhood neighborhood2 = new ImmediateNeighborhood();
   neighborhoodComputor2.computeNeighborhood(neighborhood2, coordinates1, METRIC);

   if (neighborhood2.size() == 0)
    continue;

   EObject bestObject2 = null;
   double bestDistance = Double.POSITIVE_INFINITY;
   for (EObject object2 : neighborhood2) {
    double[] coordinates2 = space2.getCoordinates(object2);
    Neighborhood neighborhood1 = new ImmediateNeighborhood();
    neighborhoodComputor1.computeNeighborhood(neighborhood1, coordinates2, METRIC);

    if (!neighborhood1.contains(object1))
     continue;

    double distance = fastMap.distance(object1, object2);
    if (bestObject2 == null || Double.compare(distance, bestDistance) < 0) {
     bestObject2 = object2;
     bestDistance = distance;
    }
   }

   if (bestObject2 == null)
    continue;

   space1.remove(object1);
   space2.remove(bestObject2);
   handler.handleMatch(object1, bestObject2, bestDistance);
  }

  for (EObject object1 : space1) {
   handler.handleNoMatch1(object1);
  }

  for (EObject object2 : space2) {
   handler.handleNoMatch2(object2);
  }
 }
...
The full source code of the matcher component can be found in the implementation of the FastMapMatcher.

If you think that the code is the most understandable part of this blog post, then I know I haven't completely messed it up :-) Feedback welcome!

1 comment:

  1. Hey, that's really awesome!
    Can't wait to see the final results! Good luck!

    ReplyDelete