M-tree

M-trees are tree data structures that are similar to R-trees and B-trees. It is constructed using a metric and relies on the triangle inequality for efficient range and k-NN queries. While M-trees can perform well in many conditions, the tree can also have large overlap and there is no clear strategy on how to best avoid overlap. In addition, it can only be used for distance functions that satisfy the triangle inequality, while many advanced dissimilarity functions used in information retrieval do not satisfy this.^[1]

1 Overview
2 M-Tree construction
3 M-Tree Queries
- 3.1 Range queries
- 3.2 k-NN queries
4 See also
5 References

Overview

2D M-Tree visualized using ELKI. The tree has a single level of leaf nodes. Due to a suboptimal split heuristic, there is a large overlap.

As in any Tree-based data structure, the M-Tree is composed of Nodes and Leaves. In each node there is a data object that identifies it uniquely and a pointer to a sub-tree where its children reside. Every leaf has several data objects. For each node there is a radius r that defines a Ball in the desired metric space. Thus, every node $n$ and leaf $l$ residing in a particular node $N$ is at most distance $r$ from $N$ , and every node n and leaf l with node parent N keep the distance from it.

M-Tree construction

Components

An M-Tree has these components and sub-components:

Root.
1. Routing object O.
2. Radius O_r.
3. A set of Nodes or (exclusive) Leafs.
Node.
1. Routing object O.
2. Radius O_r.
3. Distance from this node to its parent O_p.
4. A set of Nodes or (exclusive) Leafs.
Leaf.
1. Routing object O.
2. Radius O_r.
3. Distance from this leaf to its parent O_p.
4. Objects.
Objects.
1. Feature value (usually a d-dimensional vector).

Insert

The main idea is first to find a leaf node $N$ where the new object $O$ belongs. If $N$ is not full then just attach it to $N$ . If $N$ is full then invoke a method to split $N$ . The algorithm is as follows:

Algorithm Insert  Input: Node   of M-Tree , Entry   Output: A new instance of  containing all entries in original  plus

   ← Entries of node   if  is not a leaf then  {   /*Look for entries that the new object fits into*/   let  be entries such that    if  is not empty then   {  /*If there are one or more entry, then look for an entry such that is closer to the new object*/  let be  such that  is minimum   }   else   {  /*If there are no such entry, then look for an entry with minimal distance from its edge to the new object*/  let be  such that  is minimum  /*Upgrade the new radii of the entry*/   =    }   /*Continue inserting in the next level*/   return insert(, );  else  {   /*If the node has capacity then just insert the new object*/   if  is not full then   {  store(, )   }   /*The node is at full capacity, then it is needed to do a new split in this level*/   else   {  split(, ) }  }

"←" is a shorthand for "changes to". For instance, "largest ← item" means that the value of largest changes to the value of item.
"return" terminates the algorithm and outputs the value that follows.

Split

If the split method arrives to the root of the tree, then it choose two routing objects from $N$ , and creates two new nodes containing all the objects in original $N$ , and store them into the new root. If split methods arrives to a node $N$ that is not the root of the tree, the method choose two new routing objects from $N$ , re-arrange every routing object in $N$ in two new nodes $N_{1}$ and $N_{2}$ , and store this new nodes in the parent node $N_{p}$ of original $N$ . The split must be repeated if $N_{p}$ has not enough capacity to store $N_{2}$ . The algorithm is as follow:

Algorithm Split  Input: Node   of M-Tree , Entry   Output: A new instance of  containing a new partition.

  /*The new routing objects are now all those in the node plus the new routing object*/  let be  entries of   if  is not the root then  { /*Get the parent node and the parent routing object*/ let  be the parent routing object of  let  be the parent node of   }  /*This node will contain part of the objects of the node to be split*/  Create a new node   /*Promote two routing objects from the node to be split, to be new routing objects*/  Promote(, , )  /*Choose which objects from the node being split will act as new routing objects*/  Partition(, , , , )  /*Store entries in each new routing object*/  Store 's entries in  and 's entries in   if  is the current root then  {  /*Create a new node and set it as new root and store the new routing objects*/  Create a new root node   Store  and  in   }  else  {  /*Now use the parent rouing object to store one of the new objects*/  Replace entry  with entry  in   if  is no full then  {  /*The second routinb object is stored in the parent only if it has free capacity*/  Store  in   }  else  {   /*If there is no free capacity then split the level up*/   split(, )  }  }

"←" is a shorthand for "changes to". For instance, "largest ← item" means that the value of largest changes to the value of item.
"return" terminates the algorithm and outputs the value that follows.

M-Tree Queries

Range queries

k-NN queries

References

^ Ciaccia, Paolo; Patella, Marco; Zezula, Pavel (1997). "M-tree An Efficient Access Method for Similarity Search in Metric Spaces". Proceedings of the 23rd VLDB Conference Athens, Greece, 1997. IBM Almaden Research Center: Very Large Databases Endowment Inc.. pp. 426–435. p426. http://www.vldb.org/conf/1997/P426.PD F. Retrieved 2010-09-07.

Trees in computer science

Binary trees	Binary search tree (BST) Cartesian tree MVP Tree Top tree T-tree

Self-balancing binary search trees	AA tree AVL tree LLRB tree Red–black tree Scapegoat tree Splay tree Treap

B-trees	B+ tree B*-tree B^x-tree UB-tree 2-3 tree 2-3-4 tree (a,b)-tree Dancing tree Htree

Tries	Suffix tree Radix tree Ternary search tree X-fast trie Y-fast trie

Binary space partitioning (BSP) trees	Quadtree Octree k-d tree Implicit k-d tree VP tree

Non-binary trees	Exponential tree Fusion tree Interval tree PQ tree Range tree SPQR tree Van Emde Boas tree

Spatial data partitioning trees	R-tree R+ tree R* tree X-tree M-tree Segment tree Hilbert R-tree Priority R-tree

Other trees	Heap Hash calendar Hash tree Finger tree Order statistic tree Metric tree Cover tree BK-tree Doubly chained tree iDistance Link-cut tree Fenwick tree Log-structured merge-tree

Contents