High-dimensional problems arising from robot motion planning, biology,
data mining, and geographic information systems often require the
computation of k nearest neighbor (knn) graphs. The knn graph of a
data set is obtained by connecting each point to its k closest
points. As the research in the above-mentioned fields progressively
addresses problems of unprecedented complexity, the demand for
computing knn graphs based on arbitrary distance metrics and large
high-dimensional data sets increases, exceeding resources available to
a single machine. In this work we efficiently distribute the
computation of knn graphs for clusters of processors with message
passing. Extensions to our distributed framework include the
computation of graphs based on other proximity queries, such as
approximate knn or range queries. Our experiments show nearly linear
speedup with over one hundred processors and indicate that similar
speedup can be obtained with several hundred processors.