Galago is a toolkit for experimenting with text search. It is based on small, pluggable components that are easy to replace and change, both during indexing and during retrieval. It includes TupleFlow, which is a distributed computation framework like MapReduce or Dryad. TupleFlow manages the difficult parts of processing text: serializing data, sorting it, and distributing processing. The IndexReader and IndexWriter classes manage storing key/value pairs like inverted lists. This makes it possible to make your own kinds of index structures without starting from scratch.
by
otis
2009-08-12 16:01
java
·
software
·
search
·
library
·
information retrieval
·
distributed computing