geoLucene is an extension of Lucene that allows to effectively index and search documents that contain locational information (longitude/latitude). It uses R-tree as a spacial index.
Semantic Vector indexes, created by applying a Random Projection algorithm to term-document matrices created using Apache Lucene. The package creates a WordSpace model, of the kind developed by Stanford University's Infomap Project and other researchers during the 1990s and early 2000s. Such models are designed to represent words and documents in terms of underlying concepts, and as such can be used for many semantic (concept-aware) matching tasks such as automatic thesaurus generation, knowledge representation, and concept matching. The Semantic Vectors package uses a Random Projection algorithm, a form of automatic semantic analysis, similar to Latent Semantic Analysis (LSA) and its variants like Probabilistic Latent Semantic Analysis (PLSA).
Java-based framework designed to support the development of applications for unsupervised machine learning tasks, with a particular focus on their application to text data