Behemoth allows to deploy GATE or UIMA applications over a Hadoop cluster in order to do very large scale document analysis. It uses a simple representation format which can be used as a common ground between UIMA and GATE-generated annotations, hence achieving compatibility between both systems. Since it is Hadoop-based it benefits from all its features, namely scalability, fault-tolerance and most notably the back up of a thriving open source community. Quite a few Apache resources will fit into it: Nutch, Tika, Mahout, Hbase etc...
by
otis
2009-12-01 22:35
UIMA
·
gate
·
hadoop
·
text mining
·
text analysis
·
MapReduce
·
distributed computing
·
NLP