Bixo is an open source Java crawler that runs as a series of Cascading pipes. It is designed to be used as a tool for creating customized crawlers, thus each Cascading pipe implements a discrete operation. By building a customized Cascading pipe assembly, you can quickly create specialized crawlers that are optimized for a particular use case.
by
otis
2009-05-24 00:38
crawl
·
fetch
·
spider
·
java
·
MapReduce
·
katta