links · people · groups · tags | My: links · tags · groups · watchlists · notes login · sign up now! | help · blog
Simpy simpy
 
Search Everyone: "heritrix",
1 - 2 of 2   Watch otis
 
YouSeer is an open source search engine framework, which was built on top of other open source components. YouSeer utilizes Hereitrix as a crawler and solr as an indexing system. The framework provides software to ingest the documents harvested by Heritrix into solr. The ingesting software is very flexible and allows for user-specific data extraction implementations. Further, YouSeer provides a simple interface to query the index and another interface to retrieve cached versions of the documents.
by otis 2009-11-19 13:18 crawl · index · search · Heritrix · nutch · information retrieval
http://youseer.sourceforge.net/ - cached - mail it - history
Train: crawl, parse, create clusters Then: crawl, classify new pages into predefined classes/clusters
by otis 2009-02-26 23:54 Heritrix · classification · cluster · crawl · vertical search · focused crawl · information retrieval · NLP
http://webteam.archive.org/confluence/display/SOC06/Crawl-by-example - cached - mail it - history
1 - 2 of 2  
Related Tags
 
- exclude ~ optional + require
Add Dates