links · people · groups · tags | My: links · tags · groups · watchlists · notes login · sign up now! | help · blog
Simpy simpy
 
Search Everyone: "information retrieval",

Top "information retrieval" experts: otis, j_h_scheufen, cpaulse, glukac, mthomure, paulovn,

Groups about "information retrieval": IR NLP ML and CL, Lucene & Solr,

1 - 93 of 93   Watch otis
 
YouSeer is an open source search engine framework, which was built on top of other open source components. YouSeer utilizes Hereitrix as a crawler and solr as an indexing system. The framework provides software to ingest the documents harvested by Heritrix into solr. The ingesting software is very flexible and allows for user-specific data extraction implementations. Further, YouSeer provides a simple interface to query the index and another interface to retrieve cached versions of the documents.
by otis 2009-11-19 13:18 crawl · index · search · Heritrix · nutch · information retrieval
http://youseer.sourceforge.net/ - cached - mail it - history
by otis 2009-11-18 17:36 NLP · information retrieval · semantic · gate · software
http://www.semanticsoftware.info/ - cached - mail it - history
MuNPEx is a multi-lingual noun phrase (NP) extraction component developed for the GATE architecture, implemented in JAPE. It currently supports English, German, French, and Spanish (in beta). MuNPEx requires a part-of-speech (POS) tagger to work and can additionally use detected named entities (NEs) to improve chunking performance. Please read the documentation (or source code) for more details.
by otis 2009-11-18 17:30 NLP · information retrieval · key phrases · information extraction · computational linguistics · software · gate
http://www.semanticsoftware.info/munpex - cached - mail it - history
by otis 2009-11-02 13:59 software · information retrieval · NLP · perl · corpus · text mining · dataset
http://www.drni.de/wac-tk/ - cached - mail it - history
C++ (but has Java API), GPL
by otis 2009-11-02 13:51 information retrieval · NLP · software · library · api
http://www.lsi.upc.edu/~nlp/freeling/ - cached - mail it - history
by otis 2009-11-02 13:33 NLP · information retrieval · computational linguistics · java · software · api · library
http://herd.ida.liu.se:8180/nlpfarm/ - cached - mail it - history
Platform where anyone can share and mash open data on any subject
by otis 2009-10-29 12:24 data · information retrieval · NLP · machine learning · mashup
http://www.factual.com/ - cached - mail it - history
Common English misspellings from Wikipedia 4107 misspellings as of 2009-10-29
by otis 2009-10-29 12:20 wikipedia · spell · english · language · search · information retrieval · NLP
http://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines - cached - mail it - history
Wunder's progressive reranking explanation
by otis 2009-10-22 12:41 search · information retrieval · rank · score
http://wunderwood.org/most_casual_observer/2007/04/progressive_reranking.html - cached - mail it - history
Sen is the first opensource morphological analyzer written in pure Java.
by otis 2009-10-16 23:39 japanese · morphology · analysis · lucene · search · index · information retrieval · NLP · library
https://sen.dev.java.net/ - cached - mail it - history
OPUS is an attempt to collect translated texts from the web, to convert and align the entire collection, to add linguistic annotation, and to provide the community with a publicly available parallel corpus. OPUS is based on open source products and is also delivered as an open source package.
by otis 2009-10-07 10:54 corpus · information retrieval · NLP
http://urd.let.rug.nl/tiedeman/OPUS/ - cached - mail it - history
Galago is a toolkit for experimenting with text search. It is based on small, pluggable components that are easy to replace and change, both during indexing and during retrieval. It includes TupleFlow, which is a distributed computation framework like MapReduce or Dryad. TupleFlow manages the difficult parts of processing text: serializing data, sorting it, and distributing processing. The IndexReader and IndexWriter classes manage storing key/value pairs like inverted lists. This makes it possible to make your own kinds of index structures without starting from scratch.
by otis 2009-08-12 16:01 java · software · search · library · information retrieval · distributed computing
http://www.galagosearch.org/ - cached - mail it - history
Ivory is a Hadoop toolkit for Web-scale information retrieval research that features a retrieval engine based on Markov Random Fields
by otis 2009-08-12 15:56 hadoop · MapReduce · information retrieval · search
http://www.umiacs.umd.edu/~jimmylin/ivory/docs/index.html - cached - mail it - history
by otis 2009-06-23 23:49 perl · wordnet · similar · software · information retrieval · NLP
http://wn-similarity.sourceforge.net/ - cached - mail it - history
Default dictionary break iterator for Chinese, Japanese, Korean
by otis 2009-06-03 00:15 CJK · japan · chinese · korean · computational linguistics · NLP · information retrieval · search · analysis · word segmentation
http://bugs.icu-project.org/trac/ticket/2229 - cached - mail it - history
by otis 2009-05-28 23:40 chinese · dictionary · information retrieval · search
http://www.mdbg.net/chindict/chindict.php?page=cc-cedict - cached - mail it - history
by otis 2009-05-28 14:34 search · software · python · django · lucene · solr · information retrieval
http://haystacksearch.org/ - cached - mail it - history
by otis 2009-05-28 14:27 solr · ruby · ruby on rails · search · information retrieval
http://outoftime.github.com/sunspot/ - cached - mail it - history
by otis 2009-05-24 21:31 rdf · solr · software · java · search · information retrieval
http://fgiasson.com/blog/index.php/2009/04/29/rdf-aggregates-and-full-text-search-on-steroids-with-solr/ - cached - mail it - history
by otis 2009-05-18 14:07 chinese · dictionary · english · word · word segmentation · NLP · information retrieval · computational linguistics
http://usa.mdbg.net/chindict/chindict.php?page=cc-cedict - cached - mail it - history
by otis 2009-05-17 22:15 web · index · crawl · dataset · corpus · linguistics · computational linguistics · NLP · information retrieval
http://webascorpus.org/ - cached - mail it - history
WordnetAPI is a Java interface to the famous WordNet database of lexical relationships.
by otis 2009-05-15 10:07 wordnet · morphology · lexical · synonyms · NLP · information retrieval · library · api
http://code.google.com/p/wordnetapi/ - cached - mail it - history
WordNet visualization using Force-Directed Graphs
by otis 2009-05-15 10:01 wordnet · synonyms · visual · graph · NLP · information retrieval
http://code.google.com/p/synonym/ - cached - mail it - history
by otis 2009-05-14 17:13 django · solr · python · information retrieval · search
http://code.google.com/p/django-solr-search/ - cached - mail it - history
The Linguistic Data Consortium supports language-related education, research and technology development by creating and sharing linguistic resources: data, tools and standards.
by otis 2009-05-03 17:59 language · NLP · information retrieval · computational linguistics · model · data mining
http://www.ldc.upenn.edu/ - cached - mail it - history
A tool for the estimation, representation, and computation of statistical language models.
by otis 2009-05-03 17:54 NLP · information retrieval · language · computational linguistics · tool · software
http://sourceforge.net/projects/irstlm/ - cached - mail it - history
Moses is a statistical machine translation system that allows you to automatically train translation models for any language pair. All you need is a collection of translated texts (parallel corpus).
by otis 2009-05-03 17:47 NLP · machine translation · information retrieval · language · software · tool
http://www.statmt.org/moses/ - cached - mail it - history
Packages to facilitate natural language processing under Ubuntu Linux and other Debian-based platforms. The goal of Ubuntu NLP is to provide up-to-date packages for commonly used tools that can be easily installed and smoothly integrated into existing systems.
by otis 2009-05-03 17:45 ubuntu · linux · NLP · tool · information retrieval
http://cl.naist.jp/~eric-n/ubuntu-nlp/ - cached - mail it - history
by otis 2009-03-08 00:36 lucene · search · query expansion · information retrieval
http://grasia.fdi.ucm.es/jose/query-expansion/ - cached - mail it - history
Word-aligned Compression library for java
by otis 2009-03-02 11:59 java · api · library · compress · information retrieval · encode
http://code.google.com/p/javaewah/ - cached - mail it - history
UIMA NLP Components
by otis 2009-02-27 12:59 java · UIMA · pipeline · NLP · information retrieval · software
http://www.julielab.de/Resources/Software/Tools.html - cached - mail it - history
Lucas is a UIMA CAS consumer component which bridges the UIMA framework with the Lucene search engine library. Lucas maps CASes to lucene index documents according to a mapping file .
by otis 2009-02-27 12:57 java · UIMA · lucene · index · search · pipeline · software · information retrieval
https://www.coling.uni-jena.de/sites/lucas/index.html - cached - mail it - history
Train: crawl, parse, create clusters Then: crawl, classify new pages into predefined classes/clusters
by otis 2009-02-26 23:54 Heritrix · classification · cluster · crawl · vertical search · focused crawl · information retrieval · NLP
http://webteam.archive.org/confluence/display/SOC06/Crawl-by-example - cached - mail it - history
by otis 2009-02-17 02:55 .net · solr · client · software · search · library · information retrieval
http://code.google.com/p/solrnet/ - cached - mail it - history
Set Operation implementations for SortedIntegerSegments for inverted list caching in search engines. The implementations also include P4Delta compression algorithm based DocIdSet for iterating over DocIdSets in a compressed form.
by otis 2009-02-09 01:25 lucene · search · index · compress · information retrieval · set · java
http://code.google.com/p/lucene-ext/ - cached - mail it - history
by otis 2009-01-08 17:46 Daniel Tunkelang · information retrieval · facet · navigate · results · search · endeca · set · presentation
http://yahoo.hosted.panopto.com/CourseCast/Viewer/Default.aspx?id=6d0a6847-be51-4d29-8c1c-f961274b5343 - cached - mail it - history
by otis 2008-12-23 14:10 collocations · term · summary · NLP · information retrieval · search · keywords · key phrases
http://www.extractor.com/ - cached - mail it - history
WebLA is a Java package for handling Web Graphs, implementing popular algorithms such as PageRank, HITS, CoCitation Similarity and SimRank. It is of particular interest for research in Information Retrieval, since it provides a set of APIs (Application Programming Interfaces) that allow one to easily experiment with such algorithms.
by otis 2008-12-21 01:54 information retrieval · search · algorithm · pagerank · graph · api · library · java
http://webla.sourceforge.net/ - cached - mail it - history
by otis 2008-12-08 22:46 sentence detection · word segmentation · unicode · java · api · NLP · information retrieval · language
http://icu-project.org/userguide/boundaryAnalysis.html - cached - mail it - history
by otis 2008-10-19 22:51 java · api · string · similar · metrics · computational linguistics · NLP · information retrieval · machine learning
http://www.dcs.shef.ac.uk/~sam/simmetrics.html - cached - mail it - history
by otis 2008-09-29 11:03 information extraction · information retrieval · NLP · howto · term
http://chungwon.blogspot.com/2007/08/term-clustering-for-domain-ontology_02.html - cached - mail it - history
by otis 2008-08-18 13:30 search · search engine · information retrieval · vector space · linear algebra
http://mathdl.maa.org/mathDL/4/?pa=content&sa=viewDocument&nodeId=636&pf=1 - cached - mail it - history
Interviews with "Search Wizards" - people from the world of IR, NLP...
by otis 2008-06-11 12:06 search · people · interview · information retrieval · NLP
http://www.arnoldit.com/search-wizards-speak/ - cached - mail it - history
by otis 2008-06-07 01:05 Doug Cutting · video · lecture · lucene · nutch · information retrieval
http://videolectures.net/iiia06_cutting_ense/ - cached - mail it - history
by otis 2008-06-05 13:14 search · information retrieval · facet · explore · discover · search results · Peter Morville
http://www.slideshare.net/morville/search-patterns/ - cached - mail it - history
by otis 2008-05-29 15:31 perl · module · library · api · NLP · information retrieval · ngram
http://ngram.sourceforge.net/ - cached - mail it - history
Search Engine with a web crawler that can be trained to classify pages and crawl only "interesting" pages. Uses Lucene under the hood. Fully distributed and capable of large scale crawling and searching.
by otis 2008-05-22 17:27 search · search engine · crawl · java · software · bayes · classification · index · plugin · lucene · information retrieval
http://hounder.org/ - cached - mail it - history
Dr. Porter's solution that finds significant terms in a document with respect to the rest of the corpus, can collect user profiles based on documents they are viewing, can thus help ad targeting, etc. etc.
by otis 2008-04-25 17:36 Martin Porter · information retrieval · summary · document
http://www.grapeshot.co.uk/ - cached - mail it - history
by otis 2008-04-20 16:22 vietnam · information retrieval · index · search · word segmentation · dictionary · analysis · tokenizer · language
http://www-users.cs.umn.edu/~thnguyen/Publication/RIVF06_Word_Segmentation_for_Vietnamese_Text_Categorization_An_online_... - cached - mail it - history
by otis 2008-04-20 16:15 vietnam · information retrieval · index · search · word segmentation · dictionary · analysis · tokenizer · language
http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings6/EVIA/17.pdf - cached - mail it - history
by otis 2008-04-13 22:23 search · information retrieval · personalization
http://sifaka.cs.uiuc.edu/xshen/publication.html - cached - mail it - history
A sandbox for collecting search examples, patterns, and anti-patterns.
by otis 2008-03-31 02:37 search · information retrieval · facet · explore · discover · search results · Peter Morville · screenshot · ui
http://flickr.com/photos/morville/collections/72157603785835882/ - cached - mail it - history
This is a collection of resources in a variety of fields related to text, speech and language processing. These include computational linguistics, information retrieval and machine learning. Here you can find pointers to useful Web sites, as well as lists of relevant books, newsgroups and mailing lists, and much more.
by otis 2008-02-17 16:05 NLP · information retrieval · data mining · information extraction · computational linguistics · resource
http://www.cs.technion.ac.il/~gabr/resources/resources.html - cached - mail it - history
Tagged datasets for named entity recognition tasks
by otis 2008-02-17 16:04 NLP · machine learning · computational linguistics · information retrieval · information extraction · named entity extraction · resource
http://www.cs.technion.ac.il/~gabr/resources/data/ne_datasets.html - cached - mail it - history
Semantic Vector indexes, created by applying a Random Projection algorithm to term-document matrices created using Apache Lucene. The package creates a WordSpace model, of the kind developed by Stanford University's Infomap Project and other researchers during the 1990s and early 2000s. Such models are designed to represent words and documents in terms of underlying concepts, and as such can be used for many semantic (concept-aware) matching tasks such as automatic thesaurus generation, knowledge representation, and concept matching. The Semantic Vectors package uses a Random Projection algorithm, a form of automatic semantic analysis, similar to Latent Semantic Analysis (LSA) and its variants like Probabilistic Latent Semantic Analysis (PLSA).
by otis 2008-01-14 02:06 semantic · LSA · PLSA · NLP · information retrieval · java · api
http://code.google.com/p/semanticvectors/ - cached - mail it - history
by otis 2008-01-12 23:51 classification · algorithm · NLP · information retrieval
http://nlpers.blogspot.com/2007/09/bootstrapping.html - cached - mail it - history
Java-based framework designed to support the development of applications for unsupervised machine learning tasks, with a particular focus on their application to text data
by otis 2008-01-12 11:35 java · api · cluster · library · NLP · information retrieval
http://mlg.ucd.ie/content/view/18/ - cached - mail it - history
LETOR is a benchmark dataset for research on learning to rank, released by Microsoft Research Asia.
by otis 2008-01-09 01:56 information retrieval · evaluation · tool · rank
http://research.microsoft.com/users/LETOR/ - cached - mail it - history
An Information Retrieval Tutorial on Cosine Similarity Measures, Dot Products and Term Weight Calculations.
by otis 2007-12-02 18:10 cosine similarity · information retrieval · term vector · algorithm
http://www.miislita.com/information-retrieval-tutorial/cosine-similarity-tutorial.html - cached - mail it - history
by otis 2007-11-29 03:24 information retrieval · NLP · paper · howto · article
http://www.basistech.com/knowledge-center/ - cached - mail it - history
by otis 2007-10-14 22:13 book · information retrieval · search · index
http://www-csli.stanford.edu/~hinrich/information-retrieval-book.html - cached - mail it - history
The Clair library is a suite of open-source Perl modules intended to simplify a number of generic tasks in natural language processing (NLP), information retrieval (IR), and network analysis (NA). Its architecture also allows for external software to be plugged in with very little effort.
by otis 2007-09-24 15:47 perl · NLP · information retrieval · library
http://belobog.si.umich.edu/mediawiki/index.php/Main_Page - cached - mail it - history
In this paper, we attempt to build query networks from web search engine query logs, with the nodes representing queries and the edges exhibiting the semantic relatedness between queries. To build the network, users’ query histories are extracted from query logs and are then segmented into query sessions. Semantic relatedness of queries is modeled using three different statistical measures: collocation, weighted dependence, and mutual information. We compare the constructed query networks with comparable random networks and conclude that query networks are of small world properties. Besides, we propose a method for identifying the community structures, which is representative of semantic taxonomies, by applying Newman clustering to query networks. The experimental evaluation prove the effectiveness of our proposed method against a baseline model.
by otis 2007-09-17 19:41 NLP · information retrieval · paper · search
http://www-personal.umich.edu/~ladamic/si708w07/projects/qla.pdf - cached - mail it - history
Rich NLP resource
by otis 2007-05-22 18:22 NLP · linguistics · resource · information retrieval
http://www-nlp.stanford.edu/links/statnlp.html - cached - mail it - history
IBM's experiences building a web-scale CMS with billions of documents
by otis 2007-04-17 22:42 search · index · information retrieval · text mining · text analysis · paper
http://sites.computer.org/debull/A06dec/main1.ps - cached - mail it - history
by otis 2007-01-19 03:35 china · search · information retrieval · data mining · research
http://apex.sjtu.edu.cn/apex_wiki - cached - mail it - history
Grant Ingersoll's analysis of academic CS papers, focused on IR, NLP, ML, and such
by otis 2007-01-15 02:20 paper · information retrieval · NLP · machine learning · Grant Ingersoll
http://paperoftheweek.com/ - cached - mail it - history
by otis 2006-09-27 18:49 hrvatski · croatia · information retrieval · morphology · search · index
http://www.hnk.ffzg.hr/jthj/ - cached - mail it - history
Croatian (computational) linguistics resource
by otis 2006-09-27 18:33 hrvatski · croatia · rjecnik · morphology · information retrieval · search · index
http://www.hnk.ffzg.hr/ - cached - mail it - history
by otis 2006-09-18 17:54 information retrieval · document frequency · word
http://elib.cs.berkeley.edu/docfreq/index.html - cached - mail it - history
by otis 2006-09-07 13:39 UIMA · java · search · information retrieval
http://uima.lti.cs.cmu.edu:8080/UCR/Welcome.do - cached - mail it - history
by otis 2006-08-07 22:55 arabic · information retrieval · index · search · analysis · lucene · java
http://www.nongnu.org/aramorph/english/index.html - cached - mail it - history
by otis 2006-08-07 17:26 information retrieval · book · free · pdf
http://www-csli.stanford.edu/~schuetze/information-retrieval-book.html - cached - mail it - history
by otis 2006-08-07 17:07 information retrieval · speech · language · NLP · book
http://www.cs.colorado.edu/~martin/SLP/Updates/newtoc.html - cached - mail it - history
Copy this Kea
Analyzes text and extracts key phrases
by otis 2006-08-04 16:39 java · api · information retrieval · data mining
http://www.nzdl.org/Kea/ - cached - mail it - history
toolkit supports indexing of large-scale text databases, the construction of simple language models for documents, queries, or subcollections, and the implementation of retrieval systems based on language models as well as a variety of other retrieval models
by otis 2006-08-04 16:34 search · index · api · information retrieval · indri · C · C++
http://www.lemurproject.org/ - cached - mail it - history
Written in Java, free, Mozilla Public License
by otis 2006-08-04 16:31 java · api · information retrieval · search · index
http://ir.dcs.gla.ac.uk/terrier/ - cached - mail it - history
by otis 2006-07-21 13:07 information retrieval · tutorial · howto · search · term vector · index
http://www.miislita.com/information-retrieval-tutorial/information-retrieval-tutorials.html - cached - mail it - history
Company provides a number of linguistics products: language id, advanced CJK handling, entity extraction, etc.
by otis 2006-07-03 13:19 language · linguistics · information retrieval · CJK · company
http://www.basistech.com/ - cached - mail it - history
Java API with a collection of string matching, similarity, and distance measures
by otis 2005-08-02 16:40 string · matching · NLP · library · java · api · information retrieval · compare · distance · similarity
http://secondstring.sourceforge.net/ - cached - mail it - history
NLP software from Stanford NLP group, written in Java, with GPL
by otis 2005-05-12 14:56 information retrieval · java · framework · gpl · NLP · api
http://www-nlp.stanford.edu/software/index.shtml - cached - mail it - history
Java API for text categorization and other NLP stuff
by otis 2005-05-08 09:16 java · text · information · visual · NLP · api · pattern · category · information retrieval · machine learning · data mining
http://minorthird.sourceforge.net/ - cached - mail it - history
by otis 2005-01-26 13:06 java · matrix · software · library · free · text mining · lucene · api · cluster · information retrieval · document
http://www.trist.de/CV/Text-Mining/ - cached - mail it - history
by otis 2004-07-21 10:49 clucene · lucene · port · c++ · ben · index · search · software · library · information retrieval · LIA
http://sourceforge.net/projects/clucene/ - cached - mail it - history
by otis 2004-04-09 08:39 wordnet · download · free · information retrieval · word
http://xwn.hlt.utdallas.edu/ - cached - mail it - history
by otis 2004-03-29 23:11 java · wordnet · library · information retrieval
http://sourceforge.net/projects/jwordnet - cached - mail it - history
1 - 93 of 93  
Related Tags
 
- exclude ~ optional + require
Add Dates