links · people · groups · tags | My: links · tags · groups · watchlists · notes login · sign up now! | help · blog
Simpy simpy
 
Search Everyone: "search",

Top "search" experts: neardeath, beckymc21, macroron, barjacob, glebarr, tobassam,

Groups about "search": Searches, search engine, edinburgh_search, searchengines, Quality Search Engine Optimization, Google Search Clinic,

1 - 100 of 114 next »   Watch otis
 
Log aggregation, parsing, and indexing
by otis 2009-11-21 20:51 logging · log analysis · index · search · server · parse · software
http://code.google.com/p/logstash/ - cached - mail it - history
YouSeer is an open source search engine framework, which was built on top of other open source components. YouSeer utilizes Hereitrix as a crawler and solr as an indexing system. The framework provides software to ingest the documents harvested by Heritrix into solr. The ingesting software is very flexible and allows for user-specific data extraction implementations. Further, YouSeer provides a simple interface to query the index and another interface to retrieve cached versions of the documents.
by otis 2009-11-19 13:18 crawl · index · search · Heritrix · nutch · information retrieval
http://youseer.sourceforge.net/ - cached - mail it - history
by otis 2009-11-05 14:51 health care · medicine · search · visual · dictionary
http://www.curehunter.com/public/dictionary.do - cached - mail it - history
Common English misspellings from Wikipedia 4107 misspellings as of 2009-10-29
by otis 2009-10-29 12:20 wikipedia · spell · english · language · search · information retrieval · NLP
http://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines - cached - mail it - history
by otis 2009-10-29 12:06 ajax · solr · javascript · search · widget
http://evolvingweb.github.com/ajax-solr/ - cached - mail it - history
Wunder's progressive reranking explanation
by otis 2009-10-22 12:41 search · information retrieval · rank · score
http://wunderwood.org/most_casual_observer/2007/04/progressive_reranking.html - cached - mail it - history
Sen is the first opensource morphological analyzer written in pure Java.
by otis 2009-10-16 23:39 japanese · morphology · analysis · lucene · search · index · information retrieval · NLP · library
https://sen.dev.java.net/ - cached - mail it - history
OpenGrok is a fast and usable source code search and cross reference engine. It helps you search, cross-reference and navigate your source tree. It can understand various program file formats and version control histories like Mercurial, Git, SCCS, RCS, CVS, Subversion, Teamware, ClearCase, Perforce and Bazaar. In other words it lets you grok (profoundly understand) the open source, hence the name OpenGrok. It is written in Java.
by otis 2009-08-26 03:45 code · source code · search · browse · reference · java · subversion
http://www.opensolaris.org/os/project/opengrok/ - cached - mail it - history
by otis 2009-08-12 22:24 hbase · katta · solr · social media · shard · search · scalability
http://www.slideshare.net/lusciouspear/building-a-business-on-hadoop-hbase-and-open-source-distributed-computing?src=rel... - cached - mail it - history
Galago is a toolkit for experimenting with text search. It is based on small, pluggable components that are easy to replace and change, both during indexing and during retrieval. It includes TupleFlow, which is a distributed computation framework like MapReduce or Dryad. TupleFlow manages the difficult parts of processing text: serializing data, sorting it, and distributing processing. The IndexReader and IndexWriter classes manage storing key/value pairs like inverted lists. This makes it possible to make your own kinds of index structures without starting from scratch.
by otis 2009-08-12 16:01 java · software · search · library · information retrieval · distributed computing
http://www.galagosearch.org/ - cached - mail it - history
Ivory is a Hadoop toolkit for Web-scale information retrieval research that features a retrieval engine based on Markov Random Fields
by otis 2009-08-12 15:56 hadoop · MapReduce · information retrieval · search
http://www.umiacs.umd.edu/~jimmylin/ivory/docs/index.html - cached - mail it - history
by otis 2009-08-10 13:32 search · saas · lucene · AWS · ec2 · amazon
http://www.searchblox.com/searchbloxami.html - cached - mail it - history
Zemberek is an open source, platform independent, general purpose Natural Language Processing library and toolset designed for Turkic languages, especially Turkish. Zemberek is officially used as spell checker in Open Office Turkish version and Turkish national Linux Distribution Pardus. Google Code will host Zemberek-2, Zemberek Corpus and Wordnet projects. These projects has Mozilla Public License.
by otis 2009-07-24 09:41 turkish · language · analysis · search · tokenizer · stemming · NLP · library
http://code.google.com/p/zemberek/ - cached - mail it - history
by otis 2009-07-10 16:03 lucene · filesystem · index · search · desktop · desktop search
http://regain.sourceforge.net/ - cached - mail it - history
Default dictionary break iterator for Chinese, Japanese, Korean
by otis 2009-06-03 00:15 CJK · japan · chinese · korean · computational linguistics · NLP · information retrieval · search · analysis · word segmentation
http://bugs.icu-project.org/trac/ticket/2229 - cached - mail it - history
by otis 2009-05-28 23:40 chinese · dictionary · information retrieval · search
http://www.mdbg.net/chindict/chindict.php?page=cc-cedict - cached - mail it - history
by otis 2009-05-28 14:34 search · software · python · django · lucene · solr · information retrieval
http://haystacksearch.org/ - cached - mail it - history
by otis 2009-05-28 14:27 solr · ruby · ruby on rails · search · information retrieval
http://outoftime.github.com/sunspot/ - cached - mail it - history
by otis 2009-05-24 21:31 rdf · solr · software · java · search · information retrieval
http://fgiasson.com/blog/index.php/2009/04/29/rdf-aggregates-and-full-text-search-on-steroids-with-solr/ - cached - mail it - history
by otis 2009-05-14 17:13 django · solr · python · information retrieval · search
http://code.google.com/p/django-solr-search/ - cached - mail it - history
by otis 2009-05-14 15:07 taxonomy · ontology · facet · NLP · search
http://www.ideaeng.com/tabId/98/itemId/199/Whats-the-difference-between-Taxonomies-and-Ontol.aspx - cached - mail it - history
A WordPress plugin that interacts with an instance of the Solr search engine. This plugin allows you to index pages and posts, perform advanced queries and enable faceting on fields such as tags, categories, and author. Adds special template tags so you can create your own custom result pages to match your theme. Configuration options allow you to select pages to ignore, features to enable/disable, and what type of result information you want output.
by otis 2009-04-22 10:02 solr · wordpress · blog · search · plugin
https://launchpad.net/solr4wordpress - cached - mail it - history
REPLAY is an open source solution developed in java to manage the workflow of audiovisual lecture recordings from production in the classroom to distribution on various channels in an automated manner. In this, it also provides comprehensive functionalities for existing audiovisual archives, repositories or collections.
by otis 2009-03-24 12:57 audio · video · index · search · archive · free · software · java
http://www.replay.ethz.ch/ - cached - mail it - history
Sedna is a free native XML database which provides a full range of core database services - persistent storage, ACID transactions, security, indices, hot backup. Flexible XML processing facilities include W3C XQuery implementation, tight integration of XQuery with full-text search facilities and a node-level update language.
by otis 2009-03-14 16:46 xml · database · xquery · full-text · search
http://modis.ispras.ru/sedna/ - cached - mail it - history
WikiXMLDB provides a way of querying Wikipedia with XQuery.
by otis 2009-03-14 16:41 wikipedia · xml · xquery · search · knowledge · structure · NLP · data mining
http://wikixmldb.dyndns.org/ - cached - mail it - history
by otis 2009-03-08 00:36 lucene · search · query expansion · information retrieval
http://grasia.fdi.ucm.es/jose/query-expansion/ - cached - mail it - history
Lucas is a UIMA CAS consumer component which bridges the UIMA framework with the Lucene search engine library. Lucas maps CASes to lucene index documents according to a mapping file .
by otis 2009-02-27 12:57 java · UIMA · lucene · index · search · pipeline · software · information retrieval
https://www.coling.uni-jena.de/sites/lucas/index.html - cached - mail it - history
by otis 2009-02-20 00:05 geolocation · geocode · geography · search · lucene · solr · latitude · longitude
http://www.gissearch.com/ - cached - mail it - history
by otis 2009-02-17 02:55 .net · solr · client · software · search · library · information retrieval
http://code.google.com/p/solrnet/ - cached - mail it - history
Set Operation implementations for SortedIntegerSegments for inverted list caching in search engines. The implementations also include P4Delta compression algorithm based DocIdSet for iterating over DocIdSets in a compressed form.
by otis 2009-02-09 01:25 lucene · search · index · compress · information retrieval · set · java
http://code.google.com/p/lucene-ext/ - cached - mail it - history
by otis 2009-02-04 16:01 search · search results · ui · interface · design · usability
http://patterntap.com/tap/collection/search - cached - mail it - history
by otis 2009-01-08 17:46 Daniel Tunkelang · information retrieval · facet · navigate · results · search · endeca · set · presentation
http://yahoo.hosted.panopto.com/CourseCast/Viewer/Default.aspx?id=6d0a6847-be51-4d29-8c1c-f961274b5343 - cached - mail it - history
by otis 2008-12-23 14:10 collocations · term · summary · NLP · information retrieval · search · keywords · key phrases
http://www.extractor.com/ - cached - mail it - history
WebLA is a Java package for handling Web Graphs, implementing popular algorithms such as PageRank, HITS, CoCitation Similarity and SimRank. It is of particular interest for research in Information Retrieval, since it provides a set of APIs (Application Programming Interfaces) that allow one to easily experiment with such algorithms.
by otis 2008-12-21 01:54 information retrieval · search · algorithm · pagerank · graph · api · library · java
http://webla.sourceforge.net/ - cached - mail it - history
by otis 2008-12-19 12:20 search · search engine · log analysis · log · query
http://glinden.blogspot.com/2008/11/finding-task-boundaries-in-search-logs.html - cached - mail it - history
by otis 2008-12-05 00:46 pubsub · prospective search · paper · reference · research · publish · subscribe · query · search
http://www.seas.upenn.edu/~svilen/publications/subscribe.pdf - cached - mail it - history
Database->Lucene command-line indexing tool
by otis 2008-11-17 13:13 lucene · database · index · search · command line
http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql - cached - mail it - history
by otis 2008-10-09 13:14 maven · java · search · repository
http://mvnrepository.com/ - cached - mail it - history
Flax is a powerful enterprise search solution platform, open source licensed under the GPL.
by otis 2008-09-25 00:41 search · open source · software · library
http://www.flax.co.uk/index.shtml - cached - mail it - history
Presents files in a calendar view, supports search by name or content and filtering/narrowing by file type.
by otis 2008-09-25 00:37 filesystem · calendar · search · desktop
http://iola.dk/nemo/ - cached - mail it - history
Recoll is a personal full text search tool for Unix/Linux.
by otis 2008-09-25 00:29 desktop search · search · linux · xapian · software
http://www.lesbonscomptes.com/recoll/ - cached - mail it - history
by otis 2008-09-25 00:24 desktop search · search · linux · software
http://beagle-project.org/Main_Page - cached - mail it - history
Strigi is a daemon which uses a very fast and efficient crawler that can index data on your harddrive. Indexing operations are performed without hammering your system, this makes Strigi the fastest and smallest desktop searching program. Strigi can index different file formats, including the contents of the archive files.
by otis 2008-09-25 00:10 desktop search · search · linux · application
http://strigi.sourceforge.net/ - cached - mail it - history
by otis 2008-09-24 13:37 lucene · search · index · distributed search · facet · autocomplete · autosuggest · java · software · spell
http://www.statsbiblioteket.dk/summa/features-text-in-english - cached - mail it - history
by otis 2008-09-18 16:18 geo · search · lucene · solr
http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html - cached - mail it - history
Indexing package which makes it's best effort to abstract away which implementation of Indexer you are using by introducing the DocumentIndexer interface which don't use the propriatery lucene Document but instead uses java.util.Map.
by otis 2008-09-11 17:29 java · software · api · search · index · lucene · solr · cluster
http://dev.tailsweep.com/projects/haloe/ - cached - mail it - history
by otis 2008-08-18 13:30 search · search engine · information retrieval · vector space · linear algebra
http://mathdl.maa.org/mathDL/4/?pa=content&sa=viewDocument&nodeId=636&pf=1 - cached - mail it - history
Interviews with "Search Wizards" - people from the world of IR, NLP...
by otis 2008-06-11 12:06 search · people · interview · information retrieval · NLP
http://www.arnoldit.com/search-wizards-speak/ - cached - mail it - history
Subject-Verb-Object extraction, idea navigation
by otis 2008-06-06 12:53 search · explore · navigate · discover · collocations · NLP · POS · concept · video · lecture
http://videolectures.net/chi08_zelevinsky_ins/ - cached - mail it - history
by otis 2008-06-05 13:14 search · information retrieval · facet · explore · discover · search results · Peter Morville
http://www.slideshare.net/morville/search-patterns/ - cached - mail it - history
by otis 2008-05-30 00:20 patricia tree · radix tree · prefix tree · trie · tree · data structure · string · search · NLP
http://code.google.com/p/radixtree/ - cached - mail it - history
by otis 2008-05-29 23:59 suffix tree · patricia tree · trie · tree · data structure · search · string · video · lecture · NLP
http://www.cs.umd.edu/class/fall2004/cmsc132/suffixTree.mov - cached - mail it - history
Search Engine with a web crawler that can be trained to classify pages and crawl only "interesting" pages. Uses Lucene under the hood. Fully distributed and capable of large scale crawling and searching.
by otis 2008-05-22 17:27 search · search engine · crawl · java · software · bayes · classification · index · plugin · lucene · information retrieval
http://hounder.org/ - cached - mail it - history
by otis 2008-05-20 11:38 cluster · search · manage · crawl · distributed search · java
http://opensource.flaptor.com/clusterfest/index.html - cached - mail it - history
by otis 2008-04-29 22:34 lucene · index · search · shard · grid · distributed search · hadoop · java · master · slave
http://katta.wiki.sourceforge.net/ - cached - mail it - history
by otis 2008-04-20 16:22 vietnam · information retrieval · index · search · word segmentation · dictionary · analysis · tokenizer · language
http://www-users.cs.umn.edu/~thnguyen/Publication/RIVF06_Word_Segmentation_for_Vietnamese_Text_Categorization_An_online_... - cached - mail it - history
by otis 2008-04-20 16:15 vietnam · information retrieval · index · search · word segmentation · dictionary · analysis · tokenizer · language
http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings6/EVIA/17.pdf - cached - mail it - history
software package for the development and application of grammars that are used for the analysis of words and sentences of natural languages. It contains a programming language for the modelling of morphology and syntax grammars.
by otis 2008-04-18 12:11 search · index · library · morphology · analysis · language · lucene · stemming · lemmatization
http://home.arcor.de/bjoern-beutel/malaga/ - cached - mail it - history
by otis 2008-04-13 22:23 search · information retrieval · personalization
http://sifaka.cs.uiuc.edu/xshen/publication.html - cached - mail it - history
by otis 2008-04-03 12:28 patent · intelectual property · search
http://www.priorsmart.com/ - cached - mail it - history
A sandbox for collecting search examples, patterns, and anti-patterns.
by otis 2008-03-31 02:37 search · information retrieval · facet · explore · discover · search results · Peter Morville · screenshot · ui
http://flickr.com/photos/morville/collections/72157603785835882/ - cached - mail it - history
SCAN (Smart Content Aggregation and Navigation) is a universal semantic content aggregator. It combines search, text analysis, tagging and metadata functions to provide new user experience of desktop navigation and document management.
by otis 2008-02-19 14:47 desktop search · application · search · document · aggregator
http://scan.sourceforge.net/ - cached - mail it - history
geoLucene is an extension of Lucene that allows to effectively index and search documents that contain locational information (longitude/latitude). It uses R-tree as a spacial index.
by otis 2008-02-17 03:12 lucene · java · library · search · index · geolocation · geocode · query · spatial
https://sourceforge.net/projects/geolucene/ - cached - mail it - history
A lucene extension providing geographical based searching - boundary box and radius queries
by otis 2008-02-17 00:33 lucene · search · index · java · api · geocode · geolocation
http://www.nsshutdown.com/viewcvs/viewcvs.cgi/locallucene/ - cached - mail it - history
Searches broken down by category -- manually categorized AOL search dataset.
by otis 2008-01-17 17:03 search · category · aol · query
http://www.skrenta.com/2008/01/long_tail_in_a_short_table.html - cached - mail it - history
Chinese Segmentation Bases on Apache Lucene Analyzer
by otis 2008-01-05 22:15 java · api · lucene · chinese · analysis · segment · index · search
http://code.google.com/p/hickwall-analyzer/ - cached - mail it - history
Lucene indexer for Wikipedia dumps
by otis 2007-12-13 02:34 java · lucene · wikipedia · index · search · dump
http://schmidt.devlib.org/software/lucene-wikipedia.html - cached - mail it - history
by otis 2007-11-28 02:27 lucene · mailing list · search · archive
http://lucene.markmail.org/ - cached - mail it - history
by otis 2007-11-01 23:30 video · presentation · search · similar · NLP
http://glinden.blogspot.com/2007/10/google-tech-talk-on-similarities.html - cached - mail it - history
by otis 2007-10-14 22:13 book · information retrieval · search · index
http://www-csli.stanford.edu/~hinrich/information-retrieval-book.html - cached - mail it - history
In this paper, we attempt to build query networks from web search engine query logs, with the nodes representing queries and the edges exhibiting the semantic relatedness between queries. To build the network, users’ query histories are extracted from query logs and are then segmented into query sessions. Semantic relatedness of queries is modeled using three different statistical measures: collocation, weighted dependence, and mutual information. We compare the constructed query networks with comparable random networks and conclude that query networks are of small world properties. Besides, we propose a method for identifying the community structures, which is representative of semantic taxonomies, by applying Newman clustering to query networks. The experimental evaluation prove the effectiveness of our proposed method against a baseline model.
by otis 2007-09-17 19:41 NLP · information retrieval · paper · search
http://www-personal.umich.edu/~ladamic/si708w07/projects/qla.pdf - cached - mail it - history
by otis 2007-08-31 21:51 java · lucene · software · search · index · cluster · grid
http://staff.science.uva.nl/~emeij/research.html - cached - mail it - history
Tracker is a tool designed to extract information and metadata about your personal data so that it can be searched easily and quickly.
by otis 2007-08-01 21:22 linux · search · application · desktop search
http://www.gnome.org/projects/tracker/ - cached - mail it - history
by otis 2007-05-31 04:02 search · lucene · java
http://www.browseengine.com/ - cached - mail it - history
by otis 2007-04-26 13:26 lucene · database · rdbms · search · index
http://www.dbsight.net/ - cached - mail it - history
Keep Lucene indices in sync with the DB
by otis 2007-04-25 12:50 hibernate · orm · lucene · search · database
http://www.hibernate.org/410.html - cached - mail it - history
IBM's experiences building a web-scale CMS with billions of documents
by otis 2007-04-17 22:42 search · index · information retrieval · text mining · text analysis · paper
http://sites.computer.org/debull/A06dec/main1.ps - cached - mail it - history
Supports / parses common office file formats are supported by native Java file parsers: MS Office, Outlook, PDF, HTML, TXT, ZIP, tar.gz, PST, Pictures, scanned Images, etc.
by otis 2007-03-26 19:30 java · lucene · search · index · server · application · parse
http://www.enhydra.org/apps/snapper/index.html - cached - mail it - history
Solr-like search REST service written in Python
by otis 2007-03-16 15:02 python · lucene · search · index · solr
http://code.google.com/p/grassyknoll/ - cached - mail it - history
Wikia Search mailing list
by otis 2007-02-14 03:07 search · wikia · Jimmy Wales · mailing list
http://lists.wikia.com/pipermail/search-l/ - cached - mail it - history
PyLucene-based Lucene Shell command line tool
by otis 2007-02-02 09:42 python · lucene · command line · shell · index · search
http://cheeseshop.python.org/pypi/plush - cached - mail it - history
by otis 2007-01-19 03:35 china · search · information retrieval · data mining · research
http://apex.sjtu.edu.cn/apex_wiki - cached - mail it - history
by otis 2006-11-07 11:36 nutch · google · index · search · resource
http://www.google.com/coop/cse?cx=018284732555568122338%3A2hvzpgnzt8w - cached - mail it - history
by otis 2006-11-03 13:11 Manu Konchady · text mining · cluster · search · information extraction
http://textmine.sourceforge.net/ - cached - mail it - history
by otis 2006-10-15 01:49 lucene · java · server · search · index · xml · solr
http://www.cdlib.org/inside/projects/xtf/ - cached - mail it - history
by otis 2006-10-07 11:20 patent · search
http://www.freepatentsonline.com/ - cached - mail it - history
Search engine for academic papers. Uses Lucene.
by otis 2006-10-04 21:37 paper · search · academic · science
http://rexa.info/ - cached - mail it - history
by otis 2006-09-27 18:49 hrvatski · croatia · information retrieval · morphology · search · index
http://www.hnk.ffzg.hr/jthj/ - cached - mail it - history
Croatian (computational) linguistics resource
by otis 2006-09-27 18:33 hrvatski · croatia · rjecnik · morphology · information retrieval · search · index
http://www.hnk.ffzg.hr/ - cached - mail it - history
Lucene-based search engine which powers SourceForge.net's search and Software Map features.
by otis 2006-09-08 21:44 lucene · SourceForge · search · software
http://sourceforge.net/projects/syracuse - cached - mail it - history
by otis 2006-09-07 13:39 UIMA · java · search · information retrieval
http://uima.lti.cs.cmu.edu:8080/UCR/Welcome.do - cached - mail it - history
by otis 2006-08-07 22:55 arabic · information retrieval · index · search · analysis · lucene · java
http://www.nongnu.org/aramorph/english/index.html - cached - mail it - history
toolkit supports indexing of large-scale text databases, the construction of simple language models for documents, queries, or subcollections, and the implementation of retrieval systems based on language models as well as a variety of other retrieval models
by otis 2006-08-04 16:34 search · index · api · information retrieval · indri · C · C++
http://www.lemurproject.org/ - cached - mail it - history
Written in Java, free, Mozilla Public License
by otis 2006-08-04 16:31 java · api · information retrieval · search · index
http://ir.dcs.gla.ac.uk/terrier/ - cached - mail it - history
Mailing list manager written in java. Includes index, search, and archive functionality
by otis 2006-07-26 16:04 java · mailing list · index · search · archive · lucene · jboss · hibernate · manage
http://subetha.tigris.org/ - cached - mail it - history
by otis 2006-07-21 13:07 information retrieval · tutorial · howto · search · term vector · index
http://www.miislita.com/information-retrieval-tutorial/information-retrieval-tutorials.html - cached - mail it - history
Lucene-based indexing and search solution that can pull data from various storage types and handle a number of different file formats
by otis 2006-06-05 12:37 java · software · search · index · lucene · database · free
http://www.kneobase.com/en/ - cached - mail it - history
1 - 100 of 114 next »  
Related Tags
 
- exclude ~ optional + require
Add Dates