links · people · groups · tags | My: links · tags · groups · watchlists · notes login · sign up now! | help · blog
Simpy simpy
 
1 - 50 of 1138 next »   Watch otis
 
by otis 2009-12-09 13:22 hadoop · hdfs · hbase · python · json · data warehouse · storage
http://github.com/zohmg/zohmg - cached - mail it - history
A Java library that provides rulings on what portion of passed hostname / domain name string is effective-TLD (or Public Suffix).
by otis 2009-12-04 01:41
http://publicsuffix.sourceforge.net/ - cached - mail it - history
A java library for removal of chrome / boilerplate from documents like web pages
by otis 2009-12-03 23:16 java · library · api · information extraction · software
http://code.google.com/p/boilerpipe/ - cached - mail it - history
by otis 2009-12-03 21:20 crawl · dataset
http://bixolabs.com/2009/12/02/public-web-crawler-projects/ - cached - mail it - history
by otis 2009-12-03 13:02 java · software · library · api · machine learning
http://java-ml.sourceforge.net/ - cached - mail it - history
by otis 2009-12-03 10:59 machine learning · course · pdf · lecture · statistics · probability · linear algebra
http://www.stanford.edu/class/cs229/materials.html - cached - mail it - history
by otis 2009-12-03 10:42 machine learning · journal · research · paper
http://jmlr.csail.mit.edu/ - cached - mail it - history
Set of tutorials on many aspects of statistical data mining, including the foundations of probability, the foundations of statistical data analysis, and most of the classic machine learning and data mining algorithms. These include classification algorithms such as decision trees, neural nets, Bayesian classifiers, Support Vector Machines and cased-based (aka non-parametric) learning. They include regression algorithms such as multivariate polynomial regression, MARS, Locally Weighted Regression, GMDH and neural nets. And they include other data mining operations such as clustering (mixture models, k-means and hierarchical), Bayesian networks and Reinforcement Learning.
by otis 2009-12-03 10:29 machine learning · data mining · algorithm · tutorial · presentation · reference · statistics · probability
http://www.autonlab.org/tutorials/ - cached - mail it - history
by otis 2009-12-03 10:20 machine learning · book · free · download · pdf
http://robotics.stanford.edu/people/nilsson/mlbook.html - cached - mail it - history
by otis 2009-12-02 12:54 math · questions · answers · forum
http://mathoverflow.net/ - cached - mail it - history
Behemoth allows to deploy GATE or UIMA applications over a Hadoop cluster in order to do very large scale document analysis. It uses a simple representation format which can be used as a common ground between UIMA and GATE-generated annotations, hence achieving compatibility between both systems. Since it is Hadoop-based it benefits from all its features, namely scalability, fault-tolerance and most notably the back up of a thriving open source community. Quite a few Apache resources will fit into it: Nutch, Tika, Mahout, Hbase etc...
by otis 2009-12-01 22:35 UIMA · gate · hadoop · text mining · text analysis · MapReduce · distributed computing · NLP
http://code.google.com/p/behemoth-pebble/ - cached - mail it - history
The purpose of this project is to develop a set of reusable Java components that implement functionality common to any web crawler. These components would benefit from collaboration among various existing web crawler projects, and reduce duplication of effort.
by otis 2009-12-01 22:33 crawl · robots.txt · fetch · java · software · information extraction
http://code.google.com/p/crawler-commons/ - cached - mail it - history
by otis 2009-12-01 09:52 hadoop · gui · browser · script
http://dev.bizo.com/2009/11/quick-script-open-hadoop-jobtracker-ui.html - cached - mail it - history
Semantically Annotated Snapshot of the English Wikipedia
by otis 2009-11-28 01:29 wikipedia · semantic · annotate · knowledge
http://www.yr-bcn.es/dokuwiki/doku.php?id=semantically_annotated_snapshot_of_wikipedia - cached - mail it - history
by otis 2009-11-27 23:05 performance · java · logging · statistics · chart · software · library
http://perf4j.codehaus.org/ - cached - mail it - history
Little spinning "wait" gifs for ajax apps
by otis 2009-11-24 15:57 ajax · load · asynchronous · icon · picture · gif
http://www.ajaxload.info/ - cached - mail it - history
Framework for writing event-based, non-blocking, fast, HTTP servers in JavaScript
by otis 2009-11-24 15:54 javascript · server · client · NIO · event · framework · http
http://nodejs.org/ - cached - mail it - history
by otis 2009-11-23 17:05 twitter · popularity · trust · trust metrics
http://tweetlevel.edelman.com/ - cached - mail it - history
Speedi.ly takes a piece of content, or grabs the content from a URL, and analyzes it. It does this very fast and it outputs some key data. Speedi.ly tells you the language of the content, categorizes it (topics, keywords), and additional metadata.
by otis 2009-11-23 12:06 classification · service · saas · NLP · named entity extraction
http://www.techcrunch.com/2009/11/20/getting-to-the-supertweet-speedi-ly-classifies-the-real-time-web/ - cached - mail it - history
by otis 2009-11-23 12:05 classification · service · saas · windows · NLP
http://uclassify.com/ - cached - mail it - history
Log aggregation, parsing, and indexing
by otis 2009-11-21 20:51 logging · log analysis · index · search · server · parse · software
http://code.google.com/p/logstash/ - cached - mail it - history
List and Javadoc of all JE-BDB properties
by otis 2009-11-20 11:23 bdb · properties · configure · java · database · performance · tuning · software · javadoc
http://www.oracle.com/technology/documentation/berkeley-db/je/java/constant-values.html - cached - mail it - history
YouSeer is an open source search engine framework, which was built on top of other open source components. YouSeer utilizes Hereitrix as a crawler and solr as an indexing system. The framework provides software to ingest the documents harvested by Heritrix into solr. The ingesting software is very flexible and allows for user-specific data extraction implementations. Further, YouSeer provides a simple interface to query the index and another interface to retrieve cached versions of the documents.
by otis 2009-11-19 13:18 crawl · index · search · Heritrix · nutch · information retrieval
http://youseer.sourceforge.net/ - cached - mail it - history
by otis 2009-11-18 17:36 NLP · information retrieval · semantic · gate · software
http://www.semanticsoftware.info/ - cached - mail it - history
by otis 2009-11-18 17:33 morphix · linux · NLP · software
http://morphix-nlp.berlios.de/ - cached - mail it - history
MuNPEx is a multi-lingual noun phrase (NP) extraction component developed for the GATE architecture, implemented in JAPE. It currently supports English, German, French, and Spanish (in beta). MuNPEx requires a part-of-speech (POS) tagger to work and can additionally use detected named entities (NEs) to improve chunking performance. Please read the documentation (or source code) for more details.
by otis 2009-11-18 17:30 NLP · information retrieval · key phrases · information extraction · computational linguistics · software · gate
http://www.semanticsoftware.info/munpex - cached - mail it - history
by otis 2009-11-18 12:26 hbase · architecture · java · hadoop
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html - cached - mail it - history
by otis 2009-11-17 15:47 hadoop · hbase · rdbms · sql · presentation · powerpoint · compare · database
http://www.docstoc.com/docs/2996433/Hadoop-and-HBase-vs-RDBMS - cached - mail it - history
by otis 2009-11-11 21:27 open calais · screenshot · java · howto · tutorial · video
http://philippeadjiman.com/blog/2009/09/16/open-calais-from-java-with-eclipse-extract-entities-facts-and-events-in-4-min... - cached - mail it - history
by otis 2009-11-11 21:25 eclipse · java · ide · video · tips · shortcuts · howto · tutorial · screencast
http://philippeadjiman.com/blog/2009/10/11/5-video-tutorials-of-small-to-killer-eclipse-shortcuts/ - cached - mail it - history
Google's Java libraries
by otis 2009-11-11 20:56 java · library · api · collection · concurrent · google
http://code.google.com/p/guava-libraries/ - cached - mail it - history
by otis 2009-11-07 01:34 wikipedia · dump · extract · text · data mining · text mining · corporation · NLP
http://evanjones.ca/software/wikipedia2text.html - cached - mail it - history
by otis 2009-11-05 14:51 health care · medicine · search · visual · dictionary
http://www.curehunter.com/public/dictionary.do - cached - mail it - history
Excellent and spot-on presentation and video on work burnout
by otis 2009-11-04 13:32 work · presentation · video · personal
http://www.jonobacon.org/2009/07/29/burnout-presentation-slides/ - cached - mail it - history
Ubuntu Enterprise Cloud is the product, powered by Eucalyptus, that allows you to easily run your own Amazon-EC2-like private cloud n.b. there are posts/parts 2 and 3 to topic
by otis 2009-11-04 12:32 ubuntu · linux · AWS · ec2 · s3 · cloud computing · software · server · cluster
http://fnords.wordpress.com/2009/10/04/run-your-own-uec-part-1/ - cached - mail it - history
a modular data exploration platform that enables the user to visually create data flows (often referred to as pipelines), selectively execute some or all analysis steps, and later investigate the results through interactive views on data and models.
by otis 2009-11-04 00:02 data mining · gui · workflow · analysis · visual
http://www.knime.org/ - cached - mail it - history
PROBABILITIES, STATISTICS AND DATA MODELING
by otis 2009-11-03 16:27 statistics · probability · math · matrix · tutorial · reference · ebook · pdf · NLP
http://www.aiaccess.net/English/Glossaries/Shop/bookstore.htm - cached - mail it - history
by otis 2009-11-02 13:59 software · information retrieval · NLP · perl · corpus · text mining · dataset
http://www.drni.de/wac-tk/ - cached - mail it - history
C++ (but has Java API), GPL
by otis 2009-11-02 13:51 information retrieval · NLP · software · library · api
http://www.lsi.upc.edu/~nlp/freeling/ - cached - mail it - history
by otis 2009-11-02 13:33 NLP · information retrieval · computational linguistics · java · software · api · library
http://herd.ida.liu.se:8180/nlpfarm/ - cached - mail it - history
log4j appender for Scribe
by otis 2009-10-31 01:31 scribe · logging · java · software
http://github.com/alexlod/scribe-log4j-appender - cached - mail it - history
MaltParser is a system for data-driven dependency parsing, which can be used to induce a parsing model from treebank data and to parse new data using an induced model.
by otis 2009-10-29 16:20 machine learning · parse · computational linguistics · NLP · java · software · api · library
http://maltparser.org/ - cached - mail it - history
Platform where anyone can share and mash open data on any subject
by otis 2009-10-29 12:24 data · information retrieval · NLP · machine learning · mashup
http://www.factual.com/ - cached - mail it - history
by otis 2009-10-29 12:22 data · visual · statistics · graph
http://flowingdata.com/ - cached - mail it - history
Common English misspellings from Wikipedia 4107 misspellings as of 2009-10-29
by otis 2009-10-29 12:20 wikipedia · spell · english · language · search · information retrieval · NLP
http://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines - cached - mail it - history
by otis 2009-10-29 12:06 ajax · solr · javascript · search · widget
http://evolvingweb.github.com/ajax-solr/ - cached - mail it - history
by otis 2009-10-27 15:29 bdb · howto · tutorial · reference · database · java · library
http://www.oracle.com/technology/documentation/berkeley-db/je/GettingStartedGuide/backuprestore.html - cached - mail it - history
Wunder's progressive reranking explanation
by otis 2009-10-22 12:41 search · information retrieval · rank · score
http://wunderwood.org/most_casual_observer/2007/04/progressive_reranking.html - cached - mail it - history
Sen is the first opensource morphological analyzer written in pure Java.
by otis 2009-10-16 23:39 japanese · morphology · analysis · lucene · search · index · information retrieval · NLP · library
https://sen.dev.java.net/ - cached - mail it - history
1 - 50 of 1138 next »