links · people · groups · tags | My: links · tags · groups · watchlists · notes login · sign up now! | help · blog
Simpy simpy
 
1 - 100 of 1137 next »   Watch otis
 
A Java library that provides rulings on what portion of passed hostname / domain name string is effective-TLD (or Public Suffix).
by otis 2009-12-04 01:41
http://publicsuffix.sourceforge.net/ - cached - mail it - history
A java library for removal of chrome / boilerplate from documents like web pages
by otis 2009-12-03 23:16 java · library · api · information extraction · software
http://code.google.com/p/boilerpipe/ - cached - mail it - history
by otis 2009-12-03 21:20 crawl · dataset
http://bixolabs.com/2009/12/02/public-web-crawler-projects/ - cached - mail it - history
by otis 2009-12-03 13:02 java · software · library · api · machine learning
http://java-ml.sourceforge.net/ - cached - mail it - history
by otis 2009-12-03 10:59 machine learning · course · pdf · lecture · statistics · probability · linear algebra
http://www.stanford.edu/class/cs229/materials.html - cached - mail it - history
by otis 2009-12-03 10:42 machine learning · journal · research · paper
http://jmlr.csail.mit.edu/ - cached - mail it - history
Set of tutorials on many aspects of statistical data mining, including the foundations of probability, the foundations of statistical data analysis, and most of the classic machine learning and data mining algorithms. These include classification algorithms such as decision trees, neural nets, Bayesian classifiers, Support Vector Machines and cased-based (aka non-parametric) learning. They include regression algorithms such as multivariate polynomial regression, MARS, Locally Weighted Regression, GMDH and neural nets. And they include other data mining operations such as clustering (mixture models, k-means and hierarchical), Bayesian networks and Reinforcement Learning.
by otis 2009-12-03 10:29 machine learning · data mining · algorithm · tutorial · presentation · reference · statistics · probability
http://www.autonlab.org/tutorials/ - cached - mail it - history
by otis 2009-12-03 10:20 machine learning · book · free · download · pdf
http://robotics.stanford.edu/people/nilsson/mlbook.html - cached - mail it - history
by otis 2009-12-02 12:54 math · questions · answers · forum
http://mathoverflow.net/ - cached - mail it - history
Behemoth allows to deploy GATE or UIMA applications over a Hadoop cluster in order to do very large scale document analysis. It uses a simple representation format which can be used as a common ground between UIMA and GATE-generated annotations, hence achieving compatibility between both systems. Since it is Hadoop-based it benefits from all its features, namely scalability, fault-tolerance and most notably the back up of a thriving open source community. Quite a few Apache resources will fit into it: Nutch, Tika, Mahout, Hbase etc...
by otis 2009-12-01 22:35 UIMA · gate · hadoop · text mining · text analysis · MapReduce · distributed computing · NLP
http://code.google.com/p/behemoth-pebble/ - cached - mail it - history
The purpose of this project is to develop a set of reusable Java components that implement functionality common to any web crawler. These components would benefit from collaboration among various existing web crawler projects, and reduce duplication of effort.
by otis 2009-12-01 22:33 crawl · robots.txt · fetch · java · software · information extraction
http://code.google.com/p/crawler-commons/ - cached - mail it - history
by otis 2009-12-01 09:52 hadoop · gui · browser · script
http://dev.bizo.com/2009/11/quick-script-open-hadoop-jobtracker-ui.html - cached - mail it - history
Semantically Annotated Snapshot of the English Wikipedia
by otis 2009-11-28 01:29 wikipedia · semantic · annotate · knowledge
http://www.yr-bcn.es/dokuwiki/doku.php?id=semantically_annotated_snapshot_of_wikipedia - cached - mail it - history
by otis 2009-11-27 23:05 performance · java · logging · statistics · chart · software · library
http://perf4j.codehaus.org/ - cached - mail it - history
Little spinning "wait" gifs for ajax apps
by otis 2009-11-24 15:57 ajax · load · asynchronous · icon · picture · gif
http://www.ajaxload.info/ - cached - mail it - history
Framework for writing event-based, non-blocking, fast, HTTP servers in JavaScript
by otis 2009-11-24 15:54 javascript · server · client · NIO · event · framework · http
http://nodejs.org/ - cached - mail it - history
by otis 2009-11-23 17:05 twitter · popularity · trust · trust metrics
http://tweetlevel.edelman.com/ - cached - mail it - history
Speedi.ly takes a piece of content, or grabs the content from a URL, and analyzes it. It does this very fast and it outputs some key data. Speedi.ly tells you the language of the content, categorizes it (topics, keywords), and additional metadata.
by otis 2009-11-23 12:06 classification · service · saas · NLP · named entity extraction
http://www.techcrunch.com/2009/11/20/getting-to-the-supertweet-speedi-ly-classifies-the-real-time-web/ - cached - mail it - history
by otis 2009-11-23 12:05 classification · service · saas · windows · NLP
http://uclassify.com/ - cached - mail it - history
Log aggregation, parsing, and indexing
by otis 2009-11-21 20:51 logging · log analysis · index · search · server · parse · software
http://code.google.com/p/logstash/ - cached - mail it - history
List and Javadoc of all JE-BDB properties
by otis 2009-11-20 11:23 bdb · properties · configure · java · database · performance · tuning · software · javadoc
http://www.oracle.com/technology/documentation/berkeley-db/je/java/constant-values.html - cached - mail it - history
YouSeer is an open source search engine framework, which was built on top of other open source components. YouSeer utilizes Hereitrix as a crawler and solr as an indexing system. The framework provides software to ingest the documents harvested by Heritrix into solr. The ingesting software is very flexible and allows for user-specific data extraction implementations. Further, YouSeer provides a simple interface to query the index and another interface to retrieve cached versions of the documents.
by otis 2009-11-19 13:18 crawl · index · search · Heritrix · nutch · information retrieval
http://youseer.sourceforge.net/ - cached - mail it - history
by otis 2009-11-18 17:36 NLP · information retrieval · semantic · gate · software
http://www.semanticsoftware.info/ - cached - mail it - history
by otis 2009-11-18 17:33 morphix · linux · NLP · software
http://morphix-nlp.berlios.de/ - cached - mail it - history
MuNPEx is a multi-lingual noun phrase (NP) extraction component developed for the GATE architecture, implemented in JAPE. It currently supports English, German, French, and Spanish (in beta). MuNPEx requires a part-of-speech (POS) tagger to work and can additionally use detected named entities (NEs) to improve chunking performance. Please read the documentation (or source code) for more details.
by otis 2009-11-18 17:30 NLP · information retrieval · key phrases · information extraction · computational linguistics · software · gate
http://www.semanticsoftware.info/munpex - cached - mail it - history
by otis 2009-11-18 12:26 hbase · architecture · java · hadoop
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html - cached - mail it - history
by otis 2009-11-17 15:47 hadoop · hbase · rdbms · sql · presentation · powerpoint · compare · database
http://www.docstoc.com/docs/2996433/Hadoop-and-HBase-vs-RDBMS - cached - mail it - history
by otis 2009-11-11 21:27 open calais · screenshot · java · howto · tutorial · video
http://philippeadjiman.com/blog/2009/09/16/open-calais-from-java-with-eclipse-extract-entities-facts-and-events-in-4-min... - cached - mail it - history
by otis 2009-11-11 21:25 eclipse · java · ide · video · tips · shortcuts · howto · tutorial · screencast
http://philippeadjiman.com/blog/2009/10/11/5-video-tutorials-of-small-to-killer-eclipse-shortcuts/ - cached - mail it - history
Google's Java libraries
by otis 2009-11-11 20:56 java · library · api · collection · concurrent · google
http://code.google.com/p/guava-libraries/ - cached - mail it - history
by otis 2009-11-07 01:34 wikipedia · dump · extract · text · data mining · text mining · corporation · NLP
http://evanjones.ca/software/wikipedia2text.html - cached - mail it - history
by otis 2009-11-05 14:51 health care · medicine · search · visual · dictionary
http://www.curehunter.com/public/dictionary.do - cached - mail it - history
Excellent and spot-on presentation and video on work burnout
by otis 2009-11-04 13:32 work · presentation · video · personal
http://www.jonobacon.org/2009/07/29/burnout-presentation-slides/ - cached - mail it - history
Ubuntu Enterprise Cloud is the product, powered by Eucalyptus, that allows you to easily run your own Amazon-EC2-like private cloud n.b. there are posts/parts 2 and 3 to topic
by otis 2009-11-04 12:32 ubuntu · linux · AWS · ec2 · s3 · cloud computing · software · server · cluster
http://fnords.wordpress.com/2009/10/04/run-your-own-uec-part-1/ - cached - mail it - history
a modular data exploration platform that enables the user to visually create data flows (often referred to as pipelines), selectively execute some or all analysis steps, and later investigate the results through interactive views on data and models.
by otis 2009-11-04 00:02 data mining · gui · workflow · analysis · visual
http://www.knime.org/ - cached - mail it - history
PROBABILITIES, STATISTICS AND DATA MODELING
by otis 2009-11-03 16:27 statistics · probability · math · matrix · tutorial · reference · ebook · pdf · NLP
http://www.aiaccess.net/English/Glossaries/Shop/bookstore.htm - cached - mail it - history
by otis 2009-11-02 13:59 software · information retrieval · NLP · perl · corpus · text mining · dataset
http://www.drni.de/wac-tk/ - cached - mail it - history
C++ (but has Java API), GPL
by otis 2009-11-02 13:51 information retrieval · NLP · software · library · api
http://www.lsi.upc.edu/~nlp/freeling/ - cached - mail it - history
by otis 2009-11-02 13:33 NLP · information retrieval · computational linguistics · java · software · api · library
http://herd.ida.liu.se:8180/nlpfarm/ - cached - mail it - history
log4j appender for Scribe
by otis 2009-10-31 01:31 scribe · logging · java · software
http://github.com/alexlod/scribe-log4j-appender - cached - mail it - history
MaltParser is a system for data-driven dependency parsing, which can be used to induce a parsing model from treebank data and to parse new data using an induced model.
by otis 2009-10-29 16:20 machine learning · parse · computational linguistics · NLP · java · software · api · library
http://maltparser.org/ - cached - mail it - history
Platform where anyone can share and mash open data on any subject
by otis 2009-10-29 12:24 data · information retrieval · NLP · machine learning · mashup
http://www.factual.com/ - cached - mail it - history
by otis 2009-10-29 12:22 data · visual · statistics · graph
http://flowingdata.com/ - cached - mail it - history
Common English misspellings from Wikipedia 4107 misspellings as of 2009-10-29
by otis 2009-10-29 12:20 wikipedia · spell · english · language · search · information retrieval · NLP
http://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines - cached - mail it - history
by otis 2009-10-29 12:06 ajax · solr · javascript · search · widget
http://evolvingweb.github.com/ajax-solr/ - cached - mail it - history
by otis 2009-10-27 15:29 bdb · howto · tutorial · reference · database · java · library
http://www.oracle.com/technology/documentation/berkeley-db/je/GettingStartedGuide/backuprestore.html - cached - mail it - history
Wunder's progressive reranking explanation
by otis 2009-10-22 12:41 search · information retrieval · rank · score
http://wunderwood.org/most_casual_observer/2007/04/progressive_reranking.html - cached - mail it - history
Sen is the first opensource morphological analyzer written in pure Java.
by otis 2009-10-16 23:39 japanese · morphology · analysis · lucene · search · index · information retrieval · NLP · library
https://sen.dev.java.net/ - cached - mail it - history
Introduction to Statistical Thought
by otis 2009-10-12 22:43 statistics · book · pdf · download · free · textbook · reference · math
http://www.math.umass.edu/~lavine/Book/book.html - cached - mail it - history
Highly available NameNode setup with DRBD
by otis 2009-10-08 12:47 hadoop · DRBD · high availability · cloudera
http://www.cloudera.com/blog/2009/07/22/hadoop-ha-configuration/ - cached - mail it - history
Easily and reliably automate tasks that used to require login after login and a small army of custom shell scripts.
by otis 2009-10-07 23:14 deploy · remote · server
http://www.capify.org/index.php/Capistrano - cached - mail it - history
BETWEEN BIOACOUSTICS AND MUSIC - cicadas, monkeys, insects
by otis 2009-10-07 22:57 sound · audio · animal
http://www2.arnes.si/~ljprirodm3/okvir.html - cached - mail it - history
by otis 2009-10-07 21:55 NLP · sentiment · reference
http://www.cs.uic.edu/~liub/FBS/NLP-handbook-sentiment-analysis.pdf - cached - mail it - history
OPUS is an attempt to collect translated texts from the web, to convert and align the entire collection, to add linguistic annotation, and to provide the community with a publicly available parallel corpus. OPUS is based on open source products and is also delivered as an open source package.
by otis 2009-10-07 10:54 corpus · information retrieval · NLP
http://urd.let.rug.nl/tiedeman/OPUS/ - cached - mail it - history
by otis 2009-10-05 11:24 statistics · reference · world · number · realtime
http://www.worldometers.info/ - cached - mail it - history
Used by Cloudera Desktop
by otis 2009-10-03 23:00 javascript · ajax · framework
http://mootools.net/ - cached - mail it - history
by otis 2009-10-02 17:31 hadoop
http://nexr.co.kr/products/ - cached - mail it - history
Repository of MapReduce and other Hadoop programs / code
by otis 2009-10-02 17:30 hadoop
http://www.hadoopsource.com/ - cached - mail it - history
Hvar weather forecast
by otis 2009-09-26 22:31 hvar · croatia
http://www.hvarinfo.com/hr/prognoza-vremena/ - cached - mail it - history
by otis 2009-09-15 17:23 css · tricks · howto · tutorial · html · webdev
http://net.tutsplus.com/tutorials/html-css-techniques/11-classic-css-techniques-made-simple-with-css3/ - cached - mail it - history
by otis 2009-09-15 12:37 voldemort · hadoop
http://project-voldemort.com/blog/2009/06/building-a-1-tb-data-cycle-at-linkedin-with-hadoop-and-project-voldemort/ - cached - mail it - history
Text2Onto is the official successor of TextToOnto, a framework for ontology learning from text.
by otis 2009-09-12 22:36 ontology · corpus · NLP · semantic
http://ontoware.org/projects/text2onto/ - cached - mail it - history
A collection of extremely large matrix decomposition algorithm implementations, in Java.
by otis 2009-09-11 16:20 matrix · library · api · java · software
http://code.google.com/p/decomposer/ - cached - mail it - history
by otis 2009-09-07 08:11 school · Zagreb · croatia
http://www.waldorfska-skola-zg.skole.hr/ - cached - mail it - history
by otis 2009-09-07 08:10 school · Zagreb · croatia
http://www.montessori-skola.hr/ - cached - mail it - history
by otis 2009-09-07 08:08 Zagreb · croatia · rent · apartment
http://www.rentinzagreb.com/ - cached - mail it - history
OpenGrok is a fast and usable source code search and cross reference engine. It helps you search, cross-reference and navigate your source tree. It can understand various program file formats and version control histories like Mercurial, Git, SCCS, RCS, CVS, Subversion, Teamware, ClearCase, Perforce and Bazaar. In other words it lets you grok (profoundly understand) the open source, hence the name OpenGrok. It is written in Java.
by otis 2009-08-26 03:45 code · source code · search · browse · reference · java · subversion
http://www.opensolaris.org/os/project/opengrok/ - cached - mail it - history
A simple, asynchronous, single-threaded memcached client written in java.
by otis 2009-08-23 04:31 java · memcached · client
http://code.google.com/p/spymemcached/ - cached - mail it - history
by otis 2009-08-14 23:40 hadoop · port · cheatsheet · reference
http://www.cloudera.com/blog/2009/08/14/hadoop-default-ports-quick-reference/ - cached - mail it - history
by otis 2009-08-12 23:47 asf · apache · maven · repository · jar · java
https://repository.apache.org/index.html - cached - mail it - history
Zohmg is a data store for aggregation of multi-dimensional time series data, built on top of Hadoop, Dumbo and HBase. Presentation: http://static.last.fm/johan/huguk-20090414/fredrik-hypercubes-in-hbase.pdf
by otis 2009-08-12 22:54 hbase · analytics · distributed computing · distributed filesystem · report
http://github.com/zohmg/zohmg/tree/master - cached - mail it - history
by otis 2009-08-12 22:24 hbase · katta · solr · social media · shard · search · scalability
http://www.slideshare.net/lusciouspear/building-a-business-on-hadoop-hbase-and-open-source-distributed-computing?src=rel... - cached - mail it - history
Galago is a toolkit for experimenting with text search. It is based on small, pluggable components that are easy to replace and change, both during indexing and during retrieval. It includes TupleFlow, which is a distributed computation framework like MapReduce or Dryad. TupleFlow manages the difficult parts of processing text: serializing data, sorting it, and distributing processing. The IndexReader and IndexWriter classes manage storing key/value pairs like inverted lists. This makes it possible to make your own kinds of index structures without starting from scratch.
by otis 2009-08-12 16:01 java · software · search · library · information retrieval · distributed computing
http://www.galagosearch.org/ - cached - mail it - history
Ivory is a Hadoop toolkit for Web-scale information retrieval research that features a retrieval engine based on Markov Random Fields
by otis 2009-08-12 15:56 hadoop · MapReduce · information retrieval · search
http://www.umiacs.umd.edu/~jimmylin/ivory/docs/index.html - cached - mail it - history
by otis 2009-08-10 14:59 java · pattern · design · software · builder · object
http://forums.amd.com/devblog/blogpost.cfm?threadid=108340&catid=313 - cached - mail it - history
by otis 2009-08-10 13:32 search · saas · lucene · AWS · ec2 · amazon
http://www.searchblox.com/searchbloxami.html - cached - mail it - history
by otis 2009-08-10 12:52 hadoop · hbase · video · tutorial · lecture · distributed computing
http://huguk.org/2009/04/huguk-2-wrap-up.html - cached - mail it - history
by otis 2009-08-10 12:44 hadoop · video · log analysis · distributed computing · pig · tutorial · cloudera
http://www.cloudera.com/blog/2009/06/17/analyzing-apache-logs-with-pig/ - cached - mail it - history
by otis 2009-08-10 12:40 hbase · hadoop · video · train · tutorial · cloud computing
http://skillsmatter.com/podcast/cloud-grid/apache-hbase - cached - mail it - history
by otis 2009-08-10 12:40 hadoop · video · train · tutorial · cloud computing
http://www.vimeo.com/4211288 - cached - mail it - history
by otis 2009-08-10 12:40 hadoop · video · train · tutorial · cloud computing · cloudera
http://www.cloudera.com/hadoop-training-hive-introduction - cached - mail it - history
by otis 2009-08-10 12:39 hadoop · video · train · tutorial · cloud computing · cloudera
http://www.cloudera.com/hadoop-training-mapreduce-hdfs - cached - mail it - history
by otis 2009-08-07 23:19 project management · collaboration · howto · guide
http://lordsauron.wordpress.com/2008/06/18/zero-to-redmine-in-22-steps/ - cached - mail it - history
by otis 2009-08-07 23:19 project management · collaboration
http://bitnami.org/stack/redmine - cached - mail it - history
by otis 2009-08-06 12:56 recommendation engine · software
http://www.directededge.com/ - cached - mail it - history
Scarab Research is your cutting-edge partner for delivering high quality personalized recommendations to your website visitors
by otis 2009-08-06 12:55 recommendation engine · software · saas · service
http://www.scarabresearch.com/ - cached - mail it - history
by otis 2009-08-06 12:42 collaboration
http://www.teamapart.com/ - cached - mail it - history
KAKASI is the language processing filter to convert Kanji characters to Hiragana, Katakana or Romaji(1) and may be helpful to read Japanese documents.
by otis 2009-08-04 22:42 japanese · language · convert · kanji · katakana · romaji
http://kakasi.namazu.org/ - cached - mail it - history
How it is implemented: http://www.cloudera.com/blog/2009/07/31/tracking-trends-with-hadoop-and-hive-on-ec2
by otis 2009-08-01 00:58 wikipedia · statistics · traffic · popularity · trend
http://www.trendingtopics.org/ - cached - mail it - history
by otis 2009-08-01 00:55 wikipedia · statistics · traffic · popularity
http://stats.grok.se/ - cached - mail it - history
Zemberek is an open source, platform independent, general purpose Natural Language Processing library and toolset designed for Turkic languages, especially Turkish. Zemberek is officially used as spell checker in Open Office Turkish version and Turkish national Linux Distribution Pardus. Google Code will host Zemberek-2, Zemberek Corpus and Wordnet projects. These projects has Mozilla Public License.
by otis 2009-07-24 09:41 turkish · language · analysis · search · tokenizer · stemming · NLP · library
http://code.google.com/p/zemberek/ - cached - mail it - history
Zohmg is a data store for aggregation of multi-dimensional time series data. It is built on top of Hadoop, Dumbo and HBase. The core idea is to pre-compute aggregates and store them in a read-efficient manner – Zohmg is wasteful with storage in order to answer queries faster.
by otis 2009-07-22 00:13 hadoop · hbase · time · aggregator · analysis · software
http://github.com/zohmg/zohmg/blob/master/README - cached - mail it - history
HadoopDB An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads.
by otis 2009-07-21 17:07 hadoop · database · postgresql · parallel computing · distributed computing
http://db.cs.yale.edu/hadoopdb/hadoopdb.html - cached - mail it - history
Parallel BASH is a modified version of BASH intended for text processing on computer clusters. It enables use of common UNIX text processing tools (e.g., awk, perl, grep) across multicore or distributed systems. It is particularly suited for scalable processing of large (multi-GB or larger) files.
by otis 2009-07-21 16:50 bash · shell · script · MapReduce · hdfs · parallel computing · distributed computing
http://code.google.com/p/parbash/ - cached - mail it - history
by otis 2009-07-20 15:17 MapReduce · bash · parallel computing · distributed computing · shell · haveliwala
http://www.linux-mag.com/cache/7407/1.html - cached - mail it - history
by otis 2009-07-18 10:00 chinese · english · dictionary · translation · convert · character
http://reganmian.net/blog/2009/02/16/release-early-release-often-english-chinese-dictionary-based-on-wikipedia/ - cached - mail it - history
1 - 100 of 1137 next »