links · people · groups · tags | My: links · tags · groups · watchlists · notes login · sign up now! | help · blog
Simpy simpy
 
Search Everyone: "hadoop",

Top "hadoop" experts: otis, rubenluengas,

1 - 50 of 52 next »   Watch otis
 
The Hadoop Online Prototype (HOP) is a modified version of Hadoop MapReduce that allows data to be pipelined between tasks and between jobs. This can enable better cluster utilization and increased parallelism, and allows new functionality: online aggregation (approximate answers as a job runs), and stream processing (MapReduce jobs that run continuously, processing new data as it arrives).
by otis 2009-12-23 14:56 hadoop · MapReduce · pipeline · java · job
http://code.google.com/p/hop/ - cached - mail it - history
by otis 2009-12-19 23:52 hadoop · MapReduce · cloudera · reference · debug
http://www.cloudera.com/blog/2009/12/17/7-tips-for-improving-mapreduce-performance/ - cached - mail it - history
by otis 2009-12-09 13:22 hadoop · hdfs · hbase · python · json · data warehouse · storage · cube
http://github.com/zohmg/zohmg - cached - mail it - history
Behemoth allows to deploy GATE or UIMA applications over a Hadoop cluster in order to do very large scale document analysis. It uses a simple representation format which can be used as a common ground between UIMA and GATE-generated annotations, hence achieving compatibility between both systems. Since it is Hadoop-based it benefits from all its features, namely scalability, fault-tolerance and most notably the back up of a thriving open source community. Quite a few Apache resources will fit into it: Nutch, Tika, Mahout, Hbase etc...
by otis 2009-12-01 22:35 UIMA · gate · hadoop · text mining · text analysis · MapReduce · distributed computing · NLP
http://code.google.com/p/behemoth-pebble/ - cached - mail it - history
by otis 2009-12-01 09:52 hadoop · gui · browser · script
http://dev.bizo.com/2009/11/quick-script-open-hadoop-jobtracker-ui.html - cached - mail it - history
by otis 2009-11-18 12:26 hbase · architecture · java · hadoop
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html - cached - mail it - history
by otis 2009-11-17 15:47 hadoop · hbase · rdbms · sql · presentation · powerpoint · compare · database
http://www.docstoc.com/docs/2996433/Hadoop-and-HBase-vs-RDBMS - cached - mail it - history
Highly available NameNode setup with DRBD
by otis 2009-10-08 12:47 hadoop · DRBD · high availability · cloudera
http://www.cloudera.com/blog/2009/07/22/hadoop-ha-configuration/ - cached - mail it - history
by otis 2009-10-02 17:31 hadoop
http://nexr.co.kr/products/ - cached - mail it - history
Repository of MapReduce and other Hadoop programs / code
by otis 2009-10-02 17:30 hadoop
http://www.hadoopsource.com/ - cached - mail it - history
by otis 2009-09-15 12:37 voldemort · hadoop
http://project-voldemort.com/blog/2009/06/building-a-1-tb-data-cycle-at-linkedin-with-hadoop-and-project-voldemort/ - cached - mail it - history
by otis 2009-08-14 23:40 hadoop · port · cheatsheet · reference
http://www.cloudera.com/blog/2009/08/14/hadoop-default-ports-quick-reference/ - cached - mail it - history
Ivory is a Hadoop toolkit for Web-scale information retrieval research that features a retrieval engine based on Markov Random Fields
by otis 2009-08-12 15:56 hadoop · MapReduce · information retrieval · search
http://www.umiacs.umd.edu/~jimmylin/ivory/docs/index.html - cached - mail it - history
by otis 2009-08-10 12:52 hadoop · hbase · video · tutorial · lecture · distributed computing
http://huguk.org/2009/04/huguk-2-wrap-up.html - cached - mail it - history
by otis 2009-08-10 12:44 hadoop · video · log analysis · distributed computing · pig · tutorial · cloudera
http://www.cloudera.com/blog/2009/06/17/analyzing-apache-logs-with-pig/ - cached - mail it - history
by otis 2009-08-10 12:40 hbase · hadoop · video · train · tutorial · cloud computing
http://skillsmatter.com/podcast/cloud-grid/apache-hbase - cached - mail it - history
by otis 2009-08-10 12:40 hadoop · video · train · tutorial · cloud computing
http://www.vimeo.com/4211288 - cached - mail it - history
by otis 2009-08-10 12:40 hadoop · video · train · tutorial · cloud computing · cloudera
http://www.cloudera.com/hadoop-training-hive-introduction - cached - mail it - history
by otis 2009-08-10 12:39 hadoop · video · train · tutorial · cloud computing · cloudera
http://www.cloudera.com/hadoop-training-mapreduce-hdfs - cached - mail it - history
HadoopDB An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads.
by otis 2009-07-21 17:07 hadoop · database · postgresql · parallel computing · distributed computing
http://db.cs.yale.edu/hadoopdb/hadoopdb.html - cached - mail it - history
by otis 2009-06-02 13:00 hadoop · hdfs · tips · distributed computing · distributed filesystem
http://www.cloudera.com/blog/2009/05/18/10-mapreduce-tips/ - cached - mail it - history
by otis 2009-06-01 13:21 hadoop · MapReduce · ruby · toolkit · stream
http://open.blogs.nytimes.com/2009/05/11/announcing-the-mapreduce-toolkit/ - cached - mail it - history
by otis 2009-06-01 12:43 hadoop · MapReduce · video · tutorial
http://www.cloudera.com/hadoop-training-thinking-at-scale - cached - mail it - history
by otis 2009-06-01 12:18 hadoop · pig · screencast · video · tutorial · distributed computing · analytics
http://www.cloudera.com/hadoop-training-pig-tutorial - cached - mail it - history
by otis 2009-06-01 11:40 hadoop · eclipse · video · screencast · tutorial
http://www.cloudera.com/blog/2009/04/20/configuring-eclipse-for-hadoop-development-a-screencast/ - cached - mail it - history
java package that integrates the R environment with Hadoop's MapReduce
by otis 2009-05-05 16:35 hadoop · distributed computing · statistics · probability · software · MapReduce
http://ml.stat.purdue.edu/rhipe/ - cached - mail it - history
VERY much like Apache Mahout - same goals, it seems
by otis 2009-04-22 10:57 machine learning · hadoop · library
http://code.google.com/p/redpoll/ - cached - mail it - history
by otis 2009-04-18 23:11 hadoop · database · java · software
http://www.cloudera.com/blog/2009/03/06/database-access-with-hadoop/ - cached - mail it - history
'hamake' utility allows you to automate incremental processing of datasets stored on HDFS using Hadoop tasks written in Java or using PigLatin scripts. Datasets could be either individual files or directories containing groups of files. New files may be added (or removed) at arbitrary location which may trigger recalculation of data depending on them.
by otis 2009-04-15 15:14 hadoop · hdfs · file · process · software
http://code.google.com/p/hamake/ - cached - mail it - history
Graphical user interface client for Hadoop
by otis 2009-04-03 17:56 hadoop · hdfs · MapReduce · ui · gui
http://code.google.com/p/hadoop-ui/ - cached - mail it - history
Neptune is Distributed Large scale Structured Data Storage, and open source project implementing Google's Bigtable. Hbase-like.
by otis 2009-01-07 17:26 bigtable · distributed filesystem · hadoop · storage · scalability
http://www.openneptune.com/ - cached - mail it - history
by otis 2008-12-29 12:30 hadoop · hdfs · distributed computing · cluster
http://wiki.smartfrog.org/wiki/display/sf/Patterns+of+Hadoop+Deployment - cached - mail it - history
by otis 2008-12-12 14:47 hadoop · cluster · distributed computing · hdfs · s3 · tutorial · guide · reference
http://www.umiacs.umd.edu/~jimmylin/cloud9/umd-hadoop-dist/cloud9-docs/ - cached - mail it - history
by otis 2008-12-11 22:20 hadoop · tutorial · hdfs · java · MapReduce · distributed computing · distributed filesystem
http://public.yahoo.com/gogate/hadoop-tutorial/start-tutorial.html - cached - mail it - history
by otis 2008-11-07 16:49 java · hbase · hadoop · framework · orm · index
http://www.pigi-project.org/ - cached - mail it - history
by otis 2008-10-31 23:29 hadoop · video · presentation
http://blog.rapleaf.com/dev/?p=35 - cached - mail it - history
SmartFrog is a technology for describing distributed software systems as collections of cooperating components, and then activating and managing them. The core SmartFrog framework is released under LGPL.
by otis 2008-10-29 22:16 java · distributed computing · manage · hadoop · cluster
http://smartfrog.org/ - cached - mail it - history
by otis 2008-09-29 12:42 hadoop · python · distributed computing · MapReduce
http://code.google.com/p/happy/ - cached - mail it - history
Goodbye MapReduce, Hello Cascading
by otis 2008-09-22 13:26 MapReduce · hadoop · distributed computing
http://blog.rapleaf.com/dev/?p=33 - cached - mail it - history
See: http://wiki.apache.org/hadoop/DistributedLucene
by otis 2008-06-09 23:18 distributed indexer · distributed search · lucene · hadoop · MapReduce · java
http://www.hpl.hp.com/techreports/2008/HPL-2008-64.html - cached - mail it - history
by otis 2008-05-20 13:18 hadoop · MapReduce · algorithm · cluster · rank · graph
http://code.google.com/p/ceteri-mapred/ - cached - mail it - history
by otis 2008-04-29 22:34 lucene · index · search · shard · grid · distributed search · hadoop · java · master · slave
http://katta.wiki.sourceforge.net/ - cached - mail it - history
by otis 2008-02-28 00:21 hadoop · distributed computing · cluster · tutorial
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29 - cached - mail it - history
by otis 2008-02-28 00:17 hadoop · distributed computing · tutorial
http://wiki.apache.org/hadoop/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29 - cached - mail it - history
MapReduce library for Hadoop designed to serve as both a teaching tool and a repository for code that may be broadly useful for a variety research problems in human language technology (information retrieval, natural language processing, etc.)
by otis 2008-02-27 17:57 MapReduce · hadoop · distributed computing · library
http://umiacs.umd.edu/~jimmylin/cloud9/umd-hadoop-dist/cloud9-docs/ - cached - mail it - history
productivity layer over Hadoop that aims to simplify the development of complex distributed processing routines on large data sets. Cascading is for those processing volumes of log or event data, building indexes of unstructured data from web crawls or internal content, applying data-mining or machine learning techniques, and subsequently suffer either from more data than CPU capacity, or general complexity with managing the workflow of the data processing routines and artifacts.
by otis 2008-01-27 02:08 hadoop · MapReduce · distributed computing · job · manage
http://www.cascading.org/ - cached - mail it - history
Google's News Personalization System Clone Project.
by otis 2008-01-14 00:15 hadoop · personalization
http://wiki.apache.org/lucene-hadoop/NewsPersonalizationSystem - cached - mail it - history
by otis 2007-07-22 20:27 java · amazon · ec2 · s3 · hadoop · howto · article · Tom White
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873&categoryID=112 - cached - mail it - history
1 - 50 of 52 next »  
Related Tags
 
- exclude ~ optional + require
Add Dates