productivity layer over Hadoop that aims to simplify the development of complex distributed processing routines on large data sets. Cascading is for those processing volumes of log or event data, building indexes of unstructured data from web crawls or internal content, applying data-mining or machine learning techniques, and subsequently suffer either from more data than CPU capacity, or general complexity with managing the workflow of the data processing routines and artifacts.
by
otis
2008-01-27 02:08
hadoop
·
MapReduce
·
distributed computing
·
job
·
manage