Posts

Showing posts with the label hadoop developer

What is Greenplum HD ?

Image
Greenplum HD is enterprise-ready Apache Hadoop from EMC that allows users to write distributed processing applications for large data sets across a cluster of commodity servers using a simple programming model. This framework automatically parallelizes Map Reduce jobs to handle data at scale, thereby eliminating the need for developers to write  scalable and parallel algorithms.  Greenplum HD is an open source Apache stack and includes the following components: Hadoop Distributed File System (HDFS): File system that distributes files  across the cluster.  MapReduce: Framework for writing scalable data applications.  Pig: Procedural language that abstracts lower level MapReduce.  Hive: Data warehouse infrastructure built on top of Hadoop.  HBase: Database for random, real time read/write access.  Mahout: Scalable machine learning and data mining library.  ZooKeeper: Hadoop centralized servi...