Introducing Apache Hadoop: The Modern Data Operating System
What Is Apache Hadoop?
The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing.
It supports the running of applications on large clusters of commodity hardware. Hadoop was derived from Google’s MapReduce and Google File System (GFS) papers.
Limitations of Existing Data Analytics Architecure
Flexibility: Complex Data Processsing (Java Mapreduce, Streaming MapReduce, Crunch, Pig Latin, Hive, Oozie)
Scalability: Scalable Software Development
MapReduce: Computational Framework
CDH: Cloudera’s Distribution (Built upon Apache Hadoop)
Books: Hadoop and HBase