Hadoop – Introducing Apache Hadoop (Stanford lecture)

Introducing Apache Hadoop: The Modern Data Operating System

What Is Apache Hadoop?

The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing.
It supports the running of applications on large clusters of commodity hardware. Hadoop was derived from Google’s MapReduce and Google File System (GFS) papers.

Limitations of Existing Data Analytics Architecure

big_data_stanford001.jpg

Flexibility: Complex Data Processsing (Java Mapreduce, Streaming MapReduce, Crunch, Pig Latin, Hive, Oozie)

big_data_stanford003.jpg

Scalability: Scalable Software Development

big_data_stanford004.jpg

MapReduce: Computational Framework

big_data_stanford007.jpg

CDH: Cloudera’s Distribution (Built upon Apache Hadoop)

big_data_stanford012.jpg

Books: Hadoop and HBase

big_data_stanford013.jpg

ref: http://www.youtube.com/watch?v=d2xeNpfzsYI&list=PLXSSyz98b4Gw_l63SaDuxMmR3dUBjNUVW

http://hadoop.apache.org/docs/stable/streaming.html

http://en.wikipedia.org/wiki/Apache_Hadoop

http://hadoop.apache.org/

This entry was posted in Servers, Uncategorized, Web. Bookmark the permalink.

Comments are closed.