How Hadoop Works

>> YOUR LINK HERE: ___ http://youtube.com/watch?v=NeqC6t1J1dw

How Hadoop Works? (2018) • In this video you would learn, How Hadoop Works, the Architecture of Hadoop, Core Components of Hadoop, What is NameNode, DataNode, Secondary NameNode, JobTracker and TaskTracker. • In this session let us try to understand, how Hadoop works ? • The Hadoop framework comprises of the Hadoop Distributed File System and the MapReduce framework. • Let us try to understand, how the data is managed and processed by the Hadoop framework? • The Hadoop framework, divides the data into smaller chunks and stores each part of the data on a separate node within the cluster. • Let us say we have around 4 terabytes of data and a 4 node Hadoop cluster. • The HDFS would divide this data into 4 parts of 1 terabyte each. • By doing this, the time taken to store this data onto the disk is significantly reduced. • The total time taken to store this entire data onto the disk is equal to storing 1 part of the data, as it will store all the parts of the data simultaneously on different machines. • In order to provide high availability what Hadoop does is, it would replicate each part of the data onto other machines that are present within the cluster. • The number of copies it will replicate depends on the Replication Factor . • By default the replication factor is set to 3. • If we consider, the default replication factor is set, then there will be 3 copies for each part of the data on 3 different machines. • In order to reduce the bandwidth and latency time, it would store 2 copies of the same part of the data, on the nodes that are present within the same rack, and the last copy would be stored on a node, that is present on a different rack. • Let's say Node 1 and Node 2 are on Rack 1 and Node 3 Node 4 are on Rack 2. • Then the 1st 2 copies of part 1 will be stored, on Node 1 and Node 2, and the 3rd copy of part 1, will be stored, either on Node 3 or Node 4. • The similar process is followed, for storing remaining parts of the data. • Since this data is distributed across the cluster, the HDFS takes care of networking required by these nodes to communicate. • Another advantage of distributing this data across the cluster is that, while processing this data, it reduces lot of time, as this data can be processed simultaneously. • This was an overview of, how Hadoop works, we would learning, how data is written, or read from the Hadoop cluster, in the later sessions. • Enroll into this course at a deep discounted price: https://goo.gl/HsbEC8 • Please don't forget to subscribe to our channel. • / itskillsindemand • If like this video, please like and share it. • Visit http://www.itskillsindemand.com to access the complete course. • Follow Us On • Facebook: / itskillsindemand • Twitter: / itskillsdemand • Google+: https://plus.google.com/+Itskillsinde... • YouTube: / itskillsindemand

#############################

New on site