In continuation to the previous post (Hadoop Architecture-Hadoop Distributed File System), Hadoop cluster is made up of the following main nodes:-
1.Name Node
2.Data Node
3.Job Tracker
4.Task Tracker
The above depicted is the logical architecture of Hadoop Nodes. But physically data node and task tracker could be placed on single physical machine as per below shown diagram.
There are few other secondary nodes name as secondary name node, backup node and checkpoint node. This above diagram shows some of the communication paths between the different types of nodes in the Hadoop cluster. A client is shown as communicating with a JobTracker as well as with the NameNode and with any DataNode. There is only one NameNode in the cluster but one can plan for the redundant name node in the cluster but manually it has to be switched on. While the data file is stored in blocks at the data nodes, the metadata for a file is stored at the NameNode. If there is one node in the cluster to spend money on the best enterprise hardware for maximum reliability it is the NameNode. The NameNode should also have as much RAM as possible because it keeps the entire filesystem metadata in memory and data nodes could be used as commodity hardware.