Too Long; Didn't Read
Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. To take care of these issues, we need to move towards splitting calculations into a few machines. The objective is to store data, utilizing as many resources of each machine as possible at the same time. We can simply divide all input records by 4 (number of our nodes) so each machine would accept ¼ of input files.
Share Your Thoughts