Simple Hadoop Clusters
I’m excited to announce Pallet-Hadoop, a configuration library written in Clojure for Apache’s Hadoop. In the tutorial, we’re going to see how to create a three node Hadoop cluster on EC2, and run a word count on MapReduce. We’ll be following along with Pallet-Hadoop example project for the introduction; for a more in-depth discussion of the design of pallet-hadoop, see the project wiki. Background Hadoop is an Apache java framework that allows for distributed processing of enormous datasets across large clusters. It combines a computation engine based on MapReduce with HDFS, a distributed filesystem based on the Google File System. ...