Cascalog Testing 2.0

A few months ago I announced Midje-Cascalog [http://sritchie.github.com/2011/09/30/testing-cascalog-with-midje.html], my layer of Midje testing macros over the Cascalog MapReduce DSL. These allow you to write tests for your Cascalog jobs in a style that mimics Cascalog's own query execution syntax. In »

Introducing Cascalog-Contrib

I've had the pleasure of working with Cascalog [https://github.com/nathanmarz/cascalog] for about ten months now, and have seen the community produce some fantastic work. A number of businesses [https://www.assembla.com/spaces/cascalog/wiki/Who's_using_Cascalog] are using Cascalog in production; »

Getting Creative with MapReduce

One problem with many existing MapReduce abstraction layers is the utter difficulty of testing queries and workflows. End-to-end tests are maddening to craft in vanilla Hadoop and frustrating at best in Pig and Hive. The difficulty of testing MapReduce workflows makes it scary to change code, and destroys your desire »

Cascalog 1.8.1 Released

Nathan Marz [http://nathanmarz.com/] and I are releasing Cascalog 1.8.1 today! We've added a few interesting features, and I thought I'd provide a bit more detail here for anyone interested. Cross Join cascalog.api now includes support for cross-joins [http://en.wikipedia.org/ »

Haskell in Emacs

I spent some time today getting my emacs config set up to learn Haskell, and ran into a few issues; I figured I'd go ahead and document the process here for everyone's enjoyment. We're going to install and configure Haskell mode, then add a »

Simple Hadoop Clusters

I'm excited to announce Pallet-Hadoop [https://github.com/pallet/pallet-hadoop], a configuration library written in Clojure for Apache's Hadoop [http://hadoop.apache.org/]. In the tutorial, we're going to see how to create a three node Hadoop cluster on EC2, and run a word »

No more posts