Adding Mathjax to your (SBT-)Microsite

I'm obsessed with sbt-microsites. Sbt-microsites is a fantastic plugin for SBT (the Scala Build Tool) that makes it easy to generate a beautiful sidecar site for your software project, full of code checked by your CI! I recently built a microsite for ScalaRL, my in-progress functional Reinforcement Learning library, and »

Moving to Spacemacs for Scala and Python

I've just finished retooling my development environment, and the process was annoying enough that I thought I'd write it up here, for myself in the future, and for you in the present. tl;dr; I ended up porting my old Emacs config, based on the literate emacs24-starter-kit, over to Spacemacs, »

Cascalog + Hadoop Counters, Finally!

I've just merged a Cascalog pull request of mine that gives Cascalog operations access to the statistics that Cascading generates at the end of each job. I've also added global inc! and inc-by! functions that let you increment custom Hadoop counters from within your functions and operations without having to »

Cascalog 2.0 In Depth

Cascalog 2.0 has been out for over a year now, and outside of a post to the mailing list and a talk at Clojure/Conj 2013 (slides here), I've never written up the startingly long list of new features brought by that release. So shameful. This post fixes that. »

API Authentication with Liberator and Friend

I've just finished rewriting a number of PaddleGuru's internal APIs using two great open-source libraries; Liberator and Friend. Liberator is a library for writing RESTful resources in Clojure. Friend is an authorization and authentication library written by the prolific Chas Emerick, Dominator, Esquire. You've certainly seen his stuff around if »

Upcoming Talks in 2013

This is the year I teach myself to become a better public speaker. I've spent the past year coding up a number of powerful Scala and Clojure projects, all the while avoiding the important and difficult work of teaching and writing about the import and use of those projects. Well, »

Cascalog Testing 2.0

A few months ago I announced Midje-Cascalog, my layer of Midje testing macros over the Cascalog MapReduce DSL. These allow you to write tests for your Cascalog jobs in a style that mimics Cascalog's own query execution syntax. In this post I discuss midje-cascalog's 0.4.0 release, which brings »

Introducing Cascalog-Contrib

I've had the pleasure of working with Cascalog for about ten months now, and have seen the community produce some fantastic work. A number of businesses are using Cascalog in production; I use Cascalog at Twitter every day to write MapReduce queries for the new Twitter Web Analytics product. One »

Testing Cascalog with Midje

I've been working on a Cascalog testing suite these past few weeks, an extension to Brian Marick's Midje, that eases much of the pain of testing MapReduce workflows. I think a lot of the dull work we see in the Hadoop community is a direct result of fear. Without proper »

Getting Creative with MapReduce

One problem with many existing MapReduce abstraction layers is the utter difficulty of testing queries and workflows. End-to-end tests are maddening to craft in vanilla Hadoop and frustrating at best in Pig and Hive. The difficulty of testing MapReduce workflows makes it scary to change code, and destroys your desire »

Cascalog 1.8.1 Released

Nathan Marz and I are releasing Cascalog 1.8.1 today! We've added a few interesting features, and I thought I'd provide a bit more detail here for anyone interested. Cross Join cascalog.api now includes support for cross-joins; just add (cross-join) to your query as its own predicate. Think »

Haskell in Emacs

I spent some time today getting my emacs config set up to learn Haskell, and ran into a few issues; I figured I'd go ahead and document the process here for everyone's enjoyment. We're going to install and configure Haskell mode, then add a few extensions that'll make learning Haskell »

Simple Hadoop Clusters

I'm excited to announce Pallet-Hadoop, a configuration library written in Clojure for Apache's Hadoop. In the tutorial, we're going to see how to create a three node Hadoop cluster on EC2, and run a word count on MapReduce. We'll be following along with Pallet-Hadoop example project for the introduction; for »

No more posts