How to Publish CLJSJS Jars to Clojars

I’ve been doing a lot of work in Clojurescript lately, and the time finally came to pull in my first vanilla Javascript dependency. The default way to do this seems to be the CLJSJS project. CLJSJS publishes many Javascript packages in a form that you can consume from a Clojure project. For projects like React, you’ll find the latest versions of the JS libraries, packaged up and ready to go. For less active libraries like bignumber.js you might have to go bump a version and open up a pull request against CLJSJS’s packages repository… or maybe package and add the library from scratch, as I did recently with Complex.js. ...

July 28, 2020 · 4 min

Cascalog + Hadoop Counters, Finally!

I’ve just merged a Cascalog pull request of mine that gives Cascalog operations access to the statistics that Cascading generates at the end of each job. I’ve also added global inc! and inc-by! functions that let you increment custom Hadoop counters from within your functions and operations without having to deal with all that prepfn nastiness we introduced in Cascalog 2.0. Here’s a link to the code. If you want to follow along, or just want to get the hell away from this blog and start playing with the code now, get yourself a copy of the new snapshot: ...

February 21, 2015 · 3 min

Cascalog 2.0 In Depth

Cascalog 2.0 has been out for over a year now, and outside of a post to the mailing list and a talk at Clojure/Conj 2013 (slides here), I’ve never written up the startingly long list of new features brought by that release. So shameful. This post fixes that. 2.0 was a big deal. Anonymous functions make it easy to reuse your existing, non Cascalog code. The interop story with vanilla Clojure is much better, which is huge for testing. Finally, users can access the JobConf, Cascading’s counters and other Cascading guts during operations. ...

January 3, 2015 · 10 min

Hardcore Cascalog: Dynamic Queries

A little side note before I get started - pivoting from my last post on ski mountaineering racing to this post on advanced Cascalog patterns has made me realize that I’m a full-fledged connoisseur of the esoteric. I’m embracing it! This is the first in a series of posts on hardcore Cascalog. If you’re stoked, leave me a comment telling me what you want to learn more about and we’ll go from there. ...

January 1, 2015 · 9 min

API Authentication with Liberator and Friend

I’ve just finished rewriting a number of PaddleGuru’s internal APIs using two great open-source libraries; Liberator and Friend. Liberator is a library for writing RESTful resources in Clojure. Friend is an authorization and authentication library written by the prolific Chas Emerick, Dominator, Esquire. You’ve certainly seen his stuff around if you’ve played with Clojure(Script) in any level of detail. Authentication and authorization are both really important in RESTful APIs. These libraries are made for each other, I thought to myself. I’ll just use them together and life will be wonderful. Right? ...

January 18, 2014 · 14 min

Cascalog Testing 2.0

A few months ago I announced Midje-Cascalog, my layer of Midje testing macros over the Cascalog MapReduce DSL. These allow you to write tests for your Cascalog jobs in a style that mimics Cascalog’s own query execution syntax. In this post I discuss midje-cascalog’s 0.4.0 release, which brings tighter Midje integration and a number of new ways to write tests. I’ll start with a refresher on the old syntax before debuting the new. If you’re eager, add the following to your project.clj: ...

January 23, 2012 · 6 min

Introducing Cascalog-Contrib

I’ve had the pleasure of working with Cascalog for about ten months now, and have seen the community produce some fantastic work. A number of businesses are using Cascalog in production; I use Cascalog at Twitter every day to write MapReduce queries for the new Twitter Web Analytics product. One thing Cascalog doesn’t yet have is a community repository for generic queries and operations. To fill this gap we’ve created cascalog-contrib. Cascalog-contrib will be home to any higher-level abstractions over Cascalog that the community is willing to submit. If you have an idea for a module, file a pull request on GitHub or bring it up on the mailing list for discussion. ...

November 16, 2011 · 4 min

Testing Cascalog with Midje

I’ve been working on a Cascalog testing suite these past few weeks, an extension to Brian Marick’s Midje, that eases much of the pain of testing MapReduce workflows. I think a lot of the dull work we see in the Hadoop community is a direct result of fear. Without proper tests, Hadoop developers can’t help but be scared of making changes to production code. When creativity might bring down a workflow, it’s easiest to get it working once and leave it alone. ...

September 30, 2011 · 15 min

Cascalog 1.8.1 Released

Nathan Marz and I are releasing Cascalog 1.8.1 today! We’ve added a few interesting features, and I thought I’d provide a bit more detail here for anyone interested. Cross Join cascalog.api now includes support for cross-joins; just add (cross-join) to your query as its own predicate. Think of a cross-join as a “tuple comprehension”, or cartesian product, with similar results to clojure.core/for; it’s not very efficient, as it forces all tuples through a single reducer (and causes a massive blowup in the number of tuples!). Here’s an example: ...

September 26, 2011 · 3 min