Notable Projects

The following is a list of notable projects I've worked on or are currently engaged with, in and out of code.

SICMUtils

I'm currently developing, along with creator Colin Smith, a Clojure(script) computer algebra system called SICMUtils based on the legendary "scmutils" library of Gerald J Sussman.

Caliban

Caliban is a Docker-based job runner for AI research.

Road to Reality Newsletter

I run a weekly-to-monthly physics and AI newsletter over at https://roadtoreality.substack.com/. Follow along!

Structure and Interpretation of Classical Mechanics Manual

I'm working over at the SICM repository on best-in-class tools for sharing research.

ScalaRL

This is a functional reimplementation of the core of reinforcement learning. Check out www.scalarl.com for more info.

Personal PhD

I'm currently working on leveling up my knowledge of Deep Reinforcement Learning with an eye toward working at one of the world's top AI research labs by the end of 2020. More updates on this goal to come. (Update — it worked!)

Ryca CS-1 Cafe Racer

We're also building a motorcycle. More details to come. We're currently tearing down a Suzuki Savage S40 that I bought with cash from a guy named Senator. I did not verify that this is his real name. Look forward to updates here.

RV-10 Experimental Airplane Build

My wife Jenna and I are building an RV-10! The RV-10 is a 4-seater, 260 horsepower airplane that'll cruise at 200 miles per hour. I've posted a few pictures of the build to Twitter, but haven't done the project justice. I'll be writing about it and updating this section as the Spring proceeds.

PaddleGuru + RaceHub

I spent two years, from November of 2013 to November 2015, building PaddleGuru and RaceHub with Tim and Dave, good friends and teammates from the kayaking days.

Om-Bootstrap

Om-Bootstrap is a ClojureScript library of Bootstrap 3 components built on top of Om. This is my first big client side library! Definitely a change of scenery.

Summingbird

Summingbird is a library that lets you write MapReduce programs that look like native Scala or Java collection transformations. So, while a word-counting aggregation in pure Scala might look like this:

def wordCount(source: Iterable[String], store: MutableMap[String, Long]) =
  source.flatMap { sentence =>
    toWords(sentence).map(_ -> 1L)
  }.foreach { case (k, v) => store.update(k, store.get(k) + v) }

Counting words in Summingbird looks like this:

def wordCount[P <: Platform[P]]
  (source: Producer[P, String], store: P#Store[String, Long]) =
    source.flatMap { sentence =>
      toWords(sentence).map(_ -> 1L)
    }.sumByKey(store)

The logic is exactly the same, and the code is almost the same. The main difference is that you can execute the Summingbird program in "batch mode" (using Scalding), in "realtime mode" (using Storm), or on both Scalding and Storm in a hybrid batch/realtime mode that offers your application very attractive fault-tolerance properties.

Summingbird provides you with the primitives you need to build rock solid production systems.

Storehaus

Storehaus is a Scala library that makes it easy to work with asynchronous key-value stores.

Bijection

A Bijection is a function that can be inverted. Practically, in Scala, Bijections are used to tell the type system about equivalent concepts that may have been defined in different libraries (scala.Int vs java.lang.Integer, for example.) The ability to declare these equivalences is hugely valuable.

Injection, a related trait included in the library, is a function that can sometimes be inverted. (Your item might be able to convert to a byte array, but not all byte arrays can come back, for example.) Injection and Bijection turn out to be wonderful at describing serializations. We use the concept heavily in Summingbird and other distributed systems at Twitter.

Algebird

Algebird is an abstract algebra library for Scala. Algebird is designed with streaming aggregations in mind, and implements a number of types and combinators that are useful in a streaming mapreduce environment. The Monoid, for example, is a core concept of Summingbird, Twitter's streaming MapReduce library.

Here are some of the more exotic data structures in Algebird:

  • CountMinSketch
  • SketchMap
  • HyperLogLog
  • Stochastic Gradient Descent

Chill

Chill provides a number of enhancements to the Kryo JVM serialization library; notably, serializers for all scala primitives and collection types, and plugins that make it easy to use Kryo in Hadoop and Storm jobs. Scalding, Cascalog, Spark and many other projects use chill to manage serialization across their various distributed system implementations.

Tormenta

Tormenta provides a type-safe Scala DSL over Storm, along with scala-friendly implementations of Kafka, Kestrel and Twitter Streaming API spouts for Storm.

FORMA

The Forest Monitoring for Action (FORMA) project provides free and open forest clearing alert data derived from MODIS satellite imagery every 16 days beginning in December 2005. I was the lead developer of FORMA's Clojure codebase from January 2011 to mid-2012.

Large Contributions

Here are other people's projects I've contributed to in large ways.

Cascalog

Cascalog is a Datalog implementation in Clojure that compiles queries down to Hadoop jobs. I've maintained Cascalog since late 2011 and authored many core features and modules, including midje-cascalog and cascalog-contrib. I'm currently working on Cascalog 2, which will allow Cascalog's query language to compile down to targets other than Hadoop (like Spark or Storm).

Scalding

Scalding is a Hadoop DSL written in Scala. I've contributed a number of designs and constructs to the codebase; many of these can be found in the scalding-commons project. Some examples are

ElephantDB

ElephantDB is a distributed read-only key-value store designed to be populated by Hadoop. I maintained ElephantDB during the first half of 2012 and performed a major rewrite that went into production at Twitter for a time.

Pallet

Pallet is a cloud provisioning system written in Clojure. I contributed a Hadoop cluster deploy tool called pallet-hadoop.

iOS Games

I developed the following games for iOS: