This literate essay develops an implementation of a type called Differential. A Differential is a generalization of a type called a "dual number", and the glowing, pulsing core of the SICMUtils implementation of forward-mode automatic differentiation.
As we'll discuss, passing these numbers as arguments to some function \(f\) built out of the sicmutils.generic operators allows us to build up the derivative of \(f\) in parallel to our evaluation of \(f\). Complex programs are built out of simple pieces that we know how to evaluate; we can build up derivatives of entire programs in a similar way, building them out of the derivatives of the smaller pieces of those programs.
(ns sicmutils.differential
"This namespace contains an implementation of [[Differential]], a generalized
dual number type that forms the basis for the forward-mode automatic
differentiation implementation in sicmutils.
See `sicmutils.calculus.derivative` for a fleshed-out derivative
implementation using [[Differential]]."
(:refer-clojure :rename {compare core-compare}
#?@(:cljs [:exclude [compare]]))
(:require [clojure.string :refer [join]]
[sicmutils.function :as f]
[sicmutils.generic :as g]
[sicmutils.util :as u]
[sicmutils.util.stream :as us]
[sicmutils.util.vector-set :as uv]
[sicmutils.value :as v])
#?(:clj
(:import (clojure.lang AFn IFn))))
For many scientific computing applications, it's valuable be able to generate a "derivative" of a function; given some wiggle in the inputs, how much wobble does the output produce?
we know how to take derivatives of many of the generic functions exposed by SICMUtils, like +
, *
, sin
and friends. It turns out that we can take the derivatives of large, complicated functions by combining the derivatives of these smaller functions using the chain rule as a clever bookkeeping device.
The technique of evaluating a function and its derivative in parallel is called "forward-mode Automatic Differentiation". The SICMUtils wiki has more information on the history of this technique, and links to the many other implementations you'll find in different languages. See the cljdocs Automatic Differentiation page for "how do I use this?"-style questions.
NOTE: The other flavor of automatic differentiation (AD) is "reverse-mode AD". See sicmutils.tape for an implementation of this style, coming soon!
Our goal is to build up derivatives of complex functions out of the derivatives of small pieces. A dual number is a relatively simple piece of machinery that will help us accomplish this goal.
A dual number is a pair of numbers of the form
\[
a+b\varepsilon
\]
where \(a\) and \(b\) are real numbers, and \(\varepsilon\) is an abstract thing, with the property that \(\varepsilon^2 = 0\).
NOTE: This might remind you of the definition of a complex number of the form \(a + bi\), where \(i\) is also a new thing with the property that \(i^2 = -1\). You are very wise! The bigger idea lurking here is the "generalized complex number", and of course mathematicians have pushed this very far. You might explore the "Split-complex numbers" too, which arise when you set \(i^2 = 1\).
Why are dual numbers useful (in SICMUtils)? If you pass \(a+b\varepsilon\) in to a function \(f\), the result is a dual number \(f(a) + Df(a) b \varepsilon\); you get both the function and its derivative evaluated at the same time!
To see why, look at what happens when you pass a dual number into the Taylor series expansion of some arbitrary function \(f\). As a reminder, the Taylor series expansion of \(f\) around some point \(a\) is:
\[
f(x) = f(a)+\frac{Df(a)}{1!}(x-a)+\frac{D^2f(a)}{2!}(x-a)^{2}+\frac{D^3f(a)}{3!}(x-a)^{3}+\cdots
\]
NOTE: See this nice overview of Taylor series expansion by Andrew Chamberlain if you want to understand this idea and why we can approximate (smooth) functions this way.
If you evaluate the expansion of \(f(x)\) around \(a\) with a dual number argument whose first component is \(a\) – take \(x=a+b\varepsilon\), for example – watch how the expansion simplifies:
\[
f(a+b\varepsilon) = f(a)+\frac{Df(a)}{1!}(b\varepsilon)+\frac{D^2f(a)}{2!}(b\varepsilon)^2+\cdots
\]
Since \(\varepsilon^2=0\) we can ignore all terms beyond the first two:
\[
f(a+b\varepsilon) = f(a)+ (Df(a)b)\varepsilon
\]
NOTE: See lift-1 for an implementation of this idea.
Interesting! This justifies our claim above: applying a function to some dual number \(a+\varepsilon\) returns a new dual number, where
If we do this twice, the second component of the returned dual number beautifully recreates the Chain Rule:
\begin{aligned}
g(f(a+\varepsilon)) & = g(f(a) + Df(a)\varepsilon) \
& = g(f(a)) + (Dg(f(a)))(Df(a))\varepsilon
\end{aligned}
A "dual number" is a very general idea. Because we're interested in dual numbers as a bookkeeping device for derivatives, we're going to specialize our terminology. From now on, we'll rename \(a\) and \(b\) to \(x\) and \(x'\). Given a dual number of the form \(x+x'\varepsilon\): we'll refer to:
NOTE: "primal" means \(x\) is tracking the "primal", or "primary", part of the computation. "tangent" is a synonym for "derivative". "tag" is going to make more sense shortly, when we start talking about mixing together multiple \(\varepsilon_1\), \(\varepsilon_2\) from different computations.
What about functions of more than one variable? We can use the same approach by leaning on the multivariable Taylor series expansion. Take \(f(x, y)\) as a binary example. If we pass dual numbers in to the taylor series expansion of \(f\), the \(\varepsilon\) multiplication rule will erase all higher-order terms, leaving us with:
\[
f(x+x'\varepsilon, y+y'\varepsilon) = f(x,y) + \partial_1 f(x,y)x' + \partial_2 f(x,y) y'
\]
NOTE: See lift-2 for an implementation of this idea.
This expansion generalizes for n-ary functions; every new argument \(x_n + x'_n\varepsilon\) contributes \(\partial_n f(...)x'_n\) to the result.
We can check this with the simple cases of addition, subtraction and multiplication.
The real parts of a dual number add commutatively, so we can rearrange the components of a sum to get a new dual number:
\[
(x+x'\varepsilon)+(y+y'\varepsilon) == (x+y)+(x'+y')\varepsilon
\]
This matches the sum rule of differentiation, since the partials of \(x + y\) with respect to either \(x\) or \(y\) both equal 1.
Subtraction is almost identical and agrees with the subtraction rule:
\[
(x+x'\varepsilon)-(y+y'\varepsilon) == (x-y)+(x'-y')\varepsilon
\]
Multiplying out the components of two dual numbers again gives us a new dual number, whose tangent component agrees with the product rule:
\begin{aligned}
(x+ x'\varepsilon)*(y+y'\epsilon) &= xy+(xy')\varepsilon+(x'y)\varepsilon+(x'y')\epsilon^2 \
&= xy+(xy'+y'x)\varepsilon
\end{aligned}
Stare at these smaller derivations and convince yourself that they agree with the Taylor series expansion method for binary functions.
The upshot is that, armed with these techniques, we can implement a higher-order derivative
function (almost!) as simply as this:
(defn derivative [f]
(fn [x]
(extract-tangent
(f (make-dual x 1)))))
As long as f
is built out of functions that know how to apply themselves to dual numbers, this will all Just Work.
All of the examples above are about first-order derivatives. Taking higher-order derivatives is, in theory, straightforward:
(derivative
(derivative f))
But this guess hits one of many subtle problems with the implementation of forward-mode AD. The double-call to derivative
will expand out to this:
(fn [x]
(letfn [(inner-d [x]
(extract-tangent
(f (make-dual x 1))))]
(extract-tangent
(inner-d
(make-dual x 1)))))
the x
received by inner-d
will ALREADY be a dual number \(x+\varepsilon\)! This will cause two immediate problems:
(make-dual x 1)
will return \((x+\varepsilon)+\varepsilon = x+2\varepsilon\), which is not what we we want
The extract-tangent
call inside inner-d
will return the Df(x)
component of the dual number… which, remember, is no longer a dual number! So the SECOND call to extract-tangent
have nothing to extract, and can only sensibly return 0.
The problem here is called "perturbation confusion", and is covered beautifully in "Confusion of Tagged Perturbations in Forward Automatic Differentiation of Higher-Order Functions", by Manzyuk et pal. (2019).
The solution is to introduce a new \(\varepsilon\) for every level, and allow different \(\varepsilon\) instances to multiply without annihalating. Each \(\varepsilon\) is called a "tag". Differential (implemented below) is a generalized dual number that can track many tags at once, allowing nested derivatives like the one described above to work.
This implies that extract-tangent
needs to take a tag, to determine which tangent to extract:
(defn derivative [f]
(let [tag (fresh-tag)]
(fn [x]
(-> (f (make-dual x 1 tag))
(extract-tangent tag)))))
This is close to the final form you'll find in derivative.
Before we discuss the implementation of dual numbers (called Differential), lift-1, lift-2 and the rest of the machinery that makes this all possible; what sorts of objects is f
allowed to return?
The dual number approach is beautiful because we can bring to bear all sorts of operations in Clojure that never even see dual numbers. For example, square-and-cube
called with a dual number returns a PAIR of dual numbers:
(defn square-and-cube [x]
(let [x2 (g/square x)
x3 (g/cube x)]
[x2 x3]))
Vectors don't care what they hold! We want the derivative of square-and-cube
to also return a vector, whose entries represent the derivative of that entry with respect to the function's input.
But this implies that extract-tangent from the example above needs to know how to handle vectors and other collections; in the case of a vector v
by returning (mapv extract-tangent v)
The dual number implementation is called Differential; the way that Differential instances interact with container types in SICMUtils makes it easy for these captures to occur all over. Whenever we multiply a Differential by a structure, a function, a vector, any of those things, our implementation of the SICMUtils generics pushes the Differential inside those objects, rather than forming a Differential with, for example, a vector in the primal and tangent parts.
Functions… interesting. what about higher-order functions?
(defn offset-fn
"Returns a function that takes a single-argument function `g`, and returns a new
function like `g` that offsets its input by `offset`."
[offset]
(fn [g]
(fn [x]
(g (+ x offset)))))
(derivative offset-fn)
here returns a function! Ideally we'd like the returned function to act exactly like:
(derivative
(fn [offset] (g (+ x offset))))
for some known g
and x
, but with the ability to store (derivative offset-fn)
and call it later with many different g
.
We might accomplish this by composing extract-tangent
with the returned function, so that the extraction happens later, when the function's called.
NOTE: The real implementation is more subtle! See the sicmutils.calculus.derivative
namespace for the actual implementation of IPerturbed for functions and multimethods.
All of this suggests that we need to make extract-tangent an open function that other folks can extend for other container-like types (functors, specifically).
The IPerturbed protocol accomplishes this, along with two other functions that we'll use later:
(defprotocol IPerturbed
(perturbed? [this]
"Returns true if the supplied object has some known non-zero tangent to be
extracted via [[extract-tangent]], false otherwise. (Return `false` by
default if you can't detect a perturbation.)")
(replace-tag [this old-tag new-tag]
"If `this` is perturbed, Returns a similar object with the perturbation
modified by replacing any appearance of `old-tag` with `new-tag`. Else,
return `this`.")
(extract-tangent [this tag]
"If `this` is perturbed, return the tangent component paired with the
supplied tag. Else, returns `([[sicmutils.value/zero-like]] this)`."))
replace-tag
exists to handle subtle bugs that can arise in the case of functional return values. See the "Amazing Bug" sections in sicmutils.calculus.derivative-test for detailed examples on how this might bite you.
The default implementations are straightforward, and match the docstrings:
(extend-protocol IPerturbed
#?(:clj Object :cljs default)
(perturbed? [_] false)
(replace-tag [this _ _] this)
(extract-tangent [this _] (v/zero-like this)))
We now have a template for how to implement derivative
. What's left? We need a dual number type that we can build and split back out into primal and tangent components, given some tag. We'll call this type a Differential.
A Differential is a generalized dual number with a single primal component, and potentially many tagged terms. Identical tags cancel to 0 when multiplied, but are allowed to multiply by each other:
\[
a + b\varepsilon_1 + c\varepsilon_2 + d\varepsilon_1 \varepsilon_2 + \cdots
\]
Alternatively, you can view a Differential as a dual number with a specific tag, that's able to hold dual numbers with some other tag in its primal and tangent slots. You can turn a Differential into a dual number by specifying one of its tags. Here are the primal and tangent components for \(\varepsilon_2\):
\[
(a + b\varepsilon_1) + (c + d\varepsilon_1)\varepsilon_2
\]
And for \(\varepsilon_1\):
\[
(a + c\varepsilon_2) + (b + d \varepsilon_2) \varepsilon_1
\]
A differential term is implemented as a pair whose first element is a set of tags and whose second is the coefficient.
(def ^:private tags first)
(def ^:private coefficient peek)
The set of tags is implemented as a "vector set", from sicmutils.util.vector-set. This is a sorted set data structure, backed by a Clojure vector. vector sets provide cheap "max" and "min" operations, and acceptable union, intersection and difference performance.
(defn- make-term
"Returns a [[Differential]] term with the supplied vector-set of `tags` paired
with coefficient `coef`.
`tags` defaults to [[uv/empty-set]]"
([coef] [uv/empty-set coef])
([tags coef] [tags coef]))
Since the only use of a tag is to distinguish each unnamed \(\varepsilon_n\), we'll assign a new, unique positive integer for each new tag:
(let [next-tag (atom 0)]
(defn fresh-tag
"Returns a new, unique tag for use inside of a [[Differential]] term list."
[]
(swap! next-tag inc)))
(defn- tag-in-term?
"Return true if `t` is in the tag-set of the supplied `term`, false otherwise."
[term t]
(uv/contains? (tags term) t))
The discussion above about Taylor series expansions revealed that for single variable functions, we can pass a dual number into any function whose derivative we already know:
\[
f(a+b\varepsilon) = f(a) + (Df(a)b)\varepsilon
\]
Because we can split a Differential into a primal and tangent component with respect to some tag, we can reuse this result. We'll default to splitting Differential instances by the highest-index tag:
\begin{aligned}
f(a &+ b\varepsilon_1 + c\varepsilon_2 + d\varepsilon_1 \varepsilon_2) \
&= f((a + b\varepsilon_1)+(c + d\varepsilon_1)\varepsilon_2) \
&= f(a + b\varepsilon_1)+Df(a + b\varepsilon_1)(c + d\varepsilon_1)\varepsilon_2 \
\end{aligned}
Note that \(f\) and \(Df\) both received a dual number! One more expansion, this time in \(\varepsilon_1\), completes the evaluation (and makes abundantly clear why we want the computer doing this, not pencil-and-paper):
\begin{aligned}
f(a &+ b\varepsilon_1)+Df(a+b\varepsilon_1)(c+d\varepsilon_1)\varepsilon_2 \
&= (f(a)+Df(a)b\varepsilon_1)+(Df(a)+D^2f(a)b\varepsilon_1)(c + d\varepsilon_1)\varepsilon_2 \
&= f(a)+(Df(a)b+D^2f(a)bc)\varepsilon_1+Df(a)c\varepsilon_2+Df(a)d\varepsilon_1\varepsilon_2
\end{aligned}
The only operations we need to implement between lists of terms are addition and multiplication.
To efficiently add two Differential instances (represented as vectors of terms), we keep all terms in sorted order, sorted first by the length of each tag list (the "order" of the differential term), and secondarily by the values of the tags inside the tag list.
NOTE: Clojure vectors already implement this ordering properly, so we can use clojure.core/compare
to determine an ordering on a tag list.
(defn- terms:+
"Returns the sum of the two supplied sequences of differential terms; any terms
in the result with a zero coefficient will be removed.
Each input must be sequence of `[tag-set, coefficient]` pairs, sorted by
`tag-set`."
[xs ys]
(loop [xs xs, ys ys, result []]
(cond (empty? xs) (into result ys)
(empty? ys) (into result xs)
:else (let [[x-tags x-coef :as x] (first xs)
[y-tags y-coef :as y] (first ys)
compare-flag (core-compare x-tags y-tags)]
(cond
If the terms have the same tag set, add the coefficients
together. Include the term in the result only if the new
coefficient is non-zero.
(zero? compare-flag)
(let [sum (g/+ x-coef y-coef)]
(recur (rest xs)
(rest ys)
(if (v/zero? sum)
result
(conj result (make-term x-tags sum)))))
Else, pass the smaller term on unchanged and proceed.
(neg? compare-flag)
(recur (rest xs) ys (conj result x))
:else
(recur xs (rest ys) (conj result y)))))))
Because we've decided to store terms as a vector, we can multiply two vectors of terms by:
This final step is required by a number of different operations later, so we break it out into its own collect-terms
function:
(defn- collect-terms
"Build a term list up of pairs of tags => coefficients by grouping together and
summing coefficients paired with the same term list.
The result will be sorted by term list, and contain no duplicate term lists."
[tags->coefs]
(let [terms (for [[tags tags-coefs] (group-by tags tags->coefs)
:let [c (transduce (map coefficient) g/+ tags-coefs)]
:when (not (v/zero? c))]
[tags c])]
(into [] (sort-by tags terms))))
terms:*
implements the first three steps, and calls collect-terms
on the resulting sequence:
(defn- terms:*
"Returns a vector of non-zero [[Differential]] terms that represent the product
of the differential term lists `xs` and `ys`."
[xs ys]
(collect-terms
(for [[x-tags x-coef] xs
[y-tags y-coef] ys
:when (empty? (uv/intersection x-tags y-tags))]
(make-term (uv/union x-tags y-tags)
(g/* x-coef y-coef)))))
Armed with our term list arithmetic operations, we can finally implement our Differential type and implement a number of important Clojure and SICMUtils protocols.
A Differential will respond to v/kind
with ::differential
. Because we want Differential instances to work in any place that real numbers or symbolic argument work, let's make ::differential
derive from ::v/scalar
:
(derive ::differential ::v/scalar)
Now the actual type. The terms
field is a term-list vector that will remain (contractually!) sorted by its list of tags.
(declare d:apply compare equiv from-terms one?)
(deftype Differential [terms]
A [[Differential]] as implemented can act as a chain-rule accounting device
for all sorts of types, not just numbers. A [[Differential]] is
only [[v/numerical?]] if its coefficients are numerical.
v/Numerical
(numerical? [_]
(v/numerical? (coefficient (first terms))))
IPerturbed
(perturbed? [_] true)
;; There are 3 cases to consider when replacing some tag in a term, annotated
;; below:
(replace-tag [_ oldtag newtag]
(letfn [(process [term]
(let [tagv (tags term)]
(if-not (uv/contains? tagv oldtag)
if the term doesn't contain the old tag, ignore it.
[term]
(if (uv/contains? tagv newtag)
if the term _already contains_ the new tag
$\varepsilon_{new}$, then replacing $\varepsilon_1$
with a new instance of $\varepsilon_2$ would cause a
clash. Since $\varepsilon_2^2=0$, the term should be
removed.
[]
else, perform the replacement.
[(make-term (-> tagv
(uv/disj oldtag)
(uv/conj newtag))
(coefficient term))]))))]
(from-terms
(mapcat process terms))))
;; To extract the tangent (with respect to `tag`) from a differential, return
;; all terms that contain the tag (with the tag removed!) This can create
;; duplicate terms, so use [[from-terms]] to massage the result into
;; well-formedness again.
(extract-tangent [_ tag]
(from-terms
(mapcat (fn [term]
(let [tagv (tags term)]
(if (uv/contains? tagv tag)
[(make-term (uv/disj tagv tag)
(coefficient term))]
[])))
terms)))
v/Value
(zero? [this]
(every? (comp v/zero? coefficient) terms))
(one? [this] (one? this))
(identity? [this] (one? this))
(zero-like [_] 0)
(one-like [_] 1)
(identity-like [_] 1)
(freeze [_] `[~'Differential ~@terms])
(exact? [_] false)
(kind [_] ::differential)
Object
;; Comparing [[Differential]] objects using `equals` defaults to [[equiv]],
;; which compares instances only using their non-tagged ('finite') components.
;; If you want to compare two instances using their full term lists,
;; See [[eq]].
#?(:clj (equals [a b] (equiv a b)))
(toString [_] (str "D[" (join " " (map #(join " → " %) terms)) "]"))
;; Because a [[Differential]] is an accounting device that augments other
;; operations with the ability to carry around derivatives, it's possible that
;; the coefficient slots could be occupied by function objects. If so, then it
;; becomes possible to "apply" a [[Differential]] to some arguments (apply
;; each coefficient to the arguments).
;; TODO the arity, if anyone cares to ask, might be better implemented as the
;; joint arity of all coefficients; but my guess here is that the tangent
;; terms all have to be tracking derivatives of the primal term, which have to
;; have the same arity by definition.
f/IArity
(arity [_]
(f/arity (coefficient (first terms))))
#?@(:clj
;; This one is slightly subtle. To participate in control flow operations,
;; like comparison with both [[Differential]] and non-[[Differential]]
;; numbers, [[Differential]] instances should compare using ONLY their
;; non-tagged ("finite") terms. This means that comparison will totally
;; ignore any difference in tags.
[Comparable
(compareTo [a b] (compare a b))
IFn
(invoke [this]
(d:apply this []))
(invoke [this a]
(d:apply this [a]))
(invoke [this a b]
(d:apply this [a b]))
(invoke [this a b c]
(d:apply this [a b c]))
(invoke [this a b c d]
(d:apply this [a b c d]))
(invoke [this a b c d e]
(d:apply this [a b c d e]))
(invoke [this a b c d e f]
(d:apply this [a b c d e f]))
(invoke [this a b c d e f g]
(d:apply this [a b c d e f g]))
(invoke [this a b c d e f g h]
(d:apply this [a b c d e f g h]))
(invoke [this a b c d e f g h i]
(d:apply this [a b c d e f g h i]))
(invoke [this a b c d e f g h i j]
(d:apply this [a b c d e f g h i j]))
(invoke [this a b c d e f g h i j k]
(d:apply this [a b c d e f g h i j k]))
(invoke [this a b c d e f g h i j k l]
(d:apply this [a b c d e f g h i j k l]))
(invoke [this a b c d e f g h i j k l m]
(d:apply this [a b c d e f g h i j k l m]))
(invoke [this a b c d e f g h i j k l m n]
(d:apply this [a b c d e f g h i j k l m n]))
(invoke [this a b c d e f g h i j k l m n o]
(d:apply this [a b c d e f g h i j k l m n o]))
(invoke [this a b c d e f g h i j k l m n o p]
(d:apply this [a b c d e f g h i j k l m n o p]))
(invoke [this a b c d e f g h i j k l m n o p q]
(d:apply this [a b c d e f g h i j k l m n o p q]))
(invoke [this a b c d e f g h i j k l m n o p q r]
(d:apply this [a b c d e f g h i j k l m n o p q r]))
(invoke [this a b c d e f g h i j k l m n o p q r s]
(d:apply this [a b c d e f g h i j k l m n o p q r s]))
(invoke [this a b c d e f g h i j k l m n o p q r s t]
(d:apply this [a b c d e f g h i j k l m n o p q r s t]))
(applyTo [this xs] (AFn/applyToHelper this xs))]
:cljs
[IEquiv
(-equiv [a b] (equiv a b))
IComparable
(-compare [a b] (compare a b))
IPrintWithWriter
(-pr-writer [x writer _]
(write-all writer (.toString x)))
IFn
(-invoke [this]
(d:apply this []))
(-invoke [this a]
(d:apply this [a]))
(-invoke [this a b]
(d:apply this [a b]))
(-invoke [this a b c]
(d:apply this [a b c]))
(-invoke [this a b c d]
(d:apply this [a b c d]))
(-invoke [this a b c d e]
(d:apply this [a b c d e]))
(-invoke [this a b c d e f]
(d:apply this [a b c d e f]))
(-invoke [this a b c d e f g]
(d:apply this [a b c d e f g]))
(-invoke [this a b c d e f g h]
(d:apply this [a b c d e f g h]))
(-invoke [this a b c d e f g h i]
(d:apply this [a b c d e f g h i]))
(-invoke [this a b c d e f g h i j]
(d:apply this [a b c d e f g h i j]))
(-invoke [this a b c d e f g h i j k]
(d:apply this [a b c d e f g h i j k]))
(-invoke [this a b c d e f g h i j k l]
(d:apply this [a b c d e f g h i j k l]))
(-invoke [this a b c d e f g h i j k l m]
(d:apply this [a b c d e f g h i j k l m]))
(-invoke [this a b c d e f g h i j k l m n]
(d:apply this [a b c d e f g h i j k l m n]))
(-invoke [this a b c d e f g h i j k l m n o]
(d:apply this [a b c d e f g h i j k l m n o]))
(-invoke [this a b c d e f g h i j k l m n o p]
(d:apply this [a b c d e f g h i j k l m n o p]))
(-invoke [this a b c d e f g h i j k l m n o p q]
(d:apply this [a b c d e f g h i j k l m n o p q]))
(-invoke [this a b c d e f g h i j k l m n o p q r]
(d:apply this [a b c d e f g h i j k l m n o p q r]))
(-invoke [this a b c d e f g h i j k l m n o p q r s]
(d:apply this [a b c d e f g h i j k l m n o p q r s]))
(-invoke [this a b c d e f g h i j k l m n o p q r s t]
(d:apply this [a b c d e f g h i j k l m n o p q r s t]))
(-invoke [this a b c d e f g h i j k l m n o p q r s t rest]
(d:apply this (concat [a b c d e f g h i j k l m n o p q r s t] rest)))]))
#?(:clj
(defmethod print-method Differential
[^Differential s ^java.io.Writer w]
(.write w (.toString s))))
(defn differential?
"Returns true if the supplied object is an instance of `Differential`, false
otherwise."
[dx]
(instance? Differential dx))
(defn- bare-terms
"Returns the `-terms` field of the supplied `Differential` object. Errors if any
other type is supplied."
[dx]
{:pre [(differential? dx)]}
(.-terms ^Differential dx))
Because a Differential is really a wrapper around the idea of a generalized dual number represented as a term-list, we need to be able to get to and from the term list format from other types, not just Differential instances.
(defn- ->terms
"Returns a vector of terms that represent the supplied [[Differential]]; any
term with a [[v/zero?]] coefficient will be filtered out before return.
If you pass a non-[[Differential]], [[->terms]] will return a singleton term
list (or `[]` if the argument was zero)."
[dx]
(cond (differential? dx)
(filterv (fn [term]
(not (v/zero? (coefficient term))))
(bare-terms dx))
(v/zero? dx) []
:else [(make-term dx)]))
(defn- terms->differential
"Returns a differential instance generated from a vector of terms. This method
will do some mild cleanup, or canonicalization:
- any empty term list will return 0
- a singleton term list with no tags will return its coefficient
NOTE this method assumes that the input is properly sorted, and contains no
zero coefficients."
[terms]
{:pre [(vector? terms)]}
(cond (empty? terms) 0
(and (= (count terms) 1)
(empty? (tags (first terms))))
(coefficient (first terms))
:else (->Differential terms)))
(defn from-terms
"Accepts a sequence of terms (pairs of [tag-list, coefficient]), and returns:
- a well-formed [[Differential]] instance, if the terms resolve to a
differential with non-zero infinitesimal terms
- the original input otherwise
Duplicate (by tag list) terms are allowed; their coefficients will be summed
together and removed if they sum to zero."
[tags->coefs]
(terms->differential
(collect-terms tags->coefs)))
This next section lifts slightly-augmented versions of terms:+
and terms:*
up to operate on Differential instances. These work just as before, but handle wrapping and unwrapping the term list.
(defn d:+
"Returns an object representing the sum of the two objects `dx` and `dy`. This
works by summing the coefficients of all terms with the same list of tags.
Works with non-[[Differential]] instances on either or both sides, and returns
a [[Differential]] only if it contains any non-zero tangent components."
[dx dy]
(terms->differential
(terms:+ (->terms dx)
(->terms dy))))
(defn d:*
"Returns an object representing the product of the two objects `dx` and `dy`.
This works by multiplying out all terms:
$$(dx1 + dx2 + dx3 + ...)(dy1 + dy2 + dy3...)$$
and then collecting any duplicate terms by summing their coefficients.
Works with non-[[Differential]] instances on either or both sides, and returns
a [[Differential]] only if it contains any non-zero tangent components."
[dx dy]
(terms->differential
(terms:* (->terms dx)
(->terms dy))))
(defn- d:apply
"Accepts a [[Differential]] and a sequence of `args`, interprets each
coefficient as a function and returns a new [[Differential]] generated by
applying the coefficient to `args`."
[diff args]
(terms->differential
(into [] (mapcat (fn [term]
(let [result (apply (coefficient term) args)]
(if (v/zero? result)
[]
[(make-term (tags term) result)]))))
(->terms diff))))
Finally, the function we've been waiting for! bundle
allows you to augment some non-Differential thing with a tag and push it through the generic arithmetic system, where it will accumulate the derivative of your original input (tagged with tag
.)
(defn bundle
"Generate a new [[Differential]] object with the supplied `primal` and `tangent`
components, and the supplied internal `tag` that this [[Differential]] will
carry around to prevent perturbation confusion.
If the `tangent` component is `0`, acts as identity on `primal`. `tangent`
defaults to 1.
`tag` defaults to a side-effecting call to [[fresh-tag]]; you can retrieve
this unknown tag by calling [[max-order-tag]]."
([primal]
(bundle primal 1 (fresh-tag)))
([primal tag]
(bundle primal 1 tag))
([primal tangent tag]
(let [term (make-term (uv/make [tag]) tangent)]
(d:+ primal (->Differential [term])))))
These functions give higher-level access to the components of a Differential you're typically interested in.
(defn max-order-tag
"Given one or more well-formed [[Differential]] objects, returns the
maximum ('highest order') tag found in the highest-order term of any of
the [[Differential]] instances.
If there is NO maximal tag (ie, if you provide [[Differential]] instances with
no non-zero tangent parts, or all non-[[Differential]]s), returns nil."
([dx]
(when (differential? dx)
(let [last-term (peek (->terms dx))
highest-tag (peek (tags last-term))]
highest-tag)))
([dx & dxs]
(letfn [(max-termv [dx]
(if-let [max-order (max-order-tag dx)]
[max-order]
[]))]
(when-let [orders (seq (mapcat max-termv (cons dx dxs)))]
(apply max orders)))))
A reminder: the primal-part
of a Differential is all terms except for terms containing max-order-tag
, and tangent-part
is a Differential built out of the remaining terms, all of which contain that tag.
(defn primal-part
"Returns a [[Differential]] containing only the terms of `dx` that do NOT
contain the supplied `tag`, ie, the primal component of `dx` with respect to
`tag`.
If no tag is supplied, defaults to `([[max-order-tag]] dx)`.
NOTE: every [[Differential]] can be factored into a dual number of the form
primal + (tangent * tag)
For each tag in any of its terms. [[primal-part]] returns this first piece,
potentially simplified into a non-[[Differential]] if the primal part contains
no other tags."
([dx] (primal-part dx (max-order-tag dx)))
([dx tag]
(if (differential? dx)
(let [sans-tag? #(not (tag-in-term? % tag))]
(->> (->terms dx)
(filterv sans-tag?)
(terms->differential)))
dx)))
(defn tangent-part
"Returns a [[Differential]] containing only the terms of `dx` that contain the
supplied `tag`, ie, the tangent component of `dx` with respect to `tag`.
If no tag is supplied, defaults to `([[max-order-tag]] dx)`.
NOTE: Every [[Differential]] can be factored into a dual number of the form
primal + (tangent * tag)
For each tag in any of its terms. [[tangent-part]] returns a [[Differential]]
representing `(tangent * tag)`, or 0 if `dx` contains no terms with the
supplied `tag`.
NOTE: the 2-arity case is similar to `(extract-tangent dx tag)`; the only
difference is that `extract-tangent` drops the `dx` tag from all terms in the
returned value. Call `extract-tangent` if you want to drop `tag`."
([dx] (tangent-part dx (max-order-tag dx)))
([dx tag]
(if (differential? dx)
(->> (->terms dx)
(filterv #(tag-in-term? % tag))
(terms->differential))
0)))
(defn primal-tangent-pair
"Returns a pair of the primal and tangent components of the supplied `dx`, with
respect to the supplied `tag`. See the docs for [[primal-part]]
and [[tangent-part]] for more details.
[[primal-tangent-pair]] is equivalent to
`[([[primal-part]] dx tag) ([[tangent-part]] dx tag)]`
but slightly more efficient if you need both."
([dx] (primal-tangent-pair dx (max-order-tag dx)))
([dx tag]
(if-not (differential? dx)
[dx 0]
(let [[tangent-terms primal-terms]
(us/separatev #(tag-in-term? % tag)
(->terms dx))]
[(terms->differential primal-terms)
(terms->differential tangent-terms)]))))
(defn finite-term
"Returns the term of the supplied [[Differential]] `dx` that has no tags
attached to it, `0` otherwise.
[[Differential]] instances with many can be decomposed many times
into [[primal-part]] and [[tangent-part]]. Repeated calls
to [[primal-part]] (with different tags!) will eventually yield a
non-[[Differential]] value. If you know you want this, [[finite-term]] will
get you there in one shot.
NOTE that this will only work with a well-formed [[Differential]], ie, an
instance with all terms sorted by their list of tags."
[dx]
(if (differential? dx)
(let [[head] (bare-terms dx)
ts (tags head)]
(if (= [] ts)
(coefficient head)
0))
dx))
Functions like =
, <
and friends don't have derivatives; instead, they're used for control flow inside of Clojure functions. To play nicely with these functions, the Differential API exposes a number of methods for comparing numbers on ONLY their finite parts.
Why? If x
is a Differential instance, (< x 10)
needs to return true whenever a non-Differential x
would return true. To make this work, these operations look only at the finite-part
.
HOWEVER! v/one?
and v/zero?
are examples of SICMUtils functions that are used to skip operations that we want to happen, like multiplication.
(g/* x y)
will return y
if (v/one? x)
is true… but to propagate the derivative through we need this multiplication to occur. The compromise is:
v/one?
and v/zero?
return true only when ALL tangent-part=s are zero and the =finite-part
is either v/one?
or v/zero?
respectivelyeq
and compare-full
similarly looks at every component in the Differential supplied to both sideswhile:
equiv
and compare
only examine the finite-part
of either side.(defn one?
"Returns true if the supplied instance has a [[finite-part]] that responds true
to [[sicmutils.value/one?]], and zero coefficients on any of its tangent
components; false otherwise.
NOTE: This means that [[one?]] will not do what you expect as a conditional
inside some function. If you want to branch inside some function you're taking
the derivative of, prefer `(= 1 dx)`. This will only look at
the [[finite-part]] and ignore the values of the tangent parts."
[dx]
(let [[p t] (primal-tangent-pair dx)]
(and (v/one? p)
(v/zero? t))))
(defn eq
"For non-differentials, this is identical to [[clojure.core/=]].
For [[Differential]] instances, equality acts on tangent components too.
If you want to ignore the tangent components, use [[equiv]]."
([_] true)
([a b]
(= (->terms a)
(->terms b)))
([a b & more]
(reduce eq (eq a b) more)))
(defn compare-full
"Comparator that compares [[Differential]] instances with each other or
non-differentials using all tangent terms each instance. Matches the response
of [[eq]].
Acts as [[clojure.core/compare]] for non-differentials."
[a b]
(core-compare
(->terms a)
(->terms b)))
(defn equiv
"Returns true if all of the supplied objects have equal [[finite-part]]s, false
otherwise.
Use [[equiv]] if you want to compare non-differentials with
[[Differential]]s and ignore all tangent components. If you _do_ want to take
the tangent components into account, prefer [[eq]]."
([_] true)
([a b]
(= (finite-term a)
(finite-term b)))
([a b & more]
(reduce equiv (equiv a b) more)))
(defn compare
"Comparator that compares [[Differential]] instances with each other or
non-differentials using only the [[finite-part]] of each instance. Matches the
response of [[equiv]].
Acts as [[clojure.core/compare]] for non-differentials."
[a b]
(core-compare
(finite-term a)
(finite-term b)))
Finally, we come to the heart of it! lift-1 and lift-2 "lift", or augment, unary or binary functions with the ability to handle Differential instances in addition to whatever other types they previously supported.
These functions are implementations of the single and multivariable Taylor series expansion methods discussed at the beginning of the namespace.
There is yet another subtlety here, noted in the docstrings below. lift-1 and lift-2 really are able to lift functions like clojure.core/+
that can't accept Differentials. But the first-order derivatives that you have to supply do have to be able to take Differential instances.
This is because the tangent-part
of Differential might still be a Differential, and for Df
to handle this we need to be able to take the second-order derivative.
Magically this will all Just Work if you pass an already-lifted function, or a function built out of already-lifted components, as df:dx
or df:dy
.
(defn- lift-1
"Given:
- some unary function `f`
- a function `df:dx` that computes the derivative of `f` with respect to its
single argument
Returns a new unary function that operates on both the original type of `f`
and [[Differential]] instances.
NOTE: `df:dx` has to ALREADY be able to handle [[Differential]] instances. The
best way to accomplish this is by building `df:dx` out of already-lifted
functions, and declaring them by forward reference if you need to."
[f df:dx]
(fn call [x]
(if-not (differential? x)
(f x)
(let [[px tx] (primal-tangent-pair x)
fx (call px)]
(if (and (v/number? tx) (v/zero? tx))
fx
(d:+ fx (d:* (df:dx px) tx)))))))
(defn- lift-2
"Given:
- some binary function `f`
- a function `df:dx` that computes the derivative of `f` with respect to its
single argument
- a function `df:dy`, similar to `df:dx` for the second arg
Returns a new binary function that operates on both the original type of `f`
and [[Differential]] instances.
NOTE: `df:dx` and `df:dy` have to ALREADY be able to handle [[Differential]]
instances. The best way to accomplish this is by building `df:dx` and `df:dy`
out of already-lifted functions, and declaring them by forward reference if
you need to."
[f df:dx df:dy]
(fn call [x y]
(if-not (or (differential? x)
(differential? y))
(f x y)
(let [tag (max-order-tag x y)
[xe dx] (primal-tangent-pair x tag)
[ye dy] (primal-tangent-pair y tag)
a (call xe ye)
b (if (and (v/number? dx) (v/zero? dx))
a
(d:+ a (d:* dx (df:dx xe ye))))]
(if (and (v/number? dy) (v/zero? dy))
b
(d:+ b (d:* (df:dy xe ye) dy)))))))
(defn- lift-n
"Given:
- some function `f` that can handle 0, 1 or 2 arguments
- `df:dx`, a fn that returns the derivative wrt the single arg in the unary case
- `df:dx1` and `df:dx2`, fns that return the derivative with respect to the
first and second args in the binary case
Returns a new any-arity function that operates on both the original type of
`f` and [[Differential]] instances.
NOTE: The n-ary case of `f` is populated by nested calls to the binary case.
That means that this is NOT an appropriate lifting method for an n-ary
function that isn't built out of associative binary calls. If you need this
ability, please file an issue at the [sicmutils issue
tracker](https://github.com/sicmutils/sicmutils/issues)."
[f df:dx df:dx1 df:dx2]
(let [f1 (lift-1 f df:dx)
f2 (lift-2 f df:dx1 df:dx2)]
(fn call
([] (f))
([x] (f1 x))
([x y] (f2 x y))
([x y & more]
(reduce call (call x y) more)))))
One more treat before we augment the generic arithmetic system. The derivative operation is linear, so:
\[
D(x+x'\varepsilon) = D(x)+D(x')\varepsilon
\]
This implementation is valid because the coefficients of a Differential can be functions.
(defmethod g/partial-derivative [::differential v/seqtype] [a selectors]
(let [tag (max-order-tag a)
px (primal-part a tag)
tx (extract-tangent a tag)]
(d:+ (g/partial-derivative px selectors)
(d:* (g/partial-derivative tx selectors)
(bundle 0 1 tag)))))
Armed with lift-1 and lift-2, we can install Differential into the SICMUtils generic arithmetic system.
Any function built out of these components will work with the D operator.
(defn- defunary
"Given:
- a generic unary multimethod `generic-op`
- a corresponding single-arity lifted function `differential-op`
installs an appropriate unary implementation of `generic-op` for
`::differential` instances."
[generic-op differential-op]
(defmethod generic-op [::differential] [a] (differential-op a)))
(defn- defbinary
"Given:
- a generic binary multimethod `generic-op`
- a corresponding 2-arity lifted function `differential-op`
installs an appropriate binary implementation of `generic-op` between
`:differential` and `::v/scalar` instances."
[generic-op differential-op]
(doseq [signature [[::differential ::differential]
[::v/scalar ::differential]
[::differential ::v/scalar]]]
(defmethod generic-op signature [a b] (differential-op a b))))
And now we're off to the races. The rest of the namespace provides defunary
and defbinary
calls for all of the generic operations for which we know how to declare partial derivatives.
(defbinary g/add
(lift-2 g/add
(fn [_ _] 1)
(fn [_ _] 1)))
(defunary g/negate
(lift-1 g/negate (fn [_] -1)))
(defunary g/negative?
(fn [x] (g/negative? (finite-term x))))
(defbinary g/sub
(lift-2 g/sub
(fn [_ _] 1)
(fn [_ _] -1)))
(let [mul (lift-2
g/mul
(fn [_ y] y)
(fn [x _] x))]
(defbinary g/mul mul)
(defunary g/square (fn [x] (mul x x)))
(defunary g/cube (fn [x] (mul x (mul x x))))
(defbinary g/dot-product mul))
(defunary g/invert
(lift-1 g/invert
(fn [x] (g/div -1 (g/square x)))))
(defbinary g/div
(lift-2 g/div
(fn [_ y] (g/div 1 y))
(fn [x y] (g/div (g/negate x)
(g/square y)))))
(defunary g/abs
(fn [x]
(let [f (finite-term x)
func (cond (< f 0) (lift-1 (fn [x] (g/negate x)) (fn [_] -1))
(> f 0) (lift-1 (fn [x] x) (fn [_] 1))
(= f 0) (u/illegal "Derivative of g/abs undefined at zero")
:else (u/illegal (str "error! derivative of g/abs at" x)))]
(func x))))
(defunary g/sqrt
(lift-1 g/sqrt
(fn [x]
(g/invert
(g/mul (g/sqrt x) 2)))))
This first case of g/expt
, where the exponent itself is non-Differential, is special-cased and slightly simpler. The second partial derivative throws, since the more general definition below should always override.
(let [power (lift-2
g/expt
(fn [x y]
(g/mul y (g/expt x (g/sub y 1))))
(fn [_ _]
(u/illegal "can't get there from here")))]
(defmethod g/expt [::differential ::v/scalar] [d n] (power d n)))
The remaining two cases allow for a differential exponent.
NOTE: I took this implementation-split from scmutils, but I'm not sure that it matters… if the second partial never gets called, why is this a good optimization?
(let [expt (lift-2
g/expt
(fn [x y]
(g/mul y (g/expt x (g/sub y 1))))
(fn [x y]
(if (and (v/number? x) (v/zero? y))
(if (v/number? y)
(if (not (g/negative? y))
0
(u/illegal "Derivative undefined: expt"))
0)
(g/* (g/log x) (g/expt x y)))))]
(defmethod g/expt [::differential ::differential] [d n] (expt d n))
(defmethod g/expt [::v/scalar ::differential] [d n] (expt d n)))
(defunary g/log
(lift-1 g/log g/invert))
(defunary g/exp
(lift-1 g/exp g/exp))
(defunary g/sin
(lift-1 g/sin g/cos))
(defunary g/cos
(lift-1 g/cos
(fn [x] (g/negate (g/sin x)))))
(defunary g/tan
(lift-1 g/tan
(fn [x]
(g/invert
(g/square (g/cos x))))))
(defunary g/asin
(lift-1 g/asin
(fn [x]
(g/invert
(g/sqrt (g/sub 1 (g/square x)))))))
(defunary g/acos
(lift-1 g/acos
(fn [x]
(g/negate
(g/invert
(g/sqrt (g/sub 1 (g/square x))))))))
(defunary g/atan
(lift-1 g/atan (fn [x]
(g/invert
(g/add 1 (g/square x))))))
(defbinary g/atan
(lift-2 g/atan
(fn [y x]
(g/div x (g/add (g/square x)
(g/square y))))
(fn [y x]
(g/div (g/negate y)
(g/add (g/square x)
(g/square y))))))
(defunary g/sinh
(lift-1 g/sinh g/cosh))
(defunary g/cosh
(lift-1 g/cosh g/sinh))
(defunary g/tanh
(lift-1 g/tanh
(fn [x]
(g/sub 1 (g/square (g/tanh x))))))
]]>definite-integral
package I built into SICMUtils, a Clojure(script) based computer algebra system based on Gerald Sussman's scmutils
. (Beware the buzzwords...)I expect that this post will fission into a full series, with a bit of explanation about how each piece fits
]]>definite-integral
package I built into SICMUtils, a Clojure(script) based computer algebra system based on Gerald Sussman's scmutils
. (Beware the buzzwords...)I expect that this post will fission into a full series, with a bit of explanation about how each piece fits into the whole. Until then, know that you've found the source of what might be the most hardcore definite integration library out there, certainly the only one written like this, in functional style with unashamed explanation of how everything works.
If you're into Clojure, you can use all of this code by importing version 0.13.0
of the SICMUtils library.
Enjoy!
This document is an exploration of the definite-integral
implementation in SICMUtils, a port of the wonderful scmutils
system.
This interface is the target. The goal of the document is to make this definite-integral
interface come to life.
(ns quadrature
(:require [sicmutils.numerical.compile :as c]
[quadrature.adaptive :as qa]
[quadrature.boole :as boole]
[quadrature.common :as qc]
[quadrature.bulirsch-stoer :as bs]
[quadrature.infinite :as qi]
[quadrature.midpoint :as mid]
[quadrature.milne :as milne]
[quadrature.riemann :as riemann]
[quadrature.romberg :as romberg]
[quadrature.simpson :as simp]
[quadrature.simpson38 :as simp38]
[quadrature.trapezoid :as trap]
[sicmutils.util :as u]))
This namespace unites all of the work inside quadrature
behind a single interface, fronted by the all-powerful definite-integral
function.
The interface takes f
, an integrand, along with bounds a
and b
:
(definite-integral f a b)
Optionally, you can provide a dictionary of customizing options. These are passed down to whatever method you supply via the :method
key.
(definite-integral f a b opts)
The keys in quad-methods
below define the full range of integration methods available in the package. Each entry in this dictionary is either:
definite-integral
(possibly created with qc/defintegrator
):method
key.This latter style is used when the method itself is a specialization of a more general method.
(def ^:private quadrature-methods
{:open {:method :adaptive-bulirsch-stoer
:interval qc/open}
:closed {:method :adaptive-bulirsch-stoer
:interval qc/closed}
:closed-open {:method :adaptive-bulirsch-stoer
:interval qc/closed-open}
:open-closed {:method :adaptive-bulirsch-stoer
:interval qc/open-closed}
:bulirsch-stoer-open bs/open-integral
:bulirsch-stoer-closed bs/closed-integral
:adaptive-bulirsch-stoer (qa/adaptive bs/open-integral bs/closed-integral)
:left-riemann riemann/left-integral
:right-riemann riemann/right-integral
:lower-riemann riemann/lower-integral
:upper-riemann riemann/upper-integral
:midpoint mid/integral
:trapezoid trap/integral
:boole boole/integral
:milne milne/integral
:simpson simp/integral
:simpson38 simp38/integral
:romberg romberg/closed-integral
:romberg-open romberg/open-integral})
(def available-methods
(into #{} (keys quadrature-methods)))
The user can specify a method by providing the :method
key in their options with:
The latter two are the allowed value types in quadrature-methods
.
(defn- extract-method
"Attempts to turn the supplied argument into an integration method; returns nil
if method doesn't exist."
[method]
(cond (fn? method)
[method {}]
(keyword? method)
(extract-method
(quadrature-methods method))
(map? method)
(let [[f m] (extract-method
(:method method))]
[f (merge (dissoc method :method) m)])))
(defn get-integrator
"Takes:
- An integration method, specified as either:
- a keyword naming one of the available methods in `available-methods`
- a function with the proper integrator signature
- a dictionary of integrator options with a `:method` key
- `a` and `b` integration endpoints
- an optional dictionary of options `m`
And returns a pair of an integrator function and a possibly-enhanced options
dictionary.
(Some integration functions require extra options, so the returned dictionary
may have more entries than the `m` you pass in.)
If either endpoint is infinite, the returned integrator is wrapped in
`qi/improper` and able to handle infinite endpoints (as well as non-infinite
endpoints by passing through directly to the underlying integrator)."
([method a b] (get-integrator method a b {}))
([method a b m]
(when-let [[integrate opts] (extract-method method)]
(let [integrate (if (or (qc/infinite? a)
(qc/infinite? b))
(qi/improper integrate)
integrate)]
[integrate (dissoc (merge opts m) :method)]))))
Here we are! The one function you need care about if you're interested in definite integrals. Learn to use this, and then dig in to the details of individual methods if you run inton trouble or want to learn more. Enjoy!
(defn definite-integral
"Evaluates the definite integral of integrand `f` across the interval $a, b$.
Optionally accepts a dictionary `opts` of customizing options; All `opts` will
be passed through to the supplied `integrate` functions.
If you'd like more control, or to retrieve the integration function directly
without looking it up via `:method` each time, see `get-integrator`.
All supplied options are passed through to the underlying integrator; see the
specific integrator for information on what options are available.
## Keyword arguments:
`:method`: Specifies the integration method used. Must be
- a keyword naming one of the available methods in `available-methods`
- a function with the proper integrator signature
- a dictionary of integrator options with a `:method` key
Defaults to `:open`, which specifies an adaptive bulirsch-stoer quadrature method.
`:compile?` If true, the generic function will be simplified and compiled
before execution. (Clojure only for now.) Defaults to false.
`:info?` If true, `definite-integral` will return a map of integration
information returned by the underlying integrator. Else, returns an estimate
of the definite integral."
([f a b] (definite-integral f a b {}))
([f a b {:keys [method compile? info?]
:or {method :open
compile? false
info? false}
:as opts}]
(if-let [[integrate m] (get-integrator method a b opts)]
(let [f #?(:clj (if compile? (c/compile-univariate-fn f) f)
:cljs f)
result (integrate f a b m)]
(if info? result (:result result)))
(u/illegal (str "Unknown method: " method
". Try one of: "
available-methods)))))
(ns quadrature.riemann
(:require [quadrature.interpolate.richardson :as ir]
[quadrature.common :as qc
#?@(:cljs [:include-macros true])]
[sicmutils.generic :as g]
[sicmutils.util :as u]
[quadrature.util.aggregate :as ua]
[quadrature.util.stream :as us]
[sicmutils.numsymb]))
This namespace includes functions for calculating the Riemann integral of a single-variable function. These are probably not methods that you'll want to use; see the documentation and defaults in quadrature
for good recommendations. But they're clear and understandable. The goal of this namespace is to lay the groundwork for visualizable routines that you can use to step toward understanding of the tougher methods.
"Quadrature", in this context, means "numerical integration". The word is a historical term for calculating the area inside of some geometry shape. Riemann sums are a group of methods for numerical integration that use this strategy:
The Riemann integral of a function \(f\) is the limit of this process as \(n \to \infty\).
How do you estimate the area of a slice? All of these methods estimate the area by forming a rectangle. For the base, use \(x_r - x_l\). For the height, you might use:
This namespace builds up to implementations for left-integral
, right-integral
, upper-integral
and lower-integral
. midpoint.cljc
holds an implementation of the Midpoint method.
A closely related method involves forming a trapezoid for each slice. This is equivalent to averaging the left and right Riemann sums. The trapezoid method lives in trapezoid.cljc
.
We'll start with an inefficient-but-easily-understandable version of these methods. To form a Riemann sum we need to:
n
sliceswindowed-sum
implements this pattern:
(defn windowed-sum
"Takes:
- `area-fn`, a function of the left and right endpoints of some integration
slice
- definite integration bounds `a` and `b`
and returns a function of `n`, the number of slices to use for an integration
estimate.
`area-fn` should return an estimate of the area under some curve between the
`l` and `r` bounds it receives."
[area-fn a b]
(fn [n]
(let [width (/ (- b a) n)
grid-points (concat (range a b width) [b])]
(ua/sum
(map area-fn grid-points (rest grid-points))))))
Test this out with a function that returns 2
for every slice, and we get back an estimate (from the function returned by windowed-sum
) of 2x the number of slices:
(let [area-fn (fn [l r] 2)
estimator (windowed-sum area-fn 0 10)]
[(estimator 10)
(estimator 20)])
[20.0 40.0]
Now, let's implement the four classic "Riemann Integral" methods.
Let's say we want to integrate a function \(f\). The left and right Riemann sums estimate a slice's area as a rectangle with:
left-sum
is simple to implement, given windowed-sum
:
(defn- left-sum* [f a b]
(-> (fn [l r] (* (f l) (- r l)))
(windowed-sum a b)))
Every internal slice has the same width, so we can make the sum slightly more efficient by pulling out the constant and multiplying by it a single time.
Internally, we also generate all of the internal "left" points directly from the slice index, instead of pre-partitioning the range. This is fine since we don't need \(x_r\).
(defn- left-sum
"Returns a function of `n`, some number of slices of the total integration
range, that returns an estimate for the definite integral of $f$ over the
range $[a, b)$ using a left Riemann sum."
[f a b]
(let [width (- b a)]
(fn [n]
(let [h (/ width n)
fx (fn [i] (f (+ a (* i h))))]
(* h (ua/sum fx 0 n))))))
right-sum
is almost identical, except that it uses \(f(x_r)\) as the estimate of each rectangle's height:
(defn- right-sum* [f a b]
(-> (fn [l r] (* (f r) (- r l)))
(windowed-sum a b)))
Same trick here to get a more efficient version. This implementation also generates an internal function fx
of the window index. The only difference from the left-sum
implementation is an initial offset of h
, pushing every point to the right side of the window.
(defn- right-sum
"Returns a function of `n`, some number of slices of the total integration
range, that returns an estimate for the definite integral of $f$ over the
range $(a, b]$ using a right Riemann sum."
[f a b]
(let [width (- b a)]
(fn [n]
(let [h (/ width n)
start (+ a h)
fx (fn [i] (f (+ start (* i h))))]
(* h (ua/sum fx 0 n))))))
The upper Riemann sum generates a slice estimate by taking the maximum of \(f(x_l)\) and \(f(x_r)\):
(defn- upper-sum
"Returns an estimate for the definite integral of $f$ over the range $[a, b]$
using an upper Riemann sum.
This function may or may not make an evaluation at the endpoints $a$ or $b$,
depending on whether or not the function is increasing or decreasing at the
endpoints."
[f a b]
(-> (fn [l r] (* (- r l)
(max (f l) (f r))))
(windowed-sum a b)))
Similarly, the lower Riemann sum uses the minimum of \(f(x_l)\) and \(f(x_r)\):
(defn- lower-sum
"Returns an estimate for the definite integral of $f$ over the range $[a, b]$
using a lower Riemann sum.
This function may or may not make an evaluation at the endpoints $a$ or $b$,
depending on whether or not the function is increasing or decreasing at the
endpoints."
[f a b]
(-> (fn [l r] (* (- r l)
(min (f l) (f r))))
(windowed-sum a b)))
Given the tools above, let's attempt to estimate the integral of \(f(x) = x^2\) using the left and right Riemann sum methods. (The actual equation for the integral is \(x^3 \over 3\)).
The functions above return functions of n
, the number of slices. We can use (us/powers 2)
to return a sequence of (1, 2, 4, 8, ...)
and map the function of n
across this sequence to obtain successively better estimates for \(\int_0^{10} x^2\). The true value is $10^{3} \over 3 = 333.333…$.
Here's the left-sum
estimate:
(take 5 (map (left-sum g/square 0 10)
(us/powers 2)))
(0.0 125.0 218.75 273.4375 302.734375)
And the left-sum
estimate:
(take 5 (map (right-sum g/square 0 10)
(us/powers 2)))
(1000.0 625.0 468.75 398.4375 365.234375)
Both estimates are bad at 32 slices and don't seem to be getting better. Even up to \(2^16 = 65,536\) slices we haven't converged, and are still far from the true estimate:
(-> (map (left-sum g/square 0 10)
(us/powers 2))
(us/seq-limit {:maxterms 16}))
{:converged? false, :terms-checked 16, :result 333.31807469949126}
This bad convergence behavior is why common wisdom states that you should never use left and right Riemann sums for real work.
But maybe we can do better.
One answer to this problem is to use "sequence acceleration" via Richardson extrapolation, as described in richardson.cljc
.
ir/richardson-sequence
takes a sequence of estimates of some function and "accelerates" the sequence by combining successive estimates.
The estimates have to be functions of some parameter \(n\) that decreases by a factor of \(t\) for each new element. In the example above, \(n\) doubles each time; this is equivalent to thinking about the window width \(h\) halving each time, so \(t = 2\).
This library's functional style lets us accelerate a sequence of estimates xs
by simply wrapping it in a call to (ir/richardson-sequence xs 2)
. Amazing!
Does Richardson extrapolation help?
(let [f (fn [x] (* x x))]
(-> (map (left-sum f 0 10)
(us/powers 2))
(ir/richardson-sequence 2)
(us/seq-limit)))
{:converged? true, :terms-checked 4, :result 333.3333333333333}
We now converge to the actual, true value of the integral in 4 terms!
This is going to be useful for each of our Riemann sums, so let's make a function that can accelerate a generic sequence of estimates. The following function takes:
estimate-seq
This library is going to adopt an interface that allows the user to configure a potentially very complex integration function by sending a single dictionary of options down to each of its layers. Adopting that style now is going to allow this function to grow to accomodate other methods of sequence acceleration, like polynomial or rational function extrapolation.
For now, {:accelerate? true}
configures Richardson extrapolation iff the user hasn't specified a custom sequence of integration slices using the :n
option.
(defn- accelerate
"NOTE - this is only appropriate for Richardson-accelerating sequences with t=2,
p=q=1.
This only applies to the Riemann sequences in this namespace!"
[estimate-seq {:keys [n accelerate?] :or {n 1}}]
(if (and accelerate? (number? n))
(ir/richardson-sequence estimate-seq 2 1 1)
estimate-seq))
Check that this works:
(let [f (fn [x] (* x x))]
(-> (map (left-sum f 0 10)
(us/powers 2))
(accelerate {:accelerate? true})
(us/seq-limit)))
{:converged? true, :terms-checked 4, :result 333.3333333333333}
Excellent!
The results look quite nice; but notice how much redundant computation we're doing.
Consider the evaluation points of a left Riemann sum with 4 slices, next to a left sum with 8 slices:
x---x---x---x----
x-x-x-x-x-x-x-x--
Every time we double our number of number of evaluations, half of the windows share a left endpoint. The same is true for a right sum:
----x---x---x---x
--x-x-x-x-x-x-x-x
In both cases, the new points are simply the midpoints of the existing slices.
This suggests a strategy for incrementally updating a left or right Riemann sum when doubling the number of points:
n
slices2
to scale each NEW slice width down by 2 (since we're doubling the number of slices)First, implement midpoint-sum
. This is very close to the implementation for left-sum
; internally the function adds an offset of \(h \over 2\) to each slice before sampling its function value.
(defn midpoint-sum
"Returns a function of `n`, some number of slices of the total integration
range, that returns an estimate for the definite integral of $f$ over the
range $(a, b)$ using midpoint estimates."
[f a b]
(let [width (- b a)]
(fn [n]
(let [h (/ width n)
offset (+ a (/ h 2.0))
fx (fn [i] (f (+ offset (* i h))))]
(* h (ua/sum fx 0 n))))))
The next function returns a function that can perform the incremental update to a left or right Riemann sum (and to a midpoint method estimate, as we'll see in midpoint.cljc
):
(defn Sn->S2n
"Returns a function of:
- `Sn`: a sum estimate for `n` partitions, and
- `n`: the number of partitions
And returns a new estimate for $S_{2n}$ by sampling the midpoints of each
slice. This incremental update rule is valid for left and right Riemann sums,
as well as the midpoint method."
[f a b]
(let [midpoints (midpoint-sum f a b)]
(fn [Sn n]
(-> (+ Sn (midpoints n))
(/ 2.0)))))
After using left-sum
to generate an initial estimate, we can use Sn->S2n
to generate all successive estimates, as long as we always double our slices. This suggests a function that takes an initial number of slices, n0
, and then uses reductions
to scan across (us/powers 2 n0)
with the function returned by Sn->S2n
:
(defn- left-sequence* [f a b n0]
(let [first-S ((left-sum f a b) n0)
steps (us/powers 2 n0)]
(reductions (Sn->S2n f a b) first-S steps)))
Verify that this function returns an equivalent sequence of estimates to the non-incremental left-sum
, when mapped across powers of 2:
(let [f (fn [x] (* x x))]
(= (take 10 (left-sequence* f 0 10 1))
(take 10 (map (left-sum f 0 10)
(us/powers 2 1)))))
true
We need to use the same style for right-sum
, so let's try and extract the pattern above, of:
n0
slices using some function S-fn
n0
slices => n0 / 2
slices using some incremental updater, next-S-fn
In fact, because methods like the Midpoint method from midpoint.cljc
can only incrementally update from n
=> n/3
, let's make the factor general too.
geometric-estimate-seq
captures the pattern above:
(defn geometric-estimate-seq
"Accepts:
- `S-fn`: a function of `n` that generates a numerical integral estimate from
`n` slices of some region, and
- `next-S-fn`: a function of (previous estimate, previous `n`) => new estimate
- `factor`: the factor by which `n` increases for successive estimates
- `n0`: the initial `n` to pass to `S-fn`
The new estimate returned b `next-S-fn` should be of `factor * n` slices."
[S-fn next-S-fn factor n0]
(let [first-S (S-fn n0)
steps (us/powers factor n0)]
(reductions next-S-fn first-S steps)))
And another version of left-sequence
, implemented using the new function:
(defn left-sequence**
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the closed-open interval $a, b$ by taking left-Riemann sums with
n0, 2n0, 4n0, ...
slices."
([f a b] (left-sequence** f a b 1))
([f a b n0]
(geometric-estimate-seq (left-sum f a b)
(Sn->S2n f a b)
2
n0)))
What if we want to combine the ability to reuse old results with the ability to take successively refined estimates that don't look like geometric series? The series 1, 2, 3… of natural numbers is an obvious choice of windows… but only the even powers are able to reuse estimates.
Integration methods like the Bulirsch-Stoer approach depend on sequences like 2, 3, 4, 6…
We absolutely want to be able to save potentially-expensive function evaluations.
One way to do this is to memoize the function f
that you pass in to any of the methods above.
Alternatively, we could implement a version of geometric-estimate-seq
that takes any sequence of estimate,s and maintains a sort of internal memoization cache.
For every n
, check the cache for prev == n/factor
. If it exists in the cache, use next-S-fn
; else, use S-fn
, just like we did in geometric-estimate-seq
for the initial value.
general-estimate-seq
does this:
(defn- general-estimate-seq
"Accepts:
- `S-fn`: a function of `n` that generates a numerical integral estimate from
`n` slices of some region, and
- `next-S-fn`: a function of (previous estimate, previous `n`) => new estimate
- `factor`: the factor by which `next-S-fn` increases `n` in its returned estimate
- `n-seq`: a monotonically increasing sequence of `n` slices to use.
Returns a sequence of estimates of returned by either function for each `n` in
`n-seq`. Internally decides whether or not to use `S-fn` or `next-S-fn` to
generate successive estimates."
[S-fn next-S-fn factor n-seq]
(let [f (fn [[cache _] n]
(let [Sn (if (zero? (rem n factor))
(let [prev (quot n factor)]
(if-let [S-prev (get cache prev)]
(next-S-fn S-prev prev)
(S-fn n)))
(S-fn n))]
[(assoc cache n Sn) Sn]))]
(->> (reductions f [{} nil] n-seq)
(map second)
(rest))))
We can combine general-estimate-seq
and geometric-estimate-seq
into a final method that decides which implementation to call, based on the type of the n0
argument.
If it's a number, use it as the n0
seed for a geometrically increasing series of estimates. Else, assume it's a sequence and pass it to general-estimate-seq
.
(defn incrementalize
"Function that generalizes the ability to create successively-refined estimates
of an integral, given:
- `S-fn`: a function of `n` that generates a numerical integral estimate from
`n` slices of some region, and
- `next-S-fn`: a function of (previous estimate, previous `n`) => new estimate
- `factor`: the factor by which `next-S-fn` increases `n` in its returned estimate
- `n`: EITHER a number, or a monotonically increasing sequence of `n` slices to use.
If `n` is a sequence, returns a (lazy) sequence of estimates generated for
each entry in `n`.
If `n` is a number, returns a lazy sequence of estimates generated for each
entry in a geometrically increasing series of inputs $n, n(factor),
n(factor^2), ....$
Internally decides whether or not to use `S-fn` or `next-S-fn` to generate
successive estimates."
[S-fn next-S-fn factor n]
(let [f (if (number? n)
geometric-estimate-seq
general-estimate-seq)]
(f S-fn next-S-fn factor n)))
We can use incrementalize
to write our final version of left-sequence
, along with a matching version for right-sequence
.
Notice that we're using accelerate
from above. The interface should make more sense now:
(defn left-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the closed-open interval $a, b$ by taking left-Riemann sums.
## Optional Arguments
`:n`: If `n` is a number, returns estimates with $n, 2n, 4n, ...$ slices,
geometrically increasing by a factor of 2 with each estimate.
If `n` is a sequence, the resulting sequence will hold an estimate for each
integer number of slices in that sequence.
`:accelerate?`: if supplied (and `n` is a number), attempts to accelerate
convergence using Richardson extrapolation. If `n` is a sequence this option
is ignored."
([f a b] (left-sequence f a b {}))
([f a b opts]
(let [S (left-sum f a b)
next-S (Sn->S2n f a b)]
(-> (incrementalize S next-S 2 (:n opts 1))
(accelerate opts)))))
(defn right-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the closed-open interval $a, b$ by taking right-Riemann sums.
## Optional Arguments
`:n`: If `n` is a number, returns estimates with $n, 2n, 4n, ...$ slices,
geometrically increasing by a factor of 2 with each estimate.
If `n` is a sequence, the resulting sequence will hold an estimate for each
integer number of slices in that sequence.
`:accelerate?`: if supplied (and `n` is a number), attempts to accelerate
convergence using Richardson extrapolation. If `n` is a sequence this option
is ignored."
([f a b] (right-sequence f a b {}))
([f a b opts]
(let [S (right-sum f a b)
next-S (Sn->S2n f a b)]
(-> (incrementalize S next-S 2 (:n opts 1))
(accelerate opts)))))
lower-sequence
and upper-sequence
are similar. They can't take advantage of any incremental speedup, so we generate a sequence of n~s internally and map ~lower-sum
and upper-sum
directly across these.
(defn lower-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the closed interval $(a, b)$ by taking lower-Riemann sums.
## Optional Arguments
`:n`: If `n` is a number, returns estimates with $n, 2n, 4n, ...$ slices,
geometrically increasing by a factor of 2 with each estimate.
If `n` is a sequence, the resulting sequence will hold an estimate for each
integer number of slices in that sequence.
`:accelerate?`: if supplied (and `n` is a number), attempts to accelerate
convergence using Richardson extrapolation. If `n` is a sequence this option
is ignored."
([f a b] (lower-sequence f a b {}))
([f a b {:keys [n] :or {n 1} :as opts}]
(let [n-seq (if (number? n)
(us/powers 2 n)
n)]
(-> (map (lower-sum f a b) n-seq)
(accelerate opts)))))
(defn upper-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the closed interval $(a, b)$ by taking upper-Riemann sums.
## Optional Arguments
`:n`: If `n` is a number, returns estimates with $n, 2n, 4n, ...$ slices,
geometrically increasing by a factor of 2 with each estimate.
If `n` is a sequence, the resulting sequence will hold an estimate for each
integer number of slices in that sequence.
`:accelerate?`: if supplied (and `n` is a number), attempts to accelerate
convergence using Richardson extrapolation. If `n` is a sequence this option
is ignored."
([f a b] (upper-sequence f a b {}))
([f a b {:keys [n] :or {n 1} :as opts}]
(let [n-seq (if (number? n)
(us/powers 2 n)
n)]
(-> (map (upper-sum f a b) n-seq)
(accelerate opts)))))
Finally, we expose four API methods for each of the {left, right, lower, upper}-Riemann sums.
Each of these makes use a special qc/defintegrator
"macro"; This style allows us to adopt one final improvement. If the interval \(a, b\) is below some threshold, the integral API will take a single slice using the supplied :area-fn
below and not attempt to converge. See common.cljc
for more details.
These API interfaces are necessarily limiting. They force the assumptions that you:
I can imagine a better API, where it's much easier to configure generic sequence acceleration! This will almost certainly show up in the library at some point. For now, here are some notes:
left-sequence
or right-sequence
, you can still accelerate with Richardson. Just pass your new factor as t
.p=1
and q=1
to richardson-sequence
. accelerate
does this above.polynomial.cljc
and rational.cljc
, which are more general forms of sequence acceleration that use polynomial or rational function extrapolation. Your sequence of xs
for each of those methods should be n-seq
.(qc/defintegrator left-integral
"Returns an estimate of the integral of `f` across the closed-open interval $a,
b$ using a left-Riemann sum with $1, 2, 4 ... 2^n$ windows for each estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `left-sequence` for information on the optional args in `opts` that
customize this function's behavior."
:area-fn (fn [f a b] (* (f a) (- b a)))
:seq-fn left-sequence)
(qc/defintegrator right-integral
"Returns an estimate of the integral of `f` across the closed-open interval $a,
b$ using a right-Riemann sum with $1, 2, 4 ... 2^n$ windows for each estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `right-sequence` for information on the optional args in `opts` that
customize this function's behavior."
:area-fn (fn [f a b] (* (f b) (- b a)))
:seq-fn right-sequence)
Upper and lower Riemann sums have the same interface; internally, they're not able to take advantage of incremental summation, since it's not possible to know in advance whether or not the left or right side of the interval should get reused.
(qc/defintegrator lower-integral
"Returns an estimate of the integral of `f` across the closed-open interval $a,
b$ using a lower-Riemann sum with $1, 2, 4 ... 2^n$ windows for each estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `lower-sequence` for information on the optional args in `opts` that
customize this function's behavior."
:area-fn (fn [f a b] (* (min (f a) (f b)) (- b a)))
:seq-fn lower-sequence)
(qc/defintegrator upper-integral
"Returns an estimate of the integral of `f` across the closed-open interval $a,
b$ using an upper-Riemann sum with $1, 2, 4 ... 2^n$ windows for each estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `upper-sequence` for information on the optional args in `opts` that
customize this function's behavior."
:area-fn (fn [f a b] (* (max (f a) (f b)) (- b a)))
:seq-fn upper-sequence)
For a discussion and implementation of the more advanced methods (the workhorse methods that you should actually use!), see midpoint.cljc
and trapezoid.cljc
. The midpoint method is the standard choice for open intervals, where you can't evaluate the function at its endpoints. The trapezoid method is standard for closed intervals.
(ns quadrature.midpoint
(:require [quadrature.interpolate.richardson :as ir]
[quadrature.common :as qc
#?@(:cljs [:include-macros true])]
[quadrature.riemann :as qr]
[sicmutils.generic :as g]
[sicmutils.util :as u]
[quadrature.util.aggregate :as ua]
[quadrature.util.stream :as us]))
This namespace builds on the ideas introduced in riemann.cljc
.
riemann.cljc
described four different integration schemes ({left, right, upper, lower} Riemann sums) that were each conceptually simple, but aren't often used in practice, even in their "accelerated" forms.
One reason for this is that their error terms fall off as \(h, h^2, h^3\), where \(h\) is the width of an integration slice. Each order of sequence acceleration can cancel out one of these terms at a time; but still, the performance is not great.
It turns out that by taking the midpoint if each interval, instead of either side, you can reduce the order of the error series to \(O(h^2)\). This is too good to pass up.
Additionally, because the error terms fall off as $h^{2}, h^{4}, h^{6}, …$, each order of acceleration is worth quite a bit more than in the Riemann sum case.
This namespace follows the same development as riemann.cljc
:
Here's an implementation of a function that can take the midpoint of a single slice:
(defn single-midpoint [f a b]
(let [width (g/- b a)
half-width (g// width 2)
midpoint (g/+ a half-width)]
(g/* width (f midpoint))))
And a full (though inefficient) integrator using windowed-sum
:
(defn- midpoint-sum* [f a b]
(let [area-fn (partial single-midpoint f)]
(qr/windowed-sum area-fn a b)))
Let's integrate a triangle!
((midpoint-sum* identity 0.0 10.0) 10)
50.0
It turns out that we already had to implement an efficient version of midpoint-sum
in riemann.cljc
; the incremental version of left and right Riemann sums added the midpoints of each interval when doubling the number of slices.
We can check our implementation against qr/midpoint-sum
:
(= ((midpoint-sum* identity 0.0 100.0) 10)
((qr/midpoint-sum identity 0.0 100.0) 10))
true
We'll use qr/midpoint-sum
in the upcoming functions.
Unlike the left and right Riemann sums, the Midpoint method can't reuse function evaluations when the number of slices doubles. This is because each evaluation point, on a doubling, becomes the new border between slices:
n = 1 |-------x-------|
n = 2 |---x---|---x---|
If you triple the number of slices from \(n\) to \(3n\), you can in fact reuse the previous \(n\) evaluations:
n = 1 |--------x--------|
n = 3 |--x--|--x--|--x--|
By scaling Sn
down by a factor of 3, and adding it to a new sum that only includes the new points (using the new slice width).
BTW: The only place I found this idea mentioned is in Section 4.4 of Press's "Numerical Recipes". I haven't found other references to this trick, or implementations. I'd love to hear about them (via a Github issue) if you find any!
We'll follow the interface we used for qr/Sn->S2n
and write Sn->S3n
. This function of \(f, a, b\) will return a function that performs the incremental update.
The returned function generates S3n
across \((a, b)\) with \(n\) intervals, and picking out two new points at \(h \over 6\) and \(5h \over 6\) of the way across the old interval. These are the midpoints of the two new slices with width \(h \over 3\).
Sum them all up and add them to \(S_n \over 3\) to generate \(S_{3n}\):
(defn- Sn->S3n [f a b]
(let [width (- b a)]
(fn [Sn n]
(let [h (/ width n)
delta (/ h 6)
l-offset (+ a delta)
r-offset (+ a (* 5 delta))
fx (fn [i]
(let [ih (* i h)]
(+ (f (+ l-offset ih))
(f (+ r-offset ih)))))]
(-> (+ Sn (* h (ua/sum fx 0 n)))
(/ 3.0))))))
Now we can write midpoint-sequence
, analogous to qr/left-sequence
. This implementation reuses all the tricks from qr/incrementalize
; this means it will be smart about using the new incremental logic any time it sees any \(n\) multiple of 3, just as the docstring describes.
(defn midpoint-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the open interval $(a, b)$ using the Midpoint method.
## Optional arguments:
`:n`: If `:n` is a number, returns estimates with $n, 3n, 9n, ...$ slices,
geometrically increasing by a factor of 3 with each estimate.
If `:n` is a sequence, the resulting sequence will hold an estimate for each
integer number of slices in that sequence.
`:accelerate?`: if supplied (and `n` is a number), attempts to accelerate
convergence using Richardson extrapolation. If `n` is a sequence this option
is ignored."
([f a b] (midpoint-sequence f a b {:n 1}))
([f a b {:keys [n accelerate?] :or {n 1}}]
(let [S (qr/midpoint-sum f a b)
next-S (Sn->S3n f a b)
xs (qr/incrementalize S next-S 3 n)]
(if (and accelerate? (number? n))
(ir/richardson-sequence xs 3 2 2)
xs))))
The following example shows that for the sequence $2, 3, 4, 6, …$ (used in the Bulirsch-Stoer method!), the incrementally-augmented midpoint-sequence
only performs 253 function evaluations, vs the 315 of the non-incremental (midpoint-sum f2 0 1)
mapped across the points.
(let [f (fn [x] (/ 4 (+ 1 (* x x))))
[counter1 f1] (u/counted f)
[counter2 f2] (u/counted f)
n-seq (interleave
(iterate (fn [x] (* 2 x)) 2)
(iterate (fn [x] (* 2 x)) 3))]
(doall (take 12 (midpoint-sequence f1 0 1 {:n n-seq})))
(doall (take 12 (map (qr/midpoint-sum f2 0 1) n-seq)))
[@counter1 @counter2])
[253 315]
The final version is analogous the qr/left-integral
and friends, including an option to :accelerate?
the final sequence with Richardson extrapolation.
I'm not sure what to call this accelerated method. Accelerating the trapezoid method in this way is called "Romberg integration". Using an \(n\) sequence of powers of 2 and accelerating the midpoint method by a single step - taking the second column (index 1) of the Richardson tableau - produces "Milne's method".
The ability to combine these methods makes it easy to produce powerful methods without known names. Beware, and enjoy!
Note on Richardson Extrapolation
We noted above that the the terms of the error series for the midpoint method increase as \(h^2, h^4, h^6\)… Because of this, we pass \(p = q = 2\) into ir/richardson-sequence
below. Additionally, integral
hardcodes the factor of 3
and doesn't currently allow for a custom sequence of \(n\). This requires passing \(t = 3\) into ir/richardson-sequence
.
If you want to accelerate some other geometric sequence, call ir/richardson-sequence
with some other value of t.
To accelerate an arbitrary sequence of midpoint evaluations, investigate polynomial.cljc
or rational.cljc
. The "Bulirsch-Stoer" method uses either of these to extrapolate the midpoint method using a non-geometric sequence.
(qc/defintegrator integral
"Returns an estimate of the integral of `f` over the open interval $(a, b)$
using the Midpoint method with $1, 3, 9 ... 3^n$ windows for each estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `midpoint-sequence` for information on the optional args in `opts` that
customize this function's behavior."
:area-fn single-midpoint
:seq-fn midpoint-sequence)
If you start with the midpoint method, one single step of Richardson extrapolation (taking the second column of the Richardson tableau) is equivalent to "Milne's rule" (see milne.cljc
).
The full Richardson-accelerated Midpoint method is an open-interval variant of "Romberg integration" (see romberg.cljc
).
See the wikipedia entry on Open Newton-Cotes Formulas for more details.
same idea but for closed intervals.
(ns quadrature.trapezoid
(:require [quadrature.common :as qc
#?@(:cljs [:include-macros true])]
[quadrature.riemann :as qr]
[quadrature.interpolate.richardson :as ir]
[sicmutils.function :as f]
[sicmutils.generic :as g]
[sicmutils.util :as u]
[sicmutils.simplify]
[quadrature.util.aggregate :as ua]
[quadrature.util.stream :as us]))
This namespace builds on the ideas introduced in riemann.cljc
and midpoint.cljc
, and follows the pattern of those namespaces:
Let's begin.
A nice integration scheme related to the Midpoint method is the "Trapezoid" method. The idea here is to estimate the area of each slice by fitting a trapezoid between the function values at the left and right sides of the slice.
Alternatively, you can think of drawing a line between \(f(x_l)\) and \(f(x_r)\) and taking the area under the line.
What's the area of a trapezoid? The two slice endpoints are
The trapezoid consists of a lower rectangle and a capping triangle. The lower rectangle's area is:
\[(b - a) f(a)\].
Just like in the left Riemann sum. The upper triangle's area is one half base times height:
\[ {1 \over 2} (x_r - x_l) (f(x_r) - f(x_l))\]
The sum of these simplifies to:
\[{1 \over 2} {(x_r - x_l) (f(x_l) + f(x_r))}\]
Or, in Clojure:
(defn single-trapezoid [f xl xr]
(g// (g/* (g/- xr xl)
(g/+ (f xl) (f xr)))
2))
We can use the symbolic algebra facilities in the library to show that this simplification is valid:
(let [f (f/literal-function 'f)
square (g/* (f 'x_l)
(g/- 'x_r 'x_l))
triangle (g/* (g// 1 2)
(g/- 'x_r 'x_l)
(g/- (f 'x_r) (f 'x_l)))]
(g/simplify
(g/- (single-trapezoid f 'x_l 'x_r)
(g/+ square triangle))))
0
We can use qr/windowed-sum
to turn this function into an (inefficient) integrator:
(defn- trapezoid-sum* [f a b]
(qr/windowed-sum (partial single-trapezoid f)
a b))
Fitting triangles is easy:
((trapezoid-sum* identity 0.0 10.0) 10)
50.0
In fact, we can even use our estimator to estimate \(\pi\):
(def ^:private pi-estimator*
(let [f (fn [x] (/ 4 (+ 1 (* x x))))]
(trapezoid-sum* f 0.0 1.0)))
The accuracy is not bad, for 10 slices:
(pi-estimator* 10)
3.1399259889071587
Explicit comparison:
(- Math/PI (pi-estimator* 10))
0.0016666646826344333
10,000 slices gets us closer:
(< (- Math/PI (pi-estimator* 10000))
1e-8)
true
Fun fact: the trapezoid method is equal to the average of the left and right Riemann sums. You can see that in the equation, but lets verify:
(defn- basically-identical? [l-seq r-seq]
(every? #(< % 1e-15)
(map - l-seq r-seq)))
(let [points (take 5 (iterate inc 1))
average (fn [l r]
(/ (+ l r) 2))
f (fn [x] (/ 4 (+ 1 (* x x))))
[a b] [0 1]
left-estimates (qr/left-sequence f a b {:n points})
right-estimates (qr/right-sequence f a b {:n points})]
(basically-identical? (map (trapezoid-sum f a b) points)
(map average
left-estimates
right-estimates)))
true
Next let's attempt a more efficient implementation. Looking at single-trapezoid
, it's clear that each slice evaluates both of its endpoints. This means that each point on a border between two slices earns a contribution of \(f(x) \over 2\) from each slice.
A more efficient implementation would evaluate both endpoints once and then sum (without halving) each interior point.
This interior sum is identical to a left Riemann sum (without the \(f(a)\) evaluation), or a right Riemann sum (without \(f(b)\)).
Here is this idea implemented in Clojure:
(defn trapezoid-sum
"Returns a function of `n`, some number of slices of the total integration
range, that returns an estimate for the definite integral of $f$ over the
range $(a, b)$ using the trapezoid method."
[f a b]
(let [width (- b a)]
(fn [n]
(let [h (/ width n)
fx (fn [i] (f (+ a (* i h))))]
(* h (+ (/ (+ (f a) (f b)) 2)
(ua/sum fx 1 n)))))))
We can define a new pi-estimator
and check it against our less efficient version:
(def ^:private pi-estimator
(let [f (fn [x] (/ 4 (+ 1 (* x x))))]
(trapezoid-sum* f 0.0 1.0)))
(basically-identical?
(map pi-estimator (range 1 100))
(map pi-estimator* (range 1 100)))
true
Next let's develop an incremental updater for the Trapezoid rule that lets us reuse evaluation points as we increase the number of slices.
Because interior points of the Trapezoid method mirror the interior points of the left and right Riemann sums, we can piggyback on the incremental implementations for those two methods in developing an incremental Trapezoid implementation.
Consider the evaluation points of the trapezoid method with 2 slices, next to the points of a 4 slice pass:
x-------x-------x
x---x---x---x---x
The new points are simply the midpoints of the existing slices, just like we had for the left (and right) Riemann sums. This means that we can reuse qr/Sn->S2n
in our definition of the incrementally-enabled trapezoid-sequence
:
(defn trapezoid-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the open interval $(a, b)$ using the Trapezoid method.
## Optional arguments:
`:n`: If `:n` is a number, returns estimates with $n, 2n, 4n, ...$ slices,
geometrically increasing by a factor of 2 with each estimate.
If `:n` is a sequence, the resulting sequence will hold an estimate for each
integer number of slices in that sequence.
`:accelerate?`: if supplied (and `n` is a number), attempts to accelerate
convergence using Richardson extrapolation. If `n` is a sequence this option
is ignored."
([f a b] (trapezoid-sequence f a b {:n 1}))
([f a b {:keys [n accelerate?] :or {n 1}}]
(let [S (trapezoid-sum f a b)
next-S (qr/Sn->S2n f a b)
xs (qr/incrementalize S next-S 2 n)]
(if (and accelerate? (number? n))
(ir/richardson-sequence xs 2 2 2)
xs))))
The following example shows that for the sequence \(1, 2, 4, 8, ..., 2^n\), the incrementally-augmented trapezoid-sequence
only performs \(2^n + 1\) function evaluations; ie, the same number of evaluations as the non-incremental (trapezoid-sum f2 0 1)
would perform for \(2^n\) slices. (why \(2^n + 1\)? each interior point is shared, so each trapezoid contributes one evaluation, plus a final evaluation for the right side.)
The example also shows that evaluating every \(n\) in the sequence costs \(\sum_{i=0}^n{2^i + 1} = 2^{n+1} + n\) evaluations. As \(n\) gets large, this is roughly twice what the incremental implementation costs.
When \(n=11\), the incremental implementation uses 2049 evaluations, while the non-incremental takes 4017.
(let [n-elements 11
f (fn [x] (/ 4 (+ 1 (* x x))))
[counter1 f1] (u/counted f)
[counter2 f2] (u/counted f)
[counter3 f3] (u/counted f)
n-seq (take (inc n-elements)
(iterate (fn [x] (* 2 x)) 1))]
;; Incremental version evaluating every `n` in the sequence $1, 2, 4, ...$:
(doall (trapezoid-sequence f1 0 1 {:n n-seq}))
;; Non-incremental version evaluating every `n` in the sequence $1, 2, 4, ...$:
(doall (map (trapezoid-sum f2 0 1) n-seq))
;; A single evaluation of the final `n`
((trapezoid-sum f3 0 1) (last n-seq))
(let [two**n+1 (inc (g/expt 2 n-elements))
n+2**n (+ n-elements (g/expt 2 (inc n-elements)))]
(= [2049 4107 2049]
[two**n+1 n+2**n two**n+1]
[@counter1 @counter2 @counter3])))
true
Another short example that hints of work to come. The incremental implementation is useful in cases where the sequence includes doublings nested in among other values.
For the sequence $2, 3, 4, 6, …$ (used in the Bulirsch-Stoer method!), the incrementally-augmented trapezoid-sequence
only performs 162 function evaluations, vs the 327 of the non-incremental (trapezoid-sum f2 0 1)
mapped across the points.
This is a good bit more efficient than the Midpoint method's incremental savings, since factors of 2 come up more often than factors of 3.
(let [f (fn [x] (/ 4 (+ 1 (* x x))))
[counter1 f1] (u/counted f)
[counter2 f2] (u/counted f)
n-seq (take 12 (interleave
(iterate (fn [x] (* 2 x)) 2)
(iterate (fn [x] (* 2 x)) 3)))]
(doall (trapezoid-sequence f1 0 1 {:n n-seq}))
(doall (map (trapezoid-sum f2 0 1) n-seq))
[@counter1 @counter2])
[162 327]
The final version is analogous the qr/left-integral
and friends, including an option to :accelerate?
the final sequence with Richardson extrapolation. (Accelerating the trapezoid method in this way is called "Romberg integration".)
Note on Richardson Extrapolation
The terms of the error series for the Trapezoid method increase as \(h^2, h^4, h^6\)… (see https://en.wikipedia.org/wiki/Trapezoidal_rule#Error_analysis). Because of this, we pass \(p = q = 2\) into ir/richardson-sequence
below. Additionally, integral
hardcodes the factor of 2
and doesn't currently allow for a custom sequence of \(n\). This is configured by passing \(t = 2\) into ir/richardson-sequence
.
If you want to accelerate some other geometric sequence, call ir/richardson-sequence
with some other value of t.
To accelerate an arbitrary sequence of trapezoid evaluations, investigate polynomial.cljc
or rational.cljc
. The "Bulirsch-Stoer" method uses either of these to extrapolate the Trapezoid method using a non-geometric sequence.
(qc/defintegrator integral
"Returns an estimate of the integral of `f` over the closed interval $[a, b]$
using the Trapezoid method with $1, 2, 4 ... 2^n$ windows for each estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `trapezoid-sequence` for information on the optional args in `opts` that
customize this function's behavior."
:area-fn single-trapezoid
:seq-fn trapezoid-sequence)
If you start with the trapezoid method, one single step of Richardson extrapolation (taking the second column of the Richardson tableau) is equivalent to "Simpson's rule". One step using t=3
, ie, when you triple the number of integration slices per step, gets you "Simpson's 3/8 Rule". Two steps of Richardson extrapolation gives you "Boole's rule".
The full Richardson-accelerated Trapezoid method is also known as "Romberg integration" (see romberg.cljc
).
These methods will appear in their respective namespaces in the quadrature
package.
See the wikipedia entry on Closed Newton-Cotes Formulas for more details.
is a special case, where we get more efficient by assuming that the x values for the polynomial interpolation go $1, 1/2, 1/4, …$ and that we're extrapolating to 0.
(ns quadrature.interpolate.richardson
"Richardson interpolation is a special case of polynomial interpolation; knowing
the ratios of successive `x` coordinates in the point sequence allows a more
efficient calculation."
(:require [quadrature.interpolate.polynomial :as ip]
[sicmutils.generic :as g]
[sicmutils.util :as u]
[quadrature.util.aggregate :as ua]
[quadrature.util.stream :as us]
[sicmutils.value :as v]))
This approach (and much of this numerical library!) was inspired by Gerald Sussman's "Abstraction in Numerical Methods" paper.
That paper builds up to Richardson interpolation as a method of "series acceleration". The initial example concerns a series of the side lengths of an N-sided polygon inscribed in a unit circle.
The paper derives this relationship between the sidelength of an $N$- and $2N$-sided polygon:
(defn- refine-by-doubling
"`s` is the side length of an N-sided polygon inscribed in the unit circle. The
return value is the side length of a 2N-sided polygon."
[s]
(/ s (g/sqrt (+ 2 (g/sqrt (- 4 (g/square s)))))))
If we can increase the number of sides => infinity, we should reach a circle. The "semi-perimeter" of an $N$-sided polygon is
\[P_n = {n \over 2} S_n\]
In code:
(defn- semi-perimeter
"Returns the semi-perimeter length of an `n`-sided regular polygon with side
length `side-len`."
[n side-len]
(* (/ n 2) side-len))
so as \(n \to \infty\), \(P_n\) should approach \(\pi\), the half-perimeter of a circle.
Let's start with a square, ie, \(n = 4\) and \(s_4 = \sqrt{2}\). Clojure's iterate
function will let us create an infinite sequence of side lengths:
(def ^:private side-lengths
(iterate refine-by-doubling (Math/sqrt 2)))
and an infinite sequence of the number of sides:
(def ^:private side-numbers
(iterate #(* 2 %) 4))
Mapping a function across two sequences at once generates a new infinite sequence, of semi-perimeter lengths in this case:
(def ^:private archimedean-pi-sequence
(map semi-perimeter side-numbers side-lengths))
The following code will print the first 20 terms:
(us/pprint 10 archimedean-pi-sequence)
2.8284271247461903
3.0614674589207183
3.1214451522580524
3.1365484905459393
3.140331156954753
3.141277250932773
3.1415138011443013
3.141572940367092
3.14158772527716
3.1415914215112
Unfortunately (for Archimedes, by hand!), as the paper notes, it takes 26 iterations to converge to machine precision:
(-> archimedean-pi-sequence
(us/seq-limit {:tolerance v/machine-epsilon}))
{:converged? true, :terms-checked 26, :result 3.1415926535897944}
Enter Sussman:
"Imagine poor Archimedes doing the arithmetic by hand: square roots without even the benefit of our place value system! He would be interested in knowing that full precision can be reached on the fifth term, by forming linear combinations of the early terms that allow the limit to be seized by extrapolation." (p4, Abstraction in Numerical Methods)
Sussman does this by noting that you can also write the side length as:
\[S_n = 2 \sin {\pi \over n}\]
Then the taylor series expansion for \(P_n\) becomes:
\[ P_n = {n \over 2} S_n \ = {n \over 2} 2 \sin {\pi \over n} \ = \pi + {A\ over n^2} + B \over n^4 ... \]
A couple things to note:
The big idea is to multiply \(P_{2n}\) by 4 and subtract \(P_n\) (then divide by 3 to cancel out the extra factor). This will erase the \(A \over n^2\) term and leave a new sequence with \(B \over n^4\) as the dominant error term.
Now keep going and watch the error terms drain away.
Before we write code, let's follow the paper's example and imagine instead some general sequence of $R(h), R(h/t), R(h/t^{2})…$ (where \(t = 2\) in the example above), with a power series expansion that looks like
\[R(h) = A + B h^{p_1} + C h^{p_2}...\]
where the exponents $p_{1}, p_{2}, …$ are some OTHER series of error growth. (In the example above, because the taylor series expanson of \(n \sin n\) only has even factors, the sequence was the even numbers.)
In that case, the general way to cancel error between successive terms is:
\[{R(h/t) - t^{p_1} R(h)} = {t^{p_1} - 1} A + C_1 h^{p_2} + ...\]
or:
\[\frac{R(h/t) - t^{p_1} R(h)}{t^{p_1} - 1} = A + C_2 h^{p_2} + ...\]
Let's write this in code:
(defn- accelerate-sequence
"Generates a new sequence by combining each term in the input sequence `xs`
pairwise according to the rules for richardson acceleration.
`xs` is a sequence of evaluations of some function of $A$ with its argument
smaller by a factor of `t` each time:
$$A(h), A(h/t), ...$$
`p` is the order of the dominant error term for the sequence."
[xs t p]
(let [t**p (Math/pow t p)
t**p-1 (dec t**p)]
(map (fn [ah ah-over-t]
(/ (- (* t**p ah-over-t) ah)
t**p-1))
xs
(rest xs))))
If we start with the original sequence, we can implement Richardson extrapolation by using Clojure's iterate
with the accelerate-sequence
function to generate successive columns in the "Richardson Tableau". (This is starting to sound familiar to the scheme for polynomial interpolation, isn't it?)
To keep things general, let's take a general sequence ps
, defaulting to the sequence of natural numbers.
(defn- make-tableau
"Generates the 'tableau' of succesively accelerated Richardson interpolation
columns."
([xs t] (make-tableau xs t (iterate inc 1)))
([xs t ps]
(->> (iterate (fn [[xs [p & ps]]]
[(accelerate-sequence xs t p) ps])
[xs ps])
(map first)
(take-while seq))))
All we really care about are the FIRST terms of each sequence. These approximate the sequence's final value with small and smaller error (see the paper for details).
Polynomial interpolation in polynomial.cljc
has a similar tableau structure (not by coincidence!), so we can use ip/first-terms
in the implementation below to fetch this first row.
Now we can put it all together into a sequence transforming function, with nice docs:
(defn richardson-sequence
"Takes:
- `xs`: a (potentially lazy) sequence of points representing function values
generated by inputs continually decreasing by a factor of `t`. For example:
`[f(x), f(x/t), f(x/t^2), ...]`
- `t`: the ratio between successive inputs that generated `xs`.
And returns a new (lazy) sequence of 'accelerated' using [Richardson
extrapolation](https://en.wikipedia.org/wiki/Richardson_extrapolation) to
cancel out error terms in the taylor series expansion of `f(x)` around the
value the series to which the series is trying to converge.
Each term in the returned sequence cancels one of the error terms through a
linear combination of neighboring terms in the sequence.
### Custom P Sequence
The three-arity version takes one more argument:
- `p-sequence`: the orders of the error terms in the taylor series expansion
of the function that `xs` is estimating. For example, if `xs` is generated
from some `f(x)` trying to approximate `A`, then `[p_1, p_2...]` etc are the
correction terms:
$$f(x) = A + B x^{p_1} + C x^{p_2}...$$
The two-arity version uses a default `p-sequence` of `[1, 2, 3, ...]`
### Arithmetic Progression
The FOUR arity version takes `xs` and `t` as before, but instead of
`p-sequence` makes the assumption that `p-sequence` is an arithmetic
progression of the form `p + iq`, customized by:
- `p`: the exponent on the highest-order error term
- `q`: the step size on the error term exponent for each new seq element
## Notes
Richardson extrapolation is a special case of polynomial extrapolation,
implemented in `polynomial.cljc`.
Instead of a sequence of `xs`, if you generate an explicit series of points of
the form `[x (f x)]` with successively smaller `x` values and
polynomial-extrapolate it forward to x == 0 (with,
say, `(polynomial/modified-neville xs 0)`) you'll get the exact same result.
Richardson extrapolation is more efficient since it can make assumptions about
the spacing between points and pre-calculate a few quantities. See the
namespace for more discussion.
References:
- Wikipedia: https://en.wikipedia.org/wiki/Richardson_extrapolation
- GJS, 'Abstraction in Numerical Methods': https://dspace.mit.edu/bitstream/handle/1721.1/6060/AIM-997.pdf?sequence=2"
([xs t]
(ip/first-terms
(make-tableau xs t)))
([xs t p-sequence]
(ip/first-terms
(make-tableau xs t p-sequence)))
([xs t p q]
(let [arithmetic-p-q (iterate #(+ q %) p)]
(richardson-sequence xs t arithmetic-p-q))))
We can now call this function, combined with us/seq-limit
(a general-purpose tool that takes elements from a sequence until they converge), to see how much acceleration we can get:
(-> (richardson-sequence archimedean-pi-sequence 2 2 2)
(us/seq-limit {:tolerance v/machine-epsilon}))
{:converged? true, :terms-checked 7, :result 3.1415926535897936}
Much faster!
Richardson extrapolation works by cancelling terms in the error terms of a function's taylor expansion about 0
. To cancel the nth error term, the nth derivative has to be defined. Non-smooth functions aren't going to play well with richardson-sequence
above.
The solution is to look at specific columns of the Richardson tableau. Each column is a sequence with one further error term cancelled.
rational.cljc
and polynomial.cljc
both have this feature in their tableau-based interpolation functions. The feature here requires a different function, because the argument vector is a bit crowded already in richardson-sequence
above.
(defn richardson-column
"Function with an identical interface to `richardson-sequence` above, except for
an additional second argument `col`.
`richardson-column` will return that /column/ offset the interpolation tableau
instead of the first row. This will give you a sequence of nth-order
Richardson accelerations taken between point `i` and the next `n` points.
As a reminder, this is the shape of the Richardson tableau:
p0 p01 p012 p0123 p01234
p1 p12 p123 p1234 .
p2 p23 p234 . .
p3 p34 . . .
p4 . . . .
So supplying a `column` of `1` gives a single acceleration by combining points
from column 0; `2` kills two terms from the error sequence, etc.
NOTE Given a better interface for `richardson-sequence`, this function could
be merged with that function."
([xs col t]
(nth (make-tableau xs t) col))
([xs col t p-seq]
(nth (make-tableau xs t p-seq) col))
([xs col t p q]
(let [arithmetic-p-q (iterate #(+ q %) p)]
(richardson-column xs col t arithmetic-p-q))))
It turns out that the Richardson extrapolation is a special case of polynomial extrapolation using Neville's algorithm (as described in polynomial/neville
), evaluated at x == 0.
Neville's algorithm looks like this:
\[P(x) = [(x - x_r) P_l(x) - (x - x_l) P_r(x)] / [x_l - x_r]\]
Where:
Fill in \(x = 0\) and rearrange:
\[P(0) = [(x_l P_r(0)) - (x_r P_l(x))] \over [x_l - x_r]\]
In the Richardson extrapolation scheme, one of our parameters was t
, the ratio between successive elements in the sequence. Now multiply through by \(1 = {1 \over x_r} \over {1 \over x_r}\) so that our formula contains ratios:
\[P(0) = [({x_l \over x_r} P_r(0)) - P_l(x)] \over [{x_l \over x_r} - 1]\]
Because the sequence of \(x_i\) elements looks like \(x, x/t, x/t^2\), every recursive step separates \(x_l\) and \(x_r\) by another factor of \(t\). So
\[{x_l \over x_r} = {x \over {x \over t^n}} = t^n\]
Where \(n\) is the difference between the positions of \(x_l\) and \(x_r\). So the formula simplifies further to:
\[P(0) = [({t^n} P_r(0)) - P_l(x)] \over [{t^n} - 1]\]
Now it looks exactly like Richardson extrapolation. The only difference is that Richardson extrapolation leaves n
general (and calls it \(p_1, p_2\) etc), so that you can customize the jumps in the error series. (I'm sure there is some detail I'm missing here, so please feel free to make a PR and jump in!)
For the example above, we used a geometric series with \(p, q = 2\) to fit the archimedean \(\pi\) sequence. Another way to think about this is that we're fitting a polynomial to the SQUARE of h
(the side length), not to the actual side length.
Let's confirm that polynomial extrapolation to 0 gives the same result, if we generate squared \(x\) values:
(let [h**2 (fn [i]
;; (1/t^{i + 1})^2
(-> (/ 1 (Math/pow 2 (inc i)))
(Math/pow 2)))
xs (map-indexed (fn [i fx] [(h**2 i) fx])
archimedean-pi-sequence)]
(= (us/seq-limit
(richardson-sequence archimedean-pi-sequence 4 1 1))
(us/seq-limit
(ip/modified-neville xs 0.0))))
true
Success!
The general thing that "richardson extrapolation" is doing below. Historically cool and used to accelerate arbitrary integration sequences.
(ns quadrature.interpolate.polynomial
"This namespace contains a discussion of polynomial interpolation, and different
methods for fitting a polynomial of degree N-1 to N points and evaluating that
polynomial at some different `x`."
(:require [sicmutils.generic :as g]
[quadrature.util.aggregate :as ua]
[quadrature.util.stream :as us]))
First, a Lagrange interpolation polynomial:
(defn lagrange
"Generates a lagrange interpolating polynomial that fits every point in the
supplied sequence `points` (of form `[x (f x)]`) and returns the value of the
polynomial evaluated at `x`.
The Lagrange polynomial has this form:
g(x) = (f(a) * [(x-b)(x-c)...] / [(a-b)(a-c)...])
+ (f(b) * [(x-a)(x-c)...] / [(b-a)(b-c)...])
+ ...
for points `[a f(a)], [b f(b)], [c f(c)]` etc.
This particular method of interpolating `x` into the polynomial is
inefficient; any new calculation requires fully recomputing. Takes O(n^2)
operations in the number of points.
"
[points x]
(let [points (vec points)
n (count points)
build-term (fn [i [a fa]]
(let [others (for [j (range n) :when (not= i j)]
(get-in points [j 0]))
p (reduce g/* (map #(g/- x %) others))
q (reduce g/* (map #(g/- a %) others))]
(g// (g/* fa p) q)))]
(transduce (map-indexed build-term)
g/+
points)))
Lagrange's interpolating polynomial is straightforward, but not terribly efficient; every time we change points
or x
we have to redo the entire calculation. Ideally we'd like to be able to perform:
points
that would let us efficiently evaluate the fitted polynomial for different values of x
in O(n) time, orx
that would let us efficiently add new points to the set we use to generate the interpolating polynomial."Neville's algorithm" lets us generate the same interpolating polynomial recursively. By flipping the recursion around and generating values from the bottom up, we can achieve goal #2 and add new points incrementally.
Start the recursion with a single point. Any point \((x, f(x))\) has a unique 0th order polynomial passing through it - the constant function \(P(x) = f(x)\). For points \(x_a\), \(x_b\), let's call this \(P_a\), \(P_b\), etc.
\(P_{ab}\) is the unique FIRST order polynomial (ie, a line) going through points \(x_a\) and \(x_b\).
this first recursive step gives us this rule:
\[P_{ab}(x) = [(x - x_b) P_a(x) - (x - x_a) P_b(x)] / [x_a - x_b]\]
For higher order terms like \(P_{abcd}\), let's call \(P_{abc}\) 'P_{l}', and \(P_{bcd}\) 'P_{r}' (the polynomial fitted through the left and right set of points).
Similarly, the left and rightmost inputs - \(x_a\) and \(x_b\) - will be \(x_l\) and \(x_r\).
Neville's algorithm states that:
\[P(x) = [(x - x_r) P_l(x) - (x - x_l) P_r(x)] / [x_l - x_r]\]
This recurrence works because the two parents \(P_l\) and \(P_r\) already agree at all points except \(x_l\) and \(x_r\).
(defn neville-recursive
"Top-down implementation of Neville's algorithm.
Returns the value of `P(x)`, where `P` is a polynomial fit (using Neville's
algorithm) to every point in the supplied sequence `points` (of form `[x (f
x)]`)
The efficiency and results should be identical to
`quadrature.interpolate/lagrange`. This function represents a step on
the journey toward more incremental methods of polynomial interpolation.
References:
- Press's Numerical Recipes (p103), chapter 3: http://phys.uri.edu/nigh/NumRec/bookfpdf/f3-1.pdf
- Wikipedia: https://en.wikipedia.org/wiki/Neville%27s_algorithm"
[points x]
(letfn [(evaluate [points]
(if (= 1 (count points))
(let [[[_ y]] points]
y)
(let [l-branch (pop points)
r-branch (subvec points 1)
[xl] (first points)
[xr] (peek points)]
(g// (g/+ (g/* (g/- x xr) (evaluate l-branch))
(g/* (g/- xl x) (evaluate r-branch)))
(g/- xl xr)))))]
(evaluate (vec points))))
Neville's algorithm generates each new polynomial from \(P_l\) and \(P_r\), using this recursion to incorporate the full set of points.
You can write these out these relationships in a "tableau":
p0
\
p01
/ \
p1 p012
\ / \
p12 p0123
/ \ / \
p2 p123 p01234
\ / \ /
p23 p1234
/ \ /
p3 p234
\ /
p34
/
p4
The next few functions will discuss "rows" and "columns" of the tableau. That refers to the rows and columns of this representation;
p0 p01 p012 p0123 p01234
p1 p12 p123 p1234 .
p2 p23 p234 . .
p3 p34 . . .
p4 . . . .
. . . . .
. . . . .
. . . . .
The first column here is the initial set of points. Each entry in each successive column is generated through some operation between the entry to its left, and the entry one left and one up.
Look again at Neville's algorithm:
\[P(x) = [(x - x_r) P_l(x) - (x - x_l) P_r(x)] / [x_l - x_r]\]
\(l\) refers to the entry in the same row, previous column, while \(r\) is one row higher, previous column.
If each cell in the above tableau tracked:
we could build up Neville's rule incrementally. Let's attempt to build a function of this signature:
(defn neville-incremental*
"Takes a potentially lazy sequence of `points` and a point `x` and generates a
lazy sequence of approximations of P(x).
entry N in the returned sequence is the estimate using a polynomial generated
from the first N points of the input sequence."
[points x]
,,,)
First, write a function to process each initial point into a vector that contains each of those required elements:
(defn- neville-prepare
"Processes each point of the form [x, (f x)] into:
$$[x_l, x_r, p]$$
where $p$ is the polynomial that spans all points from $l$ to $r$. The
recursion starts with $p = f(x)$.
"
[[x fx]]
[x x fx])
Next, a function that generates the next entry, given l and r:
(defn- neville-combine-fn
"Given some value $x$, returns a function that combines $l$ and $r$ entries in
the tableau, arranged like this:
l -- return
/
/
/
r
generates the `return` entry of the form
$$[x_l, x_r, p]$$."
[x]
(fn [[xl _ pl] [_ xr pr]]
(let [plr (g// (g/+ (g/* (g/- x xr) pl)
(g/* (g/- xl x) pr))
(g/- xl xr))]
[xl xr plr])))
We can use higher-order functions to turn this function into a NEW function that can transform an entire column:
(defn- neville-next-column
"This function takes some point $x$, and returns a new function that takes some
column in the tableau and generates the next column."
[x]
(fn [prev-column]
(map (neville-combine-fn x)
prev-column
(rest prev-column))))
neville-tableau
will generate the entire tableau:
(defn- neville-tableau [points x]
(->> (map neville-prepare points)
(iterate (neville-next-column x))
(take-while seq)))
Really, we're only interested in the first row:
\[p_0, p_{01}, p_{012}, p_{0123}, p_{01234}\]
So define a function to grab that:
(defn first-terms [tableau]
(map first tableau))
the final piece we need is a function that will extract the estimate from our row of \([x_l, x_r, p]\) vectors:
(defn- neville-present [row]
(map (fn [[_ _ p]] p) row))
Putting it all together:
(defn neville-incremental*
"Takes a potentially lazy sequence of `points` and a point `x` and generates a
lazy sequence of approximations of P(x).
entry N in the returned sequence is the estimate using a polynomial generated
from the first N points of the input sequence."
[points x]
(neville-present
(first-terms
(neville-tableau points x))))
How do we know this works? We can prove it by using generic arithmetic to compare the full symbolic lagrange polynomial to each entry in the successive approximation.
(defn- lagrange-incremental
"Generates a sequence of estimates of `x` to polynomials fitted to `points`;
each entry uses one more point, just like `neville-incremental*`."
[points x]
(let [n (count points)]
(map (fn [i]
(lagrange (take i points) x))
(range 1 (inc n)))))
Every point is the same:
(let [points [['x_1 'y_1] ['x_2 'y_2] ['x_3 'y_3] ['x_4 'y_4]]]
(map (fn [neville lagrange]
(g/simplify
(g/- neville lagrange)))
(neville-incremental* points 'x)
(lagrange-incremental points 'x)))
(0 0 0 0)
The above pattern, of processing tableau entries, is general enough that we can abstract it out into a higher order function that takes a prepare
and merge
function and generates a tableau. Any method generating a tableau can use a present
function to extract the first row, OR to process the tableau in any other way that they like.
This is necessarily more abstract! But we'll specialize it shortly, and rebuild neville-incremental
into its final form.
I'm keeping points
in the argument vector for now, vs returning a new function; if you want to do this yourself, curry the function with (partial tableau-fn prepare merge present)
.
(defn tableau-fn
"Returns a Newton-style approximation tableau, given:
- `prepare`: a fn that processes each element of the supplied `points` into
the state necessary to calculate future tableau entries.
- `merge`: a fn of `l`and `r` the tableau entries:
l -- return
/
/
/
r
the inputs are of the same form returned by `prepare`. `merge` should return a
new structure of the same form.
- `points`: the (potentially lazy) sequence of points used to generate the
first column of the tableau.
"
[prepare merge points]
(let [next-col (fn [previous-col]
(map merge
previous-col
(rest previous-col)))]
(->> (map prepare points)
(iterate next-col)
(take-while seq))))
Redefine neville-merge
to make it slightly more efficient, with baked-in native operations:
(defn- neville-merge
"Returns a tableau merge function. Identical to `neville-combine-fn` but uses
native operations instead of generic operations."
[x]
(fn [[xl _ pl] [_ xr pr]]
(let [p (/ (+ (* (- x xr) pl)
(* (- xl x) pr))
(- xl xr))]
[xl xr p])))
And now, neville
, identical to neville-incremental*
except using the generic tableau generator.
The form of the tableau also makes it easy to select a particular column instead of just the first row. Columns are powerful because they allow you to successively interpolate between pairs, triplets etc of points, instead of moving onto very high order polynomials.
I'm not sure it's the best interface, but we'll add that arity here.
(defn neville
"Takes:
- a (potentially lazy) sequence of `points` of the form `[x (f x)]` and
- a point `x` to interpolate
and generates a lazy sequence of approximations of P(x). Each entry in the
return sequence incorporates one more point from `points` into the P(x)
estimate.
Said another way: the Nth in the returned sequence is the estimate using a
polynomial generated from the first N points of the input sequence:
p0 p01 p012 p0123 p01234
This function generates each estimate using Neville's algorithm:
$$P(x) = [(x - x_r) P_l(x) - (x - x_l) P_r(x)] / [x_l - x_r]$$
## Column
If you supply an integer for the third `column` argument, `neville` will
return that /column/ of the interpolation tableau instead of the first row.
This will give you a sequence of nth-order polynomial approximations taken
between point `i` and the next `n` points.
As a reminder, this is the shape of the tableau:
p0 p01 p012 p0123 p01234
p1 p12 p123 p1234 .
p2 p23 p234 . .
p3 p34 . . .
p4 . . . .
So supplying a `column` of `1` gives a sequence of linear approximations
between pairs of points; `2` gives quadratic approximations between successive
triplets, etc.
References:
- Press's Numerical Recipes (p103), chapter 3: http://phys.uri.edu/nigh/NumRec/bookfpdf/f3-1.pdf
- Wikipedia: https://en.wikipedia.org/wiki/Neville%27s_algorithm
"
([points x]
(neville-present
(first-terms
(tableau-fn neville-prepare
(neville-merge x)
points))))
([points x column]
(-> (tableau-fn neville-prepare
(neville-merge x)
points)
(nth column)
(neville-present))))
Press's Numerical Recipes, chapter 3 (p103) ( http://phys.uri.edu/nigh/NumRec/bookfpdf/f3-1.pdf ) describes a modified version of Neville's algorithm that is slightly more efficient than the version above.
Allan Macleod, in "A comparison of algorithms for polynomial interpolation", discusses this variation under the name "Modified Neville".
By generating the delta from each previous estimate in the tableau, Modified Neville is able to swap one of the multiplications above for an addition.
To make this work, instead of tracking the previous \(p\) estimate, we track two quantities:
We can recover the estimates generated by the original Neville's algorithm by summing C values across the first tableau row.
Equation 3.1.5 in Numerical recipes gives us the equations we need:
\[ C_{abc} = [(x_a - x)(C_{bc} - D_{ab})] / [x_a - x_c] &\ = [(x_l - x)(C_r - D_l)] / [x_l - x_r] \]
\[ D_{abc} = [(x_c - x)(C_{bc} - D_{ab})] / [x_a - x_c] &\ = [(x_r - x)(C_r - D_l)] / [x_l - x_r] \]
These equations describe a merge
function for a tableau processing scheme, with state == [x_l, x_r, C, D]
.
Let's implement each method, and then combine them into final form. The following methods use the prefix mn
for "Modified Neville".
(defn- mn-prepare
"Processes an initial point [x (f x)] into the required state:
[x_l, x_r, C, D]
The recursion starts with $C = D = f(x)$."
[[x fx]]
[x x fx fx])
(defn- mn-merge
"Implements the recursion rules described above to generate x_l, x_r, C and D
for a tableau node, given the usual left and left-up tableau entries."
[x]
(fn [[xl _ _ dl] [_ xr cr _]]
(let [diff (- cr dl)
den (- xl xr)
factor (/ diff den)
c (* factor (- xl x))
d (* factor (- xr x))]
[xl xr c d])))
(defn mn-present
"Returns a (lazy) sequence of estimates by successively adding C values from the
first entry of each tableau column. Each C value is the delta from the
previous estimate."
[row]
(ua/scanning-sum
(map (fn [[_ _ c _]] c) row)))
tableau-fn
allows us to assemble these pieces into a final function that has an interface identical to neville
above. The implementation is more obfuscated but slightly more efficient.
(defn modified-neville
"Similar to `neville` (the interface is identical) but slightly more efficient.
Internally this builds up its estimates by tracking the delta from the
previous estimate.
This non-obvious change lets us swap an addition in for a multiplication,
making the algorithm slightly more efficient.
See the `neville` docstring for usage information, and info about the required
structure of the arguments.
The structure of the `modified-neville` algorithm makes it difficult to select
a particular column. See `neville` if you'd like to generate polynomial
approximations between successive sequences of points.
References:
- \"A comparison of algorithms for polynomial interpolation\", A. Macleod,
https://www.sciencedirect.com/science/article/pii/0771050X82900511
- Press's Numerical Recipes (p103), chapter 3: http://phys.uri.edu/nigh/NumRec/bookfpdf/f3-1.pdf
"
[points x]
(mn-present
(first-terms
(tableau-fn mn-prepare
(mn-merge x)
points))))
The advantage of the method described above, where we generate an entire tableau and lazily pull the first entry off of each column, is that we can pass a lazy sequence in as points
and get a lazy sequence of successive estimates back. If we don't pull from the result sequence, no computation will occur.
One problem with that structure is that we have to have our sequence of points available when we call a function like neville
. What if we want to pause, save the current estimate and pick up later where we left off?
Look at the tableau again:
p0 p01 p012 p0123 p01234
p1 p12 p123 p1234 .
p2 p23 p234 . .
p3 p34 . . .
p4 . . . .
. . . . .
. . . . .
. . . . .
If you stare at this for a while, you might notice that it should be possible to use the merge
and present
functions we already have to build the tableau one row at a time, given ONLY the previous row:
(f [p1 p12 p123 p1234] [x0 fx0]) ;; => [p0 p01 p012 p0123 p01234]
Here's something close, using our previous merge
and prepare
definitions:
(defn- generate-new-row* [prepare merge]
(fn [prev-row point]
;; the new point, once it's prepared, is the first entry in the new row.
;; From there, we can treat the previous row as a sequence of "r" values.
(reduce merge (prepare point) prev-row)))
there's a problem here. reduce
only returns the FINAL value of the aggregation:
(let [f (generate-new-row* prepare present)]
(f [p1 p12 p123 p1234] [x0 fx0]))
;; => p01234
We want the entire new row! Lucky for us, Clojure has a version of reduce
, called reductions
, that returns each intermediate aggregation result:
(defn- generate-new-row [prepare merge]
(fn [prev-row point]
(reductions merge (prepare point) prev-row)))
(let [f (generate-new-row prepare present)]
(f [p1 p12 p123 p1234] [x0 fx0]))
;; => [p0 p01 p012 p0123 p01234]
Quick aside here, as we've stumbled across a familiar pattern. The discussion above suggests the idea of a "fold" from functional programming: https://en.wikipedia.org/wiki/Fold_{(higher-orderfunction)}
A fold consists of:
init
, an initial piece of state called an "accumulator"merge
function that combines ("folds") a new element x
into the accumulator and returns a value of the same shape / type as init
.present
function that transforms the accumulator into a final value.In Clojure, you perform a fold on a sequence with the reduce
function:
(reduce merge init xs)
For example:
(reduce + 0.0 (range 10))
45.0
Our generate-new-row
function from above is exactly the merge
function of a fold. The accumulator is the latest tableau row:
init
= [], the initial empty row. ~present~ =
the same present function as before (neville-present
or mn-present
)
Now that we've identified this new pattern, redefine generate-new-row
with a new name:
(defn tableau-fold-fn
"Transforms the supplied `prepare` and `merge` functions into a new function
that can merge a new point into a tableau row (generating the next tableau
row).
More detail on the arguments:
- `prepare`: a fn that processes each element of the supplied `points` into
the state necessary to calculate future tableau entries.
- `merge`: a fn of `l`and `r` the tableau entries:
l -- return
/
/
/
r
the inputs are of the same form returned by `prepare`. `merge` should return a
new structure of the same form."
[prepare merge]
(fn [prev-row point]
(reductions merge (prepare point) prev-row)))
Next, we can use this to generate specialized fold functions for our two incremental algorithms above - neville
and modified-neville
:
(defn- neville-fold-fn
"Returns a function that accepts:
- `previous-row`: previous row of an interpolation tableau
- a new point of the form `[x (f x)]`
and returns the next row of the tableau using the algorithm described in
`neville`."
[x]
(tableau-fold-fn neville-prepare
(neville-merge x)))
(defn- modified-neville-fold-fn
"Returns a function that accepts:
- `previous-row`: previous row of an interpolation tableau
- a new point of the form `[x (f x)]`
and returns the next row of the tableau using the algorithm described in
`modified-neville`."
[x]
(tableau-fold-fn mn-prepare
(mn-merge x)))
This final function brings back in the notion of present
. It returns a function that consumes an entire sequence of points, and then passes the final row into the exact present-fn
we used above:
(defn tableau-fold
"Returns a function that accepts a sequence of points and processes them into a
tableau by generating successive rows, one at a time.
The final row is passed into `present-fn`, which generates the final return
value.
This is NOT appropriate for lazy sequences! Fully consumes the input."
[fold-fn present-fn]
(fn [points]
(present-fn
(reduce fold-fn [] points))))
Note that these folds process points in the OPPOSITE order as the column-wise tableau functions! Because you build up one row at a time, each new point is PRE-pended to the interpolations in the previous row.
The advantage is that you can save the current row, and then come back and absorb further points later.
The disadvantage is that if you present
p123, you'll see successive estimates for [p1, p12, p123]...
but if you then prepend 0
, you'll see estimates for [p0, p01, p012, p0123]
. These don't share any elements, so they'll be totally different.
If you reverse the incoming point sequence, the final row of the fold will in fact equal the row of the column-based method.
If you want a true incremental version of the above code, reverse points! We don't do this automatically in case points is an infinite sequence.
tableau-scan
below will return a function that acts identically to the non-fold, column-wise version of the interpolators. It does this by folding in one point at a time, but processing EVERY intermediate value through the presentation function.
(defn tableau-scan
"Takes a folding function and a final presentation function (of accumulator type
=> return value) and returns a NEW function that:
- accepts a sequence of incoming points
- returns the result of calling `present` on each successive row."
[fold-fn present-fn]
(fn [xs]
(->> (reductions fold-fn [] xs)
(map present-fn)
(rest))))
And finally, we specialize to our two incremental methods.
(defn neville-fold
"Returns a function that consumes an entire sequence `xs` of points, and returns
a sequence of successive approximations of `x` using polynomials fitted to the
points in reverse order.
This function uses the `neville` algorithm internally."
[x]
(tableau-fold (neville-fold-fn x)
neville-present))
(defn neville-scan
"Returns a function that consumes an entire sequence `xs` of points, and returns
a sequence of SEQUENCES of successive polynomial approximations of `x`; one
for each of the supplied points.
For a sequence a, b, c... you'll see:
[(neville [a] x)
(neville [b a] x)
(neville [c b a] x)
...]"
[x]
(tableau-scan (neville-fold-fn x)
neville-present))
(defn modified-neville-fold
"Returns a function that consumes an entire sequence `xs` of points, and returns
a sequence of successive approximations of `x` using polynomials fitted to the
points in reverse order.
This function uses the `modified-neville` algorithm internally."
[x]
(tableau-fold (modified-neville-fold-fn x)
mn-present))
(defn modified-neville-scan
"Returns a function that consumes an entire sequence `xs` of points, and returns
a sequence of SEQUENCES of successive polynomial approximations of `x`; one
for each of the supplied points.
For a sequence a, b, c... you'll see:
[(modified-neville [a] x)
(modified-neville [b a] x)
(modified-neville [c b a] x)
...]"
[x]
(tableau-scan (modified-neville-fold-fn x)
mn-present))
Next, check out:
rational.cljc
to learn how to interpolate rational functionsrichardson.cljc
for a specialized implementation of polynomial interpolation, when you know something about the ratios between successive x
elements in the point sequence.(ns quadrature.interpolate.rational
"This namespace contains a discussion of rational function interpolation, and
different methods for fitting rational functions N points and evaluating them
at some value `x`."
(:require [quadrature.interpolate.polynomial :as ip]
[sicmutils.generic :as g]
[quadrature.util.aggregate :as ua]
[quadrature.util.stream :as us]
[taoensso.timbre :as log]))
This namespace contains implementations of rational function interpolation methods. The ALGLib user guide has a nice page on rational function interpolation, which suggests that the Bulirsch-Stoer method, included here, is NOT great, and that there are better methods. We'd love implementations of the others if you agree!
The main method in this package is an incremental version of the Bulirsch-Stoer algorithm.
Just like with polynomial interpolation, let's start with a straightforward implementation of the non-incremental recursive algorithm.
(defn bulirsch-stoer-recursive
"Returns the value of `P(x)`, where `P` is rational function fit (using the
Bulirsch-Stoer algorithm, of similar style to Neville's algorithm described in
`polynomial.cljc`) to every point in the supplied sequence `points`.
`points`: is a sequence of pairs of the form `[x (f x)]`
\"The Bulirsch-Stoer algorithm produces the so-called diagonal rational
function, with the degrees of numerator and denominator equal (if m is even)
or with the degree of the denominator larger by one if m is odd.\" ~ Press,
Numerical Recipes, p105
The implementation follows Equation 3.2.3 on on page 105 of Press:
http://phys.uri.edu/nigh/NumRec/bookfpdf/f3-2.pdf.
References:
- Stoer & Bulirsch, 'Introduction to Numerical Analysis': https://www.amazon.com/Introduction-Numerical-Analysis-Applied-Mathematics/dp/144193006X
- PDF of the same: http://www.math.uni.wroc.pl/~olech/metnum2/Podreczniki/(eBook)%20Introduction%20to%20Numerical%20Analysis%20-%20J.Stoer,R.Bulirsch.pdf
- Press's Numerical Recipes (p105), Section 3.2 http://phys.uri.edu/nigh/NumRec/bookfpdf/f3-2.pdf"
[points x]
(letfn [(evaluate [points x]
(cond (empty? points) 0
(= 1 (count points))
(let [[[_ y]] points]
y)
:else
(let [l-branch (pop points)
r-branch (subvec points 1)
center (pop r-branch)
[xl] (first points)
[xr] (peek points)
rl (evaluate l-branch x)
rr (evaluate r-branch x)
rc (evaluate center x)
p (g/- rr rl)
q (-> (/ (g/- x xl)
(g/- x xr))
(g/* (g/- 1 (g// p (g/- rr rc))))
(g/- 1))]
(g/+ rr (g// p q)))))]
(let [point-array (vec points)]
(evaluate point-array x))))
We can be a bit more clever, if we reuse the idea of the "tableau" described in the polynomial namespace.
(defn bulirsch-stoer
"Takes
- a (potentially lazy) sequence of `points` of the form `[x (f x)]` and
- a point `x` to interpolate
and generates a lazy sequence of approximations of `P(x)`. Each entry in the
return sequence incorporates one more point from `points` into the P(x)
estimate.
`P(x)` is rational function fit (using the Bulirsch-Stoer algorithm, of
similar style to Neville's algorithm described in `polynomial.cljc`) to every
point in the supplied sequence `points`.
\"The Bulirsch-Stoer algorithm produces the so-called diagonal rational
function, with the degrees of numerator and denominator equal (if m is even)
or with the degree of the denominator larger by one if m is odd.\" ` Press,
Numerical Recipes, p105
The implementation follows Equation 3.2.3 on on page 105 of Press:
http://phys.uri.edu/nigh/NumRec/bookfpdf/f3-2.pdf.
## Column
If you supply an integer for the third (optional) `column` argument,
`bulirsch-stoer` will return that /column/ offset the interpolation tableau
instead of the first row. This will give you a sequence of nth-order
polynomial approximations taken between point `i` and the next `n` points.
As a reminder, this is the shape of the tableau:
p0 p01 p012 p0123 p01234
p1 p12 p123 p1234 .
p2 p23 p234 . .
p3 p34 . . .
p4 . . . .
So supplying a `column` of `1` gives a sequence of 2-point approximations
between pairs of points; `2` gives 3-point approximations between successive
triplets, etc.
References:
- Stoer & Bulirsch, 'Introduction to Numerical Analysis': https://www.amazon.com/Introduction-Numerical-Analysis-Applied-Mathematics/dp/144193006X
- PDF of the same: http://www.math.uni.wroc.pl/~olech/metnum2/Podreczniki/(eBook)%20Introduction%20to%20Numerical%20Analysis%20-%20J.Stoer,R.Bulirsch.pdf
- Press's Numerical Recipes (p105), Section 3.2 http://phys.uri.edu/nigh/NumRec/bookfpdf/f3-2.pdf"
[points x & [column]]
(let [prepare (fn [[x fx]] [x x 0 fx])
merge (fn [[xl _ _ rl] [_ xr rc rr]]
(let [p (- rr rl)
q (-> (/ (- x xl)
(- x xr))
(* (- 1 (/ p (- rr rc))))
(- 1))]
[xl xr rl (+ rr (/ p q))]))
present (fn [row] (map (fn [[_ _ _ r]] r) row))
tableau (ip/tableau-fn prepare merge points)]
(present
(if column
(nth tableau column)
(ip/first-terms tableau)))))
Press, in Numerical Recipes section 3.2, describes a modification to the Bulirsch-Stoer that lets you track the differences from the left and left-up entries in the tableau, just like the modified Neville method in polynomial.cljc
. the algorithm is implemented below.
(defn bs-prepare
"Processes an initial point [x (f x)] into the required state:
[x_l, x_r, C, D]
The recursion starts with $C = D = f(x)$."
[[x fx]] [x x fx fx])
(defn bs-merge
"Implements the recursion rules described in Press's Numerical Recipes, section
3.2 http://phys.uri.edu/nigh/NumRec/bookfpdf/f3-2.pdf to generate x_l, x_r, C
and D for a tableau node, given the usual left and left-up tableau entries.
This merge function ALSO includes a 'zero denominator fix used by Bulirsch and
Stoer and Henon', in the words of Sussman from `rational.scm` in the scmutils
package.
If the denominator is 0, we pass along C from the up-left node and d from the
previous entry in the row. Otherwise, we use the algorithm to calculate.
TODO understand why this works, or where it helps!"
[x]
(fn [[xl _ _ dl] [_ xr cr _]]
(let [c-d (- cr dl)
d*ratio (-> (/ (- x xl)
(- x xr))
(* dl))
den (- d*ratio cr)]
(if (zero? den)
(do (log/info "zero denominator!")
[xl xr cr dl])
(let [cnum (* d*ratio c-d)
dnum (* cr c-d)]
[xl xr (/ cnum den) (/ dnum den)])))))
(defn modified-bulirsch-stoer
"Similar to `bulirsch-stoer` (the interface is identical) but slightly more efficient.
Internally this builds up its estimates by tracking the delta from the
previous estimate.
This non-obvious change lets us swap an addition in for a division,
making the algorithm slightly more efficient.
See the `bulirsch-stoer` docstring for usage information, and info about the
required structure of the arguments.
References:
- Press's Numerical Recipes (p105), Section 3.2 http://phys.uri.edu/nigh/NumRec/bookfpdf/f3-2.pdf"
[points x]
(ip/mn-present
(ip/first-terms
(ip/tableau-fn bs-prepare
(bs-merge x)
points))))
Just like in polynomial.cljc
, we can write rational interpolation in the style of a functional fold:
(defn modified-bulirsch-stoer-fold-fn
"Returns a function that accepts:
- `previous-row`: previous row of an interpolation tableau
- a new point of the form `[x (f x)]`
and returns the next row of the tableau using the algorithm described in
`modified-bulirsch-stoer`."
[x]
(ip/tableau-fold-fn
bs-prepare
(bs-merge x)))
(defn modified-bulirsch-stoer-fold
"Returns a function that consumes an entire sequence `xs` of points, and returns
a sequence of successive approximations of `x` using rational functions fitted
to the points in reverse order."
[x]
(ip/tableau-fold
(modified-bulirsch-stoer-fold-fn x)
ip/mn-present))
(defn modified-bulirsch-stoer-scan
"Returns a function that consumes an entire sequence `xs` of points, and returns
a sequence of SEQUENCES of successive rational function approximations of `x`;
one for each of the supplied points.
For a sequence a, b, c... you'll see:
[(modified-bulirsch-stoer [a] x)
(modified-bulirsch-stoer [b a] x)
(modified-bulirsch-stoer [c b a] x)
...]"
[x]
(ip/tableau-scan
(modified-bulirsch-stoer-fold-fn x)
ip/mn-present))
derivatives using three kinds of central difference formulas… accelerated using Richardson extrapolation, with a nice technique for guarding against underflow.
(ns quadrature.derivative
"Different numerical derivative implementations."
(:require [sicmutils.calculus.derivative :as d]
[quadrature.interpolate.richardson :as r]
[sicmutils.function :as f]
[sicmutils.generic :as g]
[sicmutils.infix :as if]
[sicmutils.util :as u]
[quadrature.util.stream :as us]
[sicmutils.value :as v]))
This module builds up to an implementation of numerical derivatives. The final function, D-numeric
, uses Richardson extrapolation to speed up convergence of successively tighter estimates of \(f^{\prime}(x)\) using a few different methods.
The inspiration for this style was Sussman's "Abstraction in Numerical Methods", starting on page 10: https://dspace.mit.edu/bitstream/handle/1721.1/6060/AIM-997.pdf?sequence=2
We'll proceed by deriving the methods symbolically, and then implement them numerically.
First, a function that will print nicely rendered infix versions of (simplified) symbolic expressions:
(defn show [e]
(let [eq (if/->TeX (g/simplify e))]
(println
(str "\\begin{equation}\n"
eq
"\n\\end{equation}"))))
And a function to play with:
(def ^:private func
(f/literal-function 'f))
The key to all of these methods involves the taylor series expansion of an arbitrary function \(f\) around a point \(x\); we know the taylor series will include a term for \(f^{\prime}(x)\), so the goal is to see if we can isolate it.
Here's the taylor series expansions of \(f(x + h)\):
(def ^:private fx+h
(->> (d/taylor-series func 'x 'h)
(take 5)
(reduce g/+)))
Use show
to print out its infix representation:
(show fx+h)
\begin{equation}
\frac{1}{24},{h}^{4},{D}^{4}f\left(x\right) + \frac{1}{6},{h}^{3},{D}^{3}f\left(x\right) + \frac{1}{2},{h}^{2},{D}^{2}f\left(x\right) + h,Df\left(x\right) + f\left(x\right)
\end{equation}
We can solve this for \(Df(x)\) by subtracting \(f(x)\) and dividing out \(h\):
(show (g// (g/- fx+h (func 'x)) 'h))
\begin{equation}
\frac{1}{24},{h}^{3},{D}^{4}f\left(x\right) + \frac{1}{6},{h}^{2},{D}^{3}f\left(x\right) + \frac{1}{2},h,{D}^{2}f\left(x\right) + Df\left(x\right)
\end{equation}
Voila! The remaining terms include \(D f(x)\) along with a series of progressively-smaller "error terms" (since \(h \to 0\)). The first of these terms is \({1 \over 2} h D^2 f(x)\). It will come to dominate the error as \(h \to 0\), so we say that the approximation we've just derived has error of \(O(h)\).
This particular formula, in the limit as \(h \to 0\), is called the "forward difference approximation" to \(Df(x)\). Here's the Clojure implementation:
(defn forward-difference
"Returns a single-variable function of a step size `h` that calculates the
forward-difference estimate of the the first derivative of `f` at point `x`:
f'(x) = [f(x + h) - f(x)] / h
Optionally accepts a third argument `fx == (f x)`, in case you've already
calculated it elsewhere and would like to save a function evaluation."
([f x] (forward-difference f x (f x)))
([f x fx]
(fn [h]
(/ (- (f (+ x h)) fx) h))))
We could also expand \(f(x - h)\):
(def ^:private fx-h
(->> (d/taylor-series func 'x (g/negate 'h))
(take 5)
(reduce g/+)))
(show fx-h)
\begin{equation}
\frac{1}{24},{h}^{4},{D}^{4}f\left(x\right) + \frac{-1}{6},{h}^{3},{D}^{3}f\left(x\right) + \frac{1}{2},{h}^{2},{D}^{2}f\left(x\right) - h,Df\left(x\right) + f\left(x\right)
\end{equation}
and solve for \(Df(x)\):
(show (g// (g/- (func 'x) fx-h) 'h))
\begin{equation}
\frac{-1}{24},{h}^{3},{D}^{4}f\left(x\right) + \frac{1}{6},{h}^{2},{D}^{3}f\left(x\right) + \frac{-1}{2},h,{D}^{2}f\left(x\right) + Df\left(x\right)
\end{equation}
To get a similar method, called the "backward difference" formula. Here's the implementation:
(defn backward-difference
"Returns a single-variable function of a step size `h` that calculates the
backward-difference estimate of the first derivative of `f` at point `x`:
f'(x) = [f(x) - f(x - h)] / h
Optionally accepts a third argument `fx == (f x)`, in case you've already
calculated it elsewhere and would like to save a function evaluation."
([f x] (backward-difference f x (f x)))
([f x fx]
(fn [h]
(/ (- fx (f (- x h))) h))))
Notice that the two expansions, of \(f(x + h)\) and \(f(x - h)\), share every term paired with an even power of \(h\). The terms associated with odd powers of \(h\) alternate in sign (because of the \(-h\) in the expansion of \(f(x - h)\)).
We can find yet another method for approximating \(Df(x)\) if we subtract these two series. We're trying to solve for \(Df(x)\), and \(Df(x)\) appears paired with \(h\), an odd-powered term… so subtracting \(f(x-h)\) should double that term, not erase it. Let's see:
(show (g/- fx+h fx-h))
\begin{equation}
\frac{1}{3},{h}^{3},{D}^{3}f\left(x\right) + 2,h,Df\left(x\right)
\end{equation}
Amazing! Now solve for \(Df(x)\):
(show (g// (g/- fx+h fx-h)
(g/* 2 'h)))
\begin{equation}
\frac{1}{6},{h}^{2},{D}^{3}f\left(x\right) + Df\left(x\right)
\end{equation}
We're left with \(Df(x) + O(h^2)\), a quadratic error term in \(h\). (Of course if we'd expanded to more than initial terms in the taylor series we'd see a long error series with only even powers.)
This formula is called the "central difference" approximation to the first derivative. Here's the implementation:
(defn central-difference
"Returns a single-variable function of a step size `h` that calculates the
central-difference estimate of the first derivative of `f` at point `x`:
f'(x) = [f(x + h) - f(x - h)] / 2h"
[f x]
(fn [h]
(/ (- (f (+ x h)) (f (- x h)))
(* 2 h))))
There's one more approximation we can extract from these two expansions. We noted earlier that the terms associated with odd powers of \(h\) alternate in sign. If we add the two series, these odd terms should all cancel out. Let's see:
(show (g/+ fx-h fx+h))
\begin{equation}
\frac{1}{12},{h}^{4},{D}^{4}f\left(x\right) + {h}^{2},{D}^{2}f\left(x\right) + 2,f\left(x\right)
\end{equation}
Interesting. The \(Df(x)\) term is gone. Remember that we have \(f(x)\) available; the first unknown term in the series is now \(D^2 f(x)\). Solve for that term:
(show (g// (g/- (g/+ fx-h fx+h) (g/* 2 (func 'x)))
(g/square 'h)))
\begin{equation}
\frac{1}{12},{h}^{2},{D}^{4}f\left(x\right) + {D}^{2}f\left(x\right)
\end{equation}
This is the "central difference" approximation to the second derivative of \(f\). Note that the error term here is quadratic in \(h\). Here it is in code:
(defn central-difference-d2
"Returns a single-variable function of a step size `h` that calculates the
central-difference estimate of the second derivative of `f` at point `x`:
f''(x) = [f(x + h) - 2f(x) + f(x - h)] / h^2
Optionally accepts a third argument `fx == (f x)`, in case you've already
calculated it elsewhere and would like to save a function evaluation."
([f x] (central-difference-d2 f x (f x)))
([f x fx]
(let [fx*2 (* 2 fx)]
(fn [h]
(/ (- (+ (f (+ x h))
(f (- x h)))
fx*2)
(* h h))))))
Let's attempt to use these estimates and see how accurate they are. (This section follows Sussman starting on page 10.)
The following function returns a new function that approximates \(Df(x)\) using the central difference method, with a fixed value of \(h = 0.00001\):
(defn- make-derivative-fn
[f]
(fn [x]
(let [h 1e-5]
((central-difference f x) h))))
The error here is not great, even for a simple function:
((make-derivative-fn g/square) 3)
6.000000000039306
Let's experiment instead with letting \(h \to 0\). This next function takes a function \(f\), a value of \(x\) and an initial \(h\), and generates a stream of central difference approximations to \(Df(x)\) using successively halved values of \(h\), ie, \((h, h/2, h/4, h/8, ....)\)
(defn- central-diff-stream [f x h]
(map (central-difference f x)
(us/zeno 2 h)))
Let's print 20 of the first 60 terms (taking every 3 so we see any pattern):
(->> (central-diff-stream g/sqrt 1 0.1)
(take-nth 3)
(us/pprint 20))
0.5006277505981893
0.5000097662926306
0.5000001525880649
0.5000000023844109
0.5000000000381988
0.5000000000109139
0.4999999998835847
0.5000000004656613
0.5000000074505806
0.49999989569187164
0.5000001192092896
0.4999971389770508
0.500030517578125
0.49957275390625
0.50048828125
0.48828125
0.625
0.0
0.0
0.0
At first, the series converges toward the proper value. But as \(h\) gets smaller, \(f(x + h)\) and \(f(x - h)\) get so close together that their difference is less than the minimum epsilon allowed by the system's floating point representation.
As Sussman states: "Hence we are in a race between truncation error, which starts out large and gets smaller, and roundoff error, which starts small and gets larger." ~Sussman, p12
We can actually analyze and quantify how many halvings we can apply to \(h\) before roundoff error swamps our calculation.
Why does roundoff error occur? From Sussman: "Any real number \(x\), represented in the machine, is rounded to a value \(x(1 + e)\), where \(e\) is effectively a random variable whose absolute value is on the order of the machine epsilon \(\epsilon\): that smallest positive number for which 1.0 and \(1.0 + \epsilon\) can be distinguished."
In the current library, v/machine-epsilon
holds this value.
Our goal, then, is to see if we can figure out when the error due to roundoff grows so large that it exceeds the tolerance we want to apply to our calculation.
For the central difference formula:
\[f^{\prime}(x) = {f(x + h) - f(x - h)} \over {2h}\]
without any roundoff error, the numerator should be equal to \(2h f'(x)\). In reality, for small values of \(h\), \(f(x + h)\) and \(f(x - h)\) both have machine representations in error by about \(f(x) \epsilon\). Their difference doesn't change the order, so we can say that their difference also has error of \(f(x) \epsilon\).
Dividing these two together, the relative error is:
\[\epsilon\left|\frac{f(x)}{2 h f^{\prime}(x)}\right|\]
The relative error doubles each time \(h\) is halved. This is technically just the relative error of the numerator of the central difference method, but we know the denominator \(2h\) to full precision, so we can ignore it here.
If we actually calculate this ratio, we'll find the INITIAL relative error due to roundoff for a given h. Also, because we want to make sure that we're working in integer multiples of machine epsilon, let's actually take the next-highest-integer of the ratio above. The following method takes the ratio above as an argument, and returns:
\[1 + floor(\lvert ratio \rvert)\]
(defn- roundoff-units
"Returns the number of 'roundoff units', ie, multiples of the machine epsilon,
that roundoff error contributes to the total relative error, given a relative
error percentage estimated for some initial step size $h$."
[rel-error-ratio]
(inc
(Math/floor
(Math/abs
(double rel-error-ratio)))))
That calculation, as the documentation states, returns the number of "roundoff units". Let's call it \(r\).
Each iteration doubles the relative error contributed by roundoff. Given some tolerance, how many roundoff error doublings (or, equivalently, halvings of \(h\)) can we tolerate before roundoff error swamps our calculation?
Here's the solution:
(defn- max-iterations
"Solution for `n`, in:
`initial-error` * 2^n <= `tolerance`"
[units tolerance]
(let [initial-error (* v/machine-epsilon units)]
(Math/floor
(/ (Math/log (/ tolerance initial-error))
(Math/log 2)))))
Let's combine these two ideas into a final function, terms-before-roundoff
, that calculates how items we can pull from a sequence like central-diff-stream
above before roundoff swamps us. (This is 1 + max iterations, because we need to include the first point.)
(defn- terms-before-roundoff
"Generates a default max number of terms, based on roundoff error estimates."
[ratio tolerance]
(inc
(max-iterations (roundoff-units ratio)
tolerance)))
How many terms are we allowed to examine for an estimate of the derivative of \(f(x) = \sqrt(x)\), with an initial \(h = 0.1\)?
(let [f g/sqrt
x 1
h 0.1
tolerance 1e-13
ratio (/ (f x)
(- (f (+ x h))
(f (- x h))))]
(terms-before-roundoff ratio tolerance))
6.0
6 terms, or 5 halvings, down to \(h = {0.1} \over {2^5} = 0.003125\). How many terms does the sequence take to converge?
(-> (central-diff-stream g/sqrt 1 0.1)
(us/seq-limit {:tolerance 1e-13}))
{:converged? true, :terms-checked 15, :result 0.5000000000109139}
15 is far beyond the level where roundoff error has rendered our results untrustworthy.
We need a way to converge more quickly. richardson.cljc
lays out a general method of "sequence acceleration" that we can use here, since we know the arithmetic progression of the terms in the error series for each of our methods above.
For the central difference method, our highest-order error term has an exponent of \(p = 2\), and successive terms are all even. r/richardson-sequence
takes p
and q
for an arithmetic sequence of error exponents $p, p + q, p + 2q…$
It also needs the initial size \(h\) of our sequence progression.
Given that information, we can transform our existing sequence of estimates into an accelerated sequence of estimates using Richardson extrapolation. Does it converge in fewer terms?
(let [h 0.1, p 2, q 2]
(-> (central-diff-stream g/sqrt 1 h)
(r/richardson-sequence h p q)
(us/seq-limit {:tolerance 1e-13})))
{:converged? true, :terms-checked 6, :result 0.5006325594766895}
Happily, it does, in only 5 terms instead of 15! This brings convergence in under our limit of 6 total terms.
If you're interested in more details of Richardson extrapolation, please see richardson.cljc
! For now we'll proceed.
We're ready to write our final numeric differentiation routine, D-numeric
. First, some supporting structure. We support four methods, so let's describe them using keywords in a set:
(def valid-methods
#{:central :central-d2 :forward :backward})
To apply one of the methods, we need to be able to:
Once again, richardson.cljc
for a discussion of \(p\) and \(q\).
This configs
function bundles all of this together. I don't know that this is the best abstraction, but I don't know yet of any other methods for numeric differentiation, so it'll do for now.
Note here that \(p = q = 2\) for both central difference methods, just like we determined above. the forward and backward difference methods both have all of the remaining terms from the taylor expansion in their error series, so they only get \(p = q = 1\).
(defn- configs [method f x fx]
(case method
:forward
{:p 1
:q 1
:function (forward-difference f x fx)
:ratio-fn (fn [h] (/ fx (- (f (+ x h)) fx)))}
:central
{:p 2
:q 2
:function (central-difference f x)
:ratio-fn (fn [h]
(/ fx (- (f (+ x h))
(f (- x h)))))}
:backward
{:p 1
:q 1
:function (backward-difference f x fx)
:ratio-fn (fn [h]
(/ fx (- fx (f (- x h)))))}
:central-d2
{:p 2
:q 2
:function (central-difference-d2 f x fx)
:ratio-fn (fn [h]
(- (+ (f (+ x h))
(f (- x h)))
(* 2 fx)))}
(u/illegal
(str "Invalid method: " method ". Please try one of " valid-methods))))
(defn- fill-defaults
"Fills in default values required by `D-numeric`. Any option not used by
`D-numeric` gets passed on to `us/seq-limit`."
[m]
(let [defaults {:tolerance (Math/sqrt v/machine-epsilon)
:method :central}
{:keys [method] :as opts} (merge defaults m)]
(assert (contains? valid-methods method)
(str method " is not a valid method. Please try one of: " valid-methods))
opts))
(defn D-numeric
"Takes a function `f: R => R` (function of a single real variable), and returns
a new function of `x` that approximates the derivative $Df(x)$ (or $D^2f(x)$
if you pass `:method :central-d2`).
Returns the estimated value of the derivative at `x`. If you pass `:info?
true`, the fn returns a dictionary of the results of `us/seq-limit`:
{:converged? <boolean>
:terms-checked <int>
:result <derivative estimate>}
Make sure to visit `sicmutils.calculus.derivative/D` if you want symbolic or
automatic differentiation.
## Roundoff Estimate
The returned function will attempt to estimate how many times it can halve the
step size used to estimate the derivative before roundoff error swamps the
calculation, and force the function to return (with `:converged? false`, if
you pass `:info?`)
## Optional Arguments
`D-numeric` takes optional args as its second param. Any of these can be
overridden by passing a second argument to the function returned by
`D-numeric`; helpful for setting defaults and then overriding them later.
The returned function passes through these and any other options to
`us/seq-limit`, where they control the sequence of richardson
extrapolation-accelerated estimates.
Options:
- `:method`: one of `:central`, `:central-d2`, `:forward` or `:backward`.
`:central-d2` forces a second derivative estimate; the other methods configure
a first derivative estimator.
- `:info?` if false (default), returns the estimated value of `x`. If true,
returns a dictionary with more information (see `D-numeric`'s docstring for
more info.)
- `:initial-h`: the initial `h` to use for derivative estimates before $h \to
0$. Defaults to 0.1 * abs(x).
- `:tolerance`: see `us/stream-limit` for a discussion of how this value
handles relative vs absolute tolerance. $\\sqrt(\\epsilon)$ by default, where
$\\epsilon$ = machine tolerance.
- `:maxterms`: the maximum number of terms to consider when hunting for a
derivative estimate. This defaults to an estimate generated internally,
designed to prevent roundoff error from swamping the result. If you want to
disable this feature, set `:maxterms` to something moderately large, like
`:maxterms 100`. But do so carefully! See the surrounding namespace for a
larger discussion."
([f] (D-numeric f {}))
([f opts]
(let [opts (fill-defaults opts)]
(fn df
([x] (df x {}))
([x overrides]
(let [{:keys [maxterms tolerance initial-h method info?] :as opts} (merge opts overrides)
{:keys [ratio-fn function p q]} (configs method f x (f x))
h (or initial-h (* 0.1 (g/abs x)))
n (or maxterms (terms-before-roundoff
(ratio-fn h)
tolerance))
estimates (map function (us/zeno 2 h))
result (-> (r/richardson-sequence estimates 2 p q)
(us/seq-limit (assoc opts :maxterms n)))]
(if info? result (:result result))))))))
More resources about numerical differentiation:
#+end_{src}
fit a parabola to every slice. OR, "accelerate" the trapezoid method with one step of Richardson extrapolation!
(ns quadrature.simpson
(:require [quadrature.common :as qc
#?@(:cljs [:include-macros true])]
[quadrature.trapezoid :as qt]
[quadrature.interpolate.richardson :as ir]))
This numerical integration method is a closed Newton-Cotes formula; for each integral slice, Simpson's rule samples each endpoint and the midpoint and combines them into an area estimate for this slice using the following formula:
\[{{h} \over 3} (f_0 + 4f_1 + f_2)\]
Given a window of \([a, b]\) and a "step size" of \(h = {{b - a} \over 2}\). The point \(f_i\) is the point \(i\) steps into the window.
There are a few simpler ways to understand this:
trapezoid.cljc
), subject to a single refinement of "Richardson extrapolation".\[S = {{2M + T} \over 3}\]
The test namespace contains a symbolic proof that the Richardson-extrapolated Trapezoid method is equivalent to using the formula above to calculate Simpson's rule directly.
(defn simpson-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the closed interval $[a, b]$ using Simpson's rule.
Simpson's rule is equivalent to the trapezoid method subject to one refinement
of Richardson extrapolation. The trapezoid method fits a line to each
integration slice. Simpson's rule fits a quadratic to each slice.
Returns estimates with $n, 2n, 4n, ...$ slices, geometrically increasing by a
factor of 2 with each estimate.
## Optional arguments:
If supplied, `:n` (default 1) specifies the initial number of slices to use."
([f a b] (simpson-sequence f a b {:n 1}))
([f a b {:keys [n] :or {n 1}}]
{:pre [(number? n)]}
(-> (qt/trapezoid-sequence f a b n)
(ir/richardson-column 1 2 2 2))))
(qc/defintegrator integral
"Returns an estimate of the integral of `f` over the closed interval $[a, b]$
using Simpson's rule with $1, 2, 4 ... 2^n$ windows for each estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `simpson-sequence` for more information about Simpson's rule, caveats that
might apply when using this integration method and information on the optional
args in `opts` that customize this function's behavior."
:area-fn (comp first simpson-sequence)
:seq-fn simpson-sequence)
(ns quadrature.simpson38
(:require [quadrature.common :as qc
#?@(:cljs [:include-macros true])]
[quadrature.trapezoid :as qt]
[quadrature.interpolate.richardson :as ir]
[quadrature.util.stream :as us]))
This numerical integration method is a closed Newton-Cotes formula; for each integral slice, Simpson's 3/8 rule samples each endpoint and TWO interior, equally spaced points, and combines them into an area estimate for this slice using the following formula:
\[{{3h} \over 8} (f_0 + 3f_1 + 3f_2 + f_3)\]
Given a window of \([a, b]\) and a "step size" of \(h = {{b - a} \over 3}\). The point \(f_i\) is the point \(i\) steps into the window.
There are a few simpler ways to understand this:
trapezoid.cljc
), subject to a single refinement of "Richardson extrapolation", with an threefold-increase of integration slices at each step, from \(n \to 3n\).The test namespace contains a symbolic proof that the Richardson-extrapolated Trapezoid method is equivalent to using the formula above to calculate Simpson's 3/8 rule directly.
(defn simpson38-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the closed interval $[a, b]$ using Simpson's 3/8 rule.
Simpson's 3/8 rule is equivalent to the trapezoid method subject to:
- one refinement of Richardson extrapolation, and
- a geometric increase of integration slices by a factor of 3 for each
sequence element. (the Trapezoid method increases by a factor of 2 by
default.)
The trapezoid method fits a line to each integration slice. Simpson's 3/8 rule
fits a cubic to each slice.
Returns estimates with $n, 3n, 9n, ...n3^i$ slices, geometrically increasing by a
factor of 3 with each estimate.
## Optional arguments:
If supplied, `:n` (default 1) specifies the initial number of slices to use.
NOTE: the Trapezoid method is able to reuse function evaluations as its
windows narrow /only/ when increasing the number of integration slices by 2.
Simpson's 3/8 rule increases the number of slices geometrically by a factor of
2 each time, so it will never hit the incremental path. You may want to
memoize your function before calling `simpson38-sequence`."
([f a b] (simpson38-sequence f a b {:n 1}))
([f a b {:keys [n] :or {n 1}}]
{:pre [(number? n)]}
(-> (qt/trapezoid-sequence f a b (us/powers 3 n))
(ir/richardson-column 1 3 2 2))))
(qc/defintegrator integral
"Returns an estimate of the integral of `f` over the closed interval $[a, b]$
using Simpson's 3/8 rule with $1, 3, 9 ... 3^n$ windows for each estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `simpson38-sequence` for more information about Simpson's 3/8 rule, caveats
that might apply when using this integration method and information on the
optional args in `opts` that customize this function's behavior."
:area-fn (comp first simpson38-sequence)
:seq-fn simpson38-sequence)
trapezoid method plus two steps of Richardson extrapolation. (Are you starting to see the pattern??)
(ns quadrature.boole
(:require [quadrature.common :as qc
#?@(:cljs [:include-macros true])]
[quadrature.trapezoid :as qt]
[quadrature.interpolate.richardson :as ir]))
NOTE: Boole's Rule is commonly mis-spelled as "Bode's Rule"!
This numerical integration method is a closed Newton-Cotes formula; for each integral slice, Boole's rule samples:
and combines them into an area estimate for this slice using the following formula:
\[{{2h} \over 45} (7f_0 + 32f_1 + 12f_2 + 32f_3 + 7f_4)\]
Given a window of \([a, b]\) and a "step size" of \(h = {{b - a} \over 4}\). The point \(f_i\) is the point \(i\) steps into the window.
There are a few simpler ways to understand this:
trapezoid.cljc
), subject to two refinements of "Richardson extrapolation".The test namespace contains a symbolic proof that the Richardson-extrapolated Trapezoid method is equivalent to using the formula above to calculate Boole's rule directly.
(defn boole-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the closed interval $[a, b]$ using Boole's rule.
Boole's rule is equivalent to the trapezoid method subject to two refinements
of Richardson extrapolation. The trapezoid method fits a line to each
integration slice. Boole's rule fits a quartic to each slice.
Returns estimates with $n, 2n, 4n, ...$ slices, geometrically increasing by a
factor of 2 with each estimate.
## Optional arguments:
If supplied, `:n` (default 1) specifies the initial number of slices to use."
([f a b] (boole-sequence f a b {:n 1}))
([f a b {:keys [n] :or {n 1}}]
{:pre [(number? n)]}
(-> (qt/trapezoid-sequence f a b n)
(ir/richardson-column 2 2 2 2))))
(qc/defintegrator integral
"Returns an estimate of the integral of `f` over the closed interval $[a, b]$
using Boole's rule with $1, 2, 4 ... 2^n$ windows for each estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `boole-sequence` for more information about Boole's rule, caveats that
might apply when using this integration method and information on the optional
args in `opts` that customize this function's behavior."
:area-fn (comp first boole-sequence)
:seq-fn boole-sequence)
(ns quadrature.romberg
(:require [quadrature.common :as qc
#?@(:cljs [:include-macros true])]
[quadrature.midpoint :as qm]
[quadrature.trapezoid :as qt]
[quadrature.interpolate.richardson :as ir]))
Romberg's method is a technique for estimating a definite integral over a closed (or open) range \(a, b\):
\[\int_{a}^{b} f(x) dx\]
By applying Richardson extrapolation (see richardson.cljc
) to either the Trapezoid method or the Midpoint method.
The implementation of Richardson extrapolation in this library can be applied to any methods; many of the numerical quadrature methods (Simpson, Simpson's 3/8, Milne, Boole) involve a single step of Richardson extrapolation.
Romberg integration goes all the way. A nice way to think about this algorithm is this:
For a wonderful reference that builds up to the ideas of Richardson extrapolation and Romberg integration, see Sussman's "Abstraction in Numerical Methods".
References:
(defn open-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the open interval $(a, b)$ by applying Richardson extrapolation to
successive integral estimates from the Midpoint rule.
Returns estimates formed by combining $n, 3n, 9n, ...$ slices, geometrically
increasing by a factor of 3 with each estimate. This factor of 3 is because,
internally, the Midpoint method is able to recycle old function evaluations
through this factor of 3.
Romberg integration converges quite fast by cancelling one error term in the
taylor series expansion of $f$ with each examined term. If your function is
/not/ smooth this may cause you trouble, and you may want to investigate a
lower-order method.
## Optional arguments:
If supplied, `:n` (default 1) specifies the initial number of slices to use."
([f a b] (open-sequence f a b {}))
([f a b {:keys [n] :or {n 1} :as opts}]
{:pre [(number? n)]}
(-> (qm/midpoint-sequence f a b opts)
(ir/richardson-sequence 3 2 2))))
(defn closed-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the closed interval $[a, b]$ by applying Richardson extrapolation to
successive integral estimates from the Trapezoid rule.
Returns estimates formed by combining $n, 2n, 4n, ...$ slices, geometrically
increasing by a factor of 2 with each estimate.
Romberg integration converges quite fast by cancelling one error term in the
taylor series expansion of $f$ with each examined term. If your function is
/not/ smooth this may cause you trouble, and you may want to investigate a
lower-order method.
## Optional arguments:
If supplied, `:n` (default 1) specifies the initial number of slices to use."
([f a b] (closed-sequence f a b {}))
([f a b {:keys [n] :or {n 1} :as opts}]
{:pre [(number? n)]}
(-> (qt/trapezoid-sequence f a b opts)
(ir/richardson-sequence 2 2 2))))
(defn romberg-sequence
"Higher-level abstraction over `closed-sequence` and `open-sequence`. Identical
to those functions (see their docstrings), but internally chooses either
implementation based on the interval specified inside of `opts`.
Defaults to the same behavior as `open-sequence`."
([f a b] (romberg-sequence f a b {}))
([f a b opts]
(let [seq-fn (if (qc/closed?
(qc/interval opts))
closed-sequence
open-sequence)]
(seq-fn f a b opts))))
(qc/defintegrator open-integral
"Returns an estimate of the integral of `f` over the open interval $(a, b)$
generated by applying Richardson extrapolation to successive integral
estimates from the Midpoint rule.
Considers $1, 3, 9 ... 3^n$ windows into $(a, b)$ for each successive
estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `open-sequence` for more information about Romberg integration, caveats
that might apply when using this integration method and information on the
optional args in `opts` that customize this function's behavior."
:area-fn qm/single-midpoint
:seq-fn open-sequence)
(qc/defintegrator closed-integral
"Returns an estimate of the integral of `f` over the closed interval $[a, b]$
generated by applying Richardson extrapolation to successive integral
estimates from the Trapezoid rule.
Considers $1, 2, 4 ... 2^n$ windows into $[a, b]$ for each successive
estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `closed-sequence` for more information about Romberg integration, caveats
that might apply when using this integration method and information on the
optional args in `opts` that customize this function's behavior."
:area-fn qt/single-trapezoid
:seq-fn closed-sequence)
(ns quadrature.milne
(:require [quadrature.common :as qc
#?@(:cljs [:include-macros true])]
[quadrature.midpoint :as qm]
[quadrature.interpolate.richardson :as ir]
[quadrature.util.stream :as us]))
This numerical integration method is an open Newton-Cotes formula; for each integral slice, Milne's rule samples three interior points (not the endpoints!) and combines them into an area estimate for this slice using the following formula:
\[{{4h} \over 3} (2f_1 - f_2 + 2f_3)\]
Given a window of \((a, b)\) and a "step size" of \(h = {{b - a} \over 3}\). The point \(f_i\) is the point \(i\) steps into the window.
There is a simpler way to understand this! Milne's method is, in fact, just the midpoint method (see midpoint.cljc
), subject to a single refinement of "Richardson extrapolation".
The test namespace contains a symbolic proof that the Richardson-extrapolated Midpoint method is equivalent to using the formula above to calculate Milne's rule directly.
(defn milne-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the open interval $(a, b)$ using Milne's rule.
Milne's rule is equivalent to the midpoint method subject to one refinement of
Richardson extrapolation.
Returns estimates with $n, 2n, 4n, ...$ slices, geometrically increasing by a
factor of 2 with each estimate.
## Optional arguments:
If supplied, `:n` (default 1) specifies the initial number of slices to use.
NOTE: the Midpoint method is able to reuse function evaluations as its windows
narrow /only/ when increasing the number of integration slices by 3. Milne's
method increases the number of slices geometrically by a factor of 2 each
time, so it will never hit the incremental path. You may want to memoize your
function before calling `milne-sequence`."
([f a b] (milne-sequence f a b {:n 1}))
([f a b {:keys [n] :or {n 1} :as opts}]
{:pre [(number? n)]}
(-> (qm/midpoint-sequence f a b (assoc opts :n (us/powers 2 n)))
(ir/richardson-column 1 2 2 2))))
(qc/defintegrator integral
"Returns an estimate of the integral of `f` over the open interval $(a, b)$
using Milne's rule with $1, 2, 4 ... 2^n$ windows for each estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `milne-sequence` for more information about Milne's rule, caveats that
might apply when using this integration method and information on the optional
args in `opts` that customize this function's behavior."
:area-fn (comp first milne-sequence)
:seq-fn milne-sequence)
(ns quadrature.bulirsch-stoer
(:require [quadrature.interpolate.polynomial :as poly]
[quadrature.interpolate.rational :as rat]
[quadrature.common :as qc
#?@(:cljs [:include-macros true])]
[quadrature.midpoint :as mid]
[quadrature.trapezoid :as trap]
[sicmutils.generic :as g]
[sicmutils.util :as u]
[quadrature.util.stream :as us]))
This quadrature method comes from the scmutils package that inspired this library.
The idea is similar to Romberg integration:
Romberg integration does this by fitting a polynomial to a geometric series of slices - $1, 2, 4…$, for example - using Richardson extrapolation.
The Bulirsch-Stoer algorithm is exactly this, but:
Here are the default step sizes:
(def bulirsch-stoer-steps
(interleave
(us/powers 2 2)
(us/powers 2 3)))
The more familiar algorithm named "Bulirsch-Stoer" applies the same ideas to the solution of ODEs, as described on Wikipedia. scmutils adapted this into the methods you see here.
NOTE - The Wikipedia page states that "Hairer, Nørsett & Wanner (1993, p. 228), in their discussion of the method, say that rational extrapolation in this case is nearly never an improvement over polynomial interpolation (Deuflhard 1983)."
We can do this too! Passing {:bs-extrapolator :polynomial}
enables polynomial extrapolation in the sequence and integration functions implemented below.
One more detail is important to understand. You could apply the ideas above to any function that approximates an integral, but this namespace focuses on accelerating the midpoint and trapezoid methods.
As discussed in midpoint.cljc
and trapezoid.cljc
, the error series for these methods has terms that fall off as even powers of the integration slice width:
\[1/h^2, 1/h^4, ...\]
\[1/(h^2) 1/(h^2)^2, ...\]
This means that the rational function approximation needs to fit the function to points of the form
\[(h^2, f(h))\]
to take advantage of the acceleration. This trick is baked into Richardson extrapolation through the ability to specify a geometric series. richardson_test.cljc
shows that Richardson extrapolation is indeed equivalent to a polynomial fit using \(h^2\)… the idea here is the same.
The following two functions generate a sequence of NON-squared \(h\) slice widths. bs-sequence-fn
below squares each entry.
(defn- slice-width [a b]
(let [width (- b a)]
(fn [n] (/ width n))))
(defn- h-sequence
"Defines the sequence of slice widths, given a sequence of `n` (number of
slices) in the interval $(a, b)$."
([a b] (h-sequence a b bulirsch-stoer-steps))
([a b n-seq]
(map (slice-width a b) n-seq)))
The next group of functions generates open-sequence
and closed-sequence
methods, analagous to all other quadrature methods in the library.
(defn- extrapolator-fn
"Allows the user to specify polynomial or rational function extrapolation via
the `:bs-extrapolator` option."
[opts]
(if (= :polynomial (:bs-extrapolator opts))
poly/modified-neville
rat/modified-bulirsch-stoer))
(defn- bs-sequence-fn
"Accepts some function (like `mid/midpoint-sequence`) that returns a sequence of
successively better estimates to the integral, and returns a new function with
interface `(f a b opts)` that accelerates the sequence with either
- polynomial extrapolation
- rational function extrapolation
By default, The `:n` in `opts` (passed on to `integrator-seq-fn`) is set to
the sequence of step sizes suggested by Bulirsch-Stoer,
`bulirsch-stoer-steps`."
[integrator-seq-fn]
(fn call
([f a b]
(call f a b {:n bulirsch-stoer-steps}))
([f a b opts]
{:pre [(not (number? (:n opts)))]}
(let [{:keys [n] :as opts} (-> {:n bulirsch-stoer-steps}
(merge opts))
extrapolate (extrapolator-fn opts)
square (fn [x] (* x x))
xs (map square (h-sequence a b n))
ys (integrator-seq-fn f a b opts)]
(-> (map vector xs ys)
(extrapolate 0))))))
(def ^{:doc "Returns a (lazy) sequence of successively refined estimates of the
integral of `f` over the closed interval $[a, b]$ by applying rational
polynomial extrapolation to successive integral estimates from the Midpoint
rule.
Returns estimates formed from the same estimates used by the Bulirsch-Stoer
ODE solver, stored in `bulirsch-stoer-steps`.
## Optional arguments:
`:n`: If supplied, `n` (sequence) overrides the sequence of steps to use.
`:bs-extrapolator`: Pass `:polynomial` to override the default rational
function extrapolation and enable polynomial extrapolation using the modified
Neville's algorithm implemented in `poly/modified-neville`."}
open-sequence
(bs-sequence-fn mid/midpoint-sequence))
(def ^{:doc "Returns a (lazy) sequence of successively refined estimates of the
integral of `f` over the closed interval $[a, b]$ by applying rational
polynomial extrapolation to successive integral estimates from the Trapezoid
rule.
Returns estimates formed from the same estimates used by the Bulirsch-Stoer
ODE solver, stored in `bulirsch-stoer-steps`.
## Optional arguments:
`:n`: If supplied, `:n` (sequence) overrides the sequence of steps to use.
`:bs-extrapolator`: Pass `:polynomial` to override the default rational
function extrapolation and enable polynomial extrapolation using the modified
Neville's algorithm implemented in `poly/modified-neville`."}
closed-sequence
(bs-sequence-fn trap/trapezoid-sequence))
Finally, two separate functions that use the sequence functions above to converge quadrature estimates.
(qc/defintegrator open-integral
"Returns an estimate of the integral of `f` over the open interval $(a, b)$
generated by applying rational polynomial extrapolation to successive integral
estimates from the Midpoint rule.
Considers successive numbers of windows into $(a, b)$ specified by
`bulirsch-stoer-steps`.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `open-sequence` for more information about Bulirsch-Stoer quadrature,
caveats that might apply when using this integration method and information on
the optional args in `opts` that customize this function's behavior."
:area-fn mid/single-midpoint
:seq-fn open-sequence)
(qc/defintegrator closed-integral
"Returns an estimate of the integral of `f` over the closed interval $[a, b]$
generated by applying rational polynomial extrapolation to successive integral
estimates from the Trapezoid rule.
Considers successive numbers of windows into $[a, b]$ specified by
`bulirsch-stoer-steps`.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `closed-sequence` for more information about Bulirsch-Stoer quadrature,
caveats that might apply when using this integration method and information on
the optional args in `opts` that customize this function's behavior."
:area-fn trap/single-trapezoid
:seq-fn closed-sequence)
Implemented as functional wrappers that take an integrator and return a modified integrator.
(ns quadrature.substitute
"## U Substitution and Variable Changes
This namespace provides implementations of functions that accept an
`integrator` and perform a variable change to address some singularity, like
an infinite endpoint, in the definite integral.
The strategies currently implemented were each described by Press, et al. in
section 4.4 of ['Numerical
Recipes'](http://phys.uri.edu/nigh/NumRec/bookfpdf/f4-4.pdf)."
(:require [clojure.core.match :refer [match]]
[quadrature.common :as qc]))
This first function, infinitize
, transforms some integrator into a new integrator with the same interface that can handle an infinite endpoint.
This implementation can only handle one endpoint at a time, and, the way it's written, both endpoints have to have the same sign. For an easier interface to this transformation, see infinite/evaluate-infinite-integral
in infinite.cljc
.
(defn infinitize
"Performs a variable substitution targeted at turning a single infinite endpoint
of an improper integral evaluation an (open) endpoint at 0 by applying the
following substitution:
$$u(t) = {1 \\over t}$$ $$du = {-1 \\over t^2}$$
This works when the integrand `f` falls off at least as fast as $1 \\over t^2$
as it approaches the infinite limit.
The returned function requires that `a` and `b` have the same sign, ie:
$$ab > 0$$
Transform the bounds with $u(t)$, and cancel the negative sign by changing
their order:
$$\\int_{a}^{b} f(x) d x=\\int_{1 / b}^{1 / a} \\frac{1}{t^{2}} f\\left(\\frac{1}{t}\\right) dt$$
References:
- Mathworld, \"Improper Integral\": https://mathworld.wolfram.com/ImproperIntegral.html
- Press, Numerical Recipes, Section 4.4: http://phys.uri.edu/nigh/NumRec/bookfpdf/f4-4.pdf"
[integrate]
(fn call
([f a b] (call f a b {}))
([f a b opts]
{:pre [(not
(and (qc/infinite? a)
(qc/infinite? b)))]}
(let [f' (fn [t]
(/ (f (/ 1.0 t))
(* t t)))
a' (if (qc/infinite? b) 0.0 (/ 1.0 b))
b' (if (qc/infinite? a) 0.0 (/ 1.0 a))
opts (qc/update-interval opts qc/flip)]
(integrate f' a' b' opts)))))
"To deal with an integral that has an integrable power-law singularity at its lower limit, one also makes a change of variable." (Press, p138)
A "power-law singularity" means that the integrand diverges as \((x - a)^{-\gamma}\) near \(x=a\).
We implement the following identity (from Press) if the singularity occurs at the lower limit:
\[\int_{a}^{b} f(x) d x=\frac{1}{1-\gamma} \int_{0}^{(b-a)^{1-\gamma}} t^{\frac{\gamma}{1-\gamma}} f\left(t^{\frac{1}{1-\gamma}}+a\right) d t \quad(b>a)\]
And this similar identity if the singularity occurs at the upper limit:
;;\[\int_{a}^{b} f(x) d x=\frac{1}{1-\gamma} \int_{0}^{(b-a)^{1-\gamma}} t^{\frac{\gamma}{1-\gamma}} f\left(b-t^{\frac{1}{1-\gamma}}\right) d t \quad(b>a)\]
If you have singularities at both sides, divide the interval at some interior breakpoint, take separate integrals for both sides and add the values back together.
(defn- inverse-power-law
"Implements a change of variables to address a power law singularity at the
lower or upper integration endpoint.
An \"inverse power law singularity\" means that the integrand diverges as
$$(x - a)^{-\\gamma}$$
near $x=a$. Passing true for `lower?` to specify a singularity at the lower
endpoint, false to signal an upper-endpoint singularity.
References:
- Mathworld, \"Improper Integral\": https://mathworld.wolfram.com/ImproperIntegral.html
- Press, Numerical Recipes, Section 4.4: http://phys.uri.edu/nigh/NumRec/bookfpdf/f4-4.pdf
- Wikipedia, \"Finite-time Singularity\": https://en.wikipedia.org/wiki/Singularity_(mathematics)#Finite-time_singularity
"
[integrate gamma lower?]
{:pre [(<= 0 gamma 1)]}
(fn call
([f a b] (call f a b {}))
([f a b opts]
(let [inner-pow (/ 1 (- 1 gamma))
gamma-pow (* gamma inner-pow)
a' 0
b' (Math/pow (- b a) (- 1 gamma))
t->t' (if lower?
(fn [t] (+ a (Math/pow t inner-pow)))
(fn [t] (- b (Math/pow t inner-pow))))
f' (fn [t] (* (Math/pow t gamma-pow)
(f (t->t' t))))]
(-> (integrate f' a' b' opts)
(update-in [:result] (partial * inner-pow)))))))
(defn inverse-power-law-lower [integrate gamma]
(inverse-power-law integrate gamma true))
(defn inverse-power-law-upper [integrate gamma]
(inverse-power-law integrate gamma false))
The next two functions specialize the inverse-power-law-*
functions to the common situation of an inverse power law singularity.
(defn inverse-sqrt-lower
"Implements a change of variables to address an inverse square root singularity
at the lower integration endpoint. Use this when the integrand diverges as
$$1 \\over {\\sqrt{x - a}}$$
near the lower endpoint $a$."
[integrate]
(fn call
([f a b] (call f a b {}))
([f a b opts]
(let [f' (fn [t] (* t (f (+ a (* t t)))))]
(-> (integrate f' 0 (Math/sqrt (- b a)) opts)
(update-in [:result] (partial * 2)))))))
(defn inverse-sqrt-upper
"Implements a change of variables to address an inverse square root singularity
at the upper integration endpoint. Use this when the integrand diverges as
$$1 \\over {\\sqrt{x - b}}$$
near the upper endpoint $b$."
[integrate]
(fn call
([f a b] (call f a b {}))
([f a b opts]
(let [f' (fn [t] (* t (f (- b (* t t)))))]
(-> (integrate f' 0 (Math/sqrt (- b a)) opts)
(update-in [:result] (partial * 2)))))))
From Press, section 4.4: "Suppose the upper limit of integration is infinite, and the integrand falls off exponentially. Then we want a change of variable that maps
\[\exp{-x} dx\]
into +- \(dt\) (with the sign chosen to keep the upper limit of the new variable larger than the lower limit)."
The required identity is:
\[\int_{x=a}^{x=\infty} f(x) d x=\int_{t=0}^{t=e^{-a}} f(-\log t) \frac{d t}{t}\]
(defn exponential-upper
"Implements a change of variables to address an exponentially diverging upper
integration endpoint. Use this when the integrand diverges as $\\exp{x}$ near
the upper endpoint $b$."
[integrate]
(fn call
([f a b] (call f a b {}))
([f a b opts]
{:pre [(qc/infinite? b)]}
(let [f' (fn [t] (* (- (Math/log t))
(/ 1 t)))
opts (qc/update-interval opts qc/flip)]
(integrate f 0 (Math/exp (- a)) opts)))))
A template for a combinator that enables infinite endpoints on any integrator, using variable substitution on an appropriate, tunable range.
(ns quadrature.infinite
(:require [clojure.core.match :refer [match]]
[quadrature.common :as qc]
[quadrature.substitute :as qs]))
This namespace holds an implementation of an "improper" integral combinator (for infinite endpoints) usable with any quadrature method in the library.
The implementation was inspired by evaluate-improper-integral
in numerics/quadrature/quadrature.scm
file in the scmutils package.
To evaluate an improper integral with an infinite endpoint, the improper
combinator applies an internal change of variables.
\[u(t) = {1 \over t}\] \[du = {-1 \over t^2}\]
This has the effect of mapping the endpoints from \(a, b\) to \({1 \over b}, {1 \over a}\). Here's the identity we implement:
\[\int_{a}^{b} f(x) d x=\int_{1 / b}^{1 / a} \frac{1}{t^{2}} f\left(\frac{1}{t}\right) dt\]
This is implemented by substitute/infinitize
.
The variable change only works as written when both endpoints are of the same sign; so, internally, improper
only applies the variable change to the segment of \(a, b\) from ##-Inf => (- :infinite-breakpoint)
and :infinite-breakpoint -> ##Inf
, where :infinite-breakpoint
is an argument the user can specify in the returned integrator's options map.
Any segment of \(a, b\) not in those regions is evaluated normally.
NOTE: The ideas in this namespace could be implemented for other variable substitutions (see substitute.cljc
) that only need to apply to certain integration intervals. The code below automatically cuts the range \((a, b)\) to accomodate this for the particular variable change we've baked in, but there is a more general abstraction lurking.
(defn improper
"Accepts:
- An `integrator` (function of `f`, `a`, `b` and `opts`)
- `a` and `b`, the endpoints of an integration interval, and
- (optionally) `opts`, a dict of integrator-configuring options
And returns a new integrator that's able to handle infinite endpoints. (If you
don't specify `##-Inf` or `##Inf`, the returned integrator will fall through
to the original `integrator` implementation.)
All `opts` will be passed through to the supplied `integrator`.
## Optional arguments relevant to `improper`:
`:infinite-breakpoint`: If either `a` or `b` is equal to `##Inf` or `##-Inf`,
this function will internally perform a change of variables on the regions
from:
`(:infinite-breakpoint opts) => ##Inf`
or
`##-Inf => (- (:infinite-breakpoint opts))`
using $u(t) = {1 \\over t}$, as described in the `infinitize` method of
`substitute.cljc`. This has the effect of mapping the infinite endpoint to an
open interval endpoint of 0.
Where should you choose the breakpoint? According to Press in Numerical
Recipes, section 4.4: \"At a sufficiently large positive value so that the
function funk is at least beginning to approach its asymptotic decrease to
zero value at infinity.\"
References:
- Press, Numerical Recipes (p138), Section 4.4: http://phys.uri.edu/nigh/NumRec/bookfpdf/f4-4.pdf"
[integrator]
(fn rec
([f a b] (rec f a b {}))
([f a b opts]
(let [{:keys [infinite-breakpoint] :as opts} (-> {:infinite-breakpoint 1}
(merge opts))
call (fn [integrate l r interval]
(let [m (qc/with-interval opts interval)]
(let [result (integrate f l r m)]
(:result result))))
ab-interval (qc/interval opts)
integrate (partial call integrator)
inf-integrate (partial call (qs/infinitize integrator))
r-break (Math/abs infinite-breakpoint)
l-break (- r-break)]
(match [[a b]]
[(:or [##-Inf ##-Inf] [##Inf ##Inf])]
{:converged? true
:terms-checked 0
:result 0.0}
[(:or [_ ##-Inf] [##Inf _])]
(-> (rec f b a opts)
(update-in [:result] -))
;; Break the region up into three pieces: a central closed core
;; and two open endpoints where we create a change of variables,
;; letting the boundary go to infinity. We use an OPEN interval on
;; the infinite side.
[[##-Inf ##Inf]]
(let [-inf->l (inf-integrate a l-break qc/open-closed)
l->r (integrate l-break r-break qc/closed)
r->+inf (inf-integrate r-break b qc/closed-open)]
{:converged? true
:result (+ -inf->l l->r r->+inf)})
;; If `b` lies to the left of the negative breakpoint, don't cut.
;; Else, cut the integral into two pieces at the negative
;; breakpoint and variable-change the left piece.
[[##-Inf _]]
(if (<= b l-break)
(inf-integrate a b ab-interval)
(let [-inf->l (inf-integrate a l-break qc/open-closed)
l->b (integrate l-break b (qc/close-l ab-interval))]
{:converged? true
:result (+ -inf->l l->b)}))
;; If `a` lies to the right of the positive breakpoint, don't cut.
;; Else, cut the integral into two pieces at the positive breakpoint
;; and variable-change the right piece.
[[_ ##Inf]]
(if (>= a r-break)
(inf-integrate a b ab-interval)
(let [a->r (integrate a r-break (qc/close-r ab-interval))
r->+inf (inf-integrate r-break b qc/closed-open)]
{:converged? true
:result (+ a->r r->+inf)}))
;; This is a lot of machinery to use with NON-infinite endpoints;
;; but for completeness, if you reach this level the fn will attempt
;; to integrate the full range directly using the original
;; integrator.
:else (integrator f a b opts))))))
The current implementation does not pass convergence information back up the line! Ideally we would merge results by:
:converged?
entries with and
(ns quadrature.adaptive
(:require [quadrature.common :as qc]
[quadrature.util.aggregate :as ua]))
This namespace holds an implementation of "adaptive quadrature" usable with any quadrature method in the library.
The implementation was inspired by the numerics/quadrature/rational.scm
file in the scmutils package. In that library, adaptive quadrature was special-cased to the Bulirsch-Stoer algorithm, ported in bulirsch_stoer.cljc
.
Most of the integrators in quadrature
work by successively refining an integration interval \(a, b\) down into evenly-spaced integration slices. Some functions are very well behaved in some regions, and then oscillate wildly in others.
Adaptive quadrature methods partition \(a, b\) into intervals of different sizes, allowing the integrator to drill in more closely to areas that need attention.
The adaptive
implementation below works like this:
integrate
function on the full \(a, b\), capping the iterations at some configurable limit (~*adaptive-maxterms*~below)integrate
fails to converge, split \(a, b\) into two intervals and attempt to converge both sides using integrate
common/narrow-slice?
, at which point integrate
returns an estimate for a single slice.The *adaptive-maxterms*
variable is dynamic, which means you can adjust the behavior of adaptive
by supplying the :maxterms
option directly, or wrapping your call in (binding [*adaptive-maxterms* 8] ,,,)
.
(def ^:dynamic *adaptive-maxterms* 10)
Symmetric intervals like \(-1, 1\) often show up with integrands with singularities right at the center of the midpoint. For this reason, adaptive
is able to customize its splitting behavior using the *neighborhood-width*
dynamic variable.
By default, when partitioning an interval, adaptive
will choose an interval within 5% of the midpoint. Override this behavior with the :adaptive-neighborhood-width
key in the options dict, or by binding this dynamic variable.
(def ^:dynamic *neighborhood-width* 0.05)
(defn- split-point
"Returns a point within`fuzz-factor` of the midpoint of the interval $[a, b]$.
`fuzz-factor` defaults to 0 (ie, `split-point` returns the midpoint)."
([a b] (split-point a b 0))
([a b fuzz-factor]
{:pre [(>= fuzz-factor 0)
(< fuzz-factor 1)]}
(let [width (- b a)
offset (if (zero? fuzz-factor)
0.5
(+ 0.5 (* fuzz-factor (dec (rand 2.0)))))]
(+ a (* offset width)))))
The implementation below takes two integrate functions, not the one described above. This allows us to handle open and closed intervals, instead of introducing open endpoints at every subdivision. All internal intervals that don't touch an open endpoint are considered closed.
(defn- fill-defaults
"Populates the supplied `opts` dictionary with defaults required by `adaptive`.
Two of these have values controlled by dynamic variables in `adaptive.cljc`."
[opts]
(merge {:maxterms *adaptive-maxterms*
:adaptive-neighborhood-width *neighborhood-width*
:interval qc/open}
opts))
(defn adaptive
"Accepts one or two 'integrators', ie, functions of:
- `f`: some integrand
- `a` and `b`: the lower and upper endpoints of integration
- `opts`, a dictionary of configuration options
And returns a new integrator that adaptively subdivides the region $a, b$ into
intervals if integration fails to converge. If two integrators are supplied,
the first is applied to any interval that's not explicitly closed on both
ends, and the second integrator is applied to explicitly closed intervals.
This behavior depends on the interval set in the supplied `opts` dictionary.
All `opts` will be passed through to the supplied `integrate` functions.
## Optional arguments relevant to `adaptive`:
`:maxterms`: defaults to `*adaptive-maxterms*`. This is passed to the
underlying integrators, and determines how long each interval attempts to
converge before a subdivision.
`:adaptive-neighborhood-width`: When dividing an interval, the split point
will be within this factor of the midpoint. Set `:adaptive-neighborhood-width`
to 0 for deterministic splitting."
([integrator] (adaptive integrator integrator))
([open-integrator closed-integrator]
(fn rec
([f a b] (rec f a b {}))
([f a b opts]
(let [opts (fill-defaults opts)
integrate (fn [l r interval]
(if (qc/closed? interval)
(closed-integrator f l r opts)
(open-integrator f l r opts)))]
(loop [stack [[a b (:interval opts)]]
sum (ua/kahan-sum)
iteration 0]
(if (empty? stack)
{:converged? true
:iterations iteration
:result (first sum)}
(let [[l r interval] (peek stack)
remaining (pop stack)
{:keys [converged? result]} (integrate l r interval)]
(if converged?
(recur remaining
(ua/kahan-sum sum result)
(inc iteration))
(let [midpoint (split-point l r (:adaptive-neighborhood-width opts))]
(recur (conj remaining
[midpoint r (qc/close-l interval)]
[l midpoint (qc/close-r interval)])
sum
(inc iteration))))))))))))
Iteration Limit
adaptive
runs until it completes, with no facility available to bail out of computation. An iteration limit would be a great addition… but it won't be efficient without some way of prioritizing high-error subintervals for refinement, as discussed next.
Priority Queue on Error
Another nice upgrade would be a version of adaptive
that is able to return an actual sequence of successive refinements. to the estimate. The current implementation uses an internal stack to track the subdivided intervals it needs to evaluate. If the integrator was able to provide an error estimate, we could instead use a priority queue, prioritized by error.
adaptive-sequence
could then return a sequence where every element processes a single refinement. Even without this upgrade, the priority queue idea would allow the estimate to converge quickly and be more accurate if we bailed out at some max number of iterations.
This article holds a related implementation.
#+end_{src}
Conclusion coming.
(ns quadrature.common
"Implements utilities shared by all integrators, for example:
- code to wrap a sequence of progressively better estimates in a common `integrator` interface
- data structures implementing various integration intervals."
#?(:cljs (:refer-clojure :exclude [infinite?]
:rename {infinite? core-infinite?}))
(:require [quadrature.util.stream :as us]
[taoensso.timbre :as log]))
This dynamic variable holds the default "roundoff cutoff" used by all integrators to decide when to return an estimate based on a single slice, vs attempting to converge a sequence of progressively finer estimates. When this condition is satisfied:
\[|b - a| / |a| + |b| <= ~cutoff~\]
An integrator will estimate a single slice directly. Else, it will attempt to converge a sequence.
(def ^:dynamic *roundoff-cutoff* 1e-14)
NOTE - we don't have an interface yet to bind this dynamic variable. bind it manually to modify the cutoff for a specific call to some integrator:
(binding [*roundoff-cutoff* 1e-6]
(integrate f a b))
Implementations of the various intervals used by the adaptive integral interface. By default, integration endpoints are considered open.
(def open [::open ::open])
(def closed [::closed ::closed])
(def open-closed [::open ::closed])
(def closed-open [::closed ::open])
(def infinities #{##Inf ##-Inf})
(def infinite?
#?(:cljs core-infinite?
:clj (comp boolean infinities)))
(defn closed?
"Returns true if the argument represents an explicit `closed` interval, false
otherwise."
[x] (= x closed))
(def open? (complement closed?))
These functions modify an interval by opening or closing either of its endpoints.
(defn close-l [[_ r]] [::closed r])
(defn close-r [[l _]] [l ::closed])
(defn open-l [[_ r]] [::open r])
(defn open-r [[l _]] [l ::open])
(defn flip [[l r]] [r l])
(defn interval
"Extracts the interval (or `open` as a default) from the supplied integration
options dict."
[opts]
(get opts :interval open))
(defn with-interval
"Sets the specified interval to a key inside the suppled `opts` map of arbitrary
integration options."
[opts interval]
(assoc opts :interval interval))
(defn update-interval
"Accepts:
- a dictionary of arbitrary options
- one of the 4 interval modification functions
and returns a dict of options with `f` applied to the contained interval (or
`open` if no interval is set).
"
[opts f]
(let [k :interval]
(assoc opts k (f (interval opts)))))
The following two functions define a shared interface that integration namespaces can use to create an "integrator" from:
The first function is called in the case that the integration range \((a, b)\) (open or closed) is too fine for subdivision. The second function takes over in all other (most!) cases.
(defn- narrow-slice?
"Returns true if the range $[a, b]$ is strip narrow enough to pass the following
test:
|b - a| / |a| + |b| <= `cutoff`
False otherwise. This inequality measures how close the two floating point
values are, scaled by the sum of their magnitudes."
[a b cutoff]
(let [sum (+ (Math/abs a)
(Math/abs b))]
(or (<= sum cutoff)
(<= (Math/abs (- b a))
(* cutoff sum)))))
(defn make-integrator-fn
"Generates an `integrator` function from two functions with the following
signatures and descriptions:
- `(area-fn f a b)` estimates the integral of `f` over the interval `(a, b)`
with no subdivision, nothing clever at all.
- `(seq-fn f a b opts)` returns a sequence of successively refined estimates
of the integral of `f` over `(a, b)`. `opts` can contain kv pairs that
configure the behavior of the sequence function (a sequence of the number of
integration slices to use, for example.)
The returned function has the signature:
`(f a b opts)`
All `opts` are passed on to `seq-fn`, /and/ to `us/seq-limit` internally,
where the options configure the checks on sequence convergence."
[area-fn seq-fn]
(fn call
([f a b] (call f a b {}))
([f a b {:keys [roundoff-cutoff]
:or {roundoff-cutoff *roundoff-cutoff*}
:as opts}]
(if (narrow-slice? a b roundoff-cutoff)
(do (log/info "Integrating narrow slice: " a b)
{:converged? true
:terms-checked 1
:result (area-fn f a b)})
(-> (seq-fn f a b opts)
(us/seq-limit opts))))))
(defn- name-with-attributes
"Taken from `clojure.tools.macro/name-with-attributes`.
Handles optional docstrings and attribute maps for a name to be defined in a
list of macro arguments. If the first macro argument is a string, it is added
as a docstring to name and removed from the macro argument list. If afterwards
the first macro argument is a map, its entries are added to the name's
metadata map and the map is removed from the macro argument list. The return
value is a vector containing the name with its extended metadata map and the
list of unprocessed macro arguments."
([name body] (name-with-attributes name body {}))
([name body meta]
(let [[docstring body] (if (string? (first body))
[(first body) (next body)]
[nil body])
[attr body] (if (map? (first body))
[(first body) (next body)]
[{} body])
attr (merge meta attr)
attr (if docstring
(assoc attr :doc docstring)
attr)
attr (if (meta name)
(conj (meta name) attr)
attr)]
[(with-meta name attr) body])))
(defmacro defintegrator
"Helper macro for defining integrators."
[sym & body]
(let [meta {:arglists (list 'quote '([f a b] [f a b opts]))}
[sym body] (name-with-attributes sym body meta)
{:keys [area-fn seq-fn]} (apply hash-map body)]
(assert seq-fn (str "defintegrator " sym ": seq-fn cannot be nil"))
(assert area-fn (str "defintegrator " sym ": area-fn cannot be nil"))
`(def ~sym
(make-integrator-fn ~area-fn ~seq-fn))))
(ns quadrature.util.stream
"This namespace contains various standard sequences, as well as utilities for
working with strict and lazy sequences."
(:require [clojure.pprint :as pp]
[sicmutils.generic :as g]
[sicmutils.value :as v]))
(defn pprint
"Realizes and pretty-prints `n` elements from the supplied sequence `xs`."
[n xs]
(doseq [x (take n xs)]
(pp/pprint x)))
(defn powers
"Returns an infinite sequence of `x * n^i`, starting with i == 0. `x` defaults
to 1."
([n] (powers n 1))
([n x] (iterate #(* n %) x)))
(defn zeno
"Returns an infinite sequence of x / n^i, starting with i == 0. `x` defaults to
1."
([n] (zeno n 1))
([n x] (iterate #(/ % n) x)))
(defn scan
"Returns a function that accepts a sequence `xs`, and performs a scan by:
- Aggregating each element of `xs` into
- the initial value `init`
- transforming each intermediate result by `present` (defaults to `identity`).
The returned sequence contains every intermediate result, after passing it
through `present` (not `init`, though).
This is what distinguishes a scan from a 'fold'; a fold would return the final
result. A fold isn't appropriate for aggregating infinite sequences, while a
scan works great.
Arities:
- the three-arity version takes `init` value, `f` fold function and `present`.
- the two arity version drops `init`, and instead calls `f` with no arguments.
The return value is the initial aggregator.
- The 1-arity version only takes the `merge` fn and defaults `present` to
`identity`.
"
[f & {:keys [present init]
:or {present identity}}]
(let [init (or init (f))]
(fn [xs]
(->> (reductions f init xs)
(map present)
(rest)))))
This convergence tester comes from Gerald Sussman's "Abstraction in Numerical Methods".
We're planning on adding a number of these here and consolidating all of the available ideas about relative and maximum tolerance so we can share (and combine) them across different stream functions.
(defn- close-enuf?
"relative closeness, transitioning to absolute closeness when we get
significantly smaller than 1."
[tolerance]
(fn [h1 h2]
(<= (g/abs (- h1 h2))
(* 0.5 tolerance (+ 2 (g/abs h1) (g/abs h2))))))
I have a dream that a function like seq-limit
could service most of the numerical methods in this library. Function minimization, root finding, definite integrals and numerical derivatives can all be expressed as successive approximations, with convergence tests (or other stopping conditions) checked and applied between iterations.
As of October 2020 we use this exclusively for various numerical integration routines. But it has more promise than this!
(defn seq-limit
"Accepts a sequence, iterates through it and returns a dictionary of this form:
{:converged? <boolean>
:terms-checked <int>
:result <sequence element>}
`:converged?` is true if the sequence reached convergence by passing the tests
described below, false otherwise.
`:terms-checked` will be equal to the number of items examined in the
sequence.
`:result` holds the final item examined in the sequence.
## Optional keyword args:
`:convergence-fn` user-supplied function of two successive elements in `xs`
that stops iteration and signals convergence if it returns true.
`:fail-fn` user-supplied function of two successive elements in `xs` that
stops iteration and signals NO convergence (failure!) if it returns true.
`:minterms` `seq-limit` won't return until at least this many terms from the
sequence have been processed.
`:maxterms` `seq-limit` will return (with `:converged? false`) after
processing this many elements without passing any other checks.
`:tolerance` A combination of relative and absolute tolerance. defaults to
`sqrt(machine epsilon)`."
([xs] (seq-limit xs {}))
([xs {:keys [minterms
maxterms
tolerance
convergence-fn
fail-fn]
:or {minterms 2
tolerance (Math/sqrt v/machine-epsilon)
convergence-fn (close-enuf? tolerance)}}]
(if (empty? xs)
{:converged? false
:terms-checked 0
:result nil}
(let [stop? (if maxterms
(fn [i] (>= i maxterms))
(constantly false))]
(loop [[x1 & [x2 :as more]] xs
terms-checked 1]
(if (empty? more)
{:converged? false
:terms-checked terms-checked
:result x1}
(let [terms-checked (inc terms-checked)
converged? (convergence-fn x1 x2)]
(if (and (>= terms-checked minterms)
(or converged?
(stop? terms-checked)))
{:converged? converged?
:terms-checked terms-checked
:result x2}
(recur more terms-checked)))))))))
(ns quadrature.util.aggregate
"Utilities for aggregating sequences.")
I learned about "Kahan's summation trick" from rational.scm
in the scmutils
package, where it shows up in the sigma
function.
(defn kahan-sum
"Implements a fold that tracks the summation of a sequence of floating point
numbers, using Kahan's trick for maintaining stability in the face of
accumulating floating point errors."
([] [0.0 0.0])
([[sum c] x]
(let [y (- x c)
t (+ sum y)]
[t (- (- t sum) y)])))
(defn sum
"Sums either:
- a series `xs` of numbers, or
- the result of mapping function `f` to `(range low high)`
Using Kahan's summation trick behind the scenes to keep floating point errors
under control."
([xs]
(first
(reduce kahan-sum [0.0 0.0] xs)))
([f low high]
(sum (map f (range low high)))))
(defn scanning-sum
"Returns every intermediate summation from summing either:
- a series `xs` of numbers, or
- the result of mapping function `f` to `(range low high)`
Using Kahan's summation trick behind the scenes to keep floating point errors
under control."
([xs]
(->> (reductions kahan-sum [0.0 0.0] xs)
(map first)
(rest)))
([f low high]
(scanning-sum
(map f (range low high)))))
]]>SICMUtils is the engine behind the wonderful "Structure and Interpretation of Classical Mechanics", an advanced physics textbook by Gerald Sussman of SICP fame. I'm
]]>SICMUtils is the engine behind the wonderful "Structure and Interpretation of Classical Mechanics", an advanced physics textbook by Gerald Sussman of SICP fame. I'm trying to get the textbook running in the browser as a Clojurescript library, and as part of that effort I've had to re-implement quite a bit of numerical code in Clojure.
Can I admit that I've gone deeper than necessary? After I'd gotten to know the library, I noted that the power series implementation could use some work. Jack Rusher pointed me at a gorgeous power series implementation by Doug McIlroy (my Uncle's boss at Bell Labs!). See his website for 10-lines version of beautiful Haskell.
Doug's paper, "Power Series, Power Serious", goes into more detail on the implementation, which leans hard on Haskell's lazy evaluation. Clojure has lazy sequences too! I implemented the paper here, then went further and made power series executable, and extended them to play with the whole generic arithmetic system in this namespace.
This post is a rendered, literate programming version of impl.cljc
, the core power series implementation based on Doug's work. It's exactly the same code we use in the library.
Enjoy!
The following code builds up a power series implementation that backs two sicmutils
types:
Series
, which represents a generic infinite series of arbitrary values, andPowerSeries
, a series that represents a power series in a single variable; in other words, a series where the nth entry is interpreted as the coefficient of \(x^n\):\begin{equation}
\label{eq:4}
[a\ b\ c\ d\ ...] = a + bx + cx^2 + dx^3 + ...
\end{equation}
We'll proceed by building up implementations of the arithmetic operations +
, -
, *
, /
and a few others using bare Clojure lazy sequences.
The implementation follows Doug McIlroy's beautiful paper, "Power Series, Power Serious". Doug also has a 10-line version in Haskell on his website.
Let's go!
A 'series' is an infinite sequence of numbers, represented by Clojure's lazy sequence. First, a function ->series
that takes some existing sequence, finite or infinite, and coerces it to an infinite seq by concatenating it with an infinite sequence of zeros. (We use v/zero-like
so that everything plays nicely with generic arithmetic.)
(defn ->series
"Form the infinite sequence starting with the supplied values. The
remainder of the series will be filled with the zero-value
corresponding to the first of the given values."
[xs]
(lazy-cat xs (repeat (v/zero-like (first xs)))))
(take 10 (->series [1 2 3 4]))
(1 2 3 4 0 0 0 0 0 0)
The core observation we'll use in the following definitions (courtesy of McIlroy) is that a power series \(F\) in a variable \(x\):
\begin{equation}
F(x)=f_{0}+x f_{1}+x^{2} f_{2}+\cdots
\end{equation}
Decomposes into a head element \(f_0\) plus a tail series, multiplied by \(x\):
\begin{equation}
\label{eq:3}
F(x) = F_0(x) = f_0 + x F_1(x)
\end{equation}
We'll use this observation to derive the more complicated sequence operations below.
To negate a series, negate each element:
(defn negate [xs]
(map g/negate xs))
Example:
(take 7 (negate (->series [1 2 3 4])))
(-1 -2 -3 -4 0 0 0)
We can derive series addition by expanding the series \(F\) and \(G\) into head and tail and rearranging terms:
\begin{equation}
\label{eq:5}
F+G=\left(f+x F_{1}\right)+\left(g+xG_{1}\right)=(f+g)+x\left(F_{1}+G_{1}\right)
\end{equation}
This is particularly straightforward in Clojure, where map
already merges sequences elementwise:
(defn seq:+ [f g]
(map g/+ f g))
(take 5 (seq:+ (range) (range)))
(0 2 4 6 8)
A constant is a series with its first element populated, all zeros otherwise. To add a constant to another series we only need add it to the first element. Here are two versions, constant-on-left vs constant-on-right:
(defn c+seq [c f]
(lazy-seq
(cons (g/+ c (first f)) (rest f))))
(defn seq+c [f c]
(lazy-seq
(cons (g/+ (first f) c) (rest f))))
(let [series (->series [1 2 3 4])]
[(take 6 (seq+c series 10))
(take 6 (c+seq 10 series))])
[(11 2 3 4 0 0) (11 2 3 4 0 0)]
Subtraction comes for free from the two definitions above:
(defn seq:- [f g]
(seq:+ f (negate g)))
(take 5 (seq:- (range) (range)))
(0 0 0 0 0)
We should get equivalent results from mapping g/-
over both sequences, and in almost all cases we do… but until we understand and fix this bug that method would return different results.
Subtract a constant from a sequence by subtracting it from the first element:
(defn seq-c [f c]
(lazy-seq
(cons (g/- (first f) c) (rest f))))
(take 5 (seq-c (range) 10))
(-10 1 2 3 4)
To subtract a sequence from a constant, subtract the first element as before, but negate the tail of the sequence:
(defn c-seq [c f]
(lazy-seq
(cons (g/- c (first f)) (negate (rest f)))))
(take 5 (c-seq 10 (range)))
(10 -1 -2 -3 -4)
What does it mean to multiply two infinite sequences? As McIlroy notes, multiplication is where the lazy-sequence-based approach really comes into its own.
First, the simple cases of multiplication by a scalar on either side of a sequence:
(defn seq*c [f c] (map #(g/mul % c) f))
(defn c*seq [c f] (map #(g/mul c %) f))
To multiply sequences, first recall from above that we can decompose each sequence \(F\) and \(G\) into a head and tail.
Mutiply the expanded representations out and rearrange terms:
\begin{equation}
\label{eq:6}
F \times G=\left(f+x F_{1}\right) \times\left(g+x G_{1}\right)=f g+x\left(f G_{1}+F_{1} \times G\right)
\end{equation}
\(G\) appears on the left and the right, so use an inner function that closes over \(g\) to simplify matters, and rewrite the above definition in Clojure:
(defn seq:* [f g]
(letfn [(step [f]
(lazy-seq
(let [f*g (g/mul (first f) (first g))
f*G1 (c*seq (first f) (rest g))
F1*G (step (rest f))]
(cons f*g (seq:+ f*G1 F1*G)))))]
(step f)))
This works just fine on two infinite sequences:
(take 10 (seq:* (range) (->series [4 3 2 1])))
(0 4 11 20 30 40 50 60 70 80)
NOTE This is also called the "Cauchy Product" of the two sequences. The description on the Wikipedia page has complicated index tracking that simply doesn't come in to play with the stream-based approach. Amazing!
The quotient \(Q\) of \(F\) and \(G\) should satisfy:
\begin{equation}
\label{eq:7}
F = Q \times G
\end{equation}
From McIlroy, first expand out \(F\), \(Q\) and one instance of \(G\):
\begin{equation}
\begin{aligned}
f+x F_{1} &=\left(q+x Q_{1}\right) \times G \cr
&=q G+x Q_{1} \times G=q\left(g+x G_{1}\right)+x Q_{1} \times G \cr
&=q g+x\left(q G_{1}+Q_{1} \times G\right)
\end{aligned}
\end{equation}
Look at just the constant terms and note that \(q = \frac{f}{g}\).
Consider the terms multiplied by \(x\) and solve for \(Q_1\):
\begin{equation}
\label{eq:8}
Q_1 = \frac{(F_1 - qG_1)}{G}
\end{equation}
There are two special cases to consider:
Encoded in Clojure:
(defn div [f g]
(lazy-seq
(let [f0 (first f) fs (rest f)
g0 (first g) gs (rest g)]
(cond (and (v/nullity? f0) (v/nullity? g0))
(div fs gs)
(v/nullity? f0)
(cons f0 (div fs g))
(v/nullity? g0)
(u/arithmetic-ex "ERROR: denominator has a zero constant term")
:else (let [q (g/div f0 g0)]
(cons q (-> (seq:- fs (c*seq q gs))
(div g))))))))
A simple example shows success:
(let [series (->series [0 0 0 4 3 2 1])]
(take 5 (div series series)))
(1 0 0 0 0)
We could generate the reciprocal of \(F\) by dividing \((1, 0, 0, ...)\) by \(F\). Page 21 of an earlier paper by McIlroy gives us a more direct formula.
We want \(R\) such that \(FR = 1\). Expand \(F\):
\begin{equation}
\label{eq:9}
(f + xF_1)R = 1
\end{equation}
Solve for R:
\begin{equation}
\label{eq:10}
R = \frac{1}{f} (1 - x(F_1 R))
\end{equation}
A recursive definition is no problem in the stream abstraction:
(defn invert [f]
(lazy-seq
(let [finv (g/invert (first f))
F1*Finv (seq:* (rest f) (invert f))
tail (c*seq finv (negate F1*Finv))]
(cons finv tail))))
This definition of invert
matches the more straightforward division implementation:
(let [series (iterate inc 3)]
(= (take 5 (invert series))
(take 5 (div (->series [1]) series))))
true
An example:
(let [series (iterate inc 3)]
[(take 5 (seq:* series (invert series)))
(take 5 (div series series))])
[(1N 0N 0N 0N 0N) (1 0 0 0 0)]
Division of a constant by a series comes easily from our previous multiplication definitions and invert
:
(defn c-div-seq [c f]
(c*seq c (invert f)))
It's not obvious that this works:
(let [nats (iterate inc 1)]
(take 6 (c-div-seq 4 nats)))
(4 -8 4 0 0 0)
But we can recover the initial series:
(let [nats (iterate inc 1)
divided (c-div-seq 4 nats)
seq-over-4 (invert divided)
original (seq*c seq-over-4 4)]
(take 5 original))
(1N 2N 3N 4N 5N)
To divide a series by a constant, divide each element of the series:
(defn seq-div-c [f c]
(map #(g// % c) f))
Division by a constant undoes multiplication by a constant:
(let [nats (iterate inc 1)]
(take 5 (seq-div-c (seq*c nats 2) 2)))
(1 2 3 4 5)
To compose two series \(F(x)\) and \(G(x)\) means to create a new series \(F(G(x))\). Derive this by substuting \(G\) for \(x\) in the expansion of \(F\):
\begin{equation}
\begin{aligned}
F(G)&=f+G \times F_{1}(G) \cr
&=f+\left(g+x G_{1}\right) \times F_{1}(G) \cr
&=\left(f+g F_{1}(G)\right)+x G_{1} \times F_{1}(G)
\end{aligned}
\end{equation}
For the stream-based calculation to work, we need to be able to calculate the head element and attach it to an infinite tail; unless \(g=0\) above the head element depends on \(F_1\), an infinite sequence.
If \(g=0\) the calculation simplifies:
\begin{equation}
\label{eq:12}
F(G)=f + x G_{1} \times F_{1}(G)
\end{equation}
In Clojure, using an inner function that captures \(G\):
(defn compose [f g]
(letfn [(step [f]
(lazy-seq
;; NOTE I don't understand why we get a StackOverflow if I move
;; this assertion out of the ~letfn~.
(assert (zero? (first g)))
(let [[f0 & fs] f
gs (rest g)
tail (seq:* gs (step fs))]
(cons f0 tail))))]
(step f)))
Composing \(x^2 = (0, 0, 1, 0, 0, ...)\) should square all $x$s, and give us a sequence of only even powers:
(take 10 (compose (repeat 1)
(->series [0 0 1])))
(1 0 1 0 1 0 1 0 1 0)
The functional inverse of a power series \(F\) is a series \(R\) that satisfies \(F(R(x)) = x\).
Following McIlroy, we expand \(F\) (substituting \(R\) for \(x\)) and one occurrence of \(R\):
\begin{equation}
\label{eq:13}
F(R(x))=f+R \times F_{1}(R)=f+\left(r+x R_{1}\right) \times F_{1}(R)=x
\end{equation}
Just like in the composition derivation, in the general case the head term depends on an infinite sequence. Set \(r=0\) to address this:
\begin{equation}
\label{eq:14}
f+x R_{1} \times F_{1}(R)=x
\end{equation}
For this to work, the constant \(f\) must be 0 as well, hence
\begin{equation}
\label{eq:15}
R_1 = \frac{1}{F_1(R)}
\end{equation}
This works as an implementation because \(r=0\). \(R_1\) is allowed to reference \(R\) thanks to the stream-based approach:
(defn revert [f]
{:pre [(zero? (first f))]}
(letfn [(step [f]
(lazy-seq
(let [F1 (rest f)
R (step f)]
(cons 0 (invert
(compose F1 R))))))]
(step f)))
An example, inverting a series starting with 0:
(let [f (cons 0 (iterate inc 1))]
(take 5 (compose f (revert f))))
(0 1 0 0 0)
Derivatives of power series are simple and mechanical:
\begin{equation}
\label{eq:16}
D(a x^n) = aD(x^n) = a n x^{n-1}
\end{equation}
Implies that all entries shift left by 1, and each new entry gets multiplied by its former index (ie, its new index plus 1).
(defn deriv [f]
(map g/* (rest f) (iterate inc 1)))
(take 6 (deriv (repeat 1)))
(1 2 3 4 5 6)
Which of course we interpret as
\begin{equation}
\label{eq:17}
1 + 2x + 3x^2 + ...
\end{equation}
The definite integral \(\int_0^{x}F(t)dt\) is similar. To take the anti-derivative of each term, move it to the right by appending a constant term onto the sequence and divide each element by its new position:
(defn integral
([s] (integral s 0))
([s constant-term]
(cons constant-term
(map g/div s (iterate inc 1)))))
With a custom constant term:
(take 6 (integral (iterate inc 1) 5))
(5 1 1 1 1 1)
By default, the constant term is 0:
(take 6 (integral (iterate inc 1)))
(0 1 1 1 1 1)
Exponentiation of a power series by some integer is simply repeated multiplication. The implementation here is more efficient the iterating seq:*
, and handles negative exponent terms by inverting the original sequence.
(defn expt [s e]
(letfn [(expt [base pow]
(loop [n pow
y (->series [1])
z base]
(let [t (even? n)
n (quot n 2)]
(cond
t (recur n y (seq:* z z))
(zero? n) (seq:* z y)
:else (recur n (seq:* z y) (seq:* z z))))))]
(cond (pos? e) (expt s e)
(zero? e) (->series [1])
:else (invert (expt s (g/negate e))))))
We can use expt
to verify that \((1+x)^3\) expands to \(1 + 3x + 3x^2 + x^3\):
(take 5 (expt (->series [1 1]) 3))
(1 3 3 1 0)
The square root of a series \(F\) is a series \(Q\) such that \(Q^2 = F\). We can find this using our calculus methods from above:
\begin{equation}
\label{eq:18}
D(F) = 2Q D(Q)
\end{equation}
or
\begin{equation}
\label{eq:19}
D(Q) = \frac{D(F)}{2Q}
\end{equation}
When the head term of \(F\) is nonzero, ie, \(f \neq 0\), the head of \(Q = \sqrt{F}\) must be \(\sqrt{f}\) for the multiplication to work out.
Integrate both sides:
\begin{equation}
\label{eq:20}
Q = \sqrt{f} + \int_0^x \frac{D(F)}{2Q}
\end{equation}
One optimization appears if the first two terms of \(F\) vanish, ie, \(F=x^2F_2\). In this case \(Q = 0 + x \sqrt{F_2}\).
Here it is in Clojure:
(defn sqrt [[f1 & [f2 & fs] :as f]]
(if (and (v/nullity? f1)
(v/nullity? f2))
(cons f1 (sqrt fs))
(let [const (g/sqrt f1)
step (fn step [g]
(lazy-seq
(-> (div (deriv g)
(c*seq 2 (step g)))
(integral const))))]
(step f))))
And a test that we can recover the naturals:
(let [xs (iterate inc 1)]
(take 6 (seq:* (sqrt xs)
(sqrt xs))))
(1 2 3 4 5 6)
We can maintain precision of the first element is the square of a rational number:
(let [xs (iterate inc 9)]
(take 6 (seq:* (sqrt xs)
(sqrt xs))))
(9 10N 11N 12N 13N 14N)
We get a correct result if the sequence starts with \(0, 0\):
(let [xs (concat [0 0] (iterate inc 9))]
(take 6 (seq:* (sqrt xs)
(sqrt xs))))
(0 0 9 10N 11N 12N)
Power series computations can encode polynomial computations. Encoding \((1-2x^2)^3\) as a power series returns the correct result:
(take 10 (expt (->series [1 0 -2]) 3))
(1 0 -6 0 12 0 -8 0 0 0)
Encoding \(\frac{1}{(1-x)}\) returns the power series $1 + x + x^{2} + …$ which sums to that value in its region of convergence:
(take 10 (div (->series [1])
(->series [1 -1])))
(1 1 1 1 1 1 1 1 1 1)
\(\frac{1}{(1-x)^2}\) is the derivative of the above series:
(take 10 (div (->series [1])
(-> (->series [1 -1])
(expt 2))))
(1 2 3 4 5 6 7 8 9 10)
With the above primitives we can define a number of series with somewhat astonishing brevity.
\(e^x\) is its own derivative, so \(e^x = 1 + e^x\):
(def expx
(lazy-seq
(integral expx 1)))
This bare definition is enough to generate the power series for \(e^x\):
(take 10 expx)
(1 1 1/2 1/6 1/24 1/120 1/720 1/5040 1/40320 1/362880)
\(sin\) and \(cos\) afford recursive definitions. \(D(sin) = cos\) and \(D(cos) = -sin\), so (with appropriate constant terms added) on:
(declare cosx)
(def sinx (lazy-seq (integral cosx)))
(def cosx (lazy-seq (c-seq 1 (integral sinx))))
(take 10 sinx)
(0 1 0 -1/6 0 1/120 0 -1/5040 0 1/362880)
(take 10 cosx)
(1 0 -1/2 0 1/24 0 -1/720 0 1/40320 0)
tangent and secant come easily from these:
(def tanx (div sinx cosx))
(def secx (invert cosx))
Reversion lets us define arcsine from sine:
(def asinx (revert sinx))
(def atanx (integral (cycle [1 0 -1 0])))
These two are less elegant, perhaps:
(def acosx (c-seq (g/div 'pi 2) asinx))
(def acotx (c-seq (g/div 'pi 2) atanx))
The hyperbolic trig functions are defined in a similar way:
(declare sinhx)
(def coshx (lazy-seq (integral sinhx 1)))
(def sinhx (lazy-seq (integral coshx)))
(def tanhx (div sinhx coshx))
(def asinhx (revert sinhx))
(def atanhx (revert tanhx))
(def log1-x
(integral (repeat -1)))
;; https://en.wikipedia.org/wiki/Mercator_series
(def log1+x
(integral (cycle [1 -1])))
These are a few more examples from McIlroy's "Power Serious" paper, presented here without context.
(def catalan
(lazy-cat [1] (seq:* catalan catalan)))
(take 10 catalan)
(1 1 2 5 14 42 132 429 1430 4862)
ordered trees…
(declare tree' forest' list')
(def tree' (lazy-cat [0] forest'))
(def list' (lazy-cat [1] list'))
(def forest' (compose list' tree'))
(take 10 tree')
(0 1 1 2 5 14 42 132 429 1430)
The catalan numbers again!
(def fib (lazy-cat [0 1] (map + fib (rest fib))))
See here for the recurrence relation: https://en.wikipedia.org/wiki/Binomial_coefficient#Multiplicative_formula
(defn binomial* [n]
(letfn [(f [acc prev n k]
(if (zero? n)
acc
(let [next (/ (* prev n) k)
acc' (conj! acc next)]
(recur acc' next (dec n) (inc k)))))]
(persistent!
(f (transient [1]) 1 n 1))))
(defn binomial
"The coefficients of (1+x)^n"
[n]
(->series (binomial* n)))
]]>CLJSJS publishes many Javascript packages in a form that you can consume from a Clojure
]]>CLJSJS publishes many Javascript packages in a form that you can consume from a Clojure project. For projects like React, you'll find the latest versions of the JS libraries, packaged up and ready to go. For less active libraries like bignumber.js you might have to go bump a version and open up a pull request against CLJSJS's packages
repository... or maybe package and add the library from scratch, as I did recently with Complex.js.
I have to say that this was a complete pain. I'll eventually write a guide describing what I had to do to upgrade or add six JS libraries, since this was not easy and I know it's blocking other people.
For now, I want to describe the steps I had to take to publish a namespaces version of cljsjs/complex
to Clojars, so that I could start using it immediately without waiting for the maybe-weeks-long turnaround time of a new CLJSJS release.
Clojars.org is the place of record for all Clojure dependencies. It's the easiest way I know of to get a library published for consumption in other projects.
The goal here is to publish a CLJSJS library that you'll eventually depend on like this:
[cljsjs/complex "2.0.11-0"]
With a prefix, so that you can use it today, like this:
[org.clojars.sritchie09/complex "2.0.11-0"]
(The prefixing is important because you can't publish with the cljsjs
prefix unless you're authorized by the CLJSJS maintainers.)
You'll publish the new dependency to Clojars. Head to https://clojars.org and make an account.
Click on "Deploy Tokens" page at the top right:
Follow the instructions and make a token (the name doesn't matter). It will appear on the screen looking something like CLOJARS_1313123123
. Save it (and your Clojars username) by adding two entries like this to your ~/.bashrc
or ~/.bash_profile
:
export CLOJARS_USER=sritchie09
export CLOJARS_PASS=CLOJARS_1313123123
source ~/.bashrc
to pick up the new environment variables.The CLJSJS packages live in this repository. Get it onto your machine by running this command in some directory:
git clone git@github.com:cljsjs/packages.git && cd packages
Now find the dependency you want to release. If you're doing this, you're probably living on a git branch, like I was for the Complex.js pull request. Check out the branch:
git checkout sritchie/complex_dep2
Each JS library has its own folder with a build.boot
file in it. Open up, for example, complex/build.boot
and locate the entry that looks like this:
(task-options!
push {:ensure-clean false}
pom {:project 'cljsjs/complex
:version +version+
:description "A well tested JavaScript library to work with complex number arithmetic in JavaScript."
:url "https://github.com/infusion/Complex.js"
:license {"MIT" "http://opensource.org/licenses/MIT"}
:scm {:url "https://github.com/cljsjs/packages"}})
Change the value referenced by :project
from 'cljsjs/complex
to 'org.clojars.YOUR_CLOJARS_USERNAME/complex
. This was org.clojars.sritchie09/complex
, in my case.
We're so close.
The latest version of Boot won't let you deploy to Clojars (dependency problem!). You can force yourself back to a working version by creating a file called boot.properties
in the packages
directory with these contents:
BOOT_VERSION=2.8.2
Finally, run this command in the subfolder containing the dependency you want to release. complex
, in this case:
boot package push --ensure-release --repo clojars \
--repo-map "{:url \"https://clojars.org/repo/\" :username \"$CLOJARS_USER\" :password \"$CLOJARS_PASS\"}"
If you've set the environment variables correctly, you should see output ending with this:
Writing pom.xml and pom.properties...
Writing complex-2.0.11-0.jar...
Checksums match
Deploying complex-2.0.11-0.jar...
Sending complex-2.0.11-0.jar to https://clojars.org/repo/ (13k)
Sending complex-2.0.11-0.pom to https://clojars.org/repo/ (1k)
Sending maven-metadata.xml to https://clojars.org/repo/ (1k)
That's it! Your dependency will now be live at a URL like https://clojars.org/org.clojars.sritchie09/complex, and you'll be able to include it as a dependency using any of the forms that Clojars displays on that page. For example:
[org.clojars.sritchie09/complex "2.0.11-0"]
Hopefully this has saved you some trouble. Enjoy!
If you see output like this:
clojure.lang.ExceptionInfo: Failed to deploy artifacts: Could not transfer artifact org.clojars.sritchie09:complex:jar:2.0.11-0 from/to clojars (https://clojars.org/repo/): Failed to transfer file: https://clojars.org/repo/org/clojars/sritchie09/complex/2.0.11-0/complex-2.0.11-0.jar. Return code is: 401, ReasonPhrase: Unauthorized.
Make sure that you've properly set the environment variables $CLOJARS_USER
and $CLOJARS_PASS
.
(This is a writeup of Exercise 1.3 from Sussman and Wisdom's "Structure and Interpretation of Classical Mechanics". See the solutions repository for more.)
The problem explores some consequences for optics of the principle of least time. The
]]>(This is a writeup of Exercise 1.3 from Sussman and Wisdom's "Structure and Interpretation of Classical Mechanics". See the solutions repository for more.)
The problem explores some consequences for optics of the principle of least time. The exercise states:
Fermat observed that the laws of reflection and refraction could be accounted for by the following facts: Light travels in a straight line in any particular medium with a velocity that depends upon the medium. The path taken by a ray from a source to a destination through any sequence of media is a path of least total time, compared to neighboring paths. Show that these facts imply the laws of reflection and refraction.
The law of reflection is described in the footnote:
For reflection the angle of incidence is equal to the angle of reflection.
Here's the setup. The horizontal line is a mirror. The law states that \(\theta_1 = \theta_2\).
We have to show that if we consider all possible paths from a given starting point to a given endpoint, the path of minimum time will give us the law of reflection.
The actual path of minimum time is the straight line that avoids the mirror, of course. If we force the light to bounce off of the mirror, then we have to figure out where it will hit, where \(x_p\) is, to minimize the time between the start and end points.
There are two ways to solve this problem. We can use geometry and visual intuition, or we can use calculus.
First, recall this fact from the problem text:
Light travels in a straight line in any particular medium with a velocity that depends upon the medium.
There's no medium change, so if there were no mirror in its path, the light beam would continue in a straight line. Instead of figuring out what the beam will do when it hits the mirror, reflect the endpoint across the mirror and draw a straight line between the start and "end" points:
The angle that the beam makes with the plane of the mirror is the same on both sides of the mirror.
Now reflect the the "end" point and the segment of the beam that's crossed the mirror back up. By symmetry, \(\theta_1 = \theta_2\), and we've proved the law of reflection.
We can also solve this with calculus. Because the beam doesn't change media, its speed \(v\) stays constant, so minimizing the total distance \(d\) is equivalent to minimizing the time \(t = {d \over v}\).
Set \(x_1 = 0\) for convenience, and write the total distance the light travels as a function of \(x_p\):
\begin{equation}
d(x_p) = \sqrt{y_1^2 + x_p^2} + \sqrt{(x_2 - x_p)^2 + y_2^2}
\end{equation}
For practice, we can also define this function in Scheme.
(define ((total-distance x1 y1 x2 y2) xp)
(+ (sqrt (+ (square (+ x1 xp))
(square y1)))
(sqrt (+ (square (- x2 (+ x1 xp)))
(square y2)))))
Here's the function again, generated from code, with general \(t_1\):
(->tex-equation
((total-distance 'x_1 'y_1 'x_2 'y_2) 'x_p))
\begin{equation}
\sqrt{{{x}_{1}}^{2} + 2 {x}_{1} {x}_{p} + {{x}_{p}}^{2} + {{y}_{1}}^{2}} + \sqrt{{{x}_{1}}^{2} - 2 {x}_{1} {x}_{2} + 2 {x}_{1} {x}_{p} + {{x}_{2}}^{2} - 2 {x}_{2} {x}_{p} + {{x}_{p}}^{2} + {{y}_{2}}^{2}}
\end{equation}
To find the \(x_p\) that minimizes the total distance,
The derivative will look cleaner in code if we keep the components of the sum separate and prevent Scheme from "simplifying". Redefine the function to return a tuple:
(define ((total-distance* x1 y1 x2 y2) xp)
(up (sqrt (+ (square (+ x1 xp))
(square y1)))
(sqrt (+ (square (- x2 (+ x1 xp)))
(square y2)))))
Here are the sum components:
(->tex-equation
((total-distance* 0 'y_1 'x_2 'y_2) 'x_p))
\begin{equation}
\begin{pmatrix} \displaystyle{ \sqrt{{{x}_{p}}^{2} + {{y}_{1}}^{2}}} \cr \cr \displaystyle{ \sqrt{{{x}_{2}}^{2} - 2 {x}_{2} {x}_{p} + {{x}_{p}}^{2} + {{y}_{2}}^{2}}}\end{pmatrix}
\end{equation}
Taking a derivative is easy with scmutils
. Just wrap the function in D
:
(let* ((distance-fn (total-distance* 0 'y_1 'x_2 'y_2))
(derivative (D distance-fn)))
(->tex-equation
(derivative 'x_p)))
\begin{equation}
\begin{pmatrix} \displaystyle{ {{{x}_{p}}\over {\sqrt{{{x}_{p}}^{2} + {{y}_{1}}^{2}}}}} \cr \cr \displaystyle{ {{ - {x}_{2} + {x}_{p}}\over {\sqrt{{{x}_{2}}^{2} - 2 {x}_{2} {x}_{p} + {{x}_{p}}^{2} + {{y}_{2}}^{2}}}}}\end{pmatrix}
\end{equation}
The first component is the base of base \(x_p\) of the left triangle over the total length. This ratio is equal to \(\cos \theta_1\):
The bottom component is \(-\cos \theta_2\), or \({- (x_2 - x_p)}\) over the length of the right segment. Add these terms together, set them equal to 0 and rearrange:
\begin{equation}
\label{eq:reflect-laws}
\cos \theta_1 = \cos \theta_2 \implies \theta_1 = \theta_2
\end{equation}
This description in terms of the two incident angles isn't so obvious from the Scheme code. Still, you can use Scheme to check this result.
If the two angles are equal, then the left and right triangles are similar, and the ratio of each base to height is equal:
\begin{equation}
\label{eq:reflect-ratio}
{x_p \over y_1} = {{x_2 - x_p} \over y_2}
\end{equation}
Solve for \(x_p\) and rearrange:
\begin{equation}
\label{eq:reflect-ratio2}
x_p = {{y_1 x_2} \over {y_1 + y_2}}
\end{equation}
Plug this in to the derivative of the original total-distance
function, and we find that the derivative equals 0, as expected:
(let* ((distance-fn (total-distance 0 'y_1 'x_2 'y_2))
(derivative (D distance-fn)))
(->tex-equation
(derivative (/ (* 'y_1 'x_2) (+ 'y_1 'y_2)))))
\begin{equation}
0
\end{equation}
If a beam of light travels in a way that minimizes total distance (and therefore time in a constant medium), then it will reflect off of a mirror with the same angle at which it arrived. The law of reflection holds.
The law of refraction is also called Snell's law. Here's the description from the footnote:
Refraction is described by Snell's law: when light passes from one medium to another, the ratio of the sines of the angles made to the normal to the interface is the inverse of the ratio of the refractive indices of the media. The refractive index is the ratio of the speed of light in the vacuum to the speed of light in the medium.
First we'll tackle this with calculus.
The setup here is slightly different. We have a light beam traveling from one medium to another and changing speeds at a boundary located \(a\) to the right of the starting point. The goal is to figure out the point where the light will hit the boundary, if we assume that the light will take the path of least time.
The refractive index \(n_i = {c \over v_i}\), the speed of light \(c\) in a vacuum over the speed in the material. Rearranging, \(v_i = {c \over n_i}\).
Time is distance over speed, so the total time that the beam spends between the start and end points as a function of \(y_p\), the point of contact with the boundary, is:
\begin{equation}
\begin{split}
t(y_p) & = {c \sqrt{a^2 + y_p^2}\over v_1} + {c \sqrt{(x_2 - x_p)^2 + y_2^2} \over v_2} \
& = {n_1 \over c} \sqrt{a^2 + y_p^2} + {n_2 \over c} \sqrt{(x_2 - x_p)^2 + y_2^2}
\end{split}
\end{equation}
Take the derivative:
\begin{equation}
Dt(y_p) = {1 \over c} \left({n_1 y_p \over \sqrt{a^2 + y_p^2}} - {n_2 (x_2 - x_p) \over \sqrt{(x_2 - x_p)^2 + y_2^2}}\right)
\end{equation}
Set the derivative equal to 0 and split terms:
\begin{equation}
\label{eq:almost-snell}
{n_1 y_p \over \sqrt{a^2 + y_p^2}} = {n_2 (x_2 - x_p) \over \sqrt{(x_2 - x_p)^2 + y_2^2}}
\end{equation}
Similar to the law of reflection's result, each term (up to its \(n_i\) multiple) is equal to the height of the left or right triangle over the length of the beam's path on the left or right of the boundary.
Equation \eqref{eq:almost-snell} simplifies to:
\begin{equation}
n_1 \sin \theta_1 = n_2 \sin \theta_2
\end{equation}
Rearranging yields Snell's law:
\begin{equation}
{n_1 \over n_2} = {\sin \theta_2 \over \sin \theta_1}
\end{equation}
I won't recreate this here, but the Feynman Lectures on Physics, in Lecture 26, has a fantastic discussion about, and derivation of, the law of refraction using no calculus, just geometry. I highly recommend you check out that lecture. Feynman lays out a number of examples of how the principle of least time is not just a restatement of the optical rules we already knew.
You can use the idea to guess what shape of mirror you'd want to build to focus many light rays on a single point (a parabola), or how you might force all light rays coming out of a single point to meet up again at another point (build a converging lens).
This whole area of optics and least time has obsessed scientists for hundreds of years. Spend a few minutes poking around and see what you find.
]]>I've been reading the lovely Visual Complex Analysis by Tristan Needham, and the visual-style proofs he's been throwing down have been wonderful and refreshing. I'll write more about this book and its goals later, but I was inspired this AM to write up a proof of the half angle identities
]]>I've been reading the lovely Visual Complex Analysis by Tristan Needham, and the visual-style proofs he's been throwing down have been wonderful and refreshing. I'll write more about this book and its goals later, but I was inspired this AM to write up a proof of the half angle identities from trigonometry using some of the tools from the book.
Here's the half angle identity for cosine:
\begin{equation}
\label{eq:half-angle}
\cos {\theta \over 2} = \sqrt{{\cos \theta + 1} \over 2}
\end{equation}
This is an equation that lets you express the cosine for half of some angle \(\theta\) in terms of the cosine of the angle itself. As you can imagine, there are double-angle, triple angle, all sorts of identities that you can sweat out next time you find yourself in a 9th grade classroom.
It turns out that these derivations all become much more fun with Euler's Formula:
\begin{equation}
\label{eq:euler}
e^{i\theta} = \cos\theta + i \sin\theta
\end{equation}
Let's say you want to figure out the half-angle identity. You have some function of cosine of half of an angle, and you want to pull the \(1 \over 2\) out of the cosine.
(This came up for me a couple of days ago in a problem Dave and I ran into working on Hephaestus, our quantum mechanics simulator in Clojure. More on that later.)
If you fiddle with the \(e^{i \theta}\) representation of complex numbers, a very clear relationship appears:
\begin{equation}
\label{eq:expansion}
e^{i\theta} = e^{2 i{\theta \over 2}} = (e^{i{\theta \over 2}})^2
\end{equation}
Next, use Euler's equation \eqref{eq:euler} and observe that we can rewrite the right side of equation \eqref{eq:expansion} as \((e^{i {\theta \over 2}})^2 = (\cos{\theta \over 2} + i \sin{\theta \over 2})^2\). Expand this out:
\begin{equation}
\label{eq:newnum}
\cos^2 {\theta \over 2} - \sin^2 {\theta \over 2} + i (2 \sin {\theta \over 2} \cos{\theta \over 2})
\end{equation}
Remember, from equation \eqref{eq:expansion}, that this is a new complex number that equals \(e^{i\theta} = \cos\theta + i \sin\theta\). The real parts of both sides are equal:
\begin{equation}
\cos \theta = \cos^2 {\theta \over 2} - \sin^2 {\theta \over 2}
\label{eq:real}
\end{equation}
and the imaginary parts are equal:
\begin{equation}
\label{eq:imag}
\sin \theta = 2 \sin {\theta \over 2} \cos{\theta \over 2}
\end{equation}
Interesting.
Next, use the \(1 - \sin^2 \theta = \cos^2 \theta\) identity on the right side of \eqref{eq:real} and simplify:
\begin{equation}
\label{eq:real-simple}
\cos \theta = 2 \cos^2{\theta \over 2} - 1
\end{equation}
Rearrange and take the square root to get (gasp!) the half-angle identity:
\begin{equation}
\label{eq:cos-half}
\cos {\theta \over 2} = \sqrt{{\cos \theta + 1} \over 2}
\end{equation}
The familiar half angle identity is a nice consequence of equation \eqref{eq:real}.
We still have equation \eqref{eq:imag}. Could that lead us to the half-angle identity for sine?
Here's the imaginary component again:
\begin{equation}
\label{eq:imag2}
\sin \theta = 2 \sin {\theta \over 2} \cos{\theta \over 2}
\end{equation}
Substitute what we just derived for the cosine half-angle:
\begin{equation}
\label{eq:5}
\sin {\theta} = 2 \sin {\theta \over 2} \sqrt{{\cos \theta + 1} \over 2}
\end{equation}
Square each side and cancel:
\begin{equation}
\label{eq:6}
\sin^2 {\theta} = 2 \sin^2 {\theta \over 2} (\cos \theta + 1)
\end{equation}
Use our identity from before on the left side:
\begin{equation}
\label{eq:10}
\sin^2 \theta = 1 - \cos^2 \theta = (1 + \cos \theta)(1 - \cos \theta)
\end{equation}
Divide through, and take the square root:
\begin{equation}
\label{eq:sin-half}
\sin {\theta \over 2} = \sqrt{{1 + \cos \theta} \over 2}
\end{equation}
And, boom, there it is! the half-angle identity for sine.
If you look back at equation \eqref{eq:expansion} you'll see that this trick would work for any factor inside the exponent, just just the \(1 \over 2\). A similar trick can help you figure out any of sum and difference identities, using observations like this:
\begin{equation}
\label{eq:sumdiff}
e^{i(\theta + \phi)} = e^{i\theta}e^{i\phi} = (\cos \theta + i \sin \theta)(\cos \phi + i \sin \phi)
\end{equation}
The derivation above was much easier for me to understand and push through than the usual geometric derivations I've seen. And, eerily, in going after one of the half angle identities, the other one came along for the ride.
As Tristan Needham says, "every complex equation says two things at once." (p15, Visual Complex Analysis).
Check out the book, leave it by the bedside, and bend your mind a little by digging into the mysteries of complex numbers.
]]>I feel like I'm never going to publish this damned thing... so here it is, in bullet point form. Maybe I'll tidy it up some day.
Niwot's Challenge, a very fun race out in South Platte. Jason invited me to this damned thing last week...
Text from Jason: "Free Sat?
]]>I feel like I'm never going to publish this damned thing... so here it is, in bullet point form. Maybe I'll tidy it up some day.
Niwot's Challenge, a very fun race out in South Platte. Jason invited me to this damned thing last week...
Text from Jason: "Free Sat?"
All I could find was this one race report from Erik Sanders: https://gearjunkie.com/colorado-ultra-mountain-race-niwots-challenge
Prep for the Infinity Loop later this summer. Race details, what does it entail... I'm totally detached from how hard it might be. "It's only 45 miles", says Jenna.
On Wednesday (I think) we hung out to talk about what gear we'd need and how to prepare. We got as far as discussing gaiters and tights, then ran out of items. We both kind of know how to do these things, right? I'd never been in a proper adventure race where you use only map and compass, so I was relying on Jason for that. Jenna was adamant that I make Jason understand how ill prepared I was at navigating.
"I'm totally dialed. Just a note - I'm totally dialed at finding out where I am in an emergency... I've never had to navigate while MOVING, but how hard could it be?"
Okay, prep, Jenna drops me off at Jason's house on Friday, we head on up. Jason has gatorade, a satchel of little debbie pies, SPKs, amazing food... I've got a bunch of BS I bought at Lucky's, empanadas, bladder, etc, stuff that's worked in running races in the past. We head on up.
We hit some mexican food, had a beer each, then left. On the way we do a dramatic reading of the course guide. You'll have to enter to see it, but it's full of curious details that we were using to try and get some insight into "Sherpa" John's mind. There was an obsessive focus on his favorite pizza, but no details. Everything we read pointed to an unhinged mind. Not a good sign for our health out on the course.
We arrive at the El Dorado hotel, a dead old building with a big flagpole and a spraypainted skull and cross bones with a circle-slash "NO" sign over it, above the word "BATS!". No death to bats? What does this mean? Beautiful spot by the river, with beautiful weather.
We get there, build the table, get sorted and go off to join the crew. Miguel is cooking cheese-filled fried doughnuts.
We get to bed. At this point neither of us had read the description of the course carefully, let alone "studied". We read the course description, noting that it just did not sound that bad. If you read something dramatically, have you learned it?
Get up, super cold, get dressed. "Watch out in the ladies room... there's a black widow posted up. I'm gonna smash it," yelled a racer.
"Ceremony" where we're allocated to the starting groups, then allowed to trade. I trade into J's group. Final prep, it's light... the peace pipe's lit and we head out.
There are only two of us heading in our direction. What the heck? Bill and (woman's name). Bill's clearly skeptical of our tights, Capris in Jason's case, and our obvious lack of knowledge of the terrain. He's done the race before and disapproved of our fast speed. We left Bill behind pretty quickly, feeling great about how fast and fit we were.
"I don't want to speak too soon, buttttt I think we might be crushing this thing!"
Theme of the day - mapping things out on foot vs trusting the map. We tried to lean heavily on our stoke and athleticism, and put in a ton more effort than we probably needed to.
Next book went well... we were getting nervous and settled into checking ALL drainages for cairns, not relaxing. Head home, no big deal.
refueled, took in a massive amount of gatorade, downed an ensure, probably 500 calories. Total shock to the system, nausea rising.
So no headlamps :)
NO ONE had come back yet from Suffer, even with our two hour fuckup. This is with 6.5 hours elapsed. Bill came in two hours ahead of us then quit - it turned out he had never planned on going more than one lap. Maybe he was even smarter than we'd given him credit for...
In our minds this was a huge opportunity to make it back by the 13 hour cutoff. 6.5 more hours to do a shorter lap? Easy, especially if we nail the navigation!
Trudge out, actually crush the first book.
We met one guy coming down who was fully shell shocked - he'd been wandering around for three hours trying to find the descent that he was now on. He told us that he'd filled up his waterbottles six times, the heat was so bad.
STICK TO THE HEADINGS - 70, 90. Do not forget! More nailing it.
We get to a 5th class wall. Fuck, that is definitely not safe to ascend, and there's no way that's the course. What can we do? We think we know where we are... we decide to go down the drainage to the river. The idea is that we'll just traverse the shore. We get there, so rough, getting scraped to pieces and... sure enough, full deep water solo cliffs.
Okay, we'll swim. Oh shit, it's a half mile.
We scramble up and out and scramble-skirt the contour to the right inlet. Takes forever.
We get the next book, evaluate our time and we're totally hosed by now. It's going to get dark, I have no headlamps, remember.
We reverse. Things are going better. We get water.
I snap a pole. Boom. That's great! Now I have a free hand to hold the iPhone that I'm using as a light. Following J, crashing through walls of scrub. With the headlamp they look like impenetrable walls. A mile goes forever.
We make it to the first intermediate checkpoint we wanted, then the second... we might be nailing this!
7 people left on the chief's loop and none survived.
Some of the drop reasons:
Oh well. We'll have to go back next year and try this thing out, get it done for real.
(UPDATE - we did, and we finished! Read the 2019 Race Report here.)
Jason's Story: https://www.instagram.com/stories/highlights/17929181812103193/
]]>I ran this race in 2019 for the
]]>I ran this race in 2019 for the second time, finishing in 27 hours and 17 minutes with Matt Fackrell and Jason Antin, my long-time collaborator on such beauties as "The Rainier Infinity Loop!" and the quite absurd "Tahoe 200, 2018 edition". We ran a good portion of the race with Todd Salzer; Todd guided us through the first two loops, but unfortunately didn't finish the 2019 edition. I know he'll be back in 2020, this year.
In lieu of a race report, please accept this photo and video tour of our glory.
We did the usual things. Nothing to note here. No black widows at the start.
First book!
We collected the next two books without any notable events.
We nailed Suffer with almost no scrub oak... except for the big climb up to "Jason's Knob", as Sherpa John likes to mutter late at night, mouth full of the fried-chicken nightguard he pops in each evening before bed.
Here's Todd climbing the hill. He commented around here that Niwot's was a "young man's game"... that was before chowing caffeine gum and snatching the map from Jason's hands, late in the night.
Look at this beast, so casual, wearing climbing pants on a run.
Up a steep hill, dominating, sending the gnar. Why am I making this face?
We met up with Matt on the climb up to the Chief's loop. Here we are, trying to find the first book, which eluded us for over an hour. We almost left before doubling back and finally finding the thing.
We snagged that book, the next, then bombed down a hill with a good bit of snow to a great refill at a stream in the cold. It took forever; I had forgotten my BeFree bottle, and Jason's was completely clogged from months of silt-sifting.
One mile down the road, hit the bridge crossing the stream and start climbing up. If we'd had a green laser pointer we would have been able to see the powerlines above.
Here's the tower, with no book!
A huddle, trying to find out what the hell happened:
I don't want to remember the madness of Book 14, where we wandered for hours. Or the Wall, climbing carefully up a 2k+ foot extremely steep hill full of dinner plate rocks. Thank god no one kicked anything down.
Crawling through scrub oak. Jason, sobbing like a bitch. Or was that me? Or the ghost of Sherpa John? It'll be forever unclear.
Far later, the next morning, Matt and I sit looking at the map, trying to figure out the final long bomb down the hill.
Here we are before the descent down the powerlines. It was so steep, so fun.
After a long, hot slog in the sun, we were done! We were Chiefs! I was dubbed "Chief Many Coves", after the events of 2018.
That's it for now. A 600 word race report, 14,500 words shorter than my Tahoe 200.
]]>Check it out below, or click here for a direct image link to stare at a big
]]>Check it out below, or click here for a direct image link to stare at a big version in the browser.
]]>I had a hard time with this book; it showed up in the TED book club mailing months ago, and I went into my reading primed to love the message, which is concise and persuasive... but maybe so persuasive on its own that the book feels like 300 pages of
]]>I had a hard time with this book; it showed up in the TED book club mailing months ago, and I went into my reading primed to love the message, which is concise and persuasive... but maybe so persuasive on its own that the book feels like 300 pages of filler.
David Brooks's "The Second Mountain" is about the sense he's developed, later in his life, that our lives play out on a landscape populated by peaks whose summits, we're told, offer peace, happiness, alleviation from the anxiety of modern life. If you get to the top, earn the PhD, become CEO, you can stop.
But then you reach the peak, and you realize that the gnawing discomfort you have with your own life was there because, just maybe, your whole viewpoint of the world as a peak-covered landscape where gladiators fight for zero-sum... what, Strava CRs? is a broken model.
David Brooks has looked at his own life and the lives of other folks he's found who have achieved a deep sense of fulfillment and found that the pattern is the same. These people reached the top, or gave up on the attempt, and set their sights on a second mountain - some objective or life goal not about them, anymore, but characterized by a thick web of connections to other people, and to a life of service.
I almost think of this like a decision to distribute your sense of self out among as many people as you can. By confining your sense of who you are into one body, you end up feeling, predictably, like you're an island, alone in a big empty world. By dedicating yourself to some cause that ignores you completely and can only be fulfilled my making other people fulfilled - by climbing the second mountain - you can cut past a whole tangled mess of self-doubt.
I love that message. But the book, as a vehicle for this message, has a big problem.
The problem is that if you already believe in this idea, or haven't yet made the leap but get the shape of what I've sketched out above, the book doesn't add anything. I didn't find anything in "The Second Mountain" that I could use to persuade, say, myself as a college student, worrying about grades and internships, that I was playing a fool's game.
Brooks has filled the book with examples of people that have found their second mountain, and are happier as a result. But why? If you take it as self-evident that this is a better way to live, why do you need examples?
What I wanted was a kick in the ass that would help me get over the current anxiety I've been feeling about job security, money and finances — anxieties that I intellectually believe are stupid, given my bank account and family situation, but that I emotionally can't quite let go of. I would be financially fine for at least, say, 10 years if I started giving away half of my income. Why am I not doing that?
Maybe my reaction to this book is unfair; I found that the stories in the book were too vague to motivate me to make any change in my own life. Maybe it's a character flaw that I was looking for a manual.
The goal of reading, as Mortimer Adler states in How to Read a Book, is to achieve enlightenment of a sort. With a practical book, if you come to the end of the book and find that you agree with the premises, and that the author's conclusions make sense, that there are no logical errors... well, then you're obligated to do what the author suggests. A practical book is trying to persuade you of something.
I'm persuaded, but I wanted my hand held. I wanted Brooks to give me a guide for how to find the second mountain, and how to just turn back and forget about tagging the first summit. I didn't find it here, and I suspect you won't either... but maybe the fact that I'm so annoyed by this means the book has done its job.
Here's the link to "The Second Mountain", by David Brooks. If you want to hear him speaking about his previous work, especially "The Road to Character", check him out on Sam Harris's podcast.
]]>My knowledge of the history here is probably at the level of a Just-So Story; still, I've been finding it helpful to have some vague idea of why folks started developing some area of mathematics before digging into the details, so I'll pass on what I've got to you.
My understanding (backed up by Wikipedia) is that the idea of $i$ came up as a sort of naughty, wtf-is-going-on way of finding solutions to equations like this:
$$x^2 + 1 = 0$$
If you look at the graph of $y = x^2 + 1$, you can see that there is no point where some $x$ gives you 0 back out.
But what if you just went into overdrive and solved the equation anyway? $x^2=-1$, so just make up a new thing, call it $i$. What is $i$? Well, it's something that, when you square it, gives you back $-1$. Now that equation has two solutions, like any good quadratic function: $i$ and $-i$.
According to Wikipedia, Descartes was the first person to call this number imaginary:
[...] sometimes only imaginary, that is one can imagine as many as I said in each equation, but sometimes there exists no quantity that matches that which we imagine. (Wikipedia)
This name is a mathematical micro-aggression. It's another step in the centuries-long discomfort mathematicians have had with opening up the idea of what a "number" is, from natural numbers, to rational, to real, and now, with $i$, the first step toward the idea of a complex number of the form $a + bi$, where $a$ and $b$ are both real numbers.
$i = \sqrt{-1}$ as an accounting device didn't work for me at all. Why does this matter?
What did work well was Gauss's realization that you can think of a complex number — again, a number of the form $a + bi$ — as a point in a two dimensional plane (the "complex plane").
It turns out that you can visualize multiplication by $i$ as a 90 degree rotation in the complex plane.
You can see from this that if you have some number $a$, and you multiply it by $i$, $ai$ now lives on the y axis of the graph, 90 degrees from where you started.
If you multiply $ai$ by $i$, you get $ai^2 = -a$, which lives back on the x axis, another 90 degrees around, or 180 degrees from where you began. ($i^2 = -1$, a 180 degree rotation, or a reflection across the y axis. See?)
Keep going around to get $-ai$:
And a final multiplication by $i$ gets you back to $-ai \cdot i = -ai^2 = -(-a) = a$, a full 360 degrees around from where you started.
Here's Gauss, expressing his frustration at the "imaginary" terminology:
If this subject has hitherto been considered from the wrong viewpoint and thus enveloped in mystery and surrounded by darkness, it is largely an unsuitable terminology which should be blamed. Had $+1$, $-1$ and $\sqrt{-1}$, instead of being called positive, negative and imaginary (or worse still, impossible) unity, been given the names say , of direct, inverse and lateral unity, there would hardly have been any scope for such obscurity.” - Gauss, via Wikipedia
If you imagine $ai$ as a number along the y axis of a complex plane, and $a$ as some number along the x axis, the next natural step is to imagine how you'd represent arbitrary points on the plane.
If you call the axes "1" and "$i$" instead of $\hat{i}$ and $\hat{j}$, you'll see that you can write every point as $a + bi$.
You should also be able to see from the pictures above that you could represent each point, each complex number, as a vector, a line segment stretching from the origin of the complex plane to the point $(a, b)$.
Okay. We saw about that multiplication by $i$ is the same as rotating a number around the origin by 90 degrees.
I'm going to claim that not only is that fact true... but that multiplying any complex number $x$ by another complex number $y$ transforms the vector represented by $x$ by:
First, let's do it symbolically, then with a picture.
TODO work out the math:
\begin{equation}
(a + bi)(c + di) = ac + adi + bci + bd(i^2) = (ac - bd) + (ad + bc)i
\end{equation}
\begin{equation}
r_1(\cos \theta_1 + i \sin \theta_1)r_2(\cos \theta_2 + i \sin \theta_2)
\label{eq:angles}
\end{equation}
\begin{equation}
r_1 r_2 ((\cos \theta_1 \cos \theta_2 - \sin \theta_1 \sin \theta_2)+ i (\cos \theta_1 \sin \theta_2 + \sin \theta_1 \cos \theta_2))
\end{equation}
Next, go look up a table of trig identities and realize that the orderly arrangements here of $\sin$ and $\cos$ look very much like the angle difference identities:
\begin{equation}
\cos \theta_1 \cos \theta_2 - \sin \theta_1 \sin \theta_2 = \cos{\theta_1 + \theta_2}
\end{equation}
and:
\begin{equation}
\cos \theta_1 \sin \theta_2 + \sin \theta_1 \cos \theta_2 = \sin{\theta_1 + \theta_2}
\end{equation}
Using those, the original product collapses down to:
\begin{equation}
r_1 r_2 (cos(\theta_1 + \theta_2) + i \sin(\theta_1 + \theta_2))
\label{eq:confirmangles}
\end{equation}
Might be more obvious from this picture:
This picture shows visually all of the relationships we need to verify that you can think of complex number multiplication as rotation and scaling in the complex plane. That is, if you have the patience to go verify all the triangle relationships.
The two complex numbers that this picture represents are $c_1 = \cos \beta + i \sin \beta$ and $c_2 = \cos \alpha + i \sin \alpha$.
Imagine the red triangle rotated up by an additional angle of $\alpha$. We can work out the same math as in \eqref{eq:angles} above, with $\alpha$ and $\beta$ subbed in, and $r_1 = r_2 = 1$:
\begin{equation}
(\cos \alpha + i \sin \alpha)(\cos \beta + i \sin \beta)
\end{equation}
If you look at the white triangle on the left, you can verify that the final triangle has dimensions:
\begin{equation}
(cos(\alpha + \beta) + i \sin(\alpha + \beta))
\end{equation}
Just like we verified in \eqref{eq:confirmangles}.
The algebra above is a mess. Is there a better way to write these things?
Yes!
It turns out that you can use the amazing Euler's Formula:
\begin{equation}
e^{ix} = \cos x + i \sin x
\label{eq:euler}
\end{equation}
To write any complex number $a + bi$ as $r (\cos \theta + i \sin \theta) = r e^{i \theta}$.
This feels fairly spooky. It does make the math easier. When you multiply powers, you simply add the exponents, so the product of two complex numbers both becomes easy, and checks out with what we found above:
\begin{equation}
(a + bi)(c + di) = r_1 e^{i \theta_1} r_2 e^{i \theta_2} = r_1 r_2 e^{i (\theta_1 + \theta_2)}
\end{equation}
The new complex number has magnitude $r_1 r_2$ and an angle equal $\theta_1 + \theta_2$.
But why does this work? Where did $e$ come from, and how is it linked to angles, cosine and sine?
To answer that question we need our final piece of machinery: the Taylor series approximation.
Notes from memory, though originally I found this page helpful.
The goal here is to motivate why we can talk about complex numbers as $e^{i\theta}$.
blah, get after it, smooth functions, show the steps:
\begin{equation}
f(x) = a_0 + a_1 (x - c) + a_2 (x - c)^2 + ... + a_n (x - c)^n
\end{equation}
What can we say about the coefficients? Figuring out the coefficients is the key to figuring out the taylor series expansion of some function.
To get $a_0$, calculate $f(c)$, and all other terms go to 0.
\begin{equation}
f(c) = a_0
\end{equation}
How about $a_1$? If the function is "smooth", then every derivative exists. That's a clue. Take
\begin{equation}
f'(x) = a_1 + 2 a_2 (x - c) + ... + n a_n (x - c)^{n-1}
\end{equation}
Taking the derivative subtracts $1$ from each exponent, conveniently setting the constant term to zero and bringing the next coefficient within our grasp.
In general, the coefficient of the $(x-c)^n$ term will have to
\begin{equation}
a_n = \frac{f^n(c)}{n!}
\end{equation}
So the whole taylor series expansion is:
\begin{equation}
f(x) = f(c) + f'(c) (x - c) + \frac{f''(c)}{2} (x - c)^2 + ... + \frac{f^n(c)}{n!} (x - c)^n
\end{equation}
Or, more compactly:
\begin{equation}
f(x) = \sum_{j = 0}^{n} \frac{f^j(c)}{j!} (x - c)^j
\end{equation}
How does this help us?
Well... $e^x$ is its own derivative! $f(x) = f^n(x) = e^x$.
The taylor series expansion of $e^x$ around $c=0$ is:
\begin{equation}
e^x = 1 + x + \frac{1}{2} x^2 + ... + \frac{1}{n!} x^n = \sum_{j= 0}^{n} \frac{1}{j!} x^j
\end{equation}
Then, for $\cos x$ around 0:
\begin{equation}
\cos x = 1 + \frac{f''(c)}{2} (x - c)^2 + ... + \frac{f^n(c)}{n!} (x - c)^n = \sum_{j= 0}^{n} \frac{1}{j!} x^j
\end{equation}
then $\sin x$ around 0:
\begin{equation}
e^x = f(c) + f'(c) (x - c) + \frac{f''(c)}{2} (x - c)^2 + ... + \frac{f^n(c)}{n!} (x - c)^n = \sum_{j= 0}^{n} \frac{1}{j!} x^j
\end{equation}
Stare at these...
then do the steps for $e^x$, which is its own derivative.
That gets us back to the amazing equation \eqref{eq:euler}, repeated here:
\begin{equation}
e^{ix} = \cos x + i \sin x
\end{equation}
Stare at the terms, get it to make sense.
This means that multiplication by a number that looks like $e^{i \theta}$ is equivalent to a rotation of $\theta$ in the complex plane.
Where can you use this?
]]>In my ongoing quest to lay a more solid foundation for this new, strange life as a machine learning "researcher", I've been going through various foundational concepts and ideas and trying to build up rock solid intuitions that I can lean on for years. (Why the hell didn't I do this back in school??)
Entropy is my latest obsession - thermodynamic entropy, and information entropy, and the ways in which these two things are similar.
This is the first of a short series of posts building up the mathematical concepts and tools required for a solid understanding of what entropy is. The rough plan is to cover:
Once the series is complete, I'll come back and fill in some reasons why you might be interested in this stuff and generally polish things up. If you've found this post in its current state, read on for an unmotivated account of how many different ways there are to pluck items out of a bag.
This post will cover:
For each, I'll present the formula you'll typically see, and then show off some intuitive ways to think about the problem that should let you figure out the formulas on your own whenever you need to use them.
Let's say you have some set of items:
The permutations of the set are all of the possible ways that you can arrange, in order, the items in that set. There are 6 possible permutations of the set above:
Some examples of permutations you might care about are:
Is there some link between the number of items in a set of size $n$ and the number of possible permutations of those items?
Of course there is! The number of permutations is
\begin{equation}
permutations(n) = n!
\label{eq:permutations}
\end{equation}
where $n!$ is the factorial function:
\begin{equation}
n! = n \cdot n - 1 \cdot n -2 \cdot ... 1 = \prod_{i=1}^n n_i
\label{eq:factorial}
\end{equation}
This checks out for a few examples:
Why is this relationship generally true? Let's first look at a visual example of how you'd generate permutations from some set, then look at some code.
If we have a set of $n$ items, to generate a permutation we have to start by making a single choice. We have $n$ different choices we can make, each of which is a valid start to a permutation. Each of those choices will leave $n-1$ items left in the set.
Here's a drawing of each possible choice, linked to the remaining items.
for each of the remaining items we can play the same game. We have $n-1$ choices in each set; after each choice, we'll have $n-2$ remaining items.
Stay focused on those first two levels for a second. Each of the $n-1$ choices above is for the second item in a permutation; because we had $n$ original choices, each of which resulted in $n-1$ more possible decisions, we've learned that there are $n(n -1)$ possible ways to make our first two choices.
This game continues all the way down until each branch has a single choice left. By that time, we've made $n!$ total choices:
\begin{equation}
n! = n \cdot n - 1 \cdot n -2 \cdot ... 1
\label{eq:fac2}
\end{equation}
Factorials show up when you're counting up groups of things where each choice takes an item out of the mix.
You can also visualize permutations as the number of total possible paths through a tree, where each branch plucks some item off of the set. The diagram above is a tree:
This post by Shawn O'Mara has a wonderful set of diagrams and descriptions of permutations and other combinatoric goodies using trees. Here's a diagram similar to mine for the set $\{a, b, c, d\}$:
Each path through the tree represents a permutation.
If you could the number of branches at each level of the tree, you'll see that they slowly descend from 4 branches at the first level, to 3, then 2, then 1. Each level of branching represents the "many worlds" that a choice between the level's branches represents.
You can also count the number of items at each level. Each path has to end in a final "leaf", so counting leaves is the same as counting paths. The tree above has $4 \cdot 3 \cdot 2 \cdot 1 = 24 = 4!$ leaves, just like we expected.
How do you generate permutations in code? Here's one attempt in Scala:
/**
This function generates the set of all possible permutations of the items in
the input set.
*/
def permutations[A](items: Set[A]): Set[List[A]] =
if (items.isEmpty) Set(List.empty)
else {
items.flatMap { x =>
// For every item in the input set:
//
// - generate all possible permutations of the set WITHOUT that item
// present; we expect (n - 1)! total permutations.
// - loop through all of those permutations and stick the removed item
// onto the beginning to generate (n -1)! permutations, each with the
// removed item on the front.
// - the "flatMap" above calls this inner function once for each of the
// n items, then sews together all of the returned sets... this gives us
// a final set of size n * (n - 1)! = n!
permutations(items - x).map(x :: _)
}
}
Again, in English:
The function generates the actual permutations, but if you track the size of the sets that are passed down into recursive calls of permutations
, you'll notice that each recursive call removes one item from the set. If permutations
receives an empty input set, it returns a set with 1 item in it.
Think of a "loop counter" associated with each call to permutations
. The function gets called $n$ times on each loop; on each call, permutations
is called with $n - 1$ items, then the results are all fused together.
We know that each item is distinct because sets can't contain duplicates, and we add the removed item back on only after the recursive call.
The function above is very close to a proof that the total number of permutations for a set of size $n$ equals $n!$.
Here's another implementation in Scala that generates permutations in a different way:
def permutationsTwo[A](input: Set[A]): Set[List[A]] = {
// we use an inner function called "loop" so that we can hide the fact that
// we're converting the input set into a list. Sets don't have ordering, but
// we need to enforce one for this approach.
def loop(items: List[A]): Set[List[A]] =
items match {
// the base case returns a set with 1 item.
case Nil => Set(List.empty)
// this pattern match breaks the items into the first item in the list and
// the remaining items; xs has size n - 1, 1 fewer than the size of
// `items`.
case x :: xs =>
// loop is called recursively here with a "loop counter" of n - 1.
loop(xs).flatMap { permutation =>
// Each of those n - 1 entries looks something like (b,a,c). For
// each of these, the next block of code generates new lists by:
//
// - inserting the element that was NOT passed down into the loop -
// say, "d" - into the slot AFTER every one of the existing
// elements, and
// - adding one extra list with "d" at the beginning
// for a total of n new permutations for each of the (n - 1) passed
// back up through the loop.
//
// n * permutations(n - 1) = n! total permutations, since we bottom
// out at 1.
(0 to permutation.size).map { i =>
val (pre, post) = permutation.splitAt(i)
pre ++ List(x) ++ post
}
}
}
loop(input.toList)
}
Different approach to generating cardinalities; same logic for tracking how the sizes of the returned set increase with each new item in the input set.
What happens if you pass the empty set into the functions above? How many permutations can you take from a set with nothing in it?
You'd think it would be $0$, and that $0! = 0$. The problem is that $0$ is the multiplicative identity, and if the definition of factorial let you go all the way down to $0$, as in $3 \cdot 2 \cdot 1 \cdot 0$, the $0$ would destroy everything and set the factorial equal to $0$ ALWAYS.
It's obvious in the code examples above that to make everything work, you have to return a set that contains ONE permutation - the empty set! $0! = 1$ for any implementation I can think of in Scala.
The mathematical argument for why this has to be true feels silly, but there's no reason it can't hold.
The permutations of a set are all of the possible distinct orderings that contain the same number of items as the original set. Well, there is ONE list (an ordered data structure) that contains the same number of items... the empty list! There are no other lists with 0 items, so we can rule out every ordered thing... except for the empty list. If we can't rule out a permutation, we have to include it, so there it is: the permutations of $\{\} = \{()\}$, and $0! = 1$.
https://en.wikipedia.org/wiki/Permutation#k-permutations_of_n
What if you want to stop? Well...
What's the equation here? We just stop when we have k items remaining. How do we express that as a nice formula?
same as before.
Same as before, we just stop.
Very similar, but this
def kPermutations[A](items: Set[A], k: Int): Set[List[A]] = {
assert(k >= 0 && k <= items.size)
if (k == 0) Set(List.empty)
else {
items.flatMap { x =>
kPermutations(items - x, k - 1).map(x :: _)
}
}
}
then combinations, you're plucking sets. $nPr / r!$, divide out the permutations of $r$ items buried in.
def combinations[A](items: Set[A], k: Int): Set[Set[A]] =
kPermutations(items, k).map(_.toSet)
def factorial(n: Int): Int =
if (n == 0) 1 else n * factorial(n - 1)
def numCombinations(n: Int, k: Int): Int =
factorial(n) / (factorial(n - k) * factorial(k))
Why does the binomial coefficient equal the number of combinations here?? That is super weird. Go through and get some intuition here.
Side note... if you add up the number of combinations of $k$ from 0 to $n$, you get what's called the "power set", which has cardinality $2^k$.
WHY IS THAT?
Well... imagine a bit-mask, a series of 1 or 0 that you'll lay over the items in the set. How many possible combinations of 1s and 0s of length n can you make?
Well, we have two choices at first... then we keeping going, and every time we make a choice we multiply.
note about how we can add up all the k permutations to get the total powerset: https://en.wikipedia.org/wiki/Binomial_coefficient#Sums_of_the_binomial_coefficients
// This gets ALL combinations...
def powerset[A](items: Set[A]): Set[Set[A]] = {
@tailrec
def loop(remaining: List[A], ret: Set[Set[A]]): Set[Set[A]] =
remaining match {
case Nil => ret
case x :: xs => loop(xs, ret ++ ret.map(_ + x))
}
loop(items.toList, Set(Set.empty))
}
// nice short way that uses foldLeft to accumulate.
def powerset[A](items: Set[A]): Set[Set[A]] =
items.foldLeft(Set(Set.empty[A])) { case (acc, a) =>
acc ++ acc.map(_ + a)
}
You can sort of see here that the cardinality is the cardinality of... well, think it through.
how many ways can I put n items into k bins? THEN we have to talk about this fantastic derivation of the number of ways to put something into k bins: https://en.wikipedia.org/wiki/Multinomial_theorem#Interpretations
The book is a long look at an insight that blew Harding's mind — from the
]]>What a strange, fantastic little book! I recently read Douglas Harding's "On Having No Head" after hearing Sam Harris mention the book in one of "Waking Up" meditations, and then many times on subsequent podcasts.
The book is a long look at an insight that blew Harding's mind — from the first-person person, as a matter of subjective experience, there was no evidence around that he had a head. Put another way — if you try not to read into what you're seeing, and just describe exactly what's in front of you, not only is there no head on display... but the whole picture of who you are, of the person doing the looking, looks quite strange indeed.
Harding had been thinking about Life's Big Questions for years, but the Headless insight crystallized when he saw this "self portrait" by Ernst Mach:
He's right — there's no head! There's a field of view, limned with a massive nose and odd protruding arms and legs that stick out into the visual field, obscuring the rest of the world. If you take literally what you see, the scales are all out of whack. The nose is huge! The hand is about the same size as the foot, and the hands are different sizes.
Well. So what? The key to the whole thing is, as I'll discuss below, noticing the effortless remapping of these strange-seeming signals onto a coherent body-map that makes it "feel" like I am a thing of the same shape, size and dimension as other people I see walking around.
Okay. I had trouble with this book at first, and I predict that you'll find it as hokey on a first reading as I did. But there is something special here. I'm going to try and describe what it was that I found so fascinating about the Headless insight, and why you should care about, and spend some time recreating, this insight for yourself.
First, some background on what it means to "be present".
If you sit and meditate just calmly observe the thoughts that arise in your mind, noticing what pops up — you'll notice that there is a whole bunch of chaos bubbling around up there. Your consciousness is a soupy combination of direct sensory experience, abstractions, emotions, thoughts, idea bubbles floating in and fading away.
There are many layers of processing your brain applies to the little electrical pulses that your various sense organs are pinging out. I contend that "being present" is the process of
That theater is consciousness — that theater is "you", and "being present" is all about identifying with the theater itself, and not with the layer of autobiographical thought bubbles.
Given all that, what are these layers, and what does it feel like to peel each back — or, equivalently, to start ignoring each one? I'll switch terms here and present each of these phases as "levels" of enlightenment.
The first level involves realizing that when you get lost in thought, much of what you're reacting to isn't even happening right now. You're being assaulted with memories and getting lost in those, sure... but even the act of getting pissed off in traffic is an act of mild obsession with the past. The car cuts you off, and you freeze-frame the moment and start looping on it, even as the moment that annoyed you slips away.
Your face is itching? Great, that's a nice thing to notice. You want it to stop? Boom, lost in thought. You've taken your mind off of the itch, and you're now anticipating the next itch you'll feel. you're in your head again, ignoring external input, looping away.
To break through the first level, you've got to notice that these feelings, thoughts, anxieties are just suggestions that float up from somewhere, and that will float away if you make room for the next bubbles and don't obsess. Watch and enjoy.
Reaching the second level is all about realizing that your itches, your aches, your proprioception — that is, your awareness of your body in space — are all interpretations of sense data coming in to that theater of awareness.
Your forehead itches? How do you know? What does an itch feel like? What's really happening is that you notice some... warmth? heat? Tingling? And you:
Spatial sense is like the emotion of getting pissed at the car ahead of you. What really happened? The car moved in front of you — some tingling warmth happened "somewhere". Then what? The lightning-fast invisible interpretation, and suddenly the other driver is an asshole, and your forehead itches.
I started to write the following: "What's really happening? Photons are hitting the 2d plane of your eyes; molecules are locking in to your smell receptors, these are causing neurons to fire; all of these things result in neurons inside your head spiking and firing. From all of that, "you", whatever that means, are inferring facts about the objects and spatial relationships out in the world."
But those are all interpretations too, and statements about some other mental model! To break through level two, notice what you can notice before the whatever-it-is in your head whips the raw sensation into the abstraction you're used to, like my body or the person I know is in that car.
Just a guess at level three, since I haven't been able to get here myself. The third level is probably — for the visual field — the machinery of object detection itself, the translation of the two dimensional pixel map in front of your face into what feels like a virtual reality landscape populated by objects. (Isn't the sense of distance so odd? You feel how far away something is. How? Spatial sense is an emotion.)
Breaking through the third level would require ignoring object detection completely and treating as equal all of the sensations and thoughts popping into the oval-shaped visual theater. No more distance, no more trees and tables and chairs, no edges, no emotion... just tingles and pixels.
Now, the Headless insight. At Level Zero, when your object detection is in full sway and you're making predictions about the world, you definitely "have a head". "You" exist! Objects exist! Sure, you can't see your head... but you know you have one, since you know you're a human, and humans have heads. I can even make a prediction about what will happen when I bring my hands up just outside the big oval I'm seeing out of and pat whatever's there. A head!
"Having a head", and feeling like it's silly to say that you don't, reveals you to be so obsessed with the abstraction-world that you can't recognize how much preprocessing is going on; or even that there's any preprocessing at all. If you don't think there's any pre-processing, then of course you can't imagine a world where it's turned off, even for the sake of discussion.
If you can turn it off and break through level two, then you're forced to admit that, without your pre-processing to help you, there's no "head" of your own on evidence. At level two, you still are detecting objects and seeing spatial relationships... but you're not thinking about what the objects and relationships imply.
I look down. I see hands and arms coming up toward... well, out toward the edges of some big field of vision, the movie screen on which everything is playing. Is there a head there? (Whoops, I'm not present anymore!)
Kwatz! Unask the question.
You can see that OTHER people have heads! (I'm assuming that you haven't turned off your object detector completely, so you see people, not pixels.)
What about when I look in the mirror? Well, there's a head in the mirror. Is that... my head? Whoops, at Level Two, that question makes no sense. By calling the pixels in the mirror "your head", you're making inferences that you could test... but not at Level Two. Inferences occur in the little lab in your brain where you go to figure out what things mean. In the present, who cares what things mean? That's so Level Zero.
Inspect your field of vision, now. There's no head available to see, so why worry about it? If you can lock in viscerally the sense that you have no head, then you've managed to break free of your looping obsession with what's going to happen next, and with preferring your predictions and abstractions over the choice to just relax and notice what's on offer. And there is certainly no head on the menu.
That's all I've got. The Headless insight is a little checkpoint, totally obvious at the level of experience. The only reason you wouldn't find it obvious is that you're refusing to separate out these levels of awareness in your mind, probably because you haven't accepted that these different levels exist.
If you can get out of the lab and just notice, and you try to notice what direct experience is available in the spot where your body-map tells you your head should be... you won't find a head. Instead you'll find an oval-shaped thing that contains, in fact, the entire world on view. At Level Two, the world itself is right in the center of things, right where your arms terminate.
The fact that there's a scene being perceived at all is the big mystery. Stop worrying about the abstractions and enjoy the pixels.
You need the book itself, of course: On Having No Head, by Douglas Harding. Sam Harris has a wonderful interview with Richard Lang (Stitcher link), a student of Harding's, on his Making Sense Podcast.
Richard maintains the website http://www.headless.org, which is full of materials and exercises designed to help you see the obvious insights I discuss above. Richard's on Twitter as well!
If you're interested in meditation at any level, I highly recommend Sam Harris's Waking Up app.
]]>Here's what it looks like now:
I used three tubes of Lord 7545 A/E adhesive from Aerosport Products, one for each pair of side windows and a full tube for the windshield. The process
]]>The RV10 has windows! And a windshield! I've got everything attached and mostly cleaned up.
Here's what it looks like now:
I used three tubes of Lord 7545 A/E adhesive from Aerosport Products, one for each pair of side windows and a full tube for the windshield. The process was stressful; I've been mildly dreading this for a couple of years now. Compared to much of the metal work, this step is one-shot with permanent results.
I wrote up the process of preparing to glue these in my last post; the spring clips are absolutely the right way to go. I can tell that in the spots where the clips were touching the window, there are very mild ripples (is that what people call "crazing" on the forums?) but it's barely there, and only in the zone that I expect to fiberglass over with deck cloth anyway. I'm not sure how I would have held the windows on without that tip.
What did I learn?
Here's a shot of the two windows on the left side of the plane curing with my heat lamp. I disabled the safety switch on the bottom of the lamp with tape and tilted it up precariously so it could point at the right angle. Not recommended!
Now, the windshield. I used a boat strap to hold the sides down tight, and used a couple of weights on top to hold the top of the plexiglass flush. (I used a bunch of little aluminum cylinders, the handles for my epoxy brushes, to focus the pressure down on a few spots).
I did have one almost-disaster happen here. I waited a couple of hours until I believed that the glue had cured, then went out and took the weight off of the top and moved a heat lamp up above so I could get direct light on the piece and cure it well.
I came out an hour later to check on the piece and, holy shit, the top had sprung up, ripping free of the tacky glue. Nightmare at 10 at night, way past my bedtime.
I still had some glue left over, so I did what I could to force more glue into the gap between the plexiglass and the cabin top. Once I'd stuffed it full and gotten a bunch of mess on the plexiglass, I added the weights back on and got the heat lamp back up, pointing down at the whole stack. 12 hours later everything looks and feels very solid.
Once again, a shot of the final windshield and windows after I trimmed the excess glue off with a razor blade and pulled all plastic:
And here's inspectors Juno and Jenna, checking out my interior work:
In the immediate future, I'll:
One task at a time.
]]>I've gone through so many oscillations
]]>I'm out here working on the plane today - I haven't made it out much, lately, but I'm in the garage spending the time. This project will end up taking roughly five years, three years longer than I thought it might when I started.
I've gone through so many oscillations of excitement and frustration out here. Some days I feel like I'm flying through tasks, and I think about how close I am to the end. Some days it is just such a grind and I feel exhausted, sanding away yet another coat of epoxy, trying to get the curve of the cabin top just right, knowing that I'm probably aiming for a level of craftsmanship that I'll never notice once the plane is up in the air.
The sweet spot is when I'm able to remember how unusual this whole project is, and how I'll almost certainly never do it again. Each task I complete is the last time I'll ever do something like this in my life. Why not be curious, and pay close attention to what's happening?
I started my day in the shop with a short, handwritten list of straightforward tasks I could accomplish today. These were:
I started with the spring tabs. These are small pieces of aluminum that I'll "cleco" to the fuselage of the plane all around the perimeter of each window. The aluminum is thin enough to flex; each rectangle is like a small, springy clip that will put light pressure on the window in one spot. The alternative is to use duct tape, or boat straps, or weights, or any number of other ideas that feel less elegant than the one I've chosen and absorbed after reading as many other build logs as I could find.
I used my shears to cut a long strip of .050" aluminum off the edge of a large sheet of scrap, snipped off a piece about 1.5" long and used that as a template to cut successive pieces off of the strip. I took the pile to my grinder and knocked the sharp burrs off of the cut edges with the scotchbrite wheel, rotating and flipping each metal rectangle by hand. Next, to the drill press to put a #40 hole off to the side of each clip, then over to the bench vice to bend each rectangular tab somewhere in the middle. I didn't measure anything, as the exact dimensions here don't matter. I do a lot more eyeballing now, so many years into this project.
Here's one of the final clips:
And here they are in action, holding on the window:
As I was standing at the grinding wheell knocking the burrs of of these little tabs, I had the strangest feeling; I had the sense that I was in the middle of a performance, in the middle of some episode of a show I've been studying and watching for some time now. I was on a stage, performing my way through the spring clips.
I've spent hundreds of hours researching this build, and have read so many build logs... I've seen dozens of examples of each step of the project. Attaching the windows has been an impossibly-far-off-task for so long, and I've had a good deal of anxiety thinking about how far off it is.
But here I was, in technicolor, standing in a virtual reality version of those build logs. I hadn't struggled in any particular way to get here... I'd just continued to tick along, putting in the time. And now, with no thunderclap, here I was, making the simple little tabs I'd thought about for years. An hour later I was done and on to the next task. Everything is like this on the plane; maybe there's a bigger lesson, too. I'll think about it when I'm done.
If you're curious about the specifics more than Life Lessons at the Grinder, Justin Twilbeck's blog is the best example of one of the window installs I've been staring at. Bob Leffler has a picture of his janky-looking spring tabs at this Matronics post; spring clips don't have to look good to accomplish the task, and the compulsion the same standard of "perfection" on every piece regardless of where it lives, or whether or not it's an actual plane part vs a jig, is a disease.
If you're building anything big, you need to come to terms with that disease. If you don't have the disease, you shouldn't be building anything like an airplane.
Finally, a side shot of the current state of affairs:
Just a couple of months to go.
]]>