definite-integral
package I built into SICMUtils, a Clojure(script) based computer algebra system based on Gerald Sussman's scmutils
. (Beware the buzzwords...)I expect that this post will fission into a full series, with a bit of explanation about how each piece fits
]]>definite-integral
package I built into SICMUtils, a Clojure(script) based computer algebra system based on Gerald Sussman's scmutils
. (Beware the buzzwords...)I expect that this post will fission into a full series, with a bit of explanation about how each piece fits into the whole. Until then, know that you've found the source of what might be the most hardcore definite integration library out there, certainly the only one written like this, in functional style with unashamed explanation of how everything works.
If you're into Clojure, you can use all of this code by importing version 0.13.0
of the SICMUtils library.
Enjoy!
This document is an exploration of the definite-integral
implementation in SICMUtils, a port of the wonderful scmutils
system.
This interface is the target. The goal of the document is to make this definite-integral
interface come to life.
(ns quadrature
(:require [sicmutils.numerical.compile :as c]
[quadrature.adaptive :as qa]
[quadrature.boole :as boole]
[quadrature.common :as qc]
[quadrature.bulirsch-stoer :as bs]
[quadrature.infinite :as qi]
[quadrature.midpoint :as mid]
[quadrature.milne :as milne]
[quadrature.riemann :as riemann]
[quadrature.romberg :as romberg]
[quadrature.simpson :as simp]
[quadrature.simpson38 :as simp38]
[quadrature.trapezoid :as trap]
[sicmutils.util :as u]))
This namespace unites all of the work inside quadrature
behind a single interface, fronted by the all-powerful definite-integral
function.
The interface takes f
, an integrand, along with bounds a
and b
:
(definite-integral f a b)
Optionally, you can provide a dictionary of customizing options. These are passed down to whatever method you supply via the :method
key.
(definite-integral f a b opts)
The keys in quad-methods
below define the full range of integration methods available in the package. Each entry in this dictionary is either:
definite-integral
(possibly created with qc/defintegrator
):method
key.This latter style is used when the method itself is a specialization of a more general method.
(def ^:private quadrature-methods
{:open {:method :adaptive-bulirsch-stoer
:interval qc/open}
:closed {:method :adaptive-bulirsch-stoer
:interval qc/closed}
:closed-open {:method :adaptive-bulirsch-stoer
:interval qc/closed-open}
:open-closed {:method :adaptive-bulirsch-stoer
:interval qc/open-closed}
:bulirsch-stoer-open bs/open-integral
:bulirsch-stoer-closed bs/closed-integral
:adaptive-bulirsch-stoer (qa/adaptive bs/open-integral bs/closed-integral)
:left-riemann riemann/left-integral
:right-riemann riemann/right-integral
:lower-riemann riemann/lower-integral
:upper-riemann riemann/upper-integral
:midpoint mid/integral
:trapezoid trap/integral
:boole boole/integral
:milne milne/integral
:simpson simp/integral
:simpson38 simp38/integral
:romberg romberg/closed-integral
:romberg-open romberg/open-integral})
(def available-methods
(into #{} (keys quadrature-methods)))
The user can specify a method by providing the :method
key in their options with:
The latter two are the allowed value types in quadrature-methods
.
(defn- extract-method
"Attempts to turn the supplied argument into an integration method; returns nil
if method doesn't exist."
[method]
(cond (fn? method)
[method {}]
(keyword? method)
(extract-method
(quadrature-methods method))
(map? method)
(let [[f m] (extract-method
(:method method))]
[f (merge (dissoc method :method) m)])))
(defn get-integrator
"Takes:
- An integration method, specified as either:
- a keyword naming one of the available methods in `available-methods`
- a function with the proper integrator signature
- a dictionary of integrator options with a `:method` key
- `a` and `b` integration endpoints
- an optional dictionary of options `m`
And returns a pair of an integrator function and a possibly-enhanced options
dictionary.
(Some integration functions require extra options, so the returned dictionary
may have more entries than the `m` you pass in.)
If either endpoint is infinite, the returned integrator is wrapped in
`qi/improper` and able to handle infinite endpoints (as well as non-infinite
endpoints by passing through directly to the underlying integrator)."
([method a b] (get-integrator method a b {}))
([method a b m]
(when-let [[integrate opts] (extract-method method)]
(let [integrate (if (or (qc/infinite? a)
(qc/infinite? b))
(qi/improper integrate)
integrate)]
[integrate (dissoc (merge opts m) :method)]))))
Here we are! The one function you need care about if you're interested in definite integrals. Learn to use this, and then dig in to the details of individual methods if you run inton trouble or want to learn more. Enjoy!
(defn definite-integral
"Evaluates the definite integral of integrand `f` across the interval $a, b$.
Optionally accepts a dictionary `opts` of customizing options; All `opts` will
be passed through to the supplied `integrate` functions.
If you'd like more control, or to retrieve the integration function directly
without looking it up via `:method` each time, see `get-integrator`.
All supplied options are passed through to the underlying integrator; see the
specific integrator for information on what options are available.
## Keyword arguments:
`:method`: Specifies the integration method used. Must be
- a keyword naming one of the available methods in `available-methods`
- a function with the proper integrator signature
- a dictionary of integrator options with a `:method` key
Defaults to `:open`, which specifies an adaptive bulirsch-stoer quadrature method.
`:compile?` If true, the generic function will be simplified and compiled
before execution. (Clojure only for now.) Defaults to false.
`:info?` If true, `definite-integral` will return a map of integration
information returned by the underlying integrator. Else, returns an estimate
of the definite integral."
([f a b] (definite-integral f a b {}))
([f a b {:keys [method compile? info?]
:or {method :open
compile? false
info? false}
:as opts}]
(if-let [[integrate m] (get-integrator method a b opts)]
(let [f #?(:clj (if compile? (c/compile-univariate-fn f) f)
:cljs f)
result (integrate f a b m)]
(if info? result (:result result)))
(u/illegal (str "Unknown method: " method
". Try one of: "
available-methods)))))
(ns quadrature.riemann
(:require [quadrature.interpolate.richardson :as ir]
[quadrature.common :as qc
#?@(:cljs [:include-macros true])]
[sicmutils.generic :as g]
[sicmutils.util :as u]
[quadrature.util.aggregate :as ua]
[quadrature.util.stream :as us]
[sicmutils.numsymb]))
This namespace includes functions for calculating the Riemann integral of a single-variable function. These are probably not methods that you'll want to use; see the documentation and defaults in quadrature
for good recommendations. But they're clear and understandable. The goal of this namespace is to lay the groundwork for visualizable routines that you can use to step toward understanding of the tougher methods.
"Quadrature", in this context, means "numerical integration". The word is a historical term for calculating the area inside of some geometry shape. Riemann sums are a group of methods for numerical integration that use this strategy:
The Riemann integral of a function \(f\) is the limit of this process as \(n \to \infty\).
How do you estimate the area of a slice? All of these methods estimate the area by forming a rectangle. For the base, use \(x_r - x_l\). For the height, you might use:
This namespace builds up to implementations for left-integral
, right-integral
, upper-integral
and lower-integral
. midpoint.cljc
holds an implementation of the Midpoint method.
A closely related method involves forming a trapezoid for each slice. This is equivalent to averaging the left and right Riemann sums. The trapezoid method lives in trapezoid.cljc
.
We'll start with an inefficient-but-easily-understandable version of these methods. To form a Riemann sum we need to:
n
sliceswindowed-sum
implements this pattern:
(defn windowed-sum
"Takes:
- `area-fn`, a function of the left and right endpoints of some integration
slice
- definite integration bounds `a` and `b`
and returns a function of `n`, the number of slices to use for an integration
estimate.
`area-fn` should return an estimate of the area under some curve between the
`l` and `r` bounds it receives."
[area-fn a b]
(fn [n]
(let [width (/ (- b a) n)
grid-points (concat (range a b width) [b])]
(ua/sum
(map area-fn grid-points (rest grid-points))))))
Test this out with a function that returns 2
for every slice, and we get back an estimate (from the function returned by windowed-sum
) of 2x the number of slices:
(let [area-fn (fn [l r] 2)
estimator (windowed-sum area-fn 0 10)]
[(estimator 10)
(estimator 20)])
[20.0 40.0]
Now, let's implement the four classic "Riemann Integral" methods.
Let's say we want to integrate a function \(f\). The left and right Riemann sums estimate a slice's area as a rectangle with:
left-sum
is simple to implement, given windowed-sum
:
(defn- left-sum* [f a b]
(-> (fn [l r] (* (f l) (- r l)))
(windowed-sum a b)))
Every internal slice has the same width, so we can make the sum slightly more efficient by pulling out the constant and multiplying by it a single time.
Internally, we also generate all of the internal "left" points directly from the slice index, instead of pre-partitioning the range. This is fine since we don't need \(x_r\).
(defn- left-sum
"Returns a function of `n`, some number of slices of the total integration
range, that returns an estimate for the definite integral of $f$ over the
range $[a, b)$ using a left Riemann sum."
[f a b]
(let [width (- b a)]
(fn [n]
(let [h (/ width n)
fx (fn [i] (f (+ a (* i h))))]
(* h (ua/sum fx 0 n))))))
right-sum
is almost identical, except that it uses \(f(x_r)\) as the estimate of each rectangle's height:
(defn- right-sum* [f a b]
(-> (fn [l r] (* (f r) (- r l)))
(windowed-sum a b)))
Same trick here to get a more efficient version. This implementation also generates an internal function fx
of the window index. The only difference from the left-sum
implementation is an initial offset of h
, pushing every point to the right side of the window.
(defn- right-sum
"Returns a function of `n`, some number of slices of the total integration
range, that returns an estimate for the definite integral of $f$ over the
range $(a, b]$ using a right Riemann sum."
[f a b]
(let [width (- b a)]
(fn [n]
(let [h (/ width n)
start (+ a h)
fx (fn [i] (f (+ start (* i h))))]
(* h (ua/sum fx 0 n))))))
The upper Riemann sum generates a slice estimate by taking the maximum of \(f(x_l)\) and \(f(x_r)\):
(defn- upper-sum
"Returns an estimate for the definite integral of $f$ over the range $[a, b]$
using an upper Riemann sum.
This function may or may not make an evaluation at the endpoints $a$ or $b$,
depending on whether or not the function is increasing or decreasing at the
endpoints."
[f a b]
(-> (fn [l r] (* (- r l)
(max (f l) (f r))))
(windowed-sum a b)))
Similarly, the lower Riemann sum uses the minimum of \(f(x_l)\) and \(f(x_r)\):
(defn- lower-sum
"Returns an estimate for the definite integral of $f$ over the range $[a, b]$
using a lower Riemann sum.
This function may or may not make an evaluation at the endpoints $a$ or $b$,
depending on whether or not the function is increasing or decreasing at the
endpoints."
[f a b]
(-> (fn [l r] (* (- r l)
(min (f l) (f r))))
(windowed-sum a b)))
Given the tools above, let's attempt to estimate the integral of \(f(x) = x^2\) using the left and right Riemann sum methods. (The actual equation for the integral is \(x^3 \over 3\)).
The functions above return functions of n
, the number of slices. We can use (us/powers 2)
to return a sequence of (1, 2, 4, 8, ...)
and map the function of n
across this sequence to obtain successively better estimates for \(\int_0^{10} x^2\). The true value is $10^{3} \over 3 = 333.333…$.
Here's the left-sum
estimate:
(take 5 (map (left-sum g/square 0 10)
(us/powers 2)))
(0.0 125.0 218.75 273.4375 302.734375)
And the left-sum
estimate:
(take 5 (map (right-sum g/square 0 10)
(us/powers 2)))
(1000.0 625.0 468.75 398.4375 365.234375)
Both estimates are bad at 32 slices and don't seem to be getting better. Even up to \(2^16 = 65,536\) slices we haven't converged, and are still far from the true estimate:
(-> (map (left-sum g/square 0 10)
(us/powers 2))
(us/seq-limit {:maxterms 16}))
{:converged? false, :terms-checked 16, :result 333.31807469949126}
This bad convergence behavior is why common wisdom states that you should never use left and right Riemann sums for real work.
But maybe we can do better.
One answer to this problem is to use "sequence acceleration" via Richardson extrapolation, as described in richardson.cljc
.
ir/richardson-sequence
takes a sequence of estimates of some function and "accelerates" the sequence by combining successive estimates.
The estimates have to be functions of some parameter \(n\) that decreases by a factor of \(t\) for each new element. In the example above, \(n\) doubles each time; this is equivalent to thinking about the window width \(h\) halving each time, so \(t = 2\).
This library's functional style lets us accelerate a sequence of estimates xs
by simply wrapping it in a call to (ir/richardson-sequence xs 2)
. Amazing!
Does Richardson extrapolation help?
(let [f (fn [x] (* x x))]
(-> (map (left-sum f 0 10)
(us/powers 2))
(ir/richardson-sequence 2)
(us/seq-limit)))
{:converged? true, :terms-checked 4, :result 333.3333333333333}
We now converge to the actual, true value of the integral in 4 terms!
This is going to be useful for each of our Riemann sums, so let's make a function that can accelerate a generic sequence of estimates. The following function takes:
estimate-seq
This library is going to adopt an interface that allows the user to configure a potentially very complex integration function by sending a single dictionary of options down to each of its layers. Adopting that style now is going to allow this function to grow to accomodate other methods of sequence acceleration, like polynomial or rational function extrapolation.
For now, {:accelerate? true}
configures Richardson extrapolation iff the user hasn't specified a custom sequence of integration slices using the :n
option.
(defn- accelerate
"NOTE - this is only appropriate for Richardson-accelerating sequences with t=2,
p=q=1.
This only applies to the Riemann sequences in this namespace!"
[estimate-seq {:keys [n accelerate?] :or {n 1}}]
(if (and accelerate? (number? n))
(ir/richardson-sequence estimate-seq 2 1 1)
estimate-seq))
Check that this works:
(let [f (fn [x] (* x x))]
(-> (map (left-sum f 0 10)
(us/powers 2))
(accelerate {:accelerate? true})
(us/seq-limit)))
{:converged? true, :terms-checked 4, :result 333.3333333333333}
Excellent!
The results look quite nice; but notice how much redundant computation we're doing.
Consider the evaluation points of a left Riemann sum with 4 slices, next to a left sum with 8 slices:
x---x---x---x----
x-x-x-x-x-x-x-x--
Every time we double our number of number of evaluations, half of the windows share a left endpoint. The same is true for a right sum:
----x---x---x---x
--x-x-x-x-x-x-x-x
In both cases, the new points are simply the midpoints of the existing slices.
This suggests a strategy for incrementally updating a left or right Riemann sum when doubling the number of points:
n
slices2
to scale each NEW slice width down by 2 (since we're doubling the number of slices)First, implement midpoint-sum
. This is very close to the implementation for left-sum
; internally the function adds an offset of \(h \over 2\) to each slice before sampling its function value.
(defn midpoint-sum
"Returns a function of `n`, some number of slices of the total integration
range, that returns an estimate for the definite integral of $f$ over the
range $(a, b)$ using midpoint estimates."
[f a b]
(let [width (- b a)]
(fn [n]
(let [h (/ width n)
offset (+ a (/ h 2.0))
fx (fn [i] (f (+ offset (* i h))))]
(* h (ua/sum fx 0 n))))))
The next function returns a function that can perform the incremental update to a left or right Riemann sum (and to a midpoint method estimate, as we'll see in midpoint.cljc
):
(defn Sn->S2n
"Returns a function of:
- `Sn`: a sum estimate for `n` partitions, and
- `n`: the number of partitions
And returns a new estimate for $S_{2n}$ by sampling the midpoints of each
slice. This incremental update rule is valid for left and right Riemann sums,
as well as the midpoint method."
[f a b]
(let [midpoints (midpoint-sum f a b)]
(fn [Sn n]
(-> (+ Sn (midpoints n))
(/ 2.0)))))
After using left-sum
to generate an initial estimate, we can use Sn->S2n
to generate all successive estimates, as long as we always double our slices. This suggests a function that takes an initial number of slices, n0
, and then uses reductions
to scan across (us/powers 2 n0)
with the function returned by Sn->S2n
:
(defn- left-sequence* [f a b n0]
(let [first-S ((left-sum f a b) n0)
steps (us/powers 2 n0)]
(reductions (Sn->S2n f a b) first-S steps)))
Verify that this function returns an equivalent sequence of estimates to the non-incremental left-sum
, when mapped across powers of 2:
(let [f (fn [x] (* x x))]
(= (take 10 (left-sequence* f 0 10 1))
(take 10 (map (left-sum f 0 10)
(us/powers 2 1)))))
true
We need to use the same style for right-sum
, so let's try and extract the pattern above, of:
n0
slices using some function S-fn
n0
slices => n0 / 2
slices using some incremental updater, next-S-fn
In fact, because methods like the Midpoint method from midpoint.cljc
can only incrementally update from n
=> n/3
, let's make the factor general too.
geometric-estimate-seq
captures the pattern above:
(defn geometric-estimate-seq
"Accepts:
- `S-fn`: a function of `n` that generates a numerical integral estimate from
`n` slices of some region, and
- `next-S-fn`: a function of (previous estimate, previous `n`) => new estimate
- `factor`: the factor by which `n` increases for successive estimates
- `n0`: the initial `n` to pass to `S-fn`
The new estimate returned b `next-S-fn` should be of `factor * n` slices."
[S-fn next-S-fn factor n0]
(let [first-S (S-fn n0)
steps (us/powers factor n0)]
(reductions next-S-fn first-S steps)))
And another version of left-sequence
, implemented using the new function:
(defn left-sequence**
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the closed-open interval $a, b$ by taking left-Riemann sums with
n0, 2n0, 4n0, ...
slices."
([f a b] (left-sequence** f a b 1))
([f a b n0]
(geometric-estimate-seq (left-sum f a b)
(Sn->S2n f a b)
2
n0)))
What if we want to combine the ability to reuse old results with the ability to take successively refined estimates that don't look like geometric series? The series 1, 2, 3… of natural numbers is an obvious choice of windows… but only the even powers are able to reuse estimates.
Integration methods like the Bulirsch-Stoer approach depend on sequences like 2, 3, 4, 6…
We absolutely want to be able to save potentially-expensive function evaluations.
One way to do this is to memoize the function f
that you pass in to any of the methods above.
Alternatively, we could implement a version of geometric-estimate-seq
that takes any sequence of estimate,s and maintains a sort of internal memoization cache.
For every n
, check the cache for prev == n/factor
. If it exists in the cache, use next-S-fn
; else, use S-fn
, just like we did in geometric-estimate-seq
for the initial value.
general-estimate-seq
does this:
(defn- general-estimate-seq
"Accepts:
- `S-fn`: a function of `n` that generates a numerical integral estimate from
`n` slices of some region, and
- `next-S-fn`: a function of (previous estimate, previous `n`) => new estimate
- `factor`: the factor by which `next-S-fn` increases `n` in its returned estimate
- `n-seq`: a monotonically increasing sequence of `n` slices to use.
Returns a sequence of estimates of returned by either function for each `n` in
`n-seq`. Internally decides whether or not to use `S-fn` or `next-S-fn` to
generate successive estimates."
[S-fn next-S-fn factor n-seq]
(let [f (fn [[cache _] n]
(let [Sn (if (zero? (rem n factor))
(let [prev (quot n factor)]
(if-let [S-prev (get cache prev)]
(next-S-fn S-prev prev)
(S-fn n)))
(S-fn n))]
[(assoc cache n Sn) Sn]))]
(->> (reductions f [{} nil] n-seq)
(map second)
(rest))))
We can combine general-estimate-seq
and geometric-estimate-seq
into a final method that decides which implementation to call, based on the type of the n0
argument.
If it's a number, use it as the n0
seed for a geometrically increasing series of estimates. Else, assume it's a sequence and pass it to general-estimate-seq
.
(defn incrementalize
"Function that generalizes the ability to create successively-refined estimates
of an integral, given:
- `S-fn`: a function of `n` that generates a numerical integral estimate from
`n` slices of some region, and
- `next-S-fn`: a function of (previous estimate, previous `n`) => new estimate
- `factor`: the factor by which `next-S-fn` increases `n` in its returned estimate
- `n`: EITHER a number, or a monotonically increasing sequence of `n` slices to use.
If `n` is a sequence, returns a (lazy) sequence of estimates generated for
each entry in `n`.
If `n` is a number, returns a lazy sequence of estimates generated for each
entry in a geometrically increasing series of inputs $n, n(factor),
n(factor^2), ....$
Internally decides whether or not to use `S-fn` or `next-S-fn` to generate
successive estimates."
[S-fn next-S-fn factor n]
(let [f (if (number? n)
geometric-estimate-seq
general-estimate-seq)]
(f S-fn next-S-fn factor n)))
We can use incrementalize
to write our final version of left-sequence
, along with a matching version for right-sequence
.
Notice that we're using accelerate
from above. The interface should make more sense now:
(defn left-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the closed-open interval $a, b$ by taking left-Riemann sums.
## Optional Arguments
`:n`: If `n` is a number, returns estimates with $n, 2n, 4n, ...$ slices,
geometrically increasing by a factor of 2 with each estimate.
If `n` is a sequence, the resulting sequence will hold an estimate for each
integer number of slices in that sequence.
`:accelerate?`: if supplied (and `n` is a number), attempts to accelerate
convergence using Richardson extrapolation. If `n` is a sequence this option
is ignored."
([f a b] (left-sequence f a b {}))
([f a b opts]
(let [S (left-sum f a b)
next-S (Sn->S2n f a b)]
(-> (incrementalize S next-S 2 (:n opts 1))
(accelerate opts)))))
(defn right-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the closed-open interval $a, b$ by taking right-Riemann sums.
## Optional Arguments
`:n`: If `n` is a number, returns estimates with $n, 2n, 4n, ...$ slices,
geometrically increasing by a factor of 2 with each estimate.
If `n` is a sequence, the resulting sequence will hold an estimate for each
integer number of slices in that sequence.
`:accelerate?`: if supplied (and `n` is a number), attempts to accelerate
convergence using Richardson extrapolation. If `n` is a sequence this option
is ignored."
([f a b] (right-sequence f a b {}))
([f a b opts]
(let [S (right-sum f a b)
next-S (Sn->S2n f a b)]
(-> (incrementalize S next-S 2 (:n opts 1))
(accelerate opts)))))
lower-sequence
and upper-sequence
are similar. They can't take advantage of any incremental speedup, so we generate a sequence of n~s internally and map ~lower-sum
and upper-sum
directly across these.
(defn lower-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the closed interval $(a, b)$ by taking lower-Riemann sums.
## Optional Arguments
`:n`: If `n` is a number, returns estimates with $n, 2n, 4n, ...$ slices,
geometrically increasing by a factor of 2 with each estimate.
If `n` is a sequence, the resulting sequence will hold an estimate for each
integer number of slices in that sequence.
`:accelerate?`: if supplied (and `n` is a number), attempts to accelerate
convergence using Richardson extrapolation. If `n` is a sequence this option
is ignored."
([f a b] (lower-sequence f a b {}))
([f a b {:keys [n] :or {n 1} :as opts}]
(let [n-seq (if (number? n)
(us/powers 2 n)
n)]
(-> (map (lower-sum f a b) n-seq)
(accelerate opts)))))
(defn upper-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the closed interval $(a, b)$ by taking upper-Riemann sums.
## Optional Arguments
`:n`: If `n` is a number, returns estimates with $n, 2n, 4n, ...$ slices,
geometrically increasing by a factor of 2 with each estimate.
If `n` is a sequence, the resulting sequence will hold an estimate for each
integer number of slices in that sequence.
`:accelerate?`: if supplied (and `n` is a number), attempts to accelerate
convergence using Richardson extrapolation. If `n` is a sequence this option
is ignored."
([f a b] (upper-sequence f a b {}))
([f a b {:keys [n] :or {n 1} :as opts}]
(let [n-seq (if (number? n)
(us/powers 2 n)
n)]
(-> (map (upper-sum f a b) n-seq)
(accelerate opts)))))
Finally, we expose four API methods for each of the {left, right, lower, upper}-Riemann sums.
Each of these makes use a special qc/defintegrator
"macro"; This style allows us to adopt one final improvement. If the interval \(a, b\) is below some threshold, the integral API will take a single slice using the supplied :area-fn
below and not attempt to converge. See common.cljc
for more details.
These API interfaces are necessarily limiting. They force the assumptions that you:
I can imagine a better API, where it's much easier to configure generic sequence acceleration! This will almost certainly show up in the library at some point. For now, here are some notes:
left-sequence
or right-sequence
, you can still accelerate with Richardson. Just pass your new factor as t
.p=1
and q=1
to richardson-sequence
. accelerate
does this above.polynomial.cljc
and rational.cljc
, which are more general forms of sequence acceleration that use polynomial or rational function extrapolation. Your sequence of xs
for each of those methods should be n-seq
.(qc/defintegrator left-integral
"Returns an estimate of the integral of `f` across the closed-open interval $a,
b$ using a left-Riemann sum with $1, 2, 4 ... 2^n$ windows for each estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `left-sequence` for information on the optional args in `opts` that
customize this function's behavior."
:area-fn (fn [f a b] (* (f a) (- b a)))
:seq-fn left-sequence)
(qc/defintegrator right-integral
"Returns an estimate of the integral of `f` across the closed-open interval $a,
b$ using a right-Riemann sum with $1, 2, 4 ... 2^n$ windows for each estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `right-sequence` for information on the optional args in `opts` that
customize this function's behavior."
:area-fn (fn [f a b] (* (f b) (- b a)))
:seq-fn right-sequence)
Upper and lower Riemann sums have the same interface; internally, they're not able to take advantage of incremental summation, since it's not possible to know in advance whether or not the left or right side of the interval should get reused.
(qc/defintegrator lower-integral
"Returns an estimate of the integral of `f` across the closed-open interval $a,
b$ using a lower-Riemann sum with $1, 2, 4 ... 2^n$ windows for each estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `lower-sequence` for information on the optional args in `opts` that
customize this function's behavior."
:area-fn (fn [f a b] (* (min (f a) (f b)) (- b a)))
:seq-fn lower-sequence)
(qc/defintegrator upper-integral
"Returns an estimate of the integral of `f` across the closed-open interval $a,
b$ using an upper-Riemann sum with $1, 2, 4 ... 2^n$ windows for each estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `upper-sequence` for information on the optional args in `opts` that
customize this function's behavior."
:area-fn (fn [f a b] (* (max (f a) (f b)) (- b a)))
:seq-fn upper-sequence)
For a discussion and implementation of the more advanced methods (the workhorse methods that you should actually use!), see midpoint.cljc
and trapezoid.cljc
. The midpoint method is the standard choice for open intervals, where you can't evaluate the function at its endpoints. The trapezoid method is standard for closed intervals.
(ns quadrature.midpoint
(:require [quadrature.interpolate.richardson :as ir]
[quadrature.common :as qc
#?@(:cljs [:include-macros true])]
[quadrature.riemann :as qr]
[sicmutils.generic :as g]
[sicmutils.util :as u]
[quadrature.util.aggregate :as ua]
[quadrature.util.stream :as us]))
This namespace builds on the ideas introduced in riemann.cljc
.
riemann.cljc
described four different integration schemes ({left, right, upper, lower} Riemann sums) that were each conceptually simple, but aren't often used in practice, even in their "accelerated" forms.
One reason for this is that their error terms fall off as \(h, h^2, h^3\), where \(h\) is the width of an integration slice. Each order of sequence acceleration can cancel out one of these terms at a time; but still, the performance is not great.
It turns out that by taking the midpoint if each interval, instead of either side, you can reduce the order of the error series to \(O(h^2)\). This is too good to pass up.
Additionally, because the error terms fall off as $h^{2}, h^{4}, h^{6}, …$, each order of acceleration is worth quite a bit more than in the Riemann sum case.
This namespace follows the same development as riemann.cljc
:
Here's an implementation of a function that can take the midpoint of a single slice:
(defn single-midpoint [f a b]
(let [width (g/- b a)
half-width (g// width 2)
midpoint (g/+ a half-width)]
(g/* width (f midpoint))))
And a full (though inefficient) integrator using windowed-sum
:
(defn- midpoint-sum* [f a b]
(let [area-fn (partial single-midpoint f)]
(qr/windowed-sum area-fn a b)))
Let's integrate a triangle!
((midpoint-sum* identity 0.0 10.0) 10)
50.0
It turns out that we already had to implement an efficient version of midpoint-sum
in riemann.cljc
; the incremental version of left and right Riemann sums added the midpoints of each interval when doubling the number of slices.
We can check our implementation against qr/midpoint-sum
:
(= ((midpoint-sum* identity 0.0 100.0) 10)
((qr/midpoint-sum identity 0.0 100.0) 10))
true
We'll use qr/midpoint-sum
in the upcoming functions.
Unlike the left and right Riemann sums, the Midpoint method can't reuse function evaluations when the number of slices doubles. This is because each evaluation point, on a doubling, becomes the new border between slices:
n = 1 |-------x-------|
n = 2 |---x---|---x---|
If you triple the number of slices from \(n\) to \(3n\), you can in fact reuse the previous \(n\) evaluations:
n = 1 |--------x--------|
n = 3 |--x--|--x--|--x--|
By scaling Sn
down by a factor of 3, and adding it to a new sum that only includes the new points (using the new slice width).
BTW: The only place I found this idea mentioned is in Section 4.4 of Press's "Numerical Recipes". I haven't found other references to this trick, or implementations. I'd love to hear about them (via a Github issue) if you find any!
We'll follow the interface we used for qr/Sn->S2n
and write Sn->S3n
. This function of \(f, a, b\) will return a function that performs the incremental update.
The returned function generates S3n
across \((a, b)\) with \(n\) intervals, and picking out two new points at \(h \over 6\) and \(5h \over 6\) of the way across the old interval. These are the midpoints of the two new slices with width \(h \over 3\).
Sum them all up and add them to \(S_n \over 3\) to generate \(S_{3n}\):
(defn- Sn->S3n [f a b]
(let [width (- b a)]
(fn [Sn n]
(let [h (/ width n)
delta (/ h 6)
l-offset (+ a delta)
r-offset (+ a (* 5 delta))
fx (fn [i]
(let [ih (* i h)]
(+ (f (+ l-offset ih))
(f (+ r-offset ih)))))]
(-> (+ Sn (* h (ua/sum fx 0 n)))
(/ 3.0))))))
Now we can write midpoint-sequence
, analogous to qr/left-sequence
. This implementation reuses all the tricks from qr/incrementalize
; this means it will be smart about using the new incremental logic any time it sees any \(n\) multiple of 3, just as the docstring describes.
(defn midpoint-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the open interval $(a, b)$ using the Midpoint method.
## Optional arguments:
`:n`: If `:n` is a number, returns estimates with $n, 3n, 9n, ...$ slices,
geometrically increasing by a factor of 3 with each estimate.
If `:n` is a sequence, the resulting sequence will hold an estimate for each
integer number of slices in that sequence.
`:accelerate?`: if supplied (and `n` is a number), attempts to accelerate
convergence using Richardson extrapolation. If `n` is a sequence this option
is ignored."
([f a b] (midpoint-sequence f a b {:n 1}))
([f a b {:keys [n accelerate?] :or {n 1}}]
(let [S (qr/midpoint-sum f a b)
next-S (Sn->S3n f a b)
xs (qr/incrementalize S next-S 3 n)]
(if (and accelerate? (number? n))
(ir/richardson-sequence xs 3 2 2)
xs))))
The following example shows that for the sequence $2, 3, 4, 6, …$ (used in the Bulirsch-Stoer method!), the incrementally-augmented midpoint-sequence
only performs 253 function evaluations, vs the 315 of the non-incremental (midpoint-sum f2 0 1)
mapped across the points.
(let [f (fn [x] (/ 4 (+ 1 (* x x))))
[counter1 f1] (u/counted f)
[counter2 f2] (u/counted f)
n-seq (interleave
(iterate (fn [x] (* 2 x)) 2)
(iterate (fn [x] (* 2 x)) 3))]
(doall (take 12 (midpoint-sequence f1 0 1 {:n n-seq})))
(doall (take 12 (map (qr/midpoint-sum f2 0 1) n-seq)))
[@counter1 @counter2])
[253 315]
The final version is analogous the qr/left-integral
and friends, including an option to :accelerate?
the final sequence with Richardson extrapolation.
I'm not sure what to call this accelerated method. Accelerating the trapezoid method in this way is called "Romberg integration". Using an \(n\) sequence of powers of 2 and accelerating the midpoint method by a single step - taking the second column (index 1) of the Richardson tableau - produces "Milne's method".
The ability to combine these methods makes it easy to produce powerful methods without known names. Beware, and enjoy!
Note on Richardson Extrapolation
We noted above that the the terms of the error series for the midpoint method increase as \(h^2, h^4, h^6\)… Because of this, we pass \(p = q = 2\) into ir/richardson-sequence
below. Additionally, integral
hardcodes the factor of 3
and doesn't currently allow for a custom sequence of \(n\). This requires passing \(t = 3\) into ir/richardson-sequence
.
If you want to accelerate some other geometric sequence, call ir/richardson-sequence
with some other value of t.
To accelerate an arbitrary sequence of midpoint evaluations, investigate polynomial.cljc
or rational.cljc
. The "Bulirsch-Stoer" method uses either of these to extrapolate the midpoint method using a non-geometric sequence.
(qc/defintegrator integral
"Returns an estimate of the integral of `f` over the open interval $(a, b)$
using the Midpoint method with $1, 3, 9 ... 3^n$ windows for each estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `midpoint-sequence` for information on the optional args in `opts` that
customize this function's behavior."
:area-fn single-midpoint
:seq-fn midpoint-sequence)
If you start with the midpoint method, one single step of Richardson extrapolation (taking the second column of the Richardson tableau) is equivalent to "Milne's rule" (see milne.cljc
).
The full Richardson-accelerated Midpoint method is an open-interval variant of "Romberg integration" (see romberg.cljc
).
See the wikipedia entry on Open Newton-Cotes Formulas for more details.
same idea but for closed intervals.
(ns quadrature.trapezoid
(:require [quadrature.common :as qc
#?@(:cljs [:include-macros true])]
[quadrature.riemann :as qr]
[quadrature.interpolate.richardson :as ir]
[sicmutils.function :as f]
[sicmutils.generic :as g]
[sicmutils.util :as u]
[sicmutils.simplify]
[quadrature.util.aggregate :as ua]
[quadrature.util.stream :as us]))
This namespace builds on the ideas introduced in riemann.cljc
and midpoint.cljc
, and follows the pattern of those namespaces:
Let's begin.
A nice integration scheme related to the Midpoint method is the "Trapezoid" method. The idea here is to estimate the area of each slice by fitting a trapezoid between the function values at the left and right sides of the slice.
Alternatively, you can think of drawing a line between \(f(x_l)\) and \(f(x_r)\) and taking the area under the line.
What's the area of a trapezoid? The two slice endpoints are
The trapezoid consists of a lower rectangle and a capping triangle. The lower rectangle's area is:
\[(b - a) f(a)\].
Just like in the left Riemann sum. The upper triangle's area is one half base times height:
\[ {1 \over 2} (x_r - x_l) (f(x_r) - f(x_l))\]
The sum of these simplifies to:
\[{1 \over 2} {(x_r - x_l) (f(x_l) + f(x_r))}\]
Or, in Clojure:
(defn single-trapezoid [f xl xr]
(g// (g/* (g/- xr xl)
(g/+ (f xl) (f xr)))
2))
We can use the symbolic algebra facilities in the library to show that this simplification is valid:
(let [f (f/literal-function 'f)
square (g/* (f 'x_l)
(g/- 'x_r 'x_l))
triangle (g/* (g// 1 2)
(g/- 'x_r 'x_l)
(g/- (f 'x_r) (f 'x_l)))]
(g/simplify
(g/- (single-trapezoid f 'x_l 'x_r)
(g/+ square triangle))))
0
We can use qr/windowed-sum
to turn this function into an (inefficient) integrator:
(defn- trapezoid-sum* [f a b]
(qr/windowed-sum (partial single-trapezoid f)
a b))
Fitting triangles is easy:
((trapezoid-sum* identity 0.0 10.0) 10)
50.0
In fact, we can even use our estimator to estimate \(\pi\):
(def ^:private pi-estimator*
(let [f (fn [x] (/ 4 (+ 1 (* x x))))]
(trapezoid-sum* f 0.0 1.0)))
The accuracy is not bad, for 10 slices:
(pi-estimator* 10)
3.1399259889071587
Explicit comparison:
(- Math/PI (pi-estimator* 10))
0.0016666646826344333
10,000 slices gets us closer:
(< (- Math/PI (pi-estimator* 10000))
1e-8)
true
Fun fact: the trapezoid method is equal to the average of the left and right Riemann sums. You can see that in the equation, but lets verify:
(defn- basically-identical? [l-seq r-seq]
(every? #(< % 1e-15)
(map - l-seq r-seq)))
(let [points (take 5 (iterate inc 1))
average (fn [l r]
(/ (+ l r) 2))
f (fn [x] (/ 4 (+ 1 (* x x))))
[a b] [0 1]
left-estimates (qr/left-sequence f a b {:n points})
right-estimates (qr/right-sequence f a b {:n points})]
(basically-identical? (map (trapezoid-sum f a b) points)
(map average
left-estimates
right-estimates)))
true
Next let's attempt a more efficient implementation. Looking at single-trapezoid
, it's clear that each slice evaluates both of its endpoints. This means that each point on a border between two slices earns a contribution of \(f(x) \over 2\) from each slice.
A more efficient implementation would evaluate both endpoints once and then sum (without halving) each interior point.
This interior sum is identical to a left Riemann sum (without the \(f(a)\) evaluation), or a right Riemann sum (without \(f(b)\)).
Here is this idea implemented in Clojure:
(defn trapezoid-sum
"Returns a function of `n`, some number of slices of the total integration
range, that returns an estimate for the definite integral of $f$ over the
range $(a, b)$ using the trapezoid method."
[f a b]
(let [width (- b a)]
(fn [n]
(let [h (/ width n)
fx (fn [i] (f (+ a (* i h))))]
(* h (+ (/ (+ (f a) (f b)) 2)
(ua/sum fx 1 n)))))))
We can define a new pi-estimator
and check it against our less efficient version:
(def ^:private pi-estimator
(let [f (fn [x] (/ 4 (+ 1 (* x x))))]
(trapezoid-sum* f 0.0 1.0)))
(basically-identical?
(map pi-estimator (range 1 100))
(map pi-estimator* (range 1 100)))
true
Next let's develop an incremental updater for the Trapezoid rule that lets us reuse evaluation points as we increase the number of slices.
Because interior points of the Trapezoid method mirror the interior points of the left and right Riemann sums, we can piggyback on the incremental implementations for those two methods in developing an incremental Trapezoid implementation.
Consider the evaluation points of the trapezoid method with 2 slices, next to the points of a 4 slice pass:
x-------x-------x
x---x---x---x---x
The new points are simply the midpoints of the existing slices, just like we had for the left (and right) Riemann sums. This means that we can reuse qr/Sn->S2n
in our definition of the incrementally-enabled trapezoid-sequence
:
(defn trapezoid-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the open interval $(a, b)$ using the Trapezoid method.
## Optional arguments:
`:n`: If `:n` is a number, returns estimates with $n, 2n, 4n, ...$ slices,
geometrically increasing by a factor of 2 with each estimate.
If `:n` is a sequence, the resulting sequence will hold an estimate for each
integer number of slices in that sequence.
`:accelerate?`: if supplied (and `n` is a number), attempts to accelerate
convergence using Richardson extrapolation. If `n` is a sequence this option
is ignored."
([f a b] (trapezoid-sequence f a b {:n 1}))
([f a b {:keys [n accelerate?] :or {n 1}}]
(let [S (trapezoid-sum f a b)
next-S (qr/Sn->S2n f a b)
xs (qr/incrementalize S next-S 2 n)]
(if (and accelerate? (number? n))
(ir/richardson-sequence xs 2 2 2)
xs))))
The following example shows that for the sequence \(1, 2, 4, 8, ..., 2^n\), the incrementally-augmented trapezoid-sequence
only performs \(2^n + 1\) function evaluations; ie, the same number of evaluations as the non-incremental (trapezoid-sum f2 0 1)
would perform for \(2^n\) slices. (why \(2^n + 1\)? each interior point is shared, so each trapezoid contributes one evaluation, plus a final evaluation for the right side.)
The example also shows that evaluating every \(n\) in the sequence costs \(\sum_{i=0}^n{2^i + 1} = 2^{n+1} + n\) evaluations. As \(n\) gets large, this is roughly twice what the incremental implementation costs.
When \(n=11\), the incremental implementation uses 2049 evaluations, while the non-incremental takes 4017.
(let [n-elements 11
f (fn [x] (/ 4 (+ 1 (* x x))))
[counter1 f1] (u/counted f)
[counter2 f2] (u/counted f)
[counter3 f3] (u/counted f)
n-seq (take (inc n-elements)
(iterate (fn [x] (* 2 x)) 1))]
;; Incremental version evaluating every `n` in the sequence $1, 2, 4, ...$:
(doall (trapezoid-sequence f1 0 1 {:n n-seq}))
;; Non-incremental version evaluating every `n` in the sequence $1, 2, 4, ...$:
(doall (map (trapezoid-sum f2 0 1) n-seq))
;; A single evaluation of the final `n`
((trapezoid-sum f3 0 1) (last n-seq))
(let [two**n+1 (inc (g/expt 2 n-elements))
n+2**n (+ n-elements (g/expt 2 (inc n-elements)))]
(= [2049 4107 2049]
[two**n+1 n+2**n two**n+1]
[@counter1 @counter2 @counter3])))
true
Another short example that hints of work to come. The incremental implementation is useful in cases where the sequence includes doublings nested in among other values.
For the sequence $2, 3, 4, 6, …$ (used in the Bulirsch-Stoer method!), the incrementally-augmented trapezoid-sequence
only performs 162 function evaluations, vs the 327 of the non-incremental (trapezoid-sum f2 0 1)
mapped across the points.
This is a good bit more efficient than the Midpoint method's incremental savings, since factors of 2 come up more often than factors of 3.
(let [f (fn [x] (/ 4 (+ 1 (* x x))))
[counter1 f1] (u/counted f)
[counter2 f2] (u/counted f)
n-seq (take 12 (interleave
(iterate (fn [x] (* 2 x)) 2)
(iterate (fn [x] (* 2 x)) 3)))]
(doall (trapezoid-sequence f1 0 1 {:n n-seq}))
(doall (map (trapezoid-sum f2 0 1) n-seq))
[@counter1 @counter2])
[162 327]
The final version is analogous the qr/left-integral
and friends, including an option to :accelerate?
the final sequence with Richardson extrapolation. (Accelerating the trapezoid method in this way is called "Romberg integration".)
Note on Richardson Extrapolation
The terms of the error series for the Trapezoid method increase as \(h^2, h^4, h^6\)… (see https://en.wikipedia.org/wiki/Trapezoidal_rule#Error_analysis). Because of this, we pass \(p = q = 2\) into ir/richardson-sequence
below. Additionally, integral
hardcodes the factor of 2
and doesn't currently allow for a custom sequence of \(n\). This is configured by passing \(t = 2\) into ir/richardson-sequence
.
If you want to accelerate some other geometric sequence, call ir/richardson-sequence
with some other value of t.
To accelerate an arbitrary sequence of trapezoid evaluations, investigate polynomial.cljc
or rational.cljc
. The "Bulirsch-Stoer" method uses either of these to extrapolate the Trapezoid method using a non-geometric sequence.
(qc/defintegrator integral
"Returns an estimate of the integral of `f` over the closed interval $[a, b]$
using the Trapezoid method with $1, 2, 4 ... 2^n$ windows for each estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `trapezoid-sequence` for information on the optional args in `opts` that
customize this function's behavior."
:area-fn single-trapezoid
:seq-fn trapezoid-sequence)
If you start with the trapezoid method, one single step of Richardson extrapolation (taking the second column of the Richardson tableau) is equivalent to "Simpson's rule". One step using t=3
, ie, when you triple the number of integration slices per step, gets you "Simpson's 3/8 Rule". Two steps of Richardson extrapolation gives you "Boole's rule".
The full Richardson-accelerated Trapezoid method is also known as "Romberg integration" (see romberg.cljc
).
These methods will appear in their respective namespaces in the quadrature
package.
See the wikipedia entry on Closed Newton-Cotes Formulas for more details.
is a special case, where we get more efficient by assuming that the x values for the polynomial interpolation go $1, 1/2, 1/4, …$ and that we're extrapolating to 0.
(ns quadrature.interpolate.richardson
"Richardson interpolation is a special case of polynomial interpolation; knowing
the ratios of successive `x` coordinates in the point sequence allows a more
efficient calculation."
(:require [quadrature.interpolate.polynomial :as ip]
[sicmutils.generic :as g]
[sicmutils.util :as u]
[quadrature.util.aggregate :as ua]
[quadrature.util.stream :as us]
[sicmutils.value :as v]))
This approach (and much of this numerical library!) was inspired by Gerald Sussman's "Abstraction in Numerical Methods" paper.
That paper builds up to Richardson interpolation as a method of "series acceleration". The initial example concerns a series of the side lengths of an N-sided polygon inscribed in a unit circle.
The paper derives this relationship between the sidelength of an $N$- and $2N$-sided polygon:
(defn- refine-by-doubling
"`s` is the side length of an N-sided polygon inscribed in the unit circle. The
return value is the side length of a 2N-sided polygon."
[s]
(/ s (g/sqrt (+ 2 (g/sqrt (- 4 (g/square s)))))))
If we can increase the number of sides => infinity, we should reach a circle. The "semi-perimeter" of an $N$-sided polygon is
\[P_n = {n \over 2} S_n\]
In code:
(defn- semi-perimeter
"Returns the semi-perimeter length of an `n`-sided regular polygon with side
length `side-len`."
[n side-len]
(* (/ n 2) side-len))
so as \(n \to \infty\), \(P_n\) should approach \(\pi\), the half-perimeter of a circle.
Let's start with a square, ie, \(n = 4\) and \(s_4 = \sqrt{2}\). Clojure's iterate
function will let us create an infinite sequence of side lengths:
(def ^:private side-lengths
(iterate refine-by-doubling (Math/sqrt 2)))
and an infinite sequence of the number of sides:
(def ^:private side-numbers
(iterate #(* 2 %) 4))
Mapping a function across two sequences at once generates a new infinite sequence, of semi-perimeter lengths in this case:
(def ^:private archimedean-pi-sequence
(map semi-perimeter side-numbers side-lengths))
The following code will print the first 20 terms:
(us/pprint 10 archimedean-pi-sequence)
2.8284271247461903
3.0614674589207183
3.1214451522580524
3.1365484905459393
3.140331156954753
3.141277250932773
3.1415138011443013
3.141572940367092
3.14158772527716
3.1415914215112
Unfortunately (for Archimedes, by hand!), as the paper notes, it takes 26 iterations to converge to machine precision:
(-> archimedean-pi-sequence
(us/seq-limit {:tolerance v/machine-epsilon}))
{:converged? true, :terms-checked 26, :result 3.1415926535897944}
Enter Sussman:
"Imagine poor Archimedes doing the arithmetic by hand: square roots without even the benefit of our place value system! He would be interested in knowing that full precision can be reached on the fifth term, by forming linear combinations of the early terms that allow the limit to be seized by extrapolation." (p4, Abstraction in Numerical Methods)
Sussman does this by noting that you can also write the side length as:
\[S_n = 2 \sin {\pi \over n}\]
Then the taylor series expansion for \(P_n\) becomes:
\[ P_n = {n \over 2} S_n \ = {n \over 2} 2 \sin {\pi \over n} \ = \pi + {A\ over n^2} + B \over n^4 ... \]
A couple things to note:
The big idea is to multiply \(P_{2n}\) by 4 and subtract \(P_n\) (then divide by 3 to cancel out the extra factor). This will erase the \(A \over n^2\) term and leave a new sequence with \(B \over n^4\) as the dominant error term.
Now keep going and watch the error terms drain away.
Before we write code, let's follow the paper's example and imagine instead some general sequence of $R(h), R(h/t), R(h/t^{2})…$ (where \(t = 2\) in the example above), with a power series expansion that looks like
\[R(h) = A + B h^{p_1} + C h^{p_2}...\]
where the exponents $p_{1}, p_{2}, …$ are some OTHER series of error growth. (In the example above, because the taylor series expanson of \(n \sin n\) only has even factors, the sequence was the even numbers.)
In that case, the general way to cancel error between successive terms is:
\[{R(h/t) - t^{p_1} R(h)} = {t^{p_1} - 1} A + C_1 h^{p_2} + ...\]
or:
\[\frac{R(h/t) - t^{p_1} R(h)}{t^{p_1} - 1} = A + C_2 h^{p_2} + ...\]
Let's write this in code:
(defn- accelerate-sequence
"Generates a new sequence by combining each term in the input sequence `xs`
pairwise according to the rules for richardson acceleration.
`xs` is a sequence of evaluations of some function of $A$ with its argument
smaller by a factor of `t` each time:
$$A(h), A(h/t), ...$$
`p` is the order of the dominant error term for the sequence."
[xs t p]
(let [t**p (Math/pow t p)
t**p-1 (dec t**p)]
(map (fn [ah ah-over-t]
(/ (- (* t**p ah-over-t) ah)
t**p-1))
xs
(rest xs))))
If we start with the original sequence, we can implement Richardson extrapolation by using Clojure's iterate
with the accelerate-sequence
function to generate successive columns in the "Richardson Tableau". (This is starting to sound familiar to the scheme for polynomial interpolation, isn't it?)
To keep things general, let's take a general sequence ps
, defaulting to the sequence of natural numbers.
(defn- make-tableau
"Generates the 'tableau' of succesively accelerated Richardson interpolation
columns."
([xs t] (make-tableau xs t (iterate inc 1)))
([xs t ps]
(->> (iterate (fn [[xs [p & ps]]]
[(accelerate-sequence xs t p) ps])
[xs ps])
(map first)
(take-while seq))))
All we really care about are the FIRST terms of each sequence. These approximate the sequence's final value with small and smaller error (see the paper for details).
Polynomial interpolation in polynomial.cljc
has a similar tableau structure (not by coincidence!), so we can use ip/first-terms
in the implementation below to fetch this first row.
Now we can put it all together into a sequence transforming function, with nice docs:
(defn richardson-sequence
"Takes:
- `xs`: a (potentially lazy) sequence of points representing function values
generated by inputs continually decreasing by a factor of `t`. For example:
`[f(x), f(x/t), f(x/t^2), ...]`
- `t`: the ratio between successive inputs that generated `xs`.
And returns a new (lazy) sequence of 'accelerated' using [Richardson
extrapolation](https://en.wikipedia.org/wiki/Richardson_extrapolation) to
cancel out error terms in the taylor series expansion of `f(x)` around the
value the series to which the series is trying to converge.
Each term in the returned sequence cancels one of the error terms through a
linear combination of neighboring terms in the sequence.
### Custom P Sequence
The three-arity version takes one more argument:
- `p-sequence`: the orders of the error terms in the taylor series expansion
of the function that `xs` is estimating. For example, if `xs` is generated
from some `f(x)` trying to approximate `A`, then `[p_1, p_2...]` etc are the
correction terms:
$$f(x) = A + B x^{p_1} + C x^{p_2}...$$
The two-arity version uses a default `p-sequence` of `[1, 2, 3, ...]`
### Arithmetic Progression
The FOUR arity version takes `xs` and `t` as before, but instead of
`p-sequence` makes the assumption that `p-sequence` is an arithmetic
progression of the form `p + iq`, customized by:
- `p`: the exponent on the highest-order error term
- `q`: the step size on the error term exponent for each new seq element
## Notes
Richardson extrapolation is a special case of polynomial extrapolation,
implemented in `polynomial.cljc`.
Instead of a sequence of `xs`, if you generate an explicit series of points of
the form `[x (f x)]` with successively smaller `x` values and
polynomial-extrapolate it forward to x == 0 (with,
say, `(polynomial/modified-neville xs 0)`) you'll get the exact same result.
Richardson extrapolation is more efficient since it can make assumptions about
the spacing between points and pre-calculate a few quantities. See the
namespace for more discussion.
References:
- Wikipedia: https://en.wikipedia.org/wiki/Richardson_extrapolation
- GJS, 'Abstraction in Numerical Methods': https://dspace.mit.edu/bitstream/handle/1721.1/6060/AIM-997.pdf?sequence=2"
([xs t]
(ip/first-terms
(make-tableau xs t)))
([xs t p-sequence]
(ip/first-terms
(make-tableau xs t p-sequence)))
([xs t p q]
(let [arithmetic-p-q (iterate #(+ q %) p)]
(richardson-sequence xs t arithmetic-p-q))))
We can now call this function, combined with us/seq-limit
(a general-purpose tool that takes elements from a sequence until they converge), to see how much acceleration we can get:
(-> (richardson-sequence archimedean-pi-sequence 2 2 2)
(us/seq-limit {:tolerance v/machine-epsilon}))
{:converged? true, :terms-checked 7, :result 3.1415926535897936}
Much faster!
Richardson extrapolation works by cancelling terms in the error terms of a function's taylor expansion about 0
. To cancel the nth error term, the nth derivative has to be defined. Non-smooth functions aren't going to play well with richardson-sequence
above.
The solution is to look at specific columns of the Richardson tableau. Each column is a sequence with one further error term cancelled.
rational.cljc
and polynomial.cljc
both have this feature in their tableau-based interpolation functions. The feature here requires a different function, because the argument vector is a bit crowded already in richardson-sequence
above.
(defn richardson-column
"Function with an identical interface to `richardson-sequence` above, except for
an additional second argument `col`.
`richardson-column` will return that /column/ offset the interpolation tableau
instead of the first row. This will give you a sequence of nth-order
Richardson accelerations taken between point `i` and the next `n` points.
As a reminder, this is the shape of the Richardson tableau:
p0 p01 p012 p0123 p01234
p1 p12 p123 p1234 .
p2 p23 p234 . .
p3 p34 . . .
p4 . . . .
So supplying a `column` of `1` gives a single acceleration by combining points
from column 0; `2` kills two terms from the error sequence, etc.
NOTE Given a better interface for `richardson-sequence`, this function could
be merged with that function."
([xs col t]
(nth (make-tableau xs t) col))
([xs col t p-seq]
(nth (make-tableau xs t p-seq) col))
([xs col t p q]
(let [arithmetic-p-q (iterate #(+ q %) p)]
(richardson-column xs col t arithmetic-p-q))))
It turns out that the Richardson extrapolation is a special case of polynomial extrapolation using Neville's algorithm (as described in polynomial/neville
), evaluated at x == 0.
Neville's algorithm looks like this:
\[P(x) = [(x - x_r) P_l(x) - (x - x_l) P_r(x)] / [x_l - x_r]\]
Where:
Fill in \(x = 0\) and rearrange:
\[P(0) = [(x_l P_r(0)) - (x_r P_l(x))] \over [x_l - x_r]\]
In the Richardson extrapolation scheme, one of our parameters was t
, the ratio between successive elements in the sequence. Now multiply through by \(1 = {1 \over x_r} \over {1 \over x_r}\) so that our formula contains ratios:
\[P(0) = [({x_l \over x_r} P_r(0)) - P_l(x)] \over [{x_l \over x_r} - 1]\]
Because the sequence of \(x_i\) elements looks like \(x, x/t, x/t^2\), every recursive step separates \(x_l\) and \(x_r\) by another factor of \(t\). So
\[{x_l \over x_r} = {x \over {x \over t^n}} = t^n\]
Where \(n\) is the difference between the positions of \(x_l\) and \(x_r\). So the formula simplifies further to:
\[P(0) = [({t^n} P_r(0)) - P_l(x)] \over [{t^n} - 1]\]
Now it looks exactly like Richardson extrapolation. The only difference is that Richardson extrapolation leaves n
general (and calls it \(p_1, p_2\) etc), so that you can customize the jumps in the error series. (I'm sure there is some detail I'm missing here, so please feel free to make a PR and jump in!)
For the example above, we used a geometric series with \(p, q = 2\) to fit the archimedean \(\pi\) sequence. Another way to think about this is that we're fitting a polynomial to the SQUARE of h
(the side length), not to the actual side length.
Let's confirm that polynomial extrapolation to 0 gives the same result, if we generate squared \(x\) values:
(let [h**2 (fn [i]
;; (1/t^{i + 1})^2
(-> (/ 1 (Math/pow 2 (inc i)))
(Math/pow 2)))
xs (map-indexed (fn [i fx] [(h**2 i) fx])
archimedean-pi-sequence)]
(= (us/seq-limit
(richardson-sequence archimedean-pi-sequence 4 1 1))
(us/seq-limit
(ip/modified-neville xs 0.0))))
true
Success!
The general thing that "richardson extrapolation" is doing below. Historically cool and used to accelerate arbitrary integration sequences.
(ns quadrature.interpolate.polynomial
"This namespace contains a discussion of polynomial interpolation, and different
methods for fitting a polynomial of degree N-1 to N points and evaluating that
polynomial at some different `x`."
(:require [sicmutils.generic :as g]
[quadrature.util.aggregate :as ua]
[quadrature.util.stream :as us]))
First, a Lagrange interpolation polynomial:
(defn lagrange
"Generates a lagrange interpolating polynomial that fits every point in the
supplied sequence `points` (of form `[x (f x)]`) and returns the value of the
polynomial evaluated at `x`.
The Lagrange polynomial has this form:
g(x) = (f(a) * [(x-b)(x-c)...] / [(a-b)(a-c)...])
+ (f(b) * [(x-a)(x-c)...] / [(b-a)(b-c)...])
+ ...
for points `[a f(a)], [b f(b)], [c f(c)]` etc.
This particular method of interpolating `x` into the polynomial is
inefficient; any new calculation requires fully recomputing. Takes O(n^2)
operations in the number of points.
"
[points x]
(let [points (vec points)
n (count points)
build-term (fn [i [a fa]]
(let [others (for [j (range n) :when (not= i j)]
(get-in points [j 0]))
p (reduce g/* (map #(g/- x %) others))
q (reduce g/* (map #(g/- a %) others))]
(g// (g/* fa p) q)))]
(transduce (map-indexed build-term)
g/+
points)))
Lagrange's interpolating polynomial is straightforward, but not terribly efficient; every time we change points
or x
we have to redo the entire calculation. Ideally we'd like to be able to perform:
points
that would let us efficiently evaluate the fitted polynomial for different values of x
in O(n) time, orx
that would let us efficiently add new points to the set we use to generate the interpolating polynomial."Neville's algorithm" lets us generate the same interpolating polynomial recursively. By flipping the recursion around and generating values from the bottom up, we can achieve goal #2 and add new points incrementally.
Start the recursion with a single point. Any point \((x, f(x))\) has a unique 0th order polynomial passing through it - the constant function \(P(x) = f(x)\). For points \(x_a\), \(x_b\), let's call this \(P_a\), \(P_b\), etc.
\(P_{ab}\) is the unique FIRST order polynomial (ie, a line) going through points \(x_a\) and \(x_b\).
this first recursive step gives us this rule:
\[P_{ab}(x) = [(x - x_b) P_a(x) - (x - x_a) P_b(x)] / [x_a - x_b]\]
For higher order terms like \(P_{abcd}\), let's call \(P_{abc}\) 'P_{l}', and \(P_{bcd}\) 'P_{r}' (the polynomial fitted through the left and right set of points).
Similarly, the left and rightmost inputs - \(x_a\) and \(x_b\) - will be \(x_l\) and \(x_r\).
Neville's algorithm states that:
\[P(x) = [(x - x_r) P_l(x) - (x - x_l) P_r(x)] / [x_l - x_r]\]
This recurrence works because the two parents \(P_l\) and \(P_r\) already agree at all points except \(x_l\) and \(x_r\).
(defn neville-recursive
"Top-down implementation of Neville's algorithm.
Returns the value of `P(x)`, where `P` is a polynomial fit (using Neville's
algorithm) to every point in the supplied sequence `points` (of form `[x (f
x)]`)
The efficiency and results should be identical to
`quadrature.interpolate/lagrange`. This function represents a step on
the journey toward more incremental methods of polynomial interpolation.
References:
- Press's Numerical Recipes (p103), chapter 3: http://phys.uri.edu/nigh/NumRec/bookfpdf/f3-1.pdf
- Wikipedia: https://en.wikipedia.org/wiki/Neville%27s_algorithm"
[points x]
(letfn [(evaluate [points]
(if (= 1 (count points))
(let [[[_ y]] points]
y)
(let [l-branch (pop points)
r-branch (subvec points 1)
[xl] (first points)
[xr] (peek points)]
(g// (g/+ (g/* (g/- x xr) (evaluate l-branch))
(g/* (g/- xl x) (evaluate r-branch)))
(g/- xl xr)))))]
(evaluate (vec points))))
Neville's algorithm generates each new polynomial from \(P_l\) and \(P_r\), using this recursion to incorporate the full set of points.
You can write these out these relationships in a "tableau":
p0
\
p01
/ \
p1 p012
\ / \
p12 p0123
/ \ / \
p2 p123 p01234
\ / \ /
p23 p1234
/ \ /
p3 p234
\ /
p34
/
p4
The next few functions will discuss "rows" and "columns" of the tableau. That refers to the rows and columns of this representation;
p0 p01 p012 p0123 p01234
p1 p12 p123 p1234 .
p2 p23 p234 . .
p3 p34 . . .
p4 . . . .
. . . . .
. . . . .
. . . . .
The first column here is the initial set of points. Each entry in each successive column is generated through some operation between the entry to its left, and the entry one left and one up.
Look again at Neville's algorithm:
\[P(x) = [(x - x_r) P_l(x) - (x - x_l) P_r(x)] / [x_l - x_r]\]
\(l\) refers to the entry in the same row, previous column, while \(r\) is one row higher, previous column.
If each cell in the above tableau tracked:
we could build up Neville's rule incrementally. Let's attempt to build a function of this signature:
(defn neville-incremental*
"Takes a potentially lazy sequence of `points` and a point `x` and generates a
lazy sequence of approximations of P(x).
entry N in the returned sequence is the estimate using a polynomial generated
from the first N points of the input sequence."
[points x]
,,,)
First, write a function to process each initial point into a vector that contains each of those required elements:
(defn- neville-prepare
"Processes each point of the form [x, (f x)] into:
$$[x_l, x_r, p]$$
where $p$ is the polynomial that spans all points from $l$ to $r$. The
recursion starts with $p = f(x)$.
"
[[x fx]]
[x x fx])
Next, a function that generates the next entry, given l and r:
(defn- neville-combine-fn
"Given some value $x$, returns a function that combines $l$ and $r$ entries in
the tableau, arranged like this:
l -- return
/
/
/
r
generates the `return` entry of the form
$$[x_l, x_r, p]$$."
[x]
(fn [[xl _ pl] [_ xr pr]]
(let [plr (g// (g/+ (g/* (g/- x xr) pl)
(g/* (g/- xl x) pr))
(g/- xl xr))]
[xl xr plr])))
We can use higher-order functions to turn this function into a NEW function that can transform an entire column:
(defn- neville-next-column
"This function takes some point $x$, and returns a new function that takes some
column in the tableau and generates the next column."
[x]
(fn [prev-column]
(map (neville-combine-fn x)
prev-column
(rest prev-column))))
neville-tableau
will generate the entire tableau:
(defn- neville-tableau [points x]
(->> (map neville-prepare points)
(iterate (neville-next-column x))
(take-while seq)))
Really, we're only interested in the first row:
\[p_0, p_{01}, p_{012}, p_{0123}, p_{01234}\]
So define a function to grab that:
(defn first-terms [tableau]
(map first tableau))
the final piece we need is a function that will extract the estimate from our row of \([x_l, x_r, p]\) vectors:
(defn- neville-present [row]
(map (fn [[_ _ p]] p) row))
Putting it all together:
(defn neville-incremental*
"Takes a potentially lazy sequence of `points` and a point `x` and generates a
lazy sequence of approximations of P(x).
entry N in the returned sequence is the estimate using a polynomial generated
from the first N points of the input sequence."
[points x]
(neville-present
(first-terms
(neville-tableau points x))))
How do we know this works? We can prove it by using generic arithmetic to compare the full symbolic lagrange polynomial to each entry in the successive approximation.
(defn- lagrange-incremental
"Generates a sequence of estimates of `x` to polynomials fitted to `points`;
each entry uses one more point, just like `neville-incremental*`."
[points x]
(let [n (count points)]
(map (fn [i]
(lagrange (take i points) x))
(range 1 (inc n)))))
Every point is the same:
(let [points [['x_1 'y_1] ['x_2 'y_2] ['x_3 'y_3] ['x_4 'y_4]]]
(map (fn [neville lagrange]
(g/simplify
(g/- neville lagrange)))
(neville-incremental* points 'x)
(lagrange-incremental points 'x)))
(0 0 0 0)
The above pattern, of processing tableau entries, is general enough that we can abstract it out into a higher order function that takes a prepare
and merge
function and generates a tableau. Any method generating a tableau can use a present
function to extract the first row, OR to process the tableau in any other way that they like.
This is necessarily more abstract! But we'll specialize it shortly, and rebuild neville-incremental
into its final form.
I'm keeping points
in the argument vector for now, vs returning a new function; if you want to do this yourself, curry the function with (partial tableau-fn prepare merge present)
.
(defn tableau-fn
"Returns a Newton-style approximation tableau, given:
- `prepare`: a fn that processes each element of the supplied `points` into
the state necessary to calculate future tableau entries.
- `merge`: a fn of `l`and `r` the tableau entries:
l -- return
/
/
/
r
the inputs are of the same form returned by `prepare`. `merge` should return a
new structure of the same form.
- `points`: the (potentially lazy) sequence of points used to generate the
first column of the tableau.
"
[prepare merge points]
(let [next-col (fn [previous-col]
(map merge
previous-col
(rest previous-col)))]
(->> (map prepare points)
(iterate next-col)
(take-while seq))))
Redefine neville-merge
to make it slightly more efficient, with baked-in native operations:
(defn- neville-merge
"Returns a tableau merge function. Identical to `neville-combine-fn` but uses
native operations instead of generic operations."
[x]
(fn [[xl _ pl] [_ xr pr]]
(let [p (/ (+ (* (- x xr) pl)
(* (- xl x) pr))
(- xl xr))]
[xl xr p])))
And now, neville
, identical to neville-incremental*
except using the generic tableau generator.
The form of the tableau also makes it easy to select a particular column instead of just the first row. Columns are powerful because they allow you to successively interpolate between pairs, triplets etc of points, instead of moving onto very high order polynomials.
I'm not sure it's the best interface, but we'll add that arity here.
(defn neville
"Takes:
- a (potentially lazy) sequence of `points` of the form `[x (f x)]` and
- a point `x` to interpolate
and generates a lazy sequence of approximations of P(x). Each entry in the
return sequence incorporates one more point from `points` into the P(x)
estimate.
Said another way: the Nth in the returned sequence is the estimate using a
polynomial generated from the first N points of the input sequence:
p0 p01 p012 p0123 p01234
This function generates each estimate using Neville's algorithm:
$$P(x) = [(x - x_r) P_l(x) - (x - x_l) P_r(x)] / [x_l - x_r]$$
## Column
If you supply an integer for the third `column` argument, `neville` will
return that /column/ of the interpolation tableau instead of the first row.
This will give you a sequence of nth-order polynomial approximations taken
between point `i` and the next `n` points.
As a reminder, this is the shape of the tableau:
p0 p01 p012 p0123 p01234
p1 p12 p123 p1234 .
p2 p23 p234 . .
p3 p34 . . .
p4 . . . .
So supplying a `column` of `1` gives a sequence of linear approximations
between pairs of points; `2` gives quadratic approximations between successive
triplets, etc.
References:
- Press's Numerical Recipes (p103), chapter 3: http://phys.uri.edu/nigh/NumRec/bookfpdf/f3-1.pdf
- Wikipedia: https://en.wikipedia.org/wiki/Neville%27s_algorithm
"
([points x]
(neville-present
(first-terms
(tableau-fn neville-prepare
(neville-merge x)
points))))
([points x column]
(-> (tableau-fn neville-prepare
(neville-merge x)
points)
(nth column)
(neville-present))))
Press's Numerical Recipes, chapter 3 (p103) ( http://phys.uri.edu/nigh/NumRec/bookfpdf/f3-1.pdf ) describes a modified version of Neville's algorithm that is slightly more efficient than the version above.
Allan Macleod, in "A comparison of algorithms for polynomial interpolation", discusses this variation under the name "Modified Neville".
By generating the delta from each previous estimate in the tableau, Modified Neville is able to swap one of the multiplications above for an addition.
To make this work, instead of tracking the previous \(p\) estimate, we track two quantities:
We can recover the estimates generated by the original Neville's algorithm by summing C values across the first tableau row.
Equation 3.1.5 in Numerical recipes gives us the equations we need:
\[ C_{abc} = [(x_a - x)(C_{bc} - D_{ab})] / [x_a - x_c] &\ = [(x_l - x)(C_r - D_l)] / [x_l - x_r] \]
\[ D_{abc} = [(x_c - x)(C_{bc} - D_{ab})] / [x_a - x_c] &\ = [(x_r - x)(C_r - D_l)] / [x_l - x_r] \]
These equations describe a merge
function for a tableau processing scheme, with state == [x_l, x_r, C, D]
.
Let's implement each method, and then combine them into final form. The following methods use the prefix mn
for "Modified Neville".
(defn- mn-prepare
"Processes an initial point [x (f x)] into the required state:
[x_l, x_r, C, D]
The recursion starts with $C = D = f(x)$."
[[x fx]]
[x x fx fx])
(defn- mn-merge
"Implements the recursion rules described above to generate x_l, x_r, C and D
for a tableau node, given the usual left and left-up tableau entries."
[x]
(fn [[xl _ _ dl] [_ xr cr _]]
(let [diff (- cr dl)
den (- xl xr)
factor (/ diff den)
c (* factor (- xl x))
d (* factor (- xr x))]
[xl xr c d])))
(defn mn-present
"Returns a (lazy) sequence of estimates by successively adding C values from the
first entry of each tableau column. Each C value is the delta from the
previous estimate."
[row]
(ua/scanning-sum
(map (fn [[_ _ c _]] c) row)))
tableau-fn
allows us to assemble these pieces into a final function that has an interface identical to neville
above. The implementation is more obfuscated but slightly more efficient.
(defn modified-neville
"Similar to `neville` (the interface is identical) but slightly more efficient.
Internally this builds up its estimates by tracking the delta from the
previous estimate.
This non-obvious change lets us swap an addition in for a multiplication,
making the algorithm slightly more efficient.
See the `neville` docstring for usage information, and info about the required
structure of the arguments.
The structure of the `modified-neville` algorithm makes it difficult to select
a particular column. See `neville` if you'd like to generate polynomial
approximations between successive sequences of points.
References:
- \"A comparison of algorithms for polynomial interpolation\", A. Macleod,
https://www.sciencedirect.com/science/article/pii/0771050X82900511
- Press's Numerical Recipes (p103), chapter 3: http://phys.uri.edu/nigh/NumRec/bookfpdf/f3-1.pdf
"
[points x]
(mn-present
(first-terms
(tableau-fn mn-prepare
(mn-merge x)
points))))
The advantage of the method described above, where we generate an entire tableau and lazily pull the first entry off of each column, is that we can pass a lazy sequence in as points
and get a lazy sequence of successive estimates back. If we don't pull from the result sequence, no computation will occur.
One problem with that structure is that we have to have our sequence of points available when we call a function like neville
. What if we want to pause, save the current estimate and pick up later where we left off?
Look at the tableau again:
p0 p01 p012 p0123 p01234
p1 p12 p123 p1234 .
p2 p23 p234 . .
p3 p34 . . .
p4 . . . .
. . . . .
. . . . .
. . . . .
If you stare at this for a while, you might notice that it should be possible to use the merge
and present
functions we already have to build the tableau one row at a time, given ONLY the previous row:
(f [p1 p12 p123 p1234] [x0 fx0]) ;; => [p0 p01 p012 p0123 p01234]
Here's something close, using our previous merge
and prepare
definitions:
(defn- generate-new-row* [prepare merge]
(fn [prev-row point]
;; the new point, once it's prepared, is the first entry in the new row.
;; From there, we can treat the previous row as a sequence of "r" values.
(reduce merge (prepare point) prev-row)))
there's a problem here. reduce
only returns the FINAL value of the aggregation:
(let [f (generate-new-row* prepare present)]
(f [p1 p12 p123 p1234] [x0 fx0]))
;; => p01234
We want the entire new row! Lucky for us, Clojure has a version of reduce
, called reductions
, that returns each intermediate aggregation result:
(defn- generate-new-row [prepare merge]
(fn [prev-row point]
(reductions merge (prepare point) prev-row)))
(let [f (generate-new-row prepare present)]
(f [p1 p12 p123 p1234] [x0 fx0]))
;; => [p0 p01 p012 p0123 p01234]
Quick aside here, as we've stumbled across a familiar pattern. The discussion above suggests the idea of a "fold" from functional programming: https://en.wikipedia.org/wiki/Fold_{(higher-orderfunction)}
A fold consists of:
init
, an initial piece of state called an "accumulator"merge
function that combines ("folds") a new element x
into the accumulator and returns a value of the same shape / type as init
.present
function that transforms the accumulator into a final value.In Clojure, you perform a fold on a sequence with the reduce
function:
(reduce merge init xs)
For example:
(reduce + 0.0 (range 10))
45.0
Our generate-new-row
function from above is exactly the merge
function of a fold. The accumulator is the latest tableau row:
init
= [], the initial empty row. ~present~ =
the same present function as before (neville-present
or mn-present
)
Now that we've identified this new pattern, redefine generate-new-row
with a new name:
(defn tableau-fold-fn
"Transforms the supplied `prepare` and `merge` functions into a new function
that can merge a new point into a tableau row (generating the next tableau
row).
More detail on the arguments:
- `prepare`: a fn that processes each element of the supplied `points` into
the state necessary to calculate future tableau entries.
- `merge`: a fn of `l`and `r` the tableau entries:
l -- return
/
/
/
r
the inputs are of the same form returned by `prepare`. `merge` should return a
new structure of the same form."
[prepare merge]
(fn [prev-row point]
(reductions merge (prepare point) prev-row)))
Next, we can use this to generate specialized fold functions for our two incremental algorithms above - neville
and modified-neville
:
(defn- neville-fold-fn
"Returns a function that accepts:
- `previous-row`: previous row of an interpolation tableau
- a new point of the form `[x (f x)]`
and returns the next row of the tableau using the algorithm described in
`neville`."
[x]
(tableau-fold-fn neville-prepare
(neville-merge x)))
(defn- modified-neville-fold-fn
"Returns a function that accepts:
- `previous-row`: previous row of an interpolation tableau
- a new point of the form `[x (f x)]`
and returns the next row of the tableau using the algorithm described in
`modified-neville`."
[x]
(tableau-fold-fn mn-prepare
(mn-merge x)))
This final function brings back in the notion of present
. It returns a function that consumes an entire sequence of points, and then passes the final row into the exact present-fn
we used above:
(defn tableau-fold
"Returns a function that accepts a sequence of points and processes them into a
tableau by generating successive rows, one at a time.
The final row is passed into `present-fn`, which generates the final return
value.
This is NOT appropriate for lazy sequences! Fully consumes the input."
[fold-fn present-fn]
(fn [points]
(present-fn
(reduce fold-fn [] points))))
Note that these folds process points in the OPPOSITE order as the column-wise tableau functions! Because you build up one row at a time, each new point is PRE-pended to the interpolations in the previous row.
The advantage is that you can save the current row, and then come back and absorb further points later.
The disadvantage is that if you present
p123, you'll see successive estimates for [p1, p12, p123]...
but if you then prepend 0
, you'll see estimates for [p0, p01, p012, p0123]
. These don't share any elements, so they'll be totally different.
If you reverse the incoming point sequence, the final row of the fold will in fact equal the row of the column-based method.
If you want a true incremental version of the above code, reverse points! We don't do this automatically in case points is an infinite sequence.
tableau-scan
below will return a function that acts identically to the non-fold, column-wise version of the interpolators. It does this by folding in one point at a time, but processing EVERY intermediate value through the presentation function.
(defn tableau-scan
"Takes a folding function and a final presentation function (of accumulator type
=> return value) and returns a NEW function that:
- accepts a sequence of incoming points
- returns the result of calling `present` on each successive row."
[fold-fn present-fn]
(fn [xs]
(->> (reductions fold-fn [] xs)
(map present-fn)
(rest))))
And finally, we specialize to our two incremental methods.
(defn neville-fold
"Returns a function that consumes an entire sequence `xs` of points, and returns
a sequence of successive approximations of `x` using polynomials fitted to the
points in reverse order.
This function uses the `neville` algorithm internally."
[x]
(tableau-fold (neville-fold-fn x)
neville-present))
(defn neville-scan
"Returns a function that consumes an entire sequence `xs` of points, and returns
a sequence of SEQUENCES of successive polynomial approximations of `x`; one
for each of the supplied points.
For a sequence a, b, c... you'll see:
[(neville [a] x)
(neville [b a] x)
(neville [c b a] x)
...]"
[x]
(tableau-scan (neville-fold-fn x)
neville-present))
(defn modified-neville-fold
"Returns a function that consumes an entire sequence `xs` of points, and returns
a sequence of successive approximations of `x` using polynomials fitted to the
points in reverse order.
This function uses the `modified-neville` algorithm internally."
[x]
(tableau-fold (modified-neville-fold-fn x)
mn-present))
(defn modified-neville-scan
"Returns a function that consumes an entire sequence `xs` of points, and returns
a sequence of SEQUENCES of successive polynomial approximations of `x`; one
for each of the supplied points.
For a sequence a, b, c... you'll see:
[(modified-neville [a] x)
(modified-neville [b a] x)
(modified-neville [c b a] x)
...]"
[x]
(tableau-scan (modified-neville-fold-fn x)
mn-present))
Next, check out:
rational.cljc
to learn how to interpolate rational functionsrichardson.cljc
for a specialized implementation of polynomial interpolation, when you know something about the ratios between successive x
elements in the point sequence.(ns quadrature.interpolate.rational
"This namespace contains a discussion of rational function interpolation, and
different methods for fitting rational functions N points and evaluating them
at some value `x`."
(:require [quadrature.interpolate.polynomial :as ip]
[sicmutils.generic :as g]
[quadrature.util.aggregate :as ua]
[quadrature.util.stream :as us]
[taoensso.timbre :as log]))
This namespace contains implementations of rational function interpolation methods. The ALGLib user guide has a nice page on rational function interpolation, which suggests that the Bulirsch-Stoer method, included here, is NOT great, and that there are better methods. We'd love implementations of the others if you agree!
The main method in this package is an incremental version of the Bulirsch-Stoer algorithm.
Just like with polynomial interpolation, let's start with a straightforward implementation of the non-incremental recursive algorithm.
(defn bulirsch-stoer-recursive
"Returns the value of `P(x)`, where `P` is rational function fit (using the
Bulirsch-Stoer algorithm, of similar style to Neville's algorithm described in
`polynomial.cljc`) to every point in the supplied sequence `points`.
`points`: is a sequence of pairs of the form `[x (f x)]`
\"The Bulirsch-Stoer algorithm produces the so-called diagonal rational
function, with the degrees of numerator and denominator equal (if m is even)
or with the degree of the denominator larger by one if m is odd.\" ~ Press,
Numerical Recipes, p105
The implementation follows Equation 3.2.3 on on page 105 of Press:
http://phys.uri.edu/nigh/NumRec/bookfpdf/f3-2.pdf.
References:
- Stoer & Bulirsch, 'Introduction to Numerical Analysis': https://www.amazon.com/Introduction-Numerical-Analysis-Applied-Mathematics/dp/144193006X
- PDF of the same: http://www.math.uni.wroc.pl/~olech/metnum2/Podreczniki/(eBook)%20Introduction%20to%20Numerical%20Analysis%20-%20J.Stoer,R.Bulirsch.pdf
- Press's Numerical Recipes (p105), Section 3.2 http://phys.uri.edu/nigh/NumRec/bookfpdf/f3-2.pdf"
[points x]
(letfn [(evaluate [points x]
(cond (empty? points) 0
(= 1 (count points))
(let [[[_ y]] points]
y)
:else
(let [l-branch (pop points)
r-branch (subvec points 1)
center (pop r-branch)
[xl] (first points)
[xr] (peek points)
rl (evaluate l-branch x)
rr (evaluate r-branch x)
rc (evaluate center x)
p (g/- rr rl)
q (-> (/ (g/- x xl)
(g/- x xr))
(g/* (g/- 1 (g// p (g/- rr rc))))
(g/- 1))]
(g/+ rr (g// p q)))))]
(let [point-array (vec points)]
(evaluate point-array x))))
We can be a bit more clever, if we reuse the idea of the "tableau" described in the polynomial namespace.
(defn bulirsch-stoer
"Takes
- a (potentially lazy) sequence of `points` of the form `[x (f x)]` and
- a point `x` to interpolate
and generates a lazy sequence of approximations of `P(x)`. Each entry in the
return sequence incorporates one more point from `points` into the P(x)
estimate.
`P(x)` is rational function fit (using the Bulirsch-Stoer algorithm, of
similar style to Neville's algorithm described in `polynomial.cljc`) to every
point in the supplied sequence `points`.
\"The Bulirsch-Stoer algorithm produces the so-called diagonal rational
function, with the degrees of numerator and denominator equal (if m is even)
or with the degree of the denominator larger by one if m is odd.\" ` Press,
Numerical Recipes, p105
The implementation follows Equation 3.2.3 on on page 105 of Press:
http://phys.uri.edu/nigh/NumRec/bookfpdf/f3-2.pdf.
## Column
If you supply an integer for the third (optional) `column` argument,
`bulirsch-stoer` will return that /column/ offset the interpolation tableau
instead of the first row. This will give you a sequence of nth-order
polynomial approximations taken between point `i` and the next `n` points.
As a reminder, this is the shape of the tableau:
p0 p01 p012 p0123 p01234
p1 p12 p123 p1234 .
p2 p23 p234 . .
p3 p34 . . .
p4 . . . .
So supplying a `column` of `1` gives a sequence of 2-point approximations
between pairs of points; `2` gives 3-point approximations between successive
triplets, etc.
References:
- Stoer & Bulirsch, 'Introduction to Numerical Analysis': https://www.amazon.com/Introduction-Numerical-Analysis-Applied-Mathematics/dp/144193006X
- PDF of the same: http://www.math.uni.wroc.pl/~olech/metnum2/Podreczniki/(eBook)%20Introduction%20to%20Numerical%20Analysis%20-%20J.Stoer,R.Bulirsch.pdf
- Press's Numerical Recipes (p105), Section 3.2 http://phys.uri.edu/nigh/NumRec/bookfpdf/f3-2.pdf"
[points x & [column]]
(let [prepare (fn [[x fx]] [x x 0 fx])
merge (fn [[xl _ _ rl] [_ xr rc rr]]
(let [p (- rr rl)
q (-> (/ (- x xl)
(- x xr))
(* (- 1 (/ p (- rr rc))))
(- 1))]
[xl xr rl (+ rr (/ p q))]))
present (fn [row] (map (fn [[_ _ _ r]] r) row))
tableau (ip/tableau-fn prepare merge points)]
(present
(if column
(nth tableau column)
(ip/first-terms tableau)))))
Press, in Numerical Recipes section 3.2, describes a modification to the Bulirsch-Stoer that lets you track the differences from the left and left-up entries in the tableau, just like the modified Neville method in polynomial.cljc
. the algorithm is implemented below.
(defn bs-prepare
"Processes an initial point [x (f x)] into the required state:
[x_l, x_r, C, D]
The recursion starts with $C = D = f(x)$."
[[x fx]] [x x fx fx])
(defn bs-merge
"Implements the recursion rules described in Press's Numerical Recipes, section
3.2 http://phys.uri.edu/nigh/NumRec/bookfpdf/f3-2.pdf to generate x_l, x_r, C
and D for a tableau node, given the usual left and left-up tableau entries.
This merge function ALSO includes a 'zero denominator fix used by Bulirsch and
Stoer and Henon', in the words of Sussman from `rational.scm` in the scmutils
package.
If the denominator is 0, we pass along C from the up-left node and d from the
previous entry in the row. Otherwise, we use the algorithm to calculate.
TODO understand why this works, or where it helps!"
[x]
(fn [[xl _ _ dl] [_ xr cr _]]
(let [c-d (- cr dl)
d*ratio (-> (/ (- x xl)
(- x xr))
(* dl))
den (- d*ratio cr)]
(if (zero? den)
(do (log/info "zero denominator!")
[xl xr cr dl])
(let [cnum (* d*ratio c-d)
dnum (* cr c-d)]
[xl xr (/ cnum den) (/ dnum den)])))))
(defn modified-bulirsch-stoer
"Similar to `bulirsch-stoer` (the interface is identical) but slightly more efficient.
Internally this builds up its estimates by tracking the delta from the
previous estimate.
This non-obvious change lets us swap an addition in for a division,
making the algorithm slightly more efficient.
See the `bulirsch-stoer` docstring for usage information, and info about the
required structure of the arguments.
References:
- Press's Numerical Recipes (p105), Section 3.2 http://phys.uri.edu/nigh/NumRec/bookfpdf/f3-2.pdf"
[points x]
(ip/mn-present
(ip/first-terms
(ip/tableau-fn bs-prepare
(bs-merge x)
points))))
Just like in polynomial.cljc
, we can write rational interpolation in the style of a functional fold:
(defn modified-bulirsch-stoer-fold-fn
"Returns a function that accepts:
- `previous-row`: previous row of an interpolation tableau
- a new point of the form `[x (f x)]`
and returns the next row of the tableau using the algorithm described in
`modified-bulirsch-stoer`."
[x]
(ip/tableau-fold-fn
bs-prepare
(bs-merge x)))
(defn modified-bulirsch-stoer-fold
"Returns a function that consumes an entire sequence `xs` of points, and returns
a sequence of successive approximations of `x` using rational functions fitted
to the points in reverse order."
[x]
(ip/tableau-fold
(modified-bulirsch-stoer-fold-fn x)
ip/mn-present))
(defn modified-bulirsch-stoer-scan
"Returns a function that consumes an entire sequence `xs` of points, and returns
a sequence of SEQUENCES of successive rational function approximations of `x`;
one for each of the supplied points.
For a sequence a, b, c... you'll see:
[(modified-bulirsch-stoer [a] x)
(modified-bulirsch-stoer [b a] x)
(modified-bulirsch-stoer [c b a] x)
...]"
[x]
(ip/tableau-scan
(modified-bulirsch-stoer-fold-fn x)
ip/mn-present))
derivatives using three kinds of central difference formulas… accelerated using Richardson extrapolation, with a nice technique for guarding against underflow.
(ns quadrature.derivative
"Different numerical derivative implementations."
(:require [sicmutils.calculus.derivative :as d]
[quadrature.interpolate.richardson :as r]
[sicmutils.function :as f]
[sicmutils.generic :as g]
[sicmutils.infix :as if]
[sicmutils.util :as u]
[quadrature.util.stream :as us]
[sicmutils.value :as v]))
This module builds up to an implementation of numerical derivatives. The final function, D-numeric
, uses Richardson extrapolation to speed up convergence of successively tighter estimates of \(f^{\prime}(x)\) using a few different methods.
The inspiration for this style was Sussman's "Abstraction in Numerical Methods", starting on page 10: https://dspace.mit.edu/bitstream/handle/1721.1/6060/AIM-997.pdf?sequence=2
We'll proceed by deriving the methods symbolically, and then implement them numerically.
First, a function that will print nicely rendered infix versions of (simplified) symbolic expressions:
(defn show [e]
(let [eq (if/->TeX (g/simplify e))]
(println
(str "\\begin{equation}\n"
eq
"\n\\end{equation}"))))
And a function to play with:
(def ^:private func
(f/literal-function 'f))
The key to all of these methods involves the taylor series expansion of an arbitrary function \(f\) around a point \(x\); we know the taylor series will include a term for \(f^{\prime}(x)\), so the goal is to see if we can isolate it.
Here's the taylor series expansions of \(f(x + h)\):
(def ^:private fx+h
(->> (d/taylor-series func 'x 'h)
(take 5)
(reduce g/+)))
Use show
to print out its infix representation:
(show fx+h)
\begin{equation}
\frac{1}{24},{h}^{4},{D}^{4}f\left(x\right) + \frac{1}{6},{h}^{3},{D}^{3}f\left(x\right) + \frac{1}{2},{h}^{2},{D}^{2}f\left(x\right) + h,Df\left(x\right) + f\left(x\right)
\end{equation}
We can solve this for \(Df(x)\) by subtracting \(f(x)\) and dividing out \(h\):
(show (g// (g/- fx+h (func 'x)) 'h))
\begin{equation}
\frac{1}{24},{h}^{3},{D}^{4}f\left(x\right) + \frac{1}{6},{h}^{2},{D}^{3}f\left(x\right) + \frac{1}{2},h,{D}^{2}f\left(x\right) + Df\left(x\right)
\end{equation}
Voila! The remaining terms include \(D f(x)\) along with a series of progressively-smaller "error terms" (since \(h \to 0\)). The first of these terms is \({1 \over 2} h D^2 f(x)\). It will come to dominate the error as \(h \to 0\), so we say that the approximation we've just derived has error of \(O(h)\).
This particular formula, in the limit as \(h \to 0\), is called the "forward difference approximation" to \(Df(x)\). Here's the Clojure implementation:
(defn forward-difference
"Returns a single-variable function of a step size `h` that calculates the
forward-difference estimate of the the first derivative of `f` at point `x`:
f'(x) = [f(x + h) - f(x)] / h
Optionally accepts a third argument `fx == (f x)`, in case you've already
calculated it elsewhere and would like to save a function evaluation."
([f x] (forward-difference f x (f x)))
([f x fx]
(fn [h]
(/ (- (f (+ x h)) fx) h))))
We could also expand \(f(x - h)\):
(def ^:private fx-h
(->> (d/taylor-series func 'x (g/negate 'h))
(take 5)
(reduce g/+)))
(show fx-h)
\begin{equation}
\frac{1}{24},{h}^{4},{D}^{4}f\left(x\right) + \frac{-1}{6},{h}^{3},{D}^{3}f\left(x\right) + \frac{1}{2},{h}^{2},{D}^{2}f\left(x\right) - h,Df\left(x\right) + f\left(x\right)
\end{equation}
and solve for \(Df(x)\):
(show (g// (g/- (func 'x) fx-h) 'h))
\begin{equation}
\frac{-1}{24},{h}^{3},{D}^{4}f\left(x\right) + \frac{1}{6},{h}^{2},{D}^{3}f\left(x\right) + \frac{-1}{2},h,{D}^{2}f\left(x\right) + Df\left(x\right)
\end{equation}
To get a similar method, called the "backward difference" formula. Here's the implementation:
(defn backward-difference
"Returns a single-variable function of a step size `h` that calculates the
backward-difference estimate of the first derivative of `f` at point `x`:
f'(x) = [f(x) - f(x - h)] / h
Optionally accepts a third argument `fx == (f x)`, in case you've already
calculated it elsewhere and would like to save a function evaluation."
([f x] (backward-difference f x (f x)))
([f x fx]
(fn [h]
(/ (- fx (f (- x h))) h))))
Notice that the two expansions, of \(f(x + h)\) and \(f(x - h)\), share every term paired with an even power of \(h\). The terms associated with odd powers of \(h\) alternate in sign (because of the \(-h\) in the expansion of \(f(x - h)\)).
We can find yet another method for approximating \(Df(x)\) if we subtract these two series. We're trying to solve for \(Df(x)\), and \(Df(x)\) appears paired with \(h\), an odd-powered term… so subtracting \(f(x-h)\) should double that term, not erase it. Let's see:
(show (g/- fx+h fx-h))
\begin{equation}
\frac{1}{3},{h}^{3},{D}^{3}f\left(x\right) + 2,h,Df\left(x\right)
\end{equation}
Amazing! Now solve for \(Df(x)\):
(show (g// (g/- fx+h fx-h)
(g/* 2 'h)))
\begin{equation}
\frac{1}{6},{h}^{2},{D}^{3}f\left(x\right) + Df\left(x\right)
\end{equation}
We're left with \(Df(x) + O(h^2)\), a quadratic error term in \(h\). (Of course if we'd expanded to more than initial terms in the taylor series we'd see a long error series with only even powers.)
This formula is called the "central difference" approximation to the first derivative. Here's the implementation:
(defn central-difference
"Returns a single-variable function of a step size `h` that calculates the
central-difference estimate of the first derivative of `f` at point `x`:
f'(x) = [f(x + h) - f(x - h)] / 2h"
[f x]
(fn [h]
(/ (- (f (+ x h)) (f (- x h)))
(* 2 h))))
There's one more approximation we can extract from these two expansions. We noted earlier that the terms associated with odd powers of \(h\) alternate in sign. If we add the two series, these odd terms should all cancel out. Let's see:
(show (g/+ fx-h fx+h))
\begin{equation}
\frac{1}{12},{h}^{4},{D}^{4}f\left(x\right) + {h}^{2},{D}^{2}f\left(x\right) + 2,f\left(x\right)
\end{equation}
Interesting. The \(Df(x)\) term is gone. Remember that we have \(f(x)\) available; the first unknown term in the series is now \(D^2 f(x)\). Solve for that term:
(show (g// (g/- (g/+ fx-h fx+h) (g/* 2 (func 'x)))
(g/square 'h)))
\begin{equation}
\frac{1}{12},{h}^{2},{D}^{4}f\left(x\right) + {D}^{2}f\left(x\right)
\end{equation}
This is the "central difference" approximation to the second derivative of \(f\). Note that the error term here is quadratic in \(h\). Here it is in code:
(defn central-difference-d2
"Returns a single-variable function of a step size `h` that calculates the
central-difference estimate of the second derivative of `f` at point `x`:
f''(x) = [f(x + h) - 2f(x) + f(x - h)] / h^2
Optionally accepts a third argument `fx == (f x)`, in case you've already
calculated it elsewhere and would like to save a function evaluation."
([f x] (central-difference-d2 f x (f x)))
([f x fx]
(let [fx*2 (* 2 fx)]
(fn [h]
(/ (- (+ (f (+ x h))
(f (- x h)))
fx*2)
(* h h))))))
Let's attempt to use these estimates and see how accurate they are. (This section follows Sussman starting on page 10.)
The following function returns a new function that approximates \(Df(x)\) using the central difference method, with a fixed value of \(h = 0.00001\):
(defn- make-derivative-fn
[f]
(fn [x]
(let [h 1e-5]
((central-difference f x) h))))
The error here is not great, even for a simple function:
((make-derivative-fn g/square) 3)
6.000000000039306
Let's experiment instead with letting \(h \to 0\). This next function takes a function \(f\), a value of \(x\) and an initial \(h\), and generates a stream of central difference approximations to \(Df(x)\) using successively halved values of \(h\), ie, \((h, h/2, h/4, h/8, ....)\)
(defn- central-diff-stream [f x h]
(map (central-difference f x)
(us/zeno 2 h)))
Let's print 20 of the first 60 terms (taking every 3 so we see any pattern):
(->> (central-diff-stream g/sqrt 1 0.1)
(take-nth 3)
(us/pprint 20))
0.5006277505981893
0.5000097662926306
0.5000001525880649
0.5000000023844109
0.5000000000381988
0.5000000000109139
0.4999999998835847
0.5000000004656613
0.5000000074505806
0.49999989569187164
0.5000001192092896
0.4999971389770508
0.500030517578125
0.49957275390625
0.50048828125
0.48828125
0.625
0.0
0.0
0.0
At first, the series converges toward the proper value. But as \(h\) gets smaller, \(f(x + h)\) and \(f(x - h)\) get so close together that their difference is less than the minimum epsilon allowed by the system's floating point representation.
As Sussman states: "Hence we are in a race between truncation error, which starts out large and gets smaller, and roundoff error, which starts small and gets larger." ~Sussman, p12
We can actually analyze and quantify how many halvings we can apply to \(h\) before roundoff error swamps our calculation.
Why does roundoff error occur? From Sussman: "Any real number \(x\), represented in the machine, is rounded to a value \(x(1 + e)\), where \(e\) is effectively a random variable whose absolute value is on the order of the machine epsilon \(\epsilon\): that smallest positive number for which 1.0 and \(1.0 + \epsilon\) can be distinguished."
In the current library, v/machine-epsilon
holds this value.
Our goal, then, is to see if we can figure out when the error due to roundoff grows so large that it exceeds the tolerance we want to apply to our calculation.
For the central difference formula:
\[f^{\prime}(x) = {f(x + h) - f(x - h)} \over {2h}\]
without any roundoff error, the numerator should be equal to \(2h f'(x)\). In reality, for small values of \(h\), \(f(x + h)\) and \(f(x - h)\) both have machine representations in error by about \(f(x) \epsilon\). Their difference doesn't change the order, so we can say that their difference also has error of \(f(x) \epsilon\).
Dividing these two together, the relative error is:
\[\epsilon\left|\frac{f(x)}{2 h f^{\prime}(x)}\right|\]
The relative error doubles each time \(h\) is halved. This is technically just the relative error of the numerator of the central difference method, but we know the denominator \(2h\) to full precision, so we can ignore it here.
If we actually calculate this ratio, we'll find the INITIAL relative error due to roundoff for a given h. Also, because we want to make sure that we're working in integer multiples of machine epsilon, let's actually take the next-highest-integer of the ratio above. The following method takes the ratio above as an argument, and returns:
\[1 + floor(\lvert ratio \rvert)\]
(defn- roundoff-units
"Returns the number of 'roundoff units', ie, multiples of the machine epsilon,
that roundoff error contributes to the total relative error, given a relative
error percentage estimated for some initial step size $h$."
[rel-error-ratio]
(inc
(Math/floor
(Math/abs
(double rel-error-ratio)))))
That calculation, as the documentation states, returns the number of "roundoff units". Let's call it \(r\).
Each iteration doubles the relative error contributed by roundoff. Given some tolerance, how many roundoff error doublings (or, equivalently, halvings of \(h\)) can we tolerate before roundoff error swamps our calculation?
Here's the solution:
(defn- max-iterations
"Solution for `n`, in:
`initial-error` * 2^n <= `tolerance`"
[units tolerance]
(let [initial-error (* v/machine-epsilon units)]
(Math/floor
(/ (Math/log (/ tolerance initial-error))
(Math/log 2)))))
Let's combine these two ideas into a final function, terms-before-roundoff
, that calculates how items we can pull from a sequence like central-diff-stream
above before roundoff swamps us. (This is 1 + max iterations, because we need to include the first point.)
(defn- terms-before-roundoff
"Generates a default max number of terms, based on roundoff error estimates."
[ratio tolerance]
(inc
(max-iterations (roundoff-units ratio)
tolerance)))
How many terms are we allowed to examine for an estimate of the derivative of \(f(x) = \sqrt(x)\), with an initial \(h = 0.1\)?
(let [f g/sqrt
x 1
h 0.1
tolerance 1e-13
ratio (/ (f x)
(- (f (+ x h))
(f (- x h))))]
(terms-before-roundoff ratio tolerance))
6.0
6 terms, or 5 halvings, down to \(h = {0.1} \over {2^5} = 0.003125\). How many terms does the sequence take to converge?
(-> (central-diff-stream g/sqrt 1 0.1)
(us/seq-limit {:tolerance 1e-13}))
{:converged? true, :terms-checked 15, :result 0.5000000000109139}
15 is far beyond the level where roundoff error has rendered our results untrustworthy.
We need a way to converge more quickly. richardson.cljc
lays out a general method of "sequence acceleration" that we can use here, since we know the arithmetic progression of the terms in the error series for each of our methods above.
For the central difference method, our highest-order error term has an exponent of \(p = 2\), and successive terms are all even. r/richardson-sequence
takes p
and q
for an arithmetic sequence of error exponents $p, p + q, p + 2q…$
It also needs the initial size \(h\) of our sequence progression.
Given that information, we can transform our existing sequence of estimates into an accelerated sequence of estimates using Richardson extrapolation. Does it converge in fewer terms?
(let [h 0.1, p 2, q 2]
(-> (central-diff-stream g/sqrt 1 h)
(r/richardson-sequence h p q)
(us/seq-limit {:tolerance 1e-13})))
{:converged? true, :terms-checked 6, :result 0.5006325594766895}
Happily, it does, in only 5 terms instead of 15! This brings convergence in under our limit of 6 total terms.
If you're interested in more details of Richardson extrapolation, please see richardson.cljc
! For now we'll proceed.
We're ready to write our final numeric differentiation routine, D-numeric
. First, some supporting structure. We support four methods, so let's describe them using keywords in a set:
(def valid-methods
#{:central :central-d2 :forward :backward})
To apply one of the methods, we need to be able to:
Once again, richardson.cljc
for a discussion of \(p\) and \(q\).
This configs
function bundles all of this together. I don't know that this is the best abstraction, but I don't know yet of any other methods for numeric differentiation, so it'll do for now.
Note here that \(p = q = 2\) for both central difference methods, just like we determined above. the forward and backward difference methods both have all of the remaining terms from the taylor expansion in their error series, so they only get \(p = q = 1\).
(defn- configs [method f x fx]
(case method
:forward
{:p 1
:q 1
:function (forward-difference f x fx)
:ratio-fn (fn [h] (/ fx (- (f (+ x h)) fx)))}
:central
{:p 2
:q 2
:function (central-difference f x)
:ratio-fn (fn [h]
(/ fx (- (f (+ x h))
(f (- x h)))))}
:backward
{:p 1
:q 1
:function (backward-difference f x fx)
:ratio-fn (fn [h]
(/ fx (- fx (f (- x h)))))}
:central-d2
{:p 2
:q 2
:function (central-difference-d2 f x fx)
:ratio-fn (fn [h]
(- (+ (f (+ x h))
(f (- x h)))
(* 2 fx)))}
(u/illegal
(str "Invalid method: " method ". Please try one of " valid-methods))))
(defn- fill-defaults
"Fills in default values required by `D-numeric`. Any option not used by
`D-numeric` gets passed on to `us/seq-limit`."
[m]
(let [defaults {:tolerance (Math/sqrt v/machine-epsilon)
:method :central}
{:keys [method] :as opts} (merge defaults m)]
(assert (contains? valid-methods method)
(str method " is not a valid method. Please try one of: " valid-methods))
opts))
(defn D-numeric
"Takes a function `f: R => R` (function of a single real variable), and returns
a new function of `x` that approximates the derivative $Df(x)$ (or $D^2f(x)$
if you pass `:method :central-d2`).
Returns the estimated value of the derivative at `x`. If you pass `:info?
true`, the fn returns a dictionary of the results of `us/seq-limit`:
{:converged? <boolean>
:terms-checked <int>
:result <derivative estimate>}
Make sure to visit `sicmutils.calculus.derivative/D` if you want symbolic or
automatic differentiation.
## Roundoff Estimate
The returned function will attempt to estimate how many times it can halve the
step size used to estimate the derivative before roundoff error swamps the
calculation, and force the function to return (with `:converged? false`, if
you pass `:info?`)
## Optional Arguments
`D-numeric` takes optional args as its second param. Any of these can be
overridden by passing a second argument to the function returned by
`D-numeric`; helpful for setting defaults and then overriding them later.
The returned function passes through these and any other options to
`us/seq-limit`, where they control the sequence of richardson
extrapolation-accelerated estimates.
Options:
- `:method`: one of `:central`, `:central-d2`, `:forward` or `:backward`.
`:central-d2` forces a second derivative estimate; the other methods configure
a first derivative estimator.
- `:info?` if false (default), returns the estimated value of `x`. If true,
returns a dictionary with more information (see `D-numeric`'s docstring for
more info.)
- `:initial-h`: the initial `h` to use for derivative estimates before $h \to
0$. Defaults to 0.1 * abs(x).
- `:tolerance`: see `us/stream-limit` for a discussion of how this value
handles relative vs absolute tolerance. $\\sqrt(\\epsilon)$ by default, where
$\\epsilon$ = machine tolerance.
- `:maxterms`: the maximum number of terms to consider when hunting for a
derivative estimate. This defaults to an estimate generated internally,
designed to prevent roundoff error from swamping the result. If you want to
disable this feature, set `:maxterms` to something moderately large, like
`:maxterms 100`. But do so carefully! See the surrounding namespace for a
larger discussion."
([f] (D-numeric f {}))
([f opts]
(let [opts (fill-defaults opts)]
(fn df
([x] (df x {}))
([x overrides]
(let [{:keys [maxterms tolerance initial-h method info?] :as opts} (merge opts overrides)
{:keys [ratio-fn function p q]} (configs method f x (f x))
h (or initial-h (* 0.1 (g/abs x)))
n (or maxterms (terms-before-roundoff
(ratio-fn h)
tolerance))
estimates (map function (us/zeno 2 h))
result (-> (r/richardson-sequence estimates 2 p q)
(us/seq-limit (assoc opts :maxterms n)))]
(if info? result (:result result))))))))
More resources about numerical differentiation:
#+end_{src}
fit a parabola to every slice. OR, "accelerate" the trapezoid method with one step of Richardson extrapolation!
(ns quadrature.simpson
(:require [quadrature.common :as qc
#?@(:cljs [:include-macros true])]
[quadrature.trapezoid :as qt]
[quadrature.interpolate.richardson :as ir]))
This numerical integration method is a closed Newton-Cotes formula; for each integral slice, Simpson's rule samples each endpoint and the midpoint and combines them into an area estimate for this slice using the following formula:
\[{{h} \over 3} (f_0 + 4f_1 + f_2)\]
Given a window of \([a, b]\) and a "step size" of \(h = {{b - a} \over 2}\). The point \(f_i\) is the point \(i\) steps into the window.
There are a few simpler ways to understand this:
trapezoid.cljc
), subject to a single refinement of "Richardson extrapolation".\[S = {{2M + T} \over 3}\]
The test namespace contains a symbolic proof that the Richardson-extrapolated Trapezoid method is equivalent to using the formula above to calculate Simpson's rule directly.
(defn simpson-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the closed interval $[a, b]$ using Simpson's rule.
Simpson's rule is equivalent to the trapezoid method subject to one refinement
of Richardson extrapolation. The trapezoid method fits a line to each
integration slice. Simpson's rule fits a quadratic to each slice.
Returns estimates with $n, 2n, 4n, ...$ slices, geometrically increasing by a
factor of 2 with each estimate.
## Optional arguments:
If supplied, `:n` (default 1) specifies the initial number of slices to use."
([f a b] (simpson-sequence f a b {:n 1}))
([f a b {:keys [n] :or {n 1}}]
{:pre [(number? n)]}
(-> (qt/trapezoid-sequence f a b n)
(ir/richardson-column 1 2 2 2))))
(qc/defintegrator integral
"Returns an estimate of the integral of `f` over the closed interval $[a, b]$
using Simpson's rule with $1, 2, 4 ... 2^n$ windows for each estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `simpson-sequence` for more information about Simpson's rule, caveats that
might apply when using this integration method and information on the optional
args in `opts` that customize this function's behavior."
:area-fn (comp first simpson-sequence)
:seq-fn simpson-sequence)
(ns quadrature.simpson38
(:require [quadrature.common :as qc
#?@(:cljs [:include-macros true])]
[quadrature.trapezoid :as qt]
[quadrature.interpolate.richardson :as ir]
[quadrature.util.stream :as us]))
This numerical integration method is a closed Newton-Cotes formula; for each integral slice, Simpson's 3/8 rule samples each endpoint and TWO interior, equally spaced points, and combines them into an area estimate for this slice using the following formula:
\[{{3h} \over 8} (f_0 + 3f_1 + 3f_2 + f_3)\]
Given a window of \([a, b]\) and a "step size" of \(h = {{b - a} \over 3}\). The point \(f_i\) is the point \(i\) steps into the window.
There are a few simpler ways to understand this:
trapezoid.cljc
), subject to a single refinement of "Richardson extrapolation", with an threefold-increase of integration slices at each step, from \(n \to 3n\).The test namespace contains a symbolic proof that the Richardson-extrapolated Trapezoid method is equivalent to using the formula above to calculate Simpson's 3/8 rule directly.
(defn simpson38-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the closed interval $[a, b]$ using Simpson's 3/8 rule.
Simpson's 3/8 rule is equivalent to the trapezoid method subject to:
- one refinement of Richardson extrapolation, and
- a geometric increase of integration slices by a factor of 3 for each
sequence element. (the Trapezoid method increases by a factor of 2 by
default.)
The trapezoid method fits a line to each integration slice. Simpson's 3/8 rule
fits a cubic to each slice.
Returns estimates with $n, 3n, 9n, ...n3^i$ slices, geometrically increasing by a
factor of 3 with each estimate.
## Optional arguments:
If supplied, `:n` (default 1) specifies the initial number of slices to use.
NOTE: the Trapezoid method is able to reuse function evaluations as its
windows narrow /only/ when increasing the number of integration slices by 2.
Simpson's 3/8 rule increases the number of slices geometrically by a factor of
2 each time, so it will never hit the incremental path. You may want to
memoize your function before calling `simpson38-sequence`."
([f a b] (simpson38-sequence f a b {:n 1}))
([f a b {:keys [n] :or {n 1}}]
{:pre [(number? n)]}
(-> (qt/trapezoid-sequence f a b (us/powers 3 n))
(ir/richardson-column 1 3 2 2))))
(qc/defintegrator integral
"Returns an estimate of the integral of `f` over the closed interval $[a, b]$
using Simpson's 3/8 rule with $1, 3, 9 ... 3^n$ windows for each estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `simpson38-sequence` for more information about Simpson's 3/8 rule, caveats
that might apply when using this integration method and information on the
optional args in `opts` that customize this function's behavior."
:area-fn (comp first simpson38-sequence)
:seq-fn simpson38-sequence)
trapezoid method plus two steps of Richardson extrapolation. (Are you starting to see the pattern??)
(ns quadrature.boole
(:require [quadrature.common :as qc
#?@(:cljs [:include-macros true])]
[quadrature.trapezoid :as qt]
[quadrature.interpolate.richardson :as ir]))
NOTE: Boole's Rule is commonly mis-spelled as "Bode's Rule"!
This numerical integration method is a closed Newton-Cotes formula; for each integral slice, Boole's rule samples:
and combines them into an area estimate for this slice using the following formula:
\[{{2h} \over 45} (7f_0 + 32f_1 + 12f_2 + 32f_3 + 7f_4)\]
Given a window of \([a, b]\) and a "step size" of \(h = {{b - a} \over 4}\). The point \(f_i\) is the point \(i\) steps into the window.
There are a few simpler ways to understand this:
trapezoid.cljc
), subject to two refinements of "Richardson extrapolation".The test namespace contains a symbolic proof that the Richardson-extrapolated Trapezoid method is equivalent to using the formula above to calculate Boole's rule directly.
(defn boole-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the closed interval $[a, b]$ using Boole's rule.
Boole's rule is equivalent to the trapezoid method subject to two refinements
of Richardson extrapolation. The trapezoid method fits a line to each
integration slice. Boole's rule fits a quartic to each slice.
Returns estimates with $n, 2n, 4n, ...$ slices, geometrically increasing by a
factor of 2 with each estimate.
## Optional arguments:
If supplied, `:n` (default 1) specifies the initial number of slices to use."
([f a b] (boole-sequence f a b {:n 1}))
([f a b {:keys [n] :or {n 1}}]
{:pre [(number? n)]}
(-> (qt/trapezoid-sequence f a b n)
(ir/richardson-column 2 2 2 2))))
(qc/defintegrator integral
"Returns an estimate of the integral of `f` over the closed interval $[a, b]$
using Boole's rule with $1, 2, 4 ... 2^n$ windows for each estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `boole-sequence` for more information about Boole's rule, caveats that
might apply when using this integration method and information on the optional
args in `opts` that customize this function's behavior."
:area-fn (comp first boole-sequence)
:seq-fn boole-sequence)
(ns quadrature.romberg
(:require [quadrature.common :as qc
#?@(:cljs [:include-macros true])]
[quadrature.midpoint :as qm]
[quadrature.trapezoid :as qt]
[quadrature.interpolate.richardson :as ir]))
Romberg's method is a technique for estimating a definite integral over a closed (or open) range \(a, b\):
\[\int_{a}^{b} f(x) dx\]
By applying Richardson extrapolation (see richardson.cljc
) to either the Trapezoid method or the Midpoint method.
The implementation of Richardson extrapolation in this library can be applied to any methods; many of the numerical quadrature methods (Simpson, Simpson's 3/8, Milne, Boole) involve a single step of Richardson extrapolation.
Romberg integration goes all the way. A nice way to think about this algorithm is this:
For a wonderful reference that builds up to the ideas of Richardson extrapolation and Romberg integration, see Sussman's "Abstraction in Numerical Methods".
References:
(defn open-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the open interval $(a, b)$ by applying Richardson extrapolation to
successive integral estimates from the Midpoint rule.
Returns estimates formed by combining $n, 3n, 9n, ...$ slices, geometrically
increasing by a factor of 3 with each estimate. This factor of 3 is because,
internally, the Midpoint method is able to recycle old function evaluations
through this factor of 3.
Romberg integration converges quite fast by cancelling one error term in the
taylor series expansion of $f$ with each examined term. If your function is
/not/ smooth this may cause you trouble, and you may want to investigate a
lower-order method.
## Optional arguments:
If supplied, `:n` (default 1) specifies the initial number of slices to use."
([f a b] (open-sequence f a b {}))
([f a b {:keys [n] :or {n 1} :as opts}]
{:pre [(number? n)]}
(-> (qm/midpoint-sequence f a b opts)
(ir/richardson-sequence 3 2 2))))
(defn closed-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the closed interval $[a, b]$ by applying Richardson extrapolation to
successive integral estimates from the Trapezoid rule.
Returns estimates formed by combining $n, 2n, 4n, ...$ slices, geometrically
increasing by a factor of 2 with each estimate.
Romberg integration converges quite fast by cancelling one error term in the
taylor series expansion of $f$ with each examined term. If your function is
/not/ smooth this may cause you trouble, and you may want to investigate a
lower-order method.
## Optional arguments:
If supplied, `:n` (default 1) specifies the initial number of slices to use."
([f a b] (closed-sequence f a b {}))
([f a b {:keys [n] :or {n 1} :as opts}]
{:pre [(number? n)]}
(-> (qt/trapezoid-sequence f a b opts)
(ir/richardson-sequence 2 2 2))))
(defn romberg-sequence
"Higher-level abstraction over `closed-sequence` and `open-sequence`. Identical
to those functions (see their docstrings), but internally chooses either
implementation based on the interval specified inside of `opts`.
Defaults to the same behavior as `open-sequence`."
([f a b] (romberg-sequence f a b {}))
([f a b opts]
(let [seq-fn (if (qc/closed?
(qc/interval opts))
closed-sequence
open-sequence)]
(seq-fn f a b opts))))
(qc/defintegrator open-integral
"Returns an estimate of the integral of `f` over the open interval $(a, b)$
generated by applying Richardson extrapolation to successive integral
estimates from the Midpoint rule.
Considers $1, 3, 9 ... 3^n$ windows into $(a, b)$ for each successive
estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `open-sequence` for more information about Romberg integration, caveats
that might apply when using this integration method and information on the
optional args in `opts` that customize this function's behavior."
:area-fn qm/single-midpoint
:seq-fn open-sequence)
(qc/defintegrator closed-integral
"Returns an estimate of the integral of `f` over the closed interval $[a, b]$
generated by applying Richardson extrapolation to successive integral
estimates from the Trapezoid rule.
Considers $1, 2, 4 ... 2^n$ windows into $[a, b]$ for each successive
estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `closed-sequence` for more information about Romberg integration, caveats
that might apply when using this integration method and information on the
optional args in `opts` that customize this function's behavior."
:area-fn qt/single-trapezoid
:seq-fn closed-sequence)
(ns quadrature.milne
(:require [quadrature.common :as qc
#?@(:cljs [:include-macros true])]
[quadrature.midpoint :as qm]
[quadrature.interpolate.richardson :as ir]
[quadrature.util.stream :as us]))
This numerical integration method is an open Newton-Cotes formula; for each integral slice, Milne's rule samples three interior points (not the endpoints!) and combines them into an area estimate for this slice using the following formula:
\[{{4h} \over 3} (2f_1 - f_2 + 2f_3)\]
Given a window of \((a, b)\) and a "step size" of \(h = {{b - a} \over 3}\). The point \(f_i\) is the point \(i\) steps into the window.
There is a simpler way to understand this! Milne's method is, in fact, just the midpoint method (see midpoint.cljc
), subject to a single refinement of "Richardson extrapolation".
The test namespace contains a symbolic proof that the Richardson-extrapolated Midpoint method is equivalent to using the formula above to calculate Milne's rule directly.
(defn milne-sequence
"Returns a (lazy) sequence of successively refined estimates of the integral of
`f` over the open interval $(a, b)$ using Milne's rule.
Milne's rule is equivalent to the midpoint method subject to one refinement of
Richardson extrapolation.
Returns estimates with $n, 2n, 4n, ...$ slices, geometrically increasing by a
factor of 2 with each estimate.
## Optional arguments:
If supplied, `:n` (default 1) specifies the initial number of slices to use.
NOTE: the Midpoint method is able to reuse function evaluations as its windows
narrow /only/ when increasing the number of integration slices by 3. Milne's
method increases the number of slices geometrically by a factor of 2 each
time, so it will never hit the incremental path. You may want to memoize your
function before calling `milne-sequence`."
([f a b] (milne-sequence f a b {:n 1}))
([f a b {:keys [n] :or {n 1} :as opts}]
{:pre [(number? n)]}
(-> (qm/midpoint-sequence f a b (assoc opts :n (us/powers 2 n)))
(ir/richardson-column 1 2 2 2))))
(qc/defintegrator integral
"Returns an estimate of the integral of `f` over the open interval $(a, b)$
using Milne's rule with $1, 2, 4 ... 2^n$ windows for each estimate.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `milne-sequence` for more information about Milne's rule, caveats that
might apply when using this integration method and information on the optional
args in `opts` that customize this function's behavior."
:area-fn (comp first milne-sequence)
:seq-fn milne-sequence)
(ns quadrature.bulirsch-stoer
(:require [quadrature.interpolate.polynomial :as poly]
[quadrature.interpolate.rational :as rat]
[quadrature.common :as qc
#?@(:cljs [:include-macros true])]
[quadrature.midpoint :as mid]
[quadrature.trapezoid :as trap]
[sicmutils.generic :as g]
[sicmutils.util :as u]
[quadrature.util.stream :as us]))
This quadrature method comes from the scmutils package that inspired this library.
The idea is similar to Romberg integration:
Romberg integration does this by fitting a polynomial to a geometric series of slices - $1, 2, 4…$, for example - using Richardson extrapolation.
The Bulirsch-Stoer algorithm is exactly this, but:
Here are the default step sizes:
(def bulirsch-stoer-steps
(interleave
(us/powers 2 2)
(us/powers 2 3)))
The more familiar algorithm named "Bulirsch-Stoer" applies the same ideas to the solution of ODEs, as described on Wikipedia. scmutils adapted this into the methods you see here.
NOTE - The Wikipedia page states that "Hairer, Nørsett & Wanner (1993, p. 228), in their discussion of the method, say that rational extrapolation in this case is nearly never an improvement over polynomial interpolation (Deuflhard 1983)."
We can do this too! Passing {:bs-extrapolator :polynomial}
enables polynomial extrapolation in the sequence and integration functions implemented below.
One more detail is important to understand. You could apply the ideas above to any function that approximates an integral, but this namespace focuses on accelerating the midpoint and trapezoid methods.
As discussed in midpoint.cljc
and trapezoid.cljc
, the error series for these methods has terms that fall off as even powers of the integration slice width:
\[1/h^2, 1/h^4, ...\]
\[1/(h^2) 1/(h^2)^2, ...\]
This means that the rational function approximation needs to fit the function to points of the form
\[(h^2, f(h))\]
to take advantage of the acceleration. This trick is baked into Richardson extrapolation through the ability to specify a geometric series. richardson_test.cljc
shows that Richardson extrapolation is indeed equivalent to a polynomial fit using \(h^2\)… the idea here is the same.
The following two functions generate a sequence of NON-squared \(h\) slice widths. bs-sequence-fn
below squares each entry.
(defn- slice-width [a b]
(let [width (- b a)]
(fn [n] (/ width n))))
(defn- h-sequence
"Defines the sequence of slice widths, given a sequence of `n` (number of
slices) in the interval $(a, b)$."
([a b] (h-sequence a b bulirsch-stoer-steps))
([a b n-seq]
(map (slice-width a b) n-seq)))
The next group of functions generates open-sequence
and closed-sequence
methods, analagous to all other quadrature methods in the library.
(defn- extrapolator-fn
"Allows the user to specify polynomial or rational function extrapolation via
the `:bs-extrapolator` option."
[opts]
(if (= :polynomial (:bs-extrapolator opts))
poly/modified-neville
rat/modified-bulirsch-stoer))
(defn- bs-sequence-fn
"Accepts some function (like `mid/midpoint-sequence`) that returns a sequence of
successively better estimates to the integral, and returns a new function with
interface `(f a b opts)` that accelerates the sequence with either
- polynomial extrapolation
- rational function extrapolation
By default, The `:n` in `opts` (passed on to `integrator-seq-fn`) is set to
the sequence of step sizes suggested by Bulirsch-Stoer,
`bulirsch-stoer-steps`."
[integrator-seq-fn]
(fn call
([f a b]
(call f a b {:n bulirsch-stoer-steps}))
([f a b opts]
{:pre [(not (number? (:n opts)))]}
(let [{:keys [n] :as opts} (-> {:n bulirsch-stoer-steps}
(merge opts))
extrapolate (extrapolator-fn opts)
square (fn [x] (* x x))
xs (map square (h-sequence a b n))
ys (integrator-seq-fn f a b opts)]
(-> (map vector xs ys)
(extrapolate 0))))))
(def ^{:doc "Returns a (lazy) sequence of successively refined estimates of the
integral of `f` over the closed interval $[a, b]$ by applying rational
polynomial extrapolation to successive integral estimates from the Midpoint
rule.
Returns estimates formed from the same estimates used by the Bulirsch-Stoer
ODE solver, stored in `bulirsch-stoer-steps`.
## Optional arguments:
`:n`: If supplied, `n` (sequence) overrides the sequence of steps to use.
`:bs-extrapolator`: Pass `:polynomial` to override the default rational
function extrapolation and enable polynomial extrapolation using the modified
Neville's algorithm implemented in `poly/modified-neville`."}
open-sequence
(bs-sequence-fn mid/midpoint-sequence))
(def ^{:doc "Returns a (lazy) sequence of successively refined estimates of the
integral of `f` over the closed interval $[a, b]$ by applying rational
polynomial extrapolation to successive integral estimates from the Trapezoid
rule.
Returns estimates formed from the same estimates used by the Bulirsch-Stoer
ODE solver, stored in `bulirsch-stoer-steps`.
## Optional arguments:
`:n`: If supplied, `:n` (sequence) overrides the sequence of steps to use.
`:bs-extrapolator`: Pass `:polynomial` to override the default rational
function extrapolation and enable polynomial extrapolation using the modified
Neville's algorithm implemented in `poly/modified-neville`."}
closed-sequence
(bs-sequence-fn trap/trapezoid-sequence))
Finally, two separate functions that use the sequence functions above to converge quadrature estimates.
(qc/defintegrator open-integral
"Returns an estimate of the integral of `f` over the open interval $(a, b)$
generated by applying rational polynomial extrapolation to successive integral
estimates from the Midpoint rule.
Considers successive numbers of windows into $(a, b)$ specified by
`bulirsch-stoer-steps`.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `open-sequence` for more information about Bulirsch-Stoer quadrature,
caveats that might apply when using this integration method and information on
the optional args in `opts` that customize this function's behavior."
:area-fn mid/single-midpoint
:seq-fn open-sequence)
(qc/defintegrator closed-integral
"Returns an estimate of the integral of `f` over the closed interval $[a, b]$
generated by applying rational polynomial extrapolation to successive integral
estimates from the Trapezoid rule.
Considers successive numbers of windows into $[a, b]$ specified by
`bulirsch-stoer-steps`.
Optionally accepts `opts`, a dict of optional arguments. All of these get
passed on to `us/seq-limit` to configure convergence checking.
See `closed-sequence` for more information about Bulirsch-Stoer quadrature,
caveats that might apply when using this integration method and information on
the optional args in `opts` that customize this function's behavior."
:area-fn trap/single-trapezoid
:seq-fn closed-sequence)
Implemented as functional wrappers that take an integrator and return a modified integrator.
(ns quadrature.substitute
"## U Substitution and Variable Changes
This namespace provides implementations of functions that accept an
`integrator` and perform a variable change to address some singularity, like
an infinite endpoint, in the definite integral.
The strategies currently implemented were each described by Press, et al. in
section 4.4 of ['Numerical
Recipes'](http://phys.uri.edu/nigh/NumRec/bookfpdf/f4-4.pdf)."
(:require [clojure.core.match :refer [match]]
[quadrature.common :as qc]))
This first function, infinitize
, transforms some integrator into a new integrator with the same interface that can handle an infinite endpoint.
This implementation can only handle one endpoint at a time, and, the way it's written, both endpoints have to have the same sign. For an easier interface to this transformation, see infinite/evaluate-infinite-integral
in infinite.cljc
.
(defn infinitize
"Performs a variable substitution targeted at turning a single infinite endpoint
of an improper integral evaluation an (open) endpoint at 0 by applying the
following substitution:
$$u(t) = {1 \\over t}$$ $$du = {-1 \\over t^2}$$
This works when the integrand `f` falls off at least as fast as $1 \\over t^2$
as it approaches the infinite limit.
The returned function requires that `a` and `b` have the same sign, ie:
$$ab > 0$$
Transform the bounds with $u(t)$, and cancel the negative sign by changing
their order:
$$\\int_{a}^{b} f(x) d x=\\int_{1 / b}^{1 / a} \\frac{1}{t^{2}} f\\left(\\frac{1}{t}\\right) dt$$
References:
- Mathworld, \"Improper Integral\": https://mathworld.wolfram.com/ImproperIntegral.html
- Press, Numerical Recipes, Section 4.4: http://phys.uri.edu/nigh/NumRec/bookfpdf/f4-4.pdf"
[integrate]
(fn call
([f a b] (call f a b {}))
([f a b opts]
{:pre [(not
(and (qc/infinite? a)
(qc/infinite? b)))]}
(let [f' (fn [t]
(/ (f (/ 1.0 t))
(* t t)))
a' (if (qc/infinite? b) 0.0 (/ 1.0 b))
b' (if (qc/infinite? a) 0.0 (/ 1.0 a))
opts (qc/update-interval opts qc/flip)]
(integrate f' a' b' opts)))))
"To deal with an integral that has an integrable power-law singularity at its lower limit, one also makes a change of variable." (Press, p138)
A "power-law singularity" means that the integrand diverges as \((x - a)^{-\gamma}\) near \(x=a\).
We implement the following identity (from Press) if the singularity occurs at the lower limit:
\[\int_{a}^{b} f(x) d x=\frac{1}{1-\gamma} \int_{0}^{(b-a)^{1-\gamma}} t^{\frac{\gamma}{1-\gamma}} f\left(t^{\frac{1}{1-\gamma}}+a\right) d t \quad(b>a)\]
And this similar identity if the singularity occurs at the upper limit:
;;\[\int_{a}^{b} f(x) d x=\frac{1}{1-\gamma} \int_{0}^{(b-a)^{1-\gamma}} t^{\frac{\gamma}{1-\gamma}} f\left(b-t^{\frac{1}{1-\gamma}}\right) d t \quad(b>a)\]
If you have singularities at both sides, divide the interval at some interior breakpoint, take separate integrals for both sides and add the values back together.
(defn- inverse-power-law
"Implements a change of variables to address a power law singularity at the
lower or upper integration endpoint.
An \"inverse power law singularity\" means that the integrand diverges as
$$(x - a)^{-\\gamma}$$
near $x=a$. Passing true for `lower?` to specify a singularity at the lower
endpoint, false to signal an upper-endpoint singularity.
References:
- Mathworld, \"Improper Integral\": https://mathworld.wolfram.com/ImproperIntegral.html
- Press, Numerical Recipes, Section 4.4: http://phys.uri.edu/nigh/NumRec/bookfpdf/f4-4.pdf
- Wikipedia, \"Finite-time Singularity\": https://en.wikipedia.org/wiki/Singularity_(mathematics)#Finite-time_singularity
"
[integrate gamma lower?]
{:pre [(<= 0 gamma 1)]}
(fn call
([f a b] (call f a b {}))
([f a b opts]
(let [inner-pow (/ 1 (- 1 gamma))
gamma-pow (* gamma inner-pow)
a' 0
b' (Math/pow (- b a) (- 1 gamma))
t->t' (if lower?
(fn [t] (+ a (Math/pow t inner-pow)))
(fn [t] (- b (Math/pow t inner-pow))))
f' (fn [t] (* (Math/pow t gamma-pow)
(f (t->t' t))))]
(-> (integrate f' a' b' opts)
(update-in [:result] (partial * inner-pow)))))))
(defn inverse-power-law-lower [integrate gamma]
(inverse-power-law integrate gamma true))
(defn inverse-power-law-upper [integrate gamma]
(inverse-power-law integrate gamma false))
The next two functions specialize the inverse-power-law-*
functions to the common situation of an inverse power law singularity.
(defn inverse-sqrt-lower
"Implements a change of variables to address an inverse square root singularity
at the lower integration endpoint. Use this when the integrand diverges as
$$1 \\over {\\sqrt{x - a}}$$
near the lower endpoint $a$."
[integrate]
(fn call
([f a b] (call f a b {}))
([f a b opts]
(let [f' (fn [t] (* t (f (+ a (* t t)))))]
(-> (integrate f' 0 (Math/sqrt (- b a)) opts)
(update-in [:result] (partial * 2)))))))
(defn inverse-sqrt-upper
"Implements a change of variables to address an inverse square root singularity
at the upper integration endpoint. Use this when the integrand diverges as
$$1 \\over {\\sqrt{x - b}}$$
near the upper endpoint $b$."
[integrate]
(fn call
([f a b] (call f a b {}))
([f a b opts]
(let [f' (fn [t] (* t (f (- b (* t t)))))]
(-> (integrate f' 0 (Math/sqrt (- b a)) opts)
(update-in [:result] (partial * 2)))))))
From Press, section 4.4: "Suppose the upper limit of integration is infinite, and the integrand falls off exponentially. Then we want a change of variable that maps
\[\exp{-x} dx\]
into +- \(dt\) (with the sign chosen to keep the upper limit of the new variable larger than the lower limit)."
The required identity is:
\[\int_{x=a}^{x=\infty} f(x) d x=\int_{t=0}^{t=e^{-a}} f(-\log t) \frac{d t}{t}\]
(defn exponential-upper
"Implements a change of variables to address an exponentially diverging upper
integration endpoint. Use this when the integrand diverges as $\\exp{x}$ near
the upper endpoint $b$."
[integrate]
(fn call
([f a b] (call f a b {}))
([f a b opts]
{:pre [(qc/infinite? b)]}
(let [f' (fn [t] (* (- (Math/log t))
(/ 1 t)))
opts (qc/update-interval opts qc/flip)]
(integrate f 0 (Math/exp (- a)) opts)))))
A template for a combinator that enables infinite endpoints on any integrator, using variable substitution on an appropriate, tunable range.
(ns quadrature.infinite
(:require [clojure.core.match :refer [match]]
[quadrature.common :as qc]
[quadrature.substitute :as qs]))
This namespace holds an implementation of an "improper" integral combinator (for infinite endpoints) usable with any quadrature method in the library.
The implementation was inspired by evaluate-improper-integral
in numerics/quadrature/quadrature.scm
file in the scmutils package.
To evaluate an improper integral with an infinite endpoint, the improper
combinator applies an internal change of variables.
\[u(t) = {1 \over t}\] \[du = {-1 \over t^2}\]
This has the effect of mapping the endpoints from \(a, b\) to \({1 \over b}, {1 \over a}\). Here's the identity we implement:
\[\int_{a}^{b} f(x) d x=\int_{1 / b}^{1 / a} \frac{1}{t^{2}} f\left(\frac{1}{t}\right) dt\]
This is implemented by substitute/infinitize
.
The variable change only works as written when both endpoints are of the same sign; so, internally, improper
only applies the variable change to the segment of \(a, b\) from ##-Inf => (- :infinite-breakpoint)
and :infinite-breakpoint -> ##Inf
, where :infinite-breakpoint
is an argument the user can specify in the returned integrator's options map.
Any segment of \(a, b\) not in those regions is evaluated normally.
NOTE: The ideas in this namespace could be implemented for other variable substitutions (see substitute.cljc
) that only need to apply to certain integration intervals. The code below automatically cuts the range \((a, b)\) to accomodate this for the particular variable change we've baked in, but there is a more general abstraction lurking.
(defn improper
"Accepts:
- An `integrator` (function of `f`, `a`, `b` and `opts`)
- `a` and `b`, the endpoints of an integration interval, and
- (optionally) `opts`, a dict of integrator-configuring options
And returns a new integrator that's able to handle infinite endpoints. (If you
don't specify `##-Inf` or `##Inf`, the returned integrator will fall through
to the original `integrator` implementation.)
All `opts` will be passed through to the supplied `integrator`.
## Optional arguments relevant to `improper`:
`:infinite-breakpoint`: If either `a` or `b` is equal to `##Inf` or `##-Inf`,
this function will internally perform a change of variables on the regions
from:
`(:infinite-breakpoint opts) => ##Inf`
or
`##-Inf => (- (:infinite-breakpoint opts))`
using $u(t) = {1 \\over t}$, as described in the `infinitize` method of
`substitute.cljc`. This has the effect of mapping the infinite endpoint to an
open interval endpoint of 0.
Where should you choose the breakpoint? According to Press in Numerical
Recipes, section 4.4: \"At a sufficiently large positive value so that the
function funk is at least beginning to approach its asymptotic decrease to
zero value at infinity.\"
References:
- Press, Numerical Recipes (p138), Section 4.4: http://phys.uri.edu/nigh/NumRec/bookfpdf/f4-4.pdf"
[integrator]
(fn rec
([f a b] (rec f a b {}))
([f a b opts]
(let [{:keys [infinite-breakpoint] :as opts} (-> {:infinite-breakpoint 1}
(merge opts))
call (fn [integrate l r interval]
(let [m (qc/with-interval opts interval)]
(let [result (integrate f l r m)]
(:result result))))
ab-interval (qc/interval opts)
integrate (partial call integrator)
inf-integrate (partial call (qs/infinitize integrator))
r-break (Math/abs infinite-breakpoint)
l-break (- r-break)]
(match [[a b]]
[(:or [##-Inf ##-Inf] [##Inf ##Inf])]
{:converged? true
:terms-checked 0
:result 0.0}
[(:or [_ ##-Inf] [##Inf _])]
(-> (rec f b a opts)
(update-in [:result] -))
;; Break the region up into three pieces: a central closed core
;; and two open endpoints where we create a change of variables,
;; letting the boundary go to infinity. We use an OPEN interval on
;; the infinite side.
[[##-Inf ##Inf]]
(let [-inf->l (inf-integrate a l-break qc/open-closed)
l->r (integrate l-break r-break qc/closed)
r->+inf (inf-integrate r-break b qc/closed-open)]
{:converged? true
:result (+ -inf->l l->r r->+inf)})
;; If `b` lies to the left of the negative breakpoint, don't cut.
;; Else, cut the integral into two pieces at the negative
;; breakpoint and variable-change the left piece.
[[##-Inf _]]
(if (<= b l-break)
(inf-integrate a b ab-interval)
(let [-inf->l (inf-integrate a l-break qc/open-closed)
l->b (integrate l-break b (qc/close-l ab-interval))]
{:converged? true
:result (+ -inf->l l->b)}))
;; If `a` lies to the right of the positive breakpoint, don't cut.
;; Else, cut the integral into two pieces at the positive breakpoint
;; and variable-change the right piece.
[[_ ##Inf]]
(if (>= a r-break)
(inf-integrate a b ab-interval)
(let [a->r (integrate a r-break (qc/close-r ab-interval))
r->+inf (inf-integrate r-break b qc/closed-open)]
{:converged? true
:result (+ a->r r->+inf)}))
;; This is a lot of machinery to use with NON-infinite endpoints;
;; but for completeness, if you reach this level the fn will attempt
;; to integrate the full range directly using the original
;; integrator.
:else (integrator f a b opts))))))
The current implementation does not pass convergence information back up the line! Ideally we would merge results by:
:converged?
entries with and
(ns quadrature.adaptive
(:require [quadrature.common :as qc]
[quadrature.util.aggregate :as ua]))
This namespace holds an implementation of "adaptive quadrature" usable with any quadrature method in the library.
The implementation was inspired by the numerics/quadrature/rational.scm
file in the scmutils package. In that library, adaptive quadrature was special-cased to the Bulirsch-Stoer algorithm, ported in bulirsch_stoer.cljc
.
Most of the integrators in quadrature
work by successively refining an integration interval \(a, b\) down into evenly-spaced integration slices. Some functions are very well behaved in some regions, and then oscillate wildly in others.
Adaptive quadrature methods partition \(a, b\) into intervals of different sizes, allowing the integrator to drill in more closely to areas that need attention.
The adaptive
implementation below works like this:
integrate
function on the full \(a, b\), capping the iterations at some configurable limit (~*adaptive-maxterms*~below)integrate
fails to converge, split \(a, b\) into two intervals and attempt to converge both sides using integrate
common/narrow-slice?
, at which point integrate
returns an estimate for a single slice.The *adaptive-maxterms*
variable is dynamic, which means you can adjust the behavior of adaptive
by supplying the :maxterms
option directly, or wrapping your call in (binding [*adaptive-maxterms* 8] ,,,)
.
(def ^:dynamic *adaptive-maxterms* 10)
Symmetric intervals like \(-1, 1\) often show up with integrands with singularities right at the center of the midpoint. For this reason, adaptive
is able to customize its splitting behavior using the *neighborhood-width*
dynamic variable.
By default, when partitioning an interval, adaptive
will choose an interval within 5% of the midpoint. Override this behavior with the :adaptive-neighborhood-width
key in the options dict, or by binding this dynamic variable.
(def ^:dynamic *neighborhood-width* 0.05)
(defn- split-point
"Returns a point within`fuzz-factor` of the midpoint of the interval $[a, b]$.
`fuzz-factor` defaults to 0 (ie, `split-point` returns the midpoint)."
([a b] (split-point a b 0))
([a b fuzz-factor]
{:pre [(>= fuzz-factor 0)
(< fuzz-factor 1)]}
(let [width (- b a)
offset (if (zero? fuzz-factor)
0.5
(+ 0.5 (* fuzz-factor (dec (rand 2.0)))))]
(+ a (* offset width)))))
The implementation below takes two integrate functions, not the one described above. This allows us to handle open and closed intervals, instead of introducing open endpoints at every subdivision. All internal intervals that don't touch an open endpoint are considered closed.
(defn- fill-defaults
"Populates the supplied `opts` dictionary with defaults required by `adaptive`.
Two of these have values controlled by dynamic variables in `adaptive.cljc`."
[opts]
(merge {:maxterms *adaptive-maxterms*
:adaptive-neighborhood-width *neighborhood-width*
:interval qc/open}
opts))
(defn adaptive
"Accepts one or two 'integrators', ie, functions of:
- `f`: some integrand
- `a` and `b`: the lower and upper endpoints of integration
- `opts`, a dictionary of configuration options
And returns a new integrator that adaptively subdivides the region $a, b$ into
intervals if integration fails to converge. If two integrators are supplied,
the first is applied to any interval that's not explicitly closed on both
ends, and the second integrator is applied to explicitly closed intervals.
This behavior depends on the interval set in the supplied `opts` dictionary.
All `opts` will be passed through to the supplied `integrate` functions.
## Optional arguments relevant to `adaptive`:
`:maxterms`: defaults to `*adaptive-maxterms*`. This is passed to the
underlying integrators, and determines how long each interval attempts to
converge before a subdivision.
`:adaptive-neighborhood-width`: When dividing an interval, the split point
will be within this factor of the midpoint. Set `:adaptive-neighborhood-width`
to 0 for deterministic splitting."
([integrator] (adaptive integrator integrator))
([open-integrator closed-integrator]
(fn rec
([f a b] (rec f a b {}))
([f a b opts]
(let [opts (fill-defaults opts)
integrate (fn [l r interval]
(if (qc/closed? interval)
(closed-integrator f l r opts)
(open-integrator f l r opts)))]
(loop [stack [[a b (:interval opts)]]
sum (ua/kahan-sum)
iteration 0]
(if (empty? stack)
{:converged? true
:iterations iteration
:result (first sum)}
(let [[l r interval] (peek stack)
remaining (pop stack)
{:keys [converged? result]} (integrate l r interval)]
(if converged?
(recur remaining
(ua/kahan-sum sum result)
(inc iteration))
(let [midpoint (split-point l r (:adaptive-neighborhood-width opts))]
(recur (conj remaining
[midpoint r (qc/close-l interval)]
[l midpoint (qc/close-r interval)])
sum
(inc iteration))))))))))))
Iteration Limit
adaptive
runs until it completes, with no facility available to bail out of computation. An iteration limit would be a great addition… but it won't be efficient without some way of prioritizing high-error subintervals for refinement, as discussed next.
Priority Queue on Error
Another nice upgrade would be a version of adaptive
that is able to return an actual sequence of successive refinements. to the estimate. The current implementation uses an internal stack to track the subdivided intervals it needs to evaluate. If the integrator was able to provide an error estimate, we could instead use a priority queue, prioritized by error.
adaptive-sequence
could then return a sequence where every element processes a single refinement. Even without this upgrade, the priority queue idea would allow the estimate to converge quickly and be more accurate if we bailed out at some max number of iterations.
This article holds a related implementation.
#+end_{src}
Conclusion coming.
(ns quadrature.common
"Implements utilities shared by all integrators, for example:
- code to wrap a sequence of progressively better estimates in a common `integrator` interface
- data structures implementing various integration intervals."
#?(:cljs (:refer-clojure :exclude [infinite?]
:rename {infinite? core-infinite?}))
(:require [quadrature.util.stream :as us]
[taoensso.timbre :as log]))
This dynamic variable holds the default "roundoff cutoff" used by all integrators to decide when to return an estimate based on a single slice, vs attempting to converge a sequence of progressively finer estimates. When this condition is satisfied:
\[|b - a| / |a| + |b| <= ~cutoff~\]
An integrator will estimate a single slice directly. Else, it will attempt to converge a sequence.
(def ^:dynamic *roundoff-cutoff* 1e-14)
NOTE - we don't have an interface yet to bind this dynamic variable. bind it manually to modify the cutoff for a specific call to some integrator:
(binding [*roundoff-cutoff* 1e-6]
(integrate f a b))
Implementations of the various intervals used by the adaptive integral interface. By default, integration endpoints are considered open.
(def open [::open ::open])
(def closed [::closed ::closed])
(def open-closed [::open ::closed])
(def closed-open [::closed ::open])
(def infinities #{##Inf ##-Inf})
(def infinite?
#?(:cljs core-infinite?
:clj (comp boolean infinities)))
(defn closed?
"Returns true if the argument represents an explicit `closed` interval, false
otherwise."
[x] (= x closed))
(def open? (complement closed?))
These functions modify an interval by opening or closing either of its endpoints.
(defn close-l [[_ r]] [::closed r])
(defn close-r [[l _]] [l ::closed])
(defn open-l [[_ r]] [::open r])
(defn open-r [[l _]] [l ::open])
(defn flip [[l r]] [r l])
(defn interval
"Extracts the interval (or `open` as a default) from the supplied integration
options dict."
[opts]
(get opts :interval open))
(defn with-interval
"Sets the specified interval to a key inside the suppled `opts` map of arbitrary
integration options."
[opts interval]
(assoc opts :interval interval))
(defn update-interval
"Accepts:
- a dictionary of arbitrary options
- one of the 4 interval modification functions
and returns a dict of options with `f` applied to the contained interval (or
`open` if no interval is set).
"
[opts f]
(let [k :interval]
(assoc opts k (f (interval opts)))))
The following two functions define a shared interface that integration namespaces can use to create an "integrator" from:
The first function is called in the case that the integration range \((a, b)\) (open or closed) is too fine for subdivision. The second function takes over in all other (most!) cases.
(defn- narrow-slice?
"Returns true if the range $[a, b]$ is strip narrow enough to pass the following
test:
|b - a| / |a| + |b| <= `cutoff`
False otherwise. This inequality measures how close the two floating point
values are, scaled by the sum of their magnitudes."
[a b cutoff]
(let [sum (+ (Math/abs a)
(Math/abs b))]
(or (<= sum cutoff)
(<= (Math/abs (- b a))
(* cutoff sum)))))
(defn make-integrator-fn
"Generates an `integrator` function from two functions with the following
signatures and descriptions:
- `(area-fn f a b)` estimates the integral of `f` over the interval `(a, b)`
with no subdivision, nothing clever at all.
- `(seq-fn f a b opts)` returns a sequence of successively refined estimates
of the integral of `f` over `(a, b)`. `opts` can contain kv pairs that
configure the behavior of the sequence function (a sequence of the number of
integration slices to use, for example.)
The returned function has the signature:
`(f a b opts)`
All `opts` are passed on to `seq-fn`, /and/ to `us/seq-limit` internally,
where the options configure the checks on sequence convergence."
[area-fn seq-fn]
(fn call
([f a b] (call f a b {}))
([f a b {:keys [roundoff-cutoff]
:or {roundoff-cutoff *roundoff-cutoff*}
:as opts}]
(if (narrow-slice? a b roundoff-cutoff)
(do (log/info "Integrating narrow slice: " a b)
{:converged? true
:terms-checked 1
:result (area-fn f a b)})
(-> (seq-fn f a b opts)
(us/seq-limit opts))))))
(defn- name-with-attributes
"Taken from `clojure.tools.macro/name-with-attributes`.
Handles optional docstrings and attribute maps for a name to be defined in a
list of macro arguments. If the first macro argument is a string, it is added
as a docstring to name and removed from the macro argument list. If afterwards
the first macro argument is a map, its entries are added to the name's
metadata map and the map is removed from the macro argument list. The return
value is a vector containing the name with its extended metadata map and the
list of unprocessed macro arguments."
([name body] (name-with-attributes name body {}))
([name body meta]
(let [[docstring body] (if (string? (first body))
[(first body) (next body)]
[nil body])
[attr body] (if (map? (first body))
[(first body) (next body)]
[{} body])
attr (merge meta attr)
attr (if docstring
(assoc attr :doc docstring)
attr)
attr (if (meta name)
(conj (meta name) attr)
attr)]
[(with-meta name attr) body])))
(defmacro defintegrator
"Helper macro for defining integrators."
[sym & body]
(let [meta {:arglists (list 'quote '([f a b] [f a b opts]))}
[sym body] (name-with-attributes sym body meta)
{:keys [area-fn seq-fn]} (apply hash-map body)]
(assert seq-fn (str "defintegrator " sym ": seq-fn cannot be nil"))
(assert area-fn (str "defintegrator " sym ": area-fn cannot be nil"))
`(def ~sym
(make-integrator-fn ~area-fn ~seq-fn))))
(ns quadrature.util.stream
"This namespace contains various standard sequences, as well as utilities for
working with strict and lazy sequences."
(:require [clojure.pprint :as pp]
[sicmutils.generic :as g]
[sicmutils.value :as v]))
(defn pprint
"Realizes and pretty-prints `n` elements from the supplied sequence `xs`."
[n xs]
(doseq [x (take n xs)]
(pp/pprint x)))
(defn powers
"Returns an infinite sequence of `x * n^i`, starting with i == 0. `x` defaults
to 1."
([n] (powers n 1))
([n x] (iterate #(* n %) x)))
(defn zeno
"Returns an infinite sequence of x / n^i, starting with i == 0. `x` defaults to
1."
([n] (zeno n 1))
([n x] (iterate #(/ % n) x)))
(defn scan
"Returns a function that accepts a sequence `xs`, and performs a scan by:
- Aggregating each element of `xs` into
- the initial value `init`
- transforming each intermediate result by `present` (defaults to `identity`).
The returned sequence contains every intermediate result, after passing it
through `present` (not `init`, though).
This is what distinguishes a scan from a 'fold'; a fold would return the final
result. A fold isn't appropriate for aggregating infinite sequences, while a
scan works great.
Arities:
- the three-arity version takes `init` value, `f` fold function and `present`.
- the two arity version drops `init`, and instead calls `f` with no arguments.
The return value is the initial aggregator.
- The 1-arity version only takes the `merge` fn and defaults `present` to
`identity`.
"
[f & {:keys [present init]
:or {present identity}}]
(let [init (or init (f))]
(fn [xs]
(->> (reductions f init xs)
(map present)
(rest)))))
This convergence tester comes from Gerald Sussman's "Abstraction in Numerical Methods".
We're planning on adding a number of these here and consolidating all of the available ideas about relative and maximum tolerance so we can share (and combine) them across different stream functions.
(defn- close-enuf?
"relative closeness, transitioning to absolute closeness when we get
significantly smaller than 1."
[tolerance]
(fn [h1 h2]
(<= (g/abs (- h1 h2))
(* 0.5 tolerance (+ 2 (g/abs h1) (g/abs h2))))))
I have a dream that a function like seq-limit
could service most of the numerical methods in this library. Function minimization, root finding, definite integrals and numerical derivatives can all be expressed as successive approximations, with convergence tests (or other stopping conditions) checked and applied between iterations.
As of October 2020 we use this exclusively for various numerical integration routines. But it has more promise than this!
(defn seq-limit
"Accepts a sequence, iterates through it and returns a dictionary of this form:
{:converged? <boolean>
:terms-checked <int>
:result <sequence element>}
`:converged?` is true if the sequence reached convergence by passing the tests
described below, false otherwise.
`:terms-checked` will be equal to the number of items examined in the
sequence.
`:result` holds the final item examined in the sequence.
## Optional keyword args:
`:convergence-fn` user-supplied function of two successive elements in `xs`
that stops iteration and signals convergence if it returns true.
`:fail-fn` user-supplied function of two successive elements in `xs` that
stops iteration and signals NO convergence (failure!) if it returns true.
`:minterms` `seq-limit` won't return until at least this many terms from the
sequence have been processed.
`:maxterms` `seq-limit` will return (with `:converged? false`) after
processing this many elements without passing any other checks.
`:tolerance` A combination of relative and absolute tolerance. defaults to
`sqrt(machine epsilon)`."
([xs] (seq-limit xs {}))
([xs {:keys [minterms
maxterms
tolerance
convergence-fn
fail-fn]
:or {minterms 2
tolerance (Math/sqrt v/machine-epsilon)
convergence-fn (close-enuf? tolerance)}}]
(if (empty? xs)
{:converged? false
:terms-checked 0
:result nil}
(let [stop? (if maxterms
(fn [i] (>= i maxterms))
(constantly false))]
(loop [[x1 & [x2 :as more]] xs
terms-checked 1]
(if (empty? more)
{:converged? false
:terms-checked terms-checked
:result x1}
(let [terms-checked (inc terms-checked)
converged? (convergence-fn x1 x2)]
(if (and (>= terms-checked minterms)
(or converged?
(stop? terms-checked)))
{:converged? converged?
:terms-checked terms-checked
:result x2}
(recur more terms-checked)))))))))
(ns quadrature.util.aggregate
"Utilities for aggregating sequences.")
I learned about "Kahan's summation trick" from rational.scm
in the scmutils
package, where it shows up in the sigma
function.
(defn kahan-sum
"Implements a fold that tracks the summation of a sequence of floating point
numbers, using Kahan's trick for maintaining stability in the face of
accumulating floating point errors."
([] [0.0 0.0])
([[sum c] x]
(let [y (- x c)
t (+ sum y)]
[t (- (- t sum) y)])))
(defn sum
"Sums either:
- a series `xs` of numbers, or
- the result of mapping function `f` to `(range low high)`
Using Kahan's summation trick behind the scenes to keep floating point errors
under control."
([xs]
(first
(reduce kahan-sum [0.0 0.0] xs)))
([f low high]
(sum (map f (range low high)))))
(defn scanning-sum
"Returns every intermediate summation from summing either:
- a series `xs` of numbers, or
- the result of mapping function `f` to `(range low high)`
Using Kahan's summation trick behind the scenes to keep floating point errors
under control."
([xs]
(->> (reductions kahan-sum [0.0 0.0] xs)
(map first)
(rest)))
([f low high]
(scanning-sum
(map f (range low high)))))
]]>SICMUtils is the engine behind the wonderful "Structure and Interpretation of Classical Mechanics", an advanced physics textbook by Gerald Sussman of SICP fame. I'm
]]>SICMUtils is the engine behind the wonderful "Structure and Interpretation of Classical Mechanics", an advanced physics textbook by Gerald Sussman of SICP fame. I'm trying to get the textbook running in the browser as a Clojurescript library, and as part of that effort I've had to re-implement quite a bit of numerical code in Clojure.
Can I admit that I've gone deeper than necessary? After I'd gotten to know the library, I noted that the power series implementation could use some work. Jack Rusher pointed me at a gorgeous power series implementation by Doug McIlroy (my Uncle's boss at Bell Labs!). See his website for 10-lines version of beautiful Haskell.
Doug's paper, "Power Series, Power Serious", goes into more detail on the implementation, which leans hard on Haskell's lazy evaluation. Clojure has lazy sequences too! I implemented the paper here, then went further and made power series executable, and extended them to play with the whole generic arithmetic system in this namespace.
This post is a rendered, literate programming version of impl.cljc
, the core power series implementation based on Doug's work. It's exactly the same code we use in the library.
Enjoy!
The following code builds up a power series implementation that backs two sicmutils
types:
Series
, which represents a generic infinite series of arbitrary values, andPowerSeries
, a series that represents a power series in a single variable; in other words, a series where the nth entry is interpreted as the coefficient of \(x^n\):\begin{equation}
\label{eq:4}
[a\ b\ c\ d\ ...] = a + bx + cx^2 + dx^3 + ...
\end{equation}
We'll proceed by building up implementations of the arithmetic operations +
, -
, *
, /
and a few others using bare Clojure lazy sequences.
The implementation follows Doug McIlroy's beautiful paper, "Power Series, Power Serious". Doug also has a 10-line version in Haskell on his website.
Let's go!
A 'series' is an infinite sequence of numbers, represented by Clojure's lazy sequence. First, a function ->series
that takes some existing sequence, finite or infinite, and coerces it to an infinite seq by concatenating it with an infinite sequence of zeros. (We use v/zero-like
so that everything plays nicely with generic arithmetic.)
(defn ->series
"Form the infinite sequence starting with the supplied values. The
remainder of the series will be filled with the zero-value
corresponding to the first of the given values."
[xs]
(lazy-cat xs (repeat (v/zero-like (first xs)))))
(take 10 (->series [1 2 3 4]))
(1 2 3 4 0 0 0 0 0 0)
The core observation we'll use in the following definitions (courtesy of McIlroy) is that a power series \(F\) in a variable \(x\):
\begin{equation}
F(x)=f_{0}+x f_{1}+x^{2} f_{2}+\cdots
\end{equation}
Decomposes into a head element \(f_0\) plus a tail series, multiplied by \(x\):
\begin{equation}
\label{eq:3}
F(x) = F_0(x) = f_0 + x F_1(x)
\end{equation}
We'll use this observation to derive the more complicated sequence operations below.
To negate a series, negate each element:
(defn negate [xs]
(map g/negate xs))
Example:
(take 7 (negate (->series [1 2 3 4])))
(-1 -2 -3 -4 0 0 0)
We can derive series addition by expanding the series \(F\) and \(G\) into head and tail and rearranging terms:
\begin{equation}
\label{eq:5}
F+G=\left(f+x F_{1}\right)+\left(g+xG_{1}\right)=(f+g)+x\left(F_{1}+G_{1}\right)
\end{equation}
This is particularly straightforward in Clojure, where map
already merges sequences elementwise:
(defn seq:+ [f g]
(map g/+ f g))
(take 5 (seq:+ (range) (range)))
(0 2 4 6 8)
A constant is a series with its first element populated, all zeros otherwise. To add a constant to another series we only need add it to the first element. Here are two versions, constant-on-left vs constant-on-right:
(defn c+seq [c f]
(lazy-seq
(cons (g/+ c (first f)) (rest f))))
(defn seq+c [f c]
(lazy-seq
(cons (g/+ (first f) c) (rest f))))
(let [series (->series [1 2 3 4])]
[(take 6 (seq+c series 10))
(take 6 (c+seq 10 series))])
[(11 2 3 4 0 0) (11 2 3 4 0 0)]
Subtraction comes for free from the two definitions above:
(defn seq:- [f g]
(seq:+ f (negate g)))
(take 5 (seq:- (range) (range)))
(0 0 0 0 0)
We should get equivalent results from mapping g/-
over both sequences, and in almost all cases we do… but until we understand and fix this bug that method would return different results.
Subtract a constant from a sequence by subtracting it from the first element:
(defn seq-c [f c]
(lazy-seq
(cons (g/- (first f) c) (rest f))))
(take 5 (seq-c (range) 10))
(-10 1 2 3 4)
To subtract a sequence from a constant, subtract the first element as before, but negate the tail of the sequence:
(defn c-seq [c f]
(lazy-seq
(cons (g/- c (first f)) (negate (rest f)))))
(take 5 (c-seq 10 (range)))
(10 -1 -2 -3 -4)
What does it mean to multiply two infinite sequences? As McIlroy notes, multiplication is where the lazy-sequence-based approach really comes into its own.
First, the simple cases of multiplication by a scalar on either side of a sequence:
(defn seq*c [f c] (map #(g/mul % c) f))
(defn c*seq [c f] (map #(g/mul c %) f))
To multiply sequences, first recall from above that we can decompose each sequence \(F\) and \(G\) into a head and tail.
Mutiply the expanded representations out and rearrange terms:
\begin{equation}
\label{eq:6}
F \times G=\left(f+x F_{1}\right) \times\left(g+x G_{1}\right)=f g+x\left(f G_{1}+F_{1} \times G\right)
\end{equation}
\(G\) appears on the left and the right, so use an inner function that closes over \(g\) to simplify matters, and rewrite the above definition in Clojure:
(defn seq:* [f g]
(letfn [(step [f]
(lazy-seq
(let [f*g (g/mul (first f) (first g))
f*G1 (c*seq (first f) (rest g))
F1*G (step (rest f))]
(cons f*g (seq:+ f*G1 F1*G)))))]
(step f)))
This works just fine on two infinite sequences:
(take 10 (seq:* (range) (->series [4 3 2 1])))
(0 4 11 20 30 40 50 60 70 80)
NOTE This is also called the "Cauchy Product" of the two sequences. The description on the Wikipedia page has complicated index tracking that simply doesn't come in to play with the stream-based approach. Amazing!
The quotient \(Q\) of \(F\) and \(G\) should satisfy:
\begin{equation}
\label{eq:7}
F = Q \times G
\end{equation}
From McIlroy, first expand out \(F\), \(Q\) and one instance of \(G\):
\begin{equation}
\begin{aligned}
f+x F_{1} &=\left(q+x Q_{1}\right) \times G \cr
&=q G+x Q_{1} \times G=q\left(g+x G_{1}\right)+x Q_{1} \times G \cr
&=q g+x\left(q G_{1}+Q_{1} \times G\right)
\end{aligned}
\end{equation}
Look at just the constant terms and note that \(q = \frac{f}{g}\).
Consider the terms multiplied by \(x\) and solve for \(Q_1\):
\begin{equation}
\label{eq:8}
Q_1 = \frac{(F_1 - qG_1)}{G}
\end{equation}
There are two special cases to consider:
Encoded in Clojure:
(defn div [f g]
(lazy-seq
(let [f0 (first f) fs (rest f)
g0 (first g) gs (rest g)]
(cond (and (v/nullity? f0) (v/nullity? g0))
(div fs gs)
(v/nullity? f0)
(cons f0 (div fs g))
(v/nullity? g0)
(u/arithmetic-ex "ERROR: denominator has a zero constant term")
:else (let [q (g/div f0 g0)]
(cons q (-> (seq:- fs (c*seq q gs))
(div g))))))))
A simple example shows success:
(let [series (->series [0 0 0 4 3 2 1])]
(take 5 (div series series)))
(1 0 0 0 0)
We could generate the reciprocal of \(F\) by dividing \((1, 0, 0, ...)\) by \(F\). Page 21 of an earlier paper by McIlroy gives us a more direct formula.
We want \(R\) such that \(FR = 1\). Expand \(F\):
\begin{equation}
\label{eq:9}
(f + xF_1)R = 1
\end{equation}
Solve for R:
\begin{equation}
\label{eq:10}
R = \frac{1}{f} (1 - x(F_1 R))
\end{equation}
A recursive definition is no problem in the stream abstraction:
(defn invert [f]
(lazy-seq
(let [finv (g/invert (first f))
F1*Finv (seq:* (rest f) (invert f))
tail (c*seq finv (negate F1*Finv))]
(cons finv tail))))
This definition of invert
matches the more straightforward division implementation:
(let [series (iterate inc 3)]
(= (take 5 (invert series))
(take 5 (div (->series [1]) series))))
true
An example:
(let [series (iterate inc 3)]
[(take 5 (seq:* series (invert series)))
(take 5 (div series series))])
[(1N 0N 0N 0N 0N) (1 0 0 0 0)]
Division of a constant by a series comes easily from our previous multiplication definitions and invert
:
(defn c-div-seq [c f]
(c*seq c (invert f)))
It's not obvious that this works:
(let [nats (iterate inc 1)]
(take 6 (c-div-seq 4 nats)))
(4 -8 4 0 0 0)
But we can recover the initial series:
(let [nats (iterate inc 1)
divided (c-div-seq 4 nats)
seq-over-4 (invert divided)
original (seq*c seq-over-4 4)]
(take 5 original))
(1N 2N 3N 4N 5N)
To divide a series by a constant, divide each element of the series:
(defn seq-div-c [f c]
(map #(g// % c) f))
Division by a constant undoes multiplication by a constant:
(let [nats (iterate inc 1)]
(take 5 (seq-div-c (seq*c nats 2) 2)))
(1 2 3 4 5)
To compose two series \(F(x)\) and \(G(x)\) means to create a new series \(F(G(x))\). Derive this by substuting \(G\) for \(x\) in the expansion of \(F\):
\begin{equation}
\begin{aligned}
F(G)&=f+G \times F_{1}(G) \cr
&=f+\left(g+x G_{1}\right) \times F_{1}(G) \cr
&=\left(f+g F_{1}(G)\right)+x G_{1} \times F_{1}(G)
\end{aligned}
\end{equation}
For the stream-based calculation to work, we need to be able to calculate the head element and attach it to an infinite tail; unless \(g=0\) above the head element depends on \(F_1\), an infinite sequence.
If \(g=0\) the calculation simplifies:
\begin{equation}
\label{eq:12}
F(G)=f + x G_{1} \times F_{1}(G)
\end{equation}
In Clojure, using an inner function that captures \(G\):
(defn compose [f g]
(letfn [(step [f]
(lazy-seq
;; NOTE I don't understand why we get a StackOverflow if I move
;; this assertion out of the ~letfn~.
(assert (zero? (first g)))
(let [[f0 & fs] f
gs (rest g)
tail (seq:* gs (step fs))]
(cons f0 tail))))]
(step f)))
Composing \(x^2 = (0, 0, 1, 0, 0, ...)\) should square all $x$s, and give us a sequence of only even powers:
(take 10 (compose (repeat 1)
(->series [0 0 1])))
(1 0 1 0 1 0 1 0 1 0)
The functional inverse of a power series \(F\) is a series \(R\) that satisfies \(F(R(x)) = x\).
Following McIlroy, we expand \(F\) (substituting \(R\) for \(x\)) and one occurrence of \(R\):
\begin{equation}
\label{eq:13}
F(R(x))=f+R \times F_{1}(R)=f+\left(r+x R_{1}\right) \times F_{1}(R)=x
\end{equation}
Just like in the composition derivation, in the general case the head term depends on an infinite sequence. Set \(r=0\) to address this:
\begin{equation}
\label{eq:14}
f+x R_{1} \times F_{1}(R)=x
\end{equation}
For this to work, the constant \(f\) must be 0 as well, hence
\begin{equation}
\label{eq:15}
R_1 = \frac{1}{F_1(R)}
\end{equation}
This works as an implementation because \(r=0\). \(R_1\) is allowed to reference \(R\) thanks to the stream-based approach:
(defn revert [f]
{:pre [(zero? (first f))]}
(letfn [(step [f]
(lazy-seq
(let [F1 (rest f)
R (step f)]
(cons 0 (invert
(compose F1 R))))))]
(step f)))
An example, inverting a series starting with 0:
(let [f (cons 0 (iterate inc 1))]
(take 5 (compose f (revert f))))
(0 1 0 0 0)
Derivatives of power series are simple and mechanical:
\begin{equation}
\label{eq:16}
D(a x^n) = aD(x^n) = a n x^{n-1}
\end{equation}
Implies that all entries shift left by 1, and each new entry gets multiplied by its former index (ie, its new index plus 1).
(defn deriv [f]
(map g/* (rest f) (iterate inc 1)))
(take 6 (deriv (repeat 1)))
(1 2 3 4 5 6)
Which of course we interpret as
\begin{equation}
\label{eq:17}
1 + 2x + 3x^2 + ...
\end{equation}
The definite integral \(\int_0^{x}F(t)dt\) is similar. To take the anti-derivative of each term, move it to the right by appending a constant term onto the sequence and divide each element by its new position:
(defn integral
([s] (integral s 0))
([s constant-term]
(cons constant-term
(map g/div s (iterate inc 1)))))
With a custom constant term:
(take 6 (integral (iterate inc 1) 5))
(5 1 1 1 1 1)
By default, the constant term is 0:
(take 6 (integral (iterate inc 1)))
(0 1 1 1 1 1)
Exponentiation of a power series by some integer is simply repeated multiplication. The implementation here is more efficient the iterating seq:*
, and handles negative exponent terms by inverting the original sequence.
(defn expt [s e]
(letfn [(expt [base pow]
(loop [n pow
y (->series [1])
z base]
(let [t (even? n)
n (quot n 2)]
(cond
t (recur n y (seq:* z z))
(zero? n) (seq:* z y)
:else (recur n (seq:* z y) (seq:* z z))))))]
(cond (pos? e) (expt s e)
(zero? e) (->series [1])
:else (invert (expt s (g/negate e))))))
We can use expt
to verify that \((1+x)^3\) expands to \(1 + 3x + 3x^2 + x^3\):
(take 5 (expt (->series [1 1]) 3))
(1 3 3 1 0)
The square root of a series \(F\) is a series \(Q\) such that \(Q^2 = F\). We can find this using our calculus methods from above:
\begin{equation}
\label{eq:18}
D(F) = 2Q D(Q)
\end{equation}
or
\begin{equation}
\label{eq:19}
D(Q) = \frac{D(F)}{2Q}
\end{equation}
When the head term of \(F\) is nonzero, ie, \(f \neq 0\), the head of \(Q = \sqrt{F}\) must be \(\sqrt{f}\) for the multiplication to work out.
Integrate both sides:
\begin{equation}
\label{eq:20}
Q = \sqrt{f} + \int_0^x \frac{D(F)}{2Q}
\end{equation}
One optimization appears if the first two terms of \(F\) vanish, ie, \(F=x^2F_2\). In this case \(Q = 0 + x \sqrt{F_2}\).
Here it is in Clojure:
(defn sqrt [[f1 & [f2 & fs] :as f]]
(if (and (v/nullity? f1)
(v/nullity? f2))
(cons f1 (sqrt fs))
(let [const (g/sqrt f1)
step (fn step [g]
(lazy-seq
(-> (div (deriv g)
(c*seq 2 (step g)))
(integral const))))]
(step f))))
And a test that we can recover the naturals:
(let [xs (iterate inc 1)]
(take 6 (seq:* (sqrt xs)
(sqrt xs))))
(1 2 3 4 5 6)
We can maintain precision of the first element is the square of a rational number:
(let [xs (iterate inc 9)]
(take 6 (seq:* (sqrt xs)
(sqrt xs))))
(9 10N 11N 12N 13N 14N)
We get a correct result if the sequence starts with \(0, 0\):
(let [xs (concat [0 0] (iterate inc 9))]
(take 6 (seq:* (sqrt xs)
(sqrt xs))))
(0 0 9 10N 11N 12N)
Power series computations can encode polynomial computations. Encoding \((1-2x^2)^3\) as a power series returns the correct result:
(take 10 (expt (->series [1 0 -2]) 3))
(1 0 -6 0 12 0 -8 0 0 0)
Encoding \(\frac{1}{(1-x)}\) returns the power series $1 + x + x^{2} + …$ which sums to that value in its region of convergence:
(take 10 (div (->series [1])
(->series [1 -1])))
(1 1 1 1 1 1 1 1 1 1)
\(\frac{1}{(1-x)^2}\) is the derivative of the above series:
(take 10 (div (->series [1])
(-> (->series [1 -1])
(expt 2))))
(1 2 3 4 5 6 7 8 9 10)
With the above primitives we can define a number of series with somewhat astonishing brevity.
\(e^x\) is its own derivative, so \(e^x = 1 + e^x\):
(def expx
(lazy-seq
(integral expx 1)))
This bare definition is enough to generate the power series for \(e^x\):
(take 10 expx)
(1 1 1/2 1/6 1/24 1/120 1/720 1/5040 1/40320 1/362880)
\(sin\) and \(cos\) afford recursive definitions. \(D(sin) = cos\) and \(D(cos) = -sin\), so (with appropriate constant terms added) on:
(declare cosx)
(def sinx (lazy-seq (integral cosx)))
(def cosx (lazy-seq (c-seq 1 (integral sinx))))
(take 10 sinx)
(0 1 0 -1/6 0 1/120 0 -1/5040 0 1/362880)
(take 10 cosx)
(1 0 -1/2 0 1/24 0 -1/720 0 1/40320 0)
tangent and secant come easily from these:
(def tanx (div sinx cosx))
(def secx (invert cosx))
Reversion lets us define arcsine from sine:
(def asinx (revert sinx))
(def atanx (integral (cycle [1 0 -1 0])))
These two are less elegant, perhaps:
(def acosx (c-seq (g/div 'pi 2) asinx))
(def acotx (c-seq (g/div 'pi 2) atanx))
The hyperbolic trig functions are defined in a similar way:
(declare sinhx)
(def coshx (lazy-seq (integral sinhx 1)))
(def sinhx (lazy-seq (integral coshx)))
(def tanhx (div sinhx coshx))
(def asinhx (revert sinhx))
(def atanhx (revert tanhx))
(def log1-x
(integral (repeat -1)))
;; https://en.wikipedia.org/wiki/Mercator_series
(def log1+x
(integral (cycle [1 -1])))
These are a few more examples from McIlroy's "Power Serious" paper, presented here without context.
(def catalan
(lazy-cat [1] (seq:* catalan catalan)))
(take 10 catalan)
(1 1 2 5 14 42 132 429 1430 4862)
ordered trees…
(declare tree' forest' list')
(def tree' (lazy-cat [0] forest'))
(def list' (lazy-cat [1] list'))
(def forest' (compose list' tree'))
(take 10 tree')
(0 1 1 2 5 14 42 132 429 1430)
The catalan numbers again!
(def fib (lazy-cat [0 1] (map + fib (rest fib))))
See here for the recurrence relation: https://en.wikipedia.org/wiki/Binomial_coefficient#Multiplicative_formula
(defn binomial* [n]
(letfn [(f [acc prev n k]
(if (zero? n)
acc
(let [next (/ (* prev n) k)
acc' (conj! acc next)]
(recur acc' next (dec n) (inc k)))))]
(persistent!
(f (transient [1]) 1 n 1))))
(defn binomial
"The coefficients of (1+x)^n"
[n]
(->series (binomial* n)))
]]>CLJSJS publishes many Javascript packages in a form that you can consume from a Clojure
]]>CLJSJS publishes many Javascript packages in a form that you can consume from a Clojure project. For projects like React, you'll find the latest versions of the JS libraries, packaged up and ready to go. For less active libraries like bignumber.js you might have to go bump a version and open up a pull request against CLJSJS's packages
repository... or maybe package and add the library from scratch, as I did recently with Complex.js.
I have to say that this was a complete pain. I'll eventually write a guide describing what I had to do to upgrade or add six JS libraries, since this was not easy and I know it's blocking other people.
For now, I want to describe the steps I had to take to publish a namespaces version of cljsjs/complex
to Clojars, so that I could start using it immediately without waiting for the maybe-weeks-long turnaround time of a new CLJSJS release.
Clojars.org is the place of record for all Clojure dependencies. It's the easiest way I know of to get a library published for consumption in other projects.
The goal here is to publish a CLJSJS library that you'll eventually depend on like this:
[cljsjs/complex "2.0.11-0"]
With a prefix, so that you can use it today, like this:
[org.clojars.sritchie09/complex "2.0.11-0"]
(The prefixing is important because you can't publish with the cljsjs
prefix unless you're authorized by the CLJSJS maintainers.)
You'll publish the new dependency to Clojars. Head to https://clojars.org and make an account.
Click on "Deploy Tokens" page at the top right:
Follow the instructions and make a token (the name doesn't matter). It will appear on the screen looking something like CLOJARS_1313123123
. Save it (and your Clojars username) by adding two entries like this to your ~/.bashrc
or ~/.bash_profile
:
export CLOJARS_USER=sritchie09
export CLOJARS_PASS=CLOJARS_1313123123
source ~/.bashrc
to pick up the new environment variables.The CLJSJS packages live in this repository. Get it onto your machine by running this command in some directory:
git clone git@github.com:cljsjs/packages.git && cd packages
Now find the dependency you want to release. If you're doing this, you're probably living on a git branch, like I was for the Complex.js pull request. Check out the branch:
git checkout sritchie/complex_dep2
Each JS library has its own folder with a build.boot
file in it. Open up, for example, complex/build.boot
and locate the entry that looks like this:
(task-options!
push {:ensure-clean false}
pom {:project 'cljsjs/complex
:version +version+
:description "A well tested JavaScript library to work with complex number arithmetic in JavaScript."
:url "https://github.com/infusion/Complex.js"
:license {"MIT" "http://opensource.org/licenses/MIT"}
:scm {:url "https://github.com/cljsjs/packages"}})
Change the value referenced by :project
from 'cljsjs/complex
to 'org.clojars.YOUR_CLOJARS_USERNAME/complex
. This was org.clojars.sritchie09/complex
, in my case.
We're so close.
The latest version of Boot won't let you deploy to Clojars (dependency problem!). You can force yourself back to a working version by creating a file called boot.properties
in the packages
directory with these contents:
BOOT_VERSION=2.8.2
Finally, run this command in the subfolder containing the dependency you want to release. complex
, in this case:
boot package push --ensure-release --repo clojars \
--repo-map "{:url \"https://clojars.org/repo/\" :username \"$CLOJARS_USER\" :password \"$CLOJARS_PASS\"}"
If you've set the environment variables correctly, you should see output ending with this:
Writing pom.xml and pom.properties...
Writing complex-2.0.11-0.jar...
Checksums match
Deploying complex-2.0.11-0.jar...
Sending complex-2.0.11-0.jar to https://clojars.org/repo/ (13k)
Sending complex-2.0.11-0.pom to https://clojars.org/repo/ (1k)
Sending maven-metadata.xml to https://clojars.org/repo/ (1k)
That's it! Your dependency will now be live at a URL like https://clojars.org/org.clojars.sritchie09/complex, and you'll be able to include it as a dependency using any of the forms that Clojars displays on that page. For example:
[org.clojars.sritchie09/complex "2.0.11-0"]
Hopefully this has saved you some trouble. Enjoy!
If you see output like this:
clojure.lang.ExceptionInfo: Failed to deploy artifacts: Could not transfer artifact org.clojars.sritchie09:complex:jar:2.0.11-0 from/to clojars (https://clojars.org/repo/): Failed to transfer file: https://clojars.org/repo/org/clojars/sritchie09/complex/2.0.11-0/complex-2.0.11-0.jar. Return code is: 401, ReasonPhrase: Unauthorized.
Make sure that you've properly set the environment variables $CLOJARS_USER
and $CLOJARS_PASS
.
(This is a writeup of Exercise 1.3 from Sussman and Wisdom's "Structure and Interpretation of Classical Mechanics". See the solutions repository for more.)
The problem explores some consequences for optics of the principle of least time. The
]]>(This is a writeup of Exercise 1.3 from Sussman and Wisdom's "Structure and Interpretation of Classical Mechanics". See the solutions repository for more.)
The problem explores some consequences for optics of the principle of least time. The exercise states:
Fermat observed that the laws of reflection and refraction could be accounted for by the following facts: Light travels in a straight line in any particular medium with a velocity that depends upon the medium. The path taken by a ray from a source to a destination through any sequence of media is a path of least total time, compared to neighboring paths. Show that these facts imply the laws of reflection and refraction.
The law of reflection is described in the footnote:
For reflection the angle of incidence is equal to the angle of reflection.
Here's the setup. The horizontal line is a mirror. The law states that \(\theta_1 = \theta_2\).
We have to show that if we consider all possible paths from a given starting point to a given endpoint, the path of minimum time will give us the law of reflection.
The actual path of minimum time is the straight line that avoids the mirror, of course. If we force the light to bounce off of the mirror, then we have to figure out where it will hit, where \(x_p\) is, to minimize the time between the start and end points.
There are two ways to solve this problem. We can use geometry and visual intuition, or we can use calculus.
First, recall this fact from the problem text:
Light travels in a straight line in any particular medium with a velocity that depends upon the medium.
There's no medium change, so if there were no mirror in its path, the light beam would continue in a straight line. Instead of figuring out what the beam will do when it hits the mirror, reflect the endpoint across the mirror and draw a straight line between the start and "end" points:
The angle that the beam makes with the plane of the mirror is the same on both sides of the mirror.
Now reflect the the "end" point and the segment of the beam that's crossed the mirror back up. By symmetry, \(\theta_1 = \theta_2\), and we've proved the law of reflection.
We can also solve this with calculus. Because the beam doesn't change media, its speed \(v\) stays constant, so minimizing the total distance \(d\) is equivalent to minimizing the time \(t = {d \over v}\).
Set \(x_1 = 0\) for convenience, and write the total distance the light travels as a function of \(x_p\):
\begin{equation}
d(x_p) = \sqrt{y_1^2 + x_p^2} + \sqrt{(x_2 - x_p)^2 + y_2^2}
\end{equation}
For practice, we can also define this function in Scheme.
(define ((total-distance x1 y1 x2 y2) xp)
(+ (sqrt (+ (square (+ x1 xp))
(square y1)))
(sqrt (+ (square (- x2 (+ x1 xp)))
(square y2)))))
Here's the function again, generated from code, with general \(t_1\):
(->tex-equation
((total-distance 'x_1 'y_1 'x_2 'y_2) 'x_p))
\begin{equation}
\sqrt{{{x}_{1}}^{2} + 2 {x}_{1} {x}_{p} + {{x}_{p}}^{2} + {{y}_{1}}^{2}} + \sqrt{{{x}_{1}}^{2} - 2 {x}_{1} {x}_{2} + 2 {x}_{1} {x}_{p} + {{x}_{2}}^{2} - 2 {x}_{2} {x}_{p} + {{x}_{p}}^{2} + {{y}_{2}}^{2}}
\end{equation}
To find the \(x_p\) that minimizes the total distance,
The derivative will look cleaner in code if we keep the components of the sum separate and prevent Scheme from "simplifying". Redefine the function to return a tuple:
(define ((total-distance* x1 y1 x2 y2) xp)
(up (sqrt (+ (square (+ x1 xp))
(square y1)))
(sqrt (+ (square (- x2 (+ x1 xp)))
(square y2)))))
Here are the sum components:
(->tex-equation
((total-distance* 0 'y_1 'x_2 'y_2) 'x_p))
\begin{equation}
\begin{pmatrix} \displaystyle{ \sqrt{{{x}_{p}}^{2} + {{y}_{1}}^{2}}} \cr \cr \displaystyle{ \sqrt{{{x}_{2}}^{2} - 2 {x}_{2} {x}_{p} + {{x}_{p}}^{2} + {{y}_{2}}^{2}}}\end{pmatrix}
\end{equation}
Taking a derivative is easy with scmutils
. Just wrap the function in D
:
(let* ((distance-fn (total-distance* 0 'y_1 'x_2 'y_2))
(derivative (D distance-fn)))
(->tex-equation
(derivative 'x_p)))
\begin{equation}
\begin{pmatrix} \displaystyle{ {{{x}_{p}}\over {\sqrt{{{x}_{p}}^{2} + {{y}_{1}}^{2}}}}} \cr \cr \displaystyle{ {{ - {x}_{2} + {x}_{p}}\over {\sqrt{{{x}_{2}}^{2} - 2 {x}_{2} {x}_{p} + {{x}_{p}}^{2} + {{y}_{2}}^{2}}}}}\end{pmatrix}
\end{equation}
The first component is the base of base \(x_p\) of the left triangle over the total length. This ratio is equal to \(\cos \theta_1\):
The bottom component is \(-\cos \theta_2\), or \({- (x_2 - x_p)}\) over the length of the right segment. Add these terms together, set them equal to 0 and rearrange:
\begin{equation}
\label{eq:reflect-laws}
\cos \theta_1 = \cos \theta_2 \implies \theta_1 = \theta_2
\end{equation}
This description in terms of the two incident angles isn't so obvious from the Scheme code. Still, you can use Scheme to check this result.
If the two angles are equal, then the left and right triangles are similar, and the ratio of each base to height is equal:
\begin{equation}
\label{eq:reflect-ratio}
{x_p \over y_1} = {{x_2 - x_p} \over y_2}
\end{equation}
Solve for \(x_p\) and rearrange:
\begin{equation}
\label{eq:reflect-ratio2}
x_p = {{y_1 x_2} \over {y_1 + y_2}}
\end{equation}
Plug this in to the derivative of the original total-distance
function, and we find that the derivative equals 0, as expected:
(let* ((distance-fn (total-distance 0 'y_1 'x_2 'y_2))
(derivative (D distance-fn)))
(->tex-equation
(derivative (/ (* 'y_1 'x_2) (+ 'y_1 'y_2)))))
\begin{equation}
0
\end{equation}
If a beam of light travels in a way that minimizes total distance (and therefore time in a constant medium), then it will reflect off of a mirror with the same angle at which it arrived. The law of reflection holds.
The law of refraction is also called Snell's law. Here's the description from the footnote:
Refraction is described by Snell's law: when light passes from one medium to another, the ratio of the sines of the angles made to the normal to the interface is the inverse of the ratio of the refractive indices of the media. The refractive index is the ratio of the speed of light in the vacuum to the speed of light in the medium.
First we'll tackle this with calculus.
The setup here is slightly different. We have a light beam traveling from one medium to another and changing speeds at a boundary located \(a\) to the right of the starting point. The goal is to figure out the point where the light will hit the boundary, if we assume that the light will take the path of least time.
The refractive index \(n_i = {c \over v_i}\), the speed of light \(c\) in a vacuum over the speed in the material. Rearranging, \(v_i = {c \over n_i}\).
Time is distance over speed, so the total time that the beam spends between the start and end points as a function of \(y_p\), the point of contact with the boundary, is:
\begin{equation}
\begin{split}
t(y_p) & = {c \sqrt{a^2 + y_p^2}\over v_1} + {c \sqrt{(x_2 - x_p)^2 + y_2^2} \over v_2} \
& = {n_1 \over c} \sqrt{a^2 + y_p^2} + {n_2 \over c} \sqrt{(x_2 - x_p)^2 + y_2^2}
\end{split}
\end{equation}
Take the derivative:
\begin{equation}
Dt(y_p) = {1 \over c} \left({n_1 y_p \over \sqrt{a^2 + y_p^2}} - {n_2 (x_2 - x_p) \over \sqrt{(x_2 - x_p)^2 + y_2^2}}\right)
\end{equation}
Set the derivative equal to 0 and split terms:
\begin{equation}
\label{eq:almost-snell}
{n_1 y_p \over \sqrt{a^2 + y_p^2}} = {n_2 (x_2 - x_p) \over \sqrt{(x_2 - x_p)^2 + y_2^2}}
\end{equation}
Similar to the law of reflection's result, each term (up to its \(n_i\) multiple) is equal to the height of the left or right triangle over the length of the beam's path on the left or right of the boundary.
Equation \eqref{eq:almost-snell} simplifies to:
\begin{equation}
n_1 \sin \theta_1 = n_2 \sin \theta_2
\end{equation}
Rearranging yields Snell's law:
\begin{equation}
{n_1 \over n_2} = {\sin \theta_2 \over \sin \theta_1}
\end{equation}
I won't recreate this here, but the Feynman Lectures on Physics, in Lecture 26, has a fantastic discussion about, and derivation of, the law of refraction using no calculus, just geometry. I highly recommend you check out that lecture. Feynman lays out a number of examples of how the principle of least time is not just a restatement of the optical rules we already knew.
You can use the idea to guess what shape of mirror you'd want to build to focus many light rays on a single point (a parabola), or how you might force all light rays coming out of a single point to meet up again at another point (build a converging lens).
This whole area of optics and least time has obsessed scientists for hundreds of years. Spend a few minutes poking around and see what you find.
]]>I've been reading the lovely Visual Complex Analysis by Tristan Needham, and the visual-style proofs he's been throwing down have been wonderful and refreshing. I'll write more about this book and its goals later, but I was inspired this AM to write up a proof of the half angle identities
]]>I've been reading the lovely Visual Complex Analysis by Tristan Needham, and the visual-style proofs he's been throwing down have been wonderful and refreshing. I'll write more about this book and its goals later, but I was inspired this AM to write up a proof of the half angle identities from trigonometry using some of the tools from the book.
Here's the half angle identity for cosine:
\begin{equation}
\label{eq:half-angle}
\cos {\theta \over 2} = \sqrt{{\cos \theta + 1} \over 2}
\end{equation}
This is an equation that lets you express the cosine for half of some angle \(\theta\) in terms of the cosine of the angle itself. As you can imagine, there are double-angle, triple angle, all sorts of identities that you can sweat out next time you find yourself in a 9th grade classroom.
It turns out that these derivations all become much more fun with Euler's Formula:
\begin{equation}
\label{eq:euler}
e^{i\theta} = \cos\theta + i \sin\theta
\end{equation}
Let's say you want to figure out the half-angle identity. You have some function of cosine of half of an angle, and you want to pull the \(1 \over 2\) out of the cosine.
(This came up for me a couple of days ago in a problem Dave and I ran into working on Hephaestus, our quantum mechanics simulator in Clojure. More on that later.)
If you fiddle with the \(e^{i \theta}\) representation of complex numbers, a very clear relationship appears:
\begin{equation}
\label{eq:expansion}
e^{i\theta} = e^{2 i{\theta \over 2}} = (e^{i{\theta \over 2}})^2
\end{equation}
Next, use Euler's equation \eqref{eq:euler} and observe that we can rewrite the right side of equation \eqref{eq:expansion} as \((e^{i {\theta \over 2}})^2 = (\cos{\theta \over 2} + i \sin{\theta \over 2})^2\). Expand this out:
\begin{equation}
\label{eq:newnum}
\cos^2 {\theta \over 2} - \sin^2 {\theta \over 2} + i (2 \sin {\theta \over 2} \cos{\theta \over 2})
\end{equation}
Remember, from equation \eqref{eq:expansion}, that this is a new complex number that equals \(e^{i\theta} = \cos\theta + i \sin\theta\). The real parts of both sides are equal:
\begin{equation}
\cos \theta = \cos^2 {\theta \over 2} - \sin^2 {\theta \over 2}
\label{eq:real}
\end{equation}
and the imaginary parts are equal:
\begin{equation}
\label{eq:imag}
\sin \theta = 2 \sin {\theta \over 2} \cos{\theta \over 2}
\end{equation}
Interesting.
Next, use the \(1 - \sin^2 \theta = \cos^2 \theta\) identity on the right side of \eqref{eq:real} and simplify:
\begin{equation}
\label{eq:real-simple}
\cos \theta = 2 \cos^2{\theta \over 2} - 1
\end{equation}
Rearrange and take the square root to get (gasp!) the half-angle identity:
\begin{equation}
\label{eq:cos-half}
\cos {\theta \over 2} = \sqrt{{\cos \theta + 1} \over 2}
\end{equation}
The familiar half angle identity is a nice consequence of equation \eqref{eq:real}.
We still have equation \eqref{eq:imag}. Could that lead us to the half-angle identity for sine?
Here's the imaginary component again:
\begin{equation}
\label{eq:imag2}
\sin \theta = 2 \sin {\theta \over 2} \cos{\theta \over 2}
\end{equation}
Substitute what we just derived for the cosine half-angle:
\begin{equation}
\label{eq:5}
\sin {\theta} = 2 \sin {\theta \over 2} \sqrt{{\cos \theta + 1} \over 2}
\end{equation}
Square each side and cancel:
\begin{equation}
\label{eq:6}
\sin^2 {\theta} = 2 \sin^2 {\theta \over 2} (\cos \theta + 1)
\end{equation}
Use our identity from before on the left side:
\begin{equation}
\label{eq:10}
\sin^2 \theta = 1 - \cos^2 \theta = (1 + \cos \theta)(1 - \cos \theta)
\end{equation}
Divide through, and take the square root:
\begin{equation}
\label{eq:sin-half}
\sin {\theta \over 2} = \sqrt{{1 + \cos \theta} \over 2}
\end{equation}
And, boom, there it is! the half-angle identity for sine.
If you look back at equation \eqref{eq:expansion} you'll see that this trick would work for any factor inside the exponent, just just the \(1 \over 2\). A similar trick can help you figure out any of sum and difference identities, using observations like this:
\begin{equation}
\label{eq:sumdiff}
e^{i(\theta + \phi)} = e^{i\theta}e^{i\phi} = (\cos \theta + i \sin \theta)(\cos \phi + i \sin \phi)
\end{equation}
The derivation above was much easier for me to understand and push through than the usual geometric derivations I've seen. And, eerily, in going after one of the half angle identities, the other one came along for the ride.
As Tristan Needham says, "every complex equation says two things at once." (p15, Visual Complex Analysis).
Check out the book, leave it by the bedside, and bend your mind a little by digging into the mysteries of complex numbers.
]]>I feel like I'm never going to publish this damned thing... so here it is, in bullet point form. Maybe I'll tidy it up some day.
Niwot's Challenge, a very fun race out in South Platte. Jason invited me to this damned thing last week...
Text from Jason: "Free Sat?
]]>I feel like I'm never going to publish this damned thing... so here it is, in bullet point form. Maybe I'll tidy it up some day.
Niwot's Challenge, a very fun race out in South Platte. Jason invited me to this damned thing last week...
Text from Jason: "Free Sat?"
All I could find was this one race report from Erik Sanders: https://gearjunkie.com/colorado-ultra-mountain-race-niwots-challenge
Prep for the Infinity Loop later this summer. Race details, what does it entail... I'm totally detached from how hard it might be. "It's only 45 miles", says Jenna.
On Wednesday (I think) we hung out to talk about what gear we'd need and how to prepare. We got as far as discussing gaiters and tights, then ran out of items. We both kind of know how to do these things, right? I'd never been in a proper adventure race where you use only map and compass, so I was relying on Jason for that. Jenna was adamant that I make Jason understand how ill prepared I was at navigating.
"I'm totally dialed. Just a note - I'm totally dialed at finding out where I am in an emergency... I've never had to navigate while MOVING, but how hard could it be?"
Okay, prep, Jenna drops me off at Jason's house on Friday, we head on up. Jason has gatorade, a satchel of little debbie pies, SPKs, amazing food... I've got a bunch of BS I bought at Lucky's, empanadas, bladder, etc, stuff that's worked in running races in the past. We head on up.
We hit some mexican food, had a beer each, then left. On the way we do a dramatic reading of the course guide. You'll have to enter to see it, but it's full of curious details that we were using to try and get some insight into "Sherpa" John's mind. There was an obsessive focus on his favorite pizza, but no details. Everything we read pointed to an unhinged mind. Not a good sign for our health out on the course.
We arrive at the El Dorado hotel, a dead old building with a big flagpole and a spraypainted skull and cross bones with a circle-slash "NO" sign over it, above the word "BATS!". No death to bats? What does this mean? Beautiful spot by the river, with beautiful weather.
We get there, build the table, get sorted and go off to join the crew. Miguel is cooking cheese-filled fried doughnuts.
We get to bed. At this point neither of us had read the description of the course carefully, let alone "studied". We read the course description, noting that it just did not sound that bad. If you read something dramatically, have you learned it?
Get up, super cold, get dressed. "Watch out in the ladies room... there's a black widow posted up. I'm gonna smash it," yelled a racer.
"Ceremony" where we're allocated to the starting groups, then allowed to trade. I trade into J's group. Final prep, it's light... the peace pipe's lit and we head out.
There are only two of us heading in our direction. What the heck? Bill and (woman's name). Bill's clearly skeptical of our tights, Capris in Jason's case, and our obvious lack of knowledge of the terrain. He's done the race before and disapproved of our fast speed. We left Bill behind pretty quickly, feeling great about how fast and fit we were.
"I don't want to speak too soon, buttttt I think we might be crushing this thing!"
Theme of the day - mapping things out on foot vs trusting the map. We tried to lean heavily on our stoke and athleticism, and put in a ton more effort than we probably needed to.
Next book went well... we were getting nervous and settled into checking ALL drainages for cairns, not relaxing. Head home, no big deal.
refueled, took in a massive amount of gatorade, downed an ensure, probably 500 calories. Total shock to the system, nausea rising.
So no headlamps :)
NO ONE had come back yet from Suffer, even with our two hour fuckup. This is with 6.5 hours elapsed. Bill came in two hours ahead of us then quit - it turned out he had never planned on going more than one lap. Maybe he was even smarter than we'd given him credit for...
In our minds this was a huge opportunity to make it back by the 13 hour cutoff. 6.5 more hours to do a shorter lap? Easy, especially if we nail the navigation!
Trudge out, actually crush the first book.
We met one guy coming down who was fully shell shocked - he'd been wandering around for three hours trying to find the descent that he was now on. He told us that he'd filled up his waterbottles six times, the heat was so bad.
STICK TO THE HEADINGS - 70, 90. Do not forget! More nailing it.
We get to a 5th class wall. Fuck, that is definitely not safe to ascend, and there's no way that's the course. What can we do? We think we know where we are... we decide to go down the drainage to the river. The idea is that we'll just traverse the shore. We get there, so rough, getting scraped to pieces and... sure enough, full deep water solo cliffs.
Okay, we'll swim. Oh shit, it's a half mile.
We scramble up and out and scramble-skirt the contour to the right inlet. Takes forever.
We get the next book, evaluate our time and we're totally hosed by now. It's going to get dark, I have no headlamps, remember.
We reverse. Things are going better. We get water.
I snap a pole. Boom. That's great! Now I have a free hand to hold the iPhone that I'm using as a light. Following J, crashing through walls of scrub. With the headlamp they look like impenetrable walls. A mile goes forever.
We make it to the first intermediate checkpoint we wanted, then the second... we might be nailing this!
7 people left on the chief's loop and none survived.
Some of the drop reasons:
Oh well. We'll have to go back next year and try this thing out, get it done for real.
(UPDATE - we did, and we finished! Read the 2019 Race Report here.)
Jason's Story: https://www.instagram.com/stories/highlights/17929181812103193/
]]>I ran this race in 2019 for the
]]>I ran this race in 2019 for the second time, finishing in 27 hours and 17 minutes with Matt Fackrell and Jason Antin, my long-time collaborator on such beauties as "The Rainier Infinity Loop!" and the quite absurd "Tahoe 200, 2018 edition". We ran a good portion of the race with Todd Salzer; Todd guided us through the first two loops, but unfortunately didn't finish the 2019 edition. I know he'll be back in 2020, this year.
In lieu of a race report, please accept this photo and video tour of our glory.
We did the usual things. Nothing to note here. No black widows at the start.
First book!
We collected the next two books without any notable events.
We nailed Suffer with almost no scrub oak... except for the big climb up to "Jason's Knob", as Sherpa John likes to mutter late at night, mouth full of the fried-chicken nightguard he pops in each evening before bed.
Here's Todd climbing the hill. He commented around here that Niwot's was a "young man's game"... that was before chowing caffeine gum and snatching the map from Jason's hands, late in the night.
Look at this beast, so casual, wearing climbing pants on a run.
Up a steep hill, dominating, sending the gnar. Why am I making this face?
We met up with Matt on the climb up to the Chief's loop. Here we are, trying to find the first book, which eluded us for over an hour. We almost left before doubling back and finally finding the thing.
We snagged that book, the next, then bombed down a hill with a good bit of snow to a great refill at a stream in the cold. It took forever; I had forgotten my BeFree bottle, and Jason's was completely clogged from months of silt-sifting.
One mile down the road, hit the bridge crossing the stream and start climbing up. If we'd had a green laser pointer we would have been able to see the powerlines above.
Here's the tower, with no book!
A huddle, trying to find out what the hell happened:
I don't want to remember the madness of Book 14, where we wandered for hours. Or the Wall, climbing carefully up a 2k+ foot extremely steep hill full of dinner plate rocks. Thank god no one kicked anything down.
Crawling through scrub oak. Jason, sobbing like a bitch. Or was that me? Or the ghost of Sherpa John? It'll be forever unclear.
Far later, the next morning, Matt and I sit looking at the map, trying to figure out the final long bomb down the hill.
Here we are before the descent down the powerlines. It was so steep, so fun.
After a long, hot slog in the sun, we were done! We were Chiefs! I was dubbed "Chief Many Coves", after the events of 2018.
That's it for now. A 600 word race report, 14,500 words shorter than my Tahoe 200.
]]>Check it out below, or click here for a direct image link to stare at a big
]]>Check it out below, or click here for a direct image link to stare at a big version in the browser.
]]>I had a hard time with this book; it showed up in the TED book club mailing months ago, and I went into my reading primed to love the message, which is concise and persuasive... but maybe so persuasive on its own that the book feels like 300 pages of
]]>I had a hard time with this book; it showed up in the TED book club mailing months ago, and I went into my reading primed to love the message, which is concise and persuasive... but maybe so persuasive on its own that the book feels like 300 pages of filler.
David Brooks's "The Second Mountain" is about the sense he's developed, later in his life, that our lives play out on a landscape populated by peaks whose summits, we're told, offer peace, happiness, alleviation from the anxiety of modern life. If you get to the top, earn the PhD, become CEO, you can stop.
But then you reach the peak, and you realize that the gnawing discomfort you have with your own life was there because, just maybe, your whole viewpoint of the world as a peak-covered landscape where gladiators fight for zero-sum... what, Strava CRs? is a broken model.
David Brooks has looked at his own life and the lives of other folks he's found who have achieved a deep sense of fulfillment and found that the pattern is the same. These people reached the top, or gave up on the attempt, and set their sights on a second mountain - some objective or life goal not about them, anymore, but characterized by a thick web of connections to other people, and to a life of service.
I almost think of this like a decision to distribute your sense of self out among as many people as you can. By confining your sense of who you are into one body, you end up feeling, predictably, like you're an island, alone in a big empty world. By dedicating yourself to some cause that ignores you completely and can only be fulfilled my making other people fulfilled - by climbing the second mountain - you can cut past a whole tangled mess of self-doubt.
I love that message. But the book, as a vehicle for this message, has a big problem.
The problem is that if you already believe in this idea, or haven't yet made the leap but get the shape of what I've sketched out above, the book doesn't add anything. I didn't find anything in "The Second Mountain" that I could use to persuade, say, myself as a college student, worrying about grades and internships, that I was playing a fool's game.
Brooks has filled the book with examples of people that have found their second mountain, and are happier as a result. But why? If you take it as self-evident that this is a better way to live, why do you need examples?
What I wanted was a kick in the ass that would help me get over the current anxiety I've been feeling about job security, money and finances — anxieties that I intellectually believe are stupid, given my bank account and family situation, but that I emotionally can't quite let go of. I would be financially fine for at least, say, 10 years if I started giving away half of my income. Why am I not doing that?
Maybe my reaction to this book is unfair; I found that the stories in the book were too vague to motivate me to make any change in my own life. Maybe it's a character flaw that I was looking for a manual.
The goal of reading, as Mortimer Adler states in How to Read a Book, is to achieve enlightenment of a sort. With a practical book, if you come to the end of the book and find that you agree with the premises, and that the author's conclusions make sense, that there are no logical errors... well, then you're obligated to do what the author suggests. A practical book is trying to persuade you of something.
I'm persuaded, but I wanted my hand held. I wanted Brooks to give me a guide for how to find the second mountain, and how to just turn back and forget about tagging the first summit. I didn't find it here, and I suspect you won't either... but maybe the fact that I'm so annoyed by this means the book has done its job.
Here's the link to "The Second Mountain", by David Brooks. If you want to hear him speaking about his previous work, especially "The Road to Character", check him out on Sam Harris's podcast.
]]>My knowledge of the history here is probably at the level of a Just-So Story; still, I've been finding it helpful to have some vague idea of why folks started developing some area of mathematics before digging into the details, so I'll pass on what I've got to you.
My understanding (backed up by Wikipedia) is that the idea of $i$ came up as a sort of naughty, wtf-is-going-on way of finding solutions to equations like this:
$$x^2 + 1 = 0$$
If you look at the graph of $y = x^2 + 1$, you can see that there is no point where some $x$ gives you 0 back out.
But what if you just went into overdrive and solved the equation anyway? $x^2=-1$, so just make up a new thing, call it $i$. What is $i$? Well, it's something that, when you square it, gives you back $-1$. Now that equation has two solutions, like any good quadratic function: $i$ and $-i$.
According to Wikipedia, Descartes was the first person to call this number imaginary:
[...] sometimes only imaginary, that is one can imagine as many as I said in each equation, but sometimes there exists no quantity that matches that which we imagine. (Wikipedia)
This name is a mathematical micro-aggression. It's another step in the centuries-long discomfort mathematicians have had with opening up the idea of what a "number" is, from natural numbers, to rational, to real, and now, with $i$, the first step toward the idea of a complex number of the form $a + bi$, where $a$ and $b$ are both real numbers.
$i = \sqrt{-1}$ as an accounting device didn't work for me at all. Why does this matter?
What did work well was Gauss's realization that you can think of a complex number — again, a number of the form $a + bi$ — as a point in a two dimensional plane (the "complex plane").
It turns out that you can visualize multiplication by $i$ as a 90 degree rotation in the complex plane.
You can see from this that if you have some number $a$, and you multiply it by $i$, $ai$ now lives on the y axis of the graph, 90 degrees from where you started.
If you multiply $ai$ by $i$, you get $ai^2 = -a$, which lives back on the x axis, another 90 degrees around, or 180 degrees from where you began. ($i^2 = -1$, a 180 degree rotation, or a reflection across the y axis. See?)
Keep going around to get $-ai$:
And a final multiplication by $i$ gets you back to $-ai \cdot i = -ai^2 = -(-a) = a$, a full 360 degrees around from where you started.
Here's Gauss, expressing his frustration at the "imaginary" terminology:
If this subject has hitherto been considered from the wrong viewpoint and thus enveloped in mystery and surrounded by darkness, it is largely an unsuitable terminology which should be blamed. Had $+1$, $-1$ and $\sqrt{-1}$, instead of being called positive, negative and imaginary (or worse still, impossible) unity, been given the names say , of direct, inverse and lateral unity, there would hardly have been any scope for such obscurity.” - Gauss, via Wikipedia
If you imagine $ai$ as a number along the y axis of a complex plane, and $a$ as some number along the x axis, the next natural step is to imagine how you'd represent arbitrary points on the plane.
If you call the axes "1" and "$i$" instead of $\hat{i}$ and $\hat{j}$, you'll see that you can write every point as $a + bi$.
You should also be able to see from the pictures above that you could represent each point, each complex number, as a vector, a line segment stretching from the origin of the complex plane to the point $(a, b)$.
Okay. We saw about that multiplication by $i$ is the same as rotating a number around the origin by 90 degrees.
I'm going to claim that not only is that fact true... but that multiplying any complex number $x$ by another complex number $y$ transforms the vector represented by $x$ by:
First, let's do it symbolically, then with a picture.
TODO work out the math:
\begin{equation}
(a + bi)(c + di) = ac + adi + bci + bd(i^2) = (ac - bd) + (ad + bc)i
\end{equation}
\begin{equation}
r_1(\cos \theta_1 + i \sin \theta_1)r_2(\cos \theta_2 + i \sin \theta_2)
\label{eq:angles}
\end{equation}
\begin{equation}
r_1 r_2 ((\cos \theta_1 \cos \theta_2 - \sin \theta_1 \sin \theta_2)+ i (\cos \theta_1 \sin \theta_2 + \sin \theta_1 \cos \theta_2))
\end{equation}
Next, go look up a table of trig identities and realize that the orderly arrangements here of $\sin$ and $\cos$ look very much like the angle difference identities:
\begin{equation}
\cos \theta_1 \cos \theta_2 - \sin \theta_1 \sin \theta_2 = \cos{\theta_1 + \theta_2}
\end{equation}
and:
\begin{equation}
\cos \theta_1 \sin \theta_2 + \sin \theta_1 \cos \theta_2 = \sin{\theta_1 + \theta_2}
\end{equation}
Using those, the original product collapses down to:
\begin{equation}
r_1 r_2 (cos(\theta_1 + \theta_2) + i \sin(\theta_1 + \theta_2))
\label{eq:confirmangles}
\end{equation}
Might be more obvious from this picture:
This picture shows visually all of the relationships we need to verify that you can think of complex number multiplication as rotation and scaling in the complex plane. That is, if you have the patience to go verify all the triangle relationships.
The two complex numbers that this picture represents are $c_1 = \cos \beta + i \sin \beta$ and $c_2 = \cos \alpha + i \sin \alpha$.
Imagine the red triangle rotated up by an additional angle of $\alpha$. We can work out the same math as in \eqref{eq:angles} above, with $\alpha$ and $\beta$ subbed in, and $r_1 = r_2 = 1$:
\begin{equation}
(\cos \alpha + i \sin \alpha)(\cos \beta + i \sin \beta)
\end{equation}
If you look at the white triangle on the left, you can verify that the final triangle has dimensions:
\begin{equation}
(cos(\alpha + \beta) + i \sin(\alpha + \beta))
\end{equation}
Just like we verified in \eqref{eq:confirmangles}.
The algebra above is a mess. Is there a better way to write these things?
Yes!
It turns out that you can use the amazing Euler's Formula:
\begin{equation}
e^{ix} = \cos x + i \sin x
\label{eq:euler}
\end{equation}
To write any complex number $a + bi$ as $r (\cos \theta + i \sin \theta) = r e^{i \theta}$.
This feels fairly spooky. It does make the math easier. When you multiply powers, you simply add the exponents, so the product of two complex numbers both becomes easy, and checks out with what we found above:
\begin{equation}
(a + bi)(c + di) = r_1 e^{i \theta_1} r_2 e^{i \theta_2} = r_1 r_2 e^{i (\theta_1 + \theta_2)}
\end{equation}
The new complex number has magnitude $r_1 r_2$ and an angle equal $\theta_1 + \theta_2$.
But why does this work? Where did $e$ come from, and how is it linked to angles, cosine and sine?
To answer that question we need our final piece of machinery: the Taylor series approximation.
Notes from memory, though originally I found this page helpful.
The goal here is to motivate why we can talk about complex numbers as $e^{i\theta}$.
blah, get after it, smooth functions, show the steps:
\begin{equation}
f(x) = a_0 + a_1 (x - c) + a_2 (x - c)^2 + ... + a_n (x - c)^n
\end{equation}
What can we say about the coefficients? Figuring out the coefficients is the key to figuring out the taylor series expansion of some function.
To get $a_0$, calculate $f(c)$, and all other terms go to 0.
\begin{equation}
f(c) = a_0
\end{equation}
How about $a_1$? If the function is "smooth", then every derivative exists. That's a clue. Take
\begin{equation}
f'(x) = a_1 + 2 a_2 (x - c) + ... + n a_n (x - c)^{n-1}
\end{equation}
Taking the derivative subtracts $1$ from each exponent, conveniently setting the constant term to zero and bringing the next coefficient within our grasp.
In general, the coefficient of the $(x-c)^n$ term will have to
\begin{equation}
a_n = \frac{f^n(c)}{n!}
\end{equation}
So the whole taylor series expansion is:
\begin{equation}
f(x) = f(c) + f'(c) (x - c) + \frac{f''(c)}{2} (x - c)^2 + ... + \frac{f^n(c)}{n!} (x - c)^n
\end{equation}
Or, more compactly:
\begin{equation}
f(x) = \sum_{j = 0}^{n} \frac{f^j(c)}{j!} (x - c)^j
\end{equation}
How does this help us?
Well... $e^x$ is its own derivative! $f(x) = f^n(x) = e^x$.
The taylor series expansion of $e^x$ around $c=0$ is:
\begin{equation}
e^x = 1 + x + \frac{1}{2} x^2 + ... + \frac{1}{n!} x^n = \sum_{j= 0}^{n} \frac{1}{j!} x^j
\end{equation}
Then, for $\cos x$ around 0:
\begin{equation}
\cos x = 1 + \frac{f''(c)}{2} (x - c)^2 + ... + \frac{f^n(c)}{n!} (x - c)^n = \sum_{j= 0}^{n} \frac{1}{j!} x^j
\end{equation}
then $\sin x$ around 0:
\begin{equation}
e^x = f(c) + f'(c) (x - c) + \frac{f''(c)}{2} (x - c)^2 + ... + \frac{f^n(c)}{n!} (x - c)^n = \sum_{j= 0}^{n} \frac{1}{j!} x^j
\end{equation}
Stare at these...
then do the steps for $e^x$, which is its own derivative.
That gets us back to the amazing equation \eqref{eq:euler}, repeated here:
\begin{equation}
e^{ix} = \cos x + i \sin x
\end{equation}
Stare at the terms, get it to make sense.
This means that multiplication by a number that looks like $e^{i \theta}$ is equivalent to a rotation of $\theta$ in the complex plane.
Where can you use this?
]]>In my ongoing quest to lay a more solid foundation for this new, strange life as a machine learning "researcher", I've been going through various foundational concepts and ideas and trying to build up rock solid intuitions that I can lean on for years. (Why the hell didn't I do this back in school??)
Entropy is my latest obsession - thermodynamic entropy, and information entropy, and the ways in which these two things are similar.
This is the first of a short series of posts building up the mathematical concepts and tools required for a solid understanding of what entropy is. The rough plan is to cover:
Once the series is complete, I'll come back and fill in some reasons why you might be interested in this stuff and generally polish things up. If you've found this post in its current state, read on for an unmotivated account of how many different ways there are to pluck items out of a bag.
This post will cover:
For each, I'll present the formula you'll typically see, and then show off some intuitive ways to think about the problem that should let you figure out the formulas on your own whenever you need to use them.
Let's say you have some set of items:
The permutations of the set are all of the possible ways that you can arrange, in order, the items in that set. There are 6 possible permutations of the set above:
Some examples of permutations you might care about are:
Is there some link between the number of items in a set of size $n$ and the number of possible permutations of those items?
Of course there is! The number of permutations is
\begin{equation}
permutations(n) = n!
\label{eq:permutations}
\end{equation}
where $n!$ is the factorial function:
\begin{equation}
n! = n \cdot n - 1 \cdot n -2 \cdot ... 1 = \prod_{i=1}^n n_i
\label{eq:factorial}
\end{equation}
This checks out for a few examples:
Why is this relationship generally true? Let's first look at a visual example of how you'd generate permutations from some set, then look at some code.
If we have a set of $n$ items, to generate a permutation we have to start by making a single choice. We have $n$ different choices we can make, each of which is a valid start to a permutation. Each of those choices will leave $n-1$ items left in the set.
Here's a drawing of each possible choice, linked to the remaining items.
for each of the remaining items we can play the same game. We have $n-1$ choices in each set; after each choice, we'll have $n-2$ remaining items.
Stay focused on those first two levels for a second. Each of the $n-1$ choices above is for the second item in a permutation; because we had $n$ original choices, each of which resulted in $n-1$ more possible decisions, we've learned that there are $n(n -1)$ possible ways to make our first two choices.
This game continues all the way down until each branch has a single choice left. By that time, we've made $n!$ total choices:
\begin{equation}
n! = n \cdot n - 1 \cdot n -2 \cdot ... 1
\label{eq:fac2}
\end{equation}
Factorials show up when you're counting up groups of things where each choice takes an item out of the mix.
You can also visualize permutations as the number of total possible paths through a tree, where each branch plucks some item off of the set. The diagram above is a tree:
This post by Shawn O'Mara has a wonderful set of diagrams and descriptions of permutations and other combinatoric goodies using trees. Here's a diagram similar to mine for the set $\{a, b, c, d\}$:
Each path through the tree represents a permutation.
If you could the number of branches at each level of the tree, you'll see that they slowly descend from 4 branches at the first level, to 3, then 2, then 1. Each level of branching represents the "many worlds" that a choice between the level's branches represents.
You can also count the number of items at each level. Each path has to end in a final "leaf", so counting leaves is the same as counting paths. The tree above has $4 \cdot 3 \cdot 2 \cdot 1 = 24 = 4!$ leaves, just like we expected.
How do you generate permutations in code? Here's one attempt in Scala:
/**
This function generates the set of all possible permutations of the items in
the input set.
*/
def permutations[A](items: Set[A]): Set[List[A]] =
if (items.isEmpty) Set(List.empty)
else {
items.flatMap { x =>
// For every item in the input set:
//
// - generate all possible permutations of the set WITHOUT that item
// present; we expect (n - 1)! total permutations.
// - loop through all of those permutations and stick the removed item
// onto the beginning to generate (n -1)! permutations, each with the
// removed item on the front.
// - the "flatMap" above calls this inner function once for each of the
// n items, then sews together all of the returned sets... this gives us
// a final set of size n * (n - 1)! = n!
permutations(items - x).map(x :: _)
}
}
Again, in English:
The function generates the actual permutations, but if you track the size of the sets that are passed down into recursive calls of permutations
, you'll notice that each recursive call removes one item from the set. If permutations
receives an empty input set, it returns a set with 1 item in it.
Think of a "loop counter" associated with each call to permutations
. The function gets called $n$ times on each loop; on each call, permutations
is called with $n - 1$ items, then the results are all fused together.
We know that each item is distinct because sets can't contain duplicates, and we add the removed item back on only after the recursive call.
The function above is very close to a proof that the total number of permutations for a set of size $n$ equals $n!$.
Here's another implementation in Scala that generates permutations in a different way:
def permutationsTwo[A](input: Set[A]): Set[List[A]] = {
// we use an inner function called "loop" so that we can hide the fact that
// we're converting the input set into a list. Sets don't have ordering, but
// we need to enforce one for this approach.
def loop(items: List[A]): Set[List[A]] =
items match {
// the base case returns a set with 1 item.
case Nil => Set(List.empty)
// this pattern match breaks the items into the first item in the list and
// the remaining items; xs has size n - 1, 1 fewer than the size of
// `items`.
case x :: xs =>
// loop is called recursively here with a "loop counter" of n - 1.
loop(xs).flatMap { permutation =>
// Each of those n - 1 entries looks something like (b,a,c). For
// each of these, the next block of code generates new lists by:
//
// - inserting the element that was NOT passed down into the loop -
// say, "d" - into the slot AFTER every one of the existing
// elements, and
// - adding one extra list with "d" at the beginning
// for a total of n new permutations for each of the (n - 1) passed
// back up through the loop.
//
// n * permutations(n - 1) = n! total permutations, since we bottom
// out at 1.
(0 to permutation.size).map { i =>
val (pre, post) = permutation.splitAt(i)
pre ++ List(x) ++ post
}
}
}
loop(input.toList)
}
Different approach to generating cardinalities; same logic for tracking how the sizes of the returned set increase with each new item in the input set.
What happens if you pass the empty set into the functions above? How many permutations can you take from a set with nothing in it?
You'd think it would be $0$, and that $0! = 0$. The problem is that $0$ is the multiplicative identity, and if the definition of factorial let you go all the way down to $0$, as in $3 \cdot 2 \cdot 1 \cdot 0$, the $0$ would destroy everything and set the factorial equal to $0$ ALWAYS.
It's obvious in the code examples above that to make everything work, you have to return a set that contains ONE permutation - the empty set! $0! = 1$ for any implementation I can think of in Scala.
The mathematical argument for why this has to be true feels silly, but there's no reason it can't hold.
The permutations of a set are all of the possible distinct orderings that contain the same number of items as the original set. Well, there is ONE list (an ordered data structure) that contains the same number of items... the empty list! There are no other lists with 0 items, so we can rule out every ordered thing... except for the empty list. If we can't rule out a permutation, we have to include it, so there it is: the permutations of $\{\} = \{()\}$, and $0! = 1$.
https://en.wikipedia.org/wiki/Permutation#k-permutations_of_n
What if you want to stop? Well...
What's the equation here? We just stop when we have k items remaining. How do we express that as a nice formula?
same as before.
Same as before, we just stop.
Very similar, but this
def kPermutations[A](items: Set[A], k: Int): Set[List[A]] = {
assert(k >= 0 && k <= items.size)
if (k == 0) Set(List.empty)
else {
items.flatMap { x =>
kPermutations(items - x, k - 1).map(x :: _)
}
}
}
then combinations, you're plucking sets. $nPr / r!$, divide out the permutations of $r$ items buried in.
def combinations[A](items: Set[A], k: Int): Set[Set[A]] =
kPermutations(items, k).map(_.toSet)
def factorial(n: Int): Int =
if (n == 0) 1 else n * factorial(n - 1)
def numCombinations(n: Int, k: Int): Int =
factorial(n) / (factorial(n - k) * factorial(k))
Why does the binomial coefficient equal the number of combinations here?? That is super weird. Go through and get some intuition here.
Side note... if you add up the number of combinations of $k$ from 0 to $n$, you get what's called the "power set", which has cardinality $2^k$.
WHY IS THAT?
Well... imagine a bit-mask, a series of 1 or 0 that you'll lay over the items in the set. How many possible combinations of 1s and 0s of length n can you make?
Well, we have two choices at first... then we keeping going, and every time we make a choice we multiply.
note about how we can add up all the k permutations to get the total powerset: https://en.wikipedia.org/wiki/Binomial_coefficient#Sums_of_the_binomial_coefficients
// This gets ALL combinations...
def powerset[A](items: Set[A]): Set[Set[A]] = {
@tailrec
def loop(remaining: List[A], ret: Set[Set[A]]): Set[Set[A]] =
remaining match {
case Nil => ret
case x :: xs => loop(xs, ret ++ ret.map(_ + x))
}
loop(items.toList, Set(Set.empty))
}
// nice short way that uses foldLeft to accumulate.
def powerset[A](items: Set[A]): Set[Set[A]] =
items.foldLeft(Set(Set.empty[A])) { case (acc, a) =>
acc ++ acc.map(_ + a)
}
You can sort of see here that the cardinality is the cardinality of... well, think it through.
how many ways can I put n items into k bins? THEN we have to talk about this fantastic derivation of the number of ways to put something into k bins: https://en.wikipedia.org/wiki/Multinomial_theorem#Interpretations
The book is a long look at an insight that blew Harding's mind — from the
]]>What a strange, fantastic little book! I recently read Douglas Harding's "On Having No Head" after hearing Sam Harris mention the book in one of "Waking Up" meditations, and then many times on subsequent podcasts.
The book is a long look at an insight that blew Harding's mind — from the first-person person, as a matter of subjective experience, there was no evidence around that he had a head. Put another way — if you try not to read into what you're seeing, and just describe exactly what's in front of you, not only is there no head on display... but the whole picture of who you are, of the person doing the looking, looks quite strange indeed.
Harding had been thinking about Life's Big Questions for years, but the Headless insight crystallized when he saw this "self portrait" by Ernst Mach:
He's right — there's no head! There's a field of view, limned with a massive nose and odd protruding arms and legs that stick out into the visual field, obscuring the rest of the world. If you take literally what you see, the scales are all out of whack. The nose is huge! The hand is about the same size as the foot, and the hands are different sizes.
Well. So what? The key to the whole thing is, as I'll discuss below, noticing the effortless remapping of these strange-seeming signals onto a coherent body-map that makes it "feel" like I am a thing of the same shape, size and dimension as other people I see walking around.
Okay. I had trouble with this book at first, and I predict that you'll find it as hokey on a first reading as I did. But there is something special here. I'm going to try and describe what it was that I found so fascinating about the Headless insight, and why you should care about, and spend some time recreating, this insight for yourself.
First, some background on what it means to "be present".
If you sit and meditate just calmly observe the thoughts that arise in your mind, noticing what pops up — you'll notice that there is a whole bunch of chaos bubbling around up there. Your consciousness is a soupy combination of direct sensory experience, abstractions, emotions, thoughts, idea bubbles floating in and fading away.
There are many layers of processing your brain applies to the little electrical pulses that your various sense organs are pinging out. I contend that "being present" is the process of
That theater is consciousness — that theater is "you", and "being present" is all about identifying with the theater itself, and not with the layer of autobiographical thought bubbles.
Given all that, what are these layers, and what does it feel like to peel each back — or, equivalently, to start ignoring each one? I'll switch terms here and present each of these phases as "levels" of enlightenment.
The first level involves realizing that when you get lost in thought, much of what you're reacting to isn't even happening right now. You're being assaulted with memories and getting lost in those, sure... but even the act of getting pissed off in traffic is an act of mild obsession with the past. The car cuts you off, and you freeze-frame the moment and start looping on it, even as the moment that annoyed you slips away.
Your face is itching? Great, that's a nice thing to notice. You want it to stop? Boom, lost in thought. You've taken your mind off of the itch, and you're now anticipating the next itch you'll feel. you're in your head again, ignoring external input, looping away.
To break through the first level, you've got to notice that these feelings, thoughts, anxieties are just suggestions that float up from somewhere, and that will float away if you make room for the next bubbles and don't obsess. Watch and enjoy.
Reaching the second level is all about realizing that your itches, your aches, your proprioception — that is, your awareness of your body in space — are all interpretations of sense data coming in to that theater of awareness.
Your forehead itches? How do you know? What does an itch feel like? What's really happening is that you notice some... warmth? heat? Tingling? And you:
Spatial sense is like the emotion of getting pissed at the car ahead of you. What really happened? The car moved in front of you — some tingling warmth happened "somewhere". Then what? The lightning-fast invisible interpretation, and suddenly the other driver is an asshole, and your forehead itches.
I started to write the following: "What's really happening? Photons are hitting the 2d plane of your eyes; molecules are locking in to your smell receptors, these are causing neurons to fire; all of these things result in neurons inside your head spiking and firing. From all of that, "you", whatever that means, are inferring facts about the objects and spatial relationships out in the world."
But those are all interpretations too, and statements about some other mental model! To break through level two, notice what you can notice before the whatever-it-is in your head whips the raw sensation into the abstraction you're used to, like my body or the person I know is in that car.
Just a guess at level three, since I haven't been able to get here myself. The third level is probably — for the visual field — the machinery of object detection itself, the translation of the two dimensional pixel map in front of your face into what feels like a virtual reality landscape populated by objects. (Isn't the sense of distance so odd? You feel how far away something is. How? Spatial sense is an emotion.)
Breaking through the third level would require ignoring object detection completely and treating as equal all of the sensations and thoughts popping into the oval-shaped visual theater. No more distance, no more trees and tables and chairs, no edges, no emotion... just tingles and pixels.
Now, the Headless insight. At Level Zero, when your object detection is in full sway and you're making predictions about the world, you definitely "have a head". "You" exist! Objects exist! Sure, you can't see your head... but you know you have one, since you know you're a human, and humans have heads. I can even make a prediction about what will happen when I bring my hands up just outside the big oval I'm seeing out of and pat whatever's there. A head!
"Having a head", and feeling like it's silly to say that you don't, reveals you to be so obsessed with the abstraction-world that you can't recognize how much preprocessing is going on; or even that there's any preprocessing at all. If you don't think there's any pre-processing, then of course you can't imagine a world where it's turned off, even for the sake of discussion.
If you can turn it off and break through level two, then you're forced to admit that, without your pre-processing to help you, there's no "head" of your own on evidence. At level two, you still are detecting objects and seeing spatial relationships... but you're not thinking about what the objects and relationships imply.
I look down. I see hands and arms coming up toward... well, out toward the edges of some big field of vision, the movie screen on which everything is playing. Is there a head there? (Whoops, I'm not present anymore!)
Kwatz! Unask the question.
You can see that OTHER people have heads! (I'm assuming that you haven't turned off your object detector completely, so you see people, not pixels.)
What about when I look in the mirror? Well, there's a head in the mirror. Is that... my head? Whoops, at Level Two, that question makes no sense. By calling the pixels in the mirror "your head", you're making inferences that you could test... but not at Level Two. Inferences occur in the little lab in your brain where you go to figure out what things mean. In the present, who cares what things mean? That's so Level Zero.
Inspect your field of vision, now. There's no head available to see, so why worry about it? If you can lock in viscerally the sense that you have no head, then you've managed to break free of your looping obsession with what's going to happen next, and with preferring your predictions and abstractions over the choice to just relax and notice what's on offer. And there is certainly no head on the menu.
That's all I've got. The Headless insight is a little checkpoint, totally obvious at the level of experience. The only reason you wouldn't find it obvious is that you're refusing to separate out these levels of awareness in your mind, probably because you haven't accepted that these different levels exist.
If you can get out of the lab and just notice, and you try to notice what direct experience is available in the spot where your body-map tells you your head should be... you won't find a head. Instead you'll find an oval-shaped thing that contains, in fact, the entire world on view. At Level Two, the world itself is right in the center of things, right where your arms terminate.
The fact that there's a scene being perceived at all is the big mystery. Stop worrying about the abstractions and enjoy the pixels.
You need the book itself, of course: On Having No Head, by Douglas Harding. Sam Harris has a wonderful interview with Richard Lang (Stitcher link), a student of Harding's, on his Making Sense Podcast.
Richard maintains the website http://www.headless.org, which is full of materials and exercises designed to help you see the obvious insights I discuss above. Richard's on Twitter as well!
If you're interested in meditation at any level, I highly recommend Sam Harris's Waking Up app.
]]>Here's what it looks like now:
I used three tubes of Lord 7545 A/E adhesive from Aerosport Products, one for each pair of side windows and a full tube for the windshield. The process
]]>The RV10 has windows! And a windshield! I've got everything attached and mostly cleaned up.
Here's what it looks like now:
I used three tubes of Lord 7545 A/E adhesive from Aerosport Products, one for each pair of side windows and a full tube for the windshield. The process was stressful; I've been mildly dreading this for a couple of years now. Compared to much of the metal work, this step is one-shot with permanent results.
I wrote up the process of preparing to glue these in my last post; the spring clips are absolutely the right way to go. I can tell that in the spots where the clips were touching the window, there are very mild ripples (is that what people call "crazing" on the forums?) but it's barely there, and only in the zone that I expect to fiberglass over with deck cloth anyway. I'm not sure how I would have held the windows on without that tip.
What did I learn?
Here's a shot of the two windows on the left side of the plane curing with my heat lamp. I disabled the safety switch on the bottom of the lamp with tape and tilted it up precariously so it could point at the right angle. Not recommended!
Now, the windshield. I used a boat strap to hold the sides down tight, and used a couple of weights on top to hold the top of the plexiglass flush. (I used a bunch of little aluminum cylinders, the handles for my epoxy brushes, to focus the pressure down on a few spots).
I did have one almost-disaster happen here. I waited a couple of hours until I believed that the glue had cured, then went out and took the weight off of the top and moved a heat lamp up above so I could get direct light on the piece and cure it well.
I came out an hour later to check on the piece and, holy shit, the top had sprung up, ripping free of the tacky glue. Nightmare at 10 at night, way past my bedtime.
I still had some glue left over, so I did what I could to force more glue into the gap between the plexiglass and the cabin top. Once I'd stuffed it full and gotten a bunch of mess on the plexiglass, I added the weights back on and got the heat lamp back up, pointing down at the whole stack. 12 hours later everything looks and feels very solid.
Once again, a shot of the final windshield and windows after I trimmed the excess glue off with a razor blade and pulled all plastic:
And here's inspectors Juno and Jenna, checking out my interior work:
In the immediate future, I'll:
One task at a time.
]]>I've gone through so many oscillations
]]>I'm out here working on the plane today - I haven't made it out much, lately, but I'm in the garage spending the time. This project will end up taking roughly five years, three years longer than I thought it might when I started.
I've gone through so many oscillations of excitement and frustration out here. Some days I feel like I'm flying through tasks, and I think about how close I am to the end. Some days it is just such a grind and I feel exhausted, sanding away yet another coat of epoxy, trying to get the curve of the cabin top just right, knowing that I'm probably aiming for a level of craftsmanship that I'll never notice once the plane is up in the air.
The sweet spot is when I'm able to remember how unusual this whole project is, and how I'll almost certainly never do it again. Each task I complete is the last time I'll ever do something like this in my life. Why not be curious, and pay close attention to what's happening?
I started my day in the shop with a short, handwritten list of straightforward tasks I could accomplish today. These were:
I started with the spring tabs. These are small pieces of aluminum that I'll "cleco" to the fuselage of the plane all around the perimeter of each window. The aluminum is thin enough to flex; each rectangle is like a small, springy clip that will put light pressure on the window in one spot. The alternative is to use duct tape, or boat straps, or weights, or any number of other ideas that feel less elegant than the one I've chosen and absorbed after reading as many other build logs as I could find.
I used my shears to cut a long strip of .050" aluminum off the edge of a large sheet of scrap, snipped off a piece about 1.5" long and used that as a template to cut successive pieces off of the strip. I took the pile to my grinder and knocked the sharp burrs off of the cut edges with the scotchbrite wheel, rotating and flipping each metal rectangle by hand. Next, to the drill press to put a #40 hole off to the side of each clip, then over to the bench vice to bend each rectangular tab somewhere in the middle. I didn't measure anything, as the exact dimensions here don't matter. I do a lot more eyeballing now, so many years into this project.
Here's one of the final clips:
And here they are in action, holding on the window:
As I was standing at the grinding wheell knocking the burrs of of these little tabs, I had the strangest feeling; I had the sense that I was in the middle of a performance, in the middle of some episode of a show I've been studying and watching for some time now. I was on a stage, performing my way through the spring clips.
I've spent hundreds of hours researching this build, and have read so many build logs... I've seen dozens of examples of each step of the project. Attaching the windows has been an impossibly-far-off-task for so long, and I've had a good deal of anxiety thinking about how far off it is.
But here I was, in technicolor, standing in a virtual reality version of those build logs. I hadn't struggled in any particular way to get here... I'd just continued to tick along, putting in the time. And now, with no thunderclap, here I was, making the simple little tabs I'd thought about for years. An hour later I was done and on to the next task. Everything is like this on the plane; maybe there's a bigger lesson, too. I'll think about it when I'm done.
If you're curious about the specifics more than Life Lessons at the Grinder, Justin Twilbeck's blog is the best example of one of the window installs I've been staring at. Bob Leffler has a picture of his janky-looking spring tabs at this Matronics post; spring clips don't have to look good to accomplish the task, and the compulsion the same standard of "perfection" on every piece regardless of where it lives, or whether or not it's an actual plane part vs a jig, is a disease.
If you're building anything big, you need to come to terms with that disease. If you don't have the disease, you shouldn't be building anything like an airplane.
Finally, a side shot of the current state of affairs:
Just a couple of months to go.
]]>I spent a while this summer reading math books and popular science, trying to figure out what a life in "research" might look like; in late Spring I began to sense that it was time to wake up my brain and start thinking again, and the best way forward seemed to be to find examples of lives-well-lived in science and study the cases.
I'm so glad I found "The Second Kind of Impossible: the Extraordinary Quest for a New Form of Matter" by Paul Steinhardt. The book is a trip report of Paul's decades-long relationship with a peculiar type of matter called a "quasicrystal". I'll give a short summary, then discuss what struck me as so wonderful about the book, and finish out by including my more detailed notes in case you want to dig deeper.
Paul is, I guess, best known for coming up with the inflationary theory of the universe in the early 80s; I would have imagined that that topic would be consuming. Shows what I know. Paul spent the summer of 1980 working on a computer simulation of what happens when you rapidly cool a liquid down to a frozen state.
A liquid is a random arrangement of atoms, all jostling around past each other. If you cool a liquid slowly, the atoms will begin to clump together into a solid; imagine water freezing into ice as an example. If you cool slowly enough the resulting solid will assume a form called a "periodic crystal". The water molecules in ice stack "together like oranges on display at a grocery store. The structure - technically called face-centered cubic - has the same symmetry as a cube, consistent with all the known rules of crystallography." (Steinhardt, Loc 358)
This arrangement is called a periodic crystal because the pattern, the cube, in this case, repeats itself as you scan through the solid. It turns out that there are only 230 possible periodic crystal structures in 3 dimensions. Any crystal you find in nature will have one of these shapes.
What would happen if you cooled water so fast that the molecules didn't have time to arrange themselves into a periodic crystal? Maybe some mix of randomness and order, Paul guessed. What he found, instead, was a "forbidden symmetry" - a crystal that somehow had the symmetry of a 12-sided die, an icosahedron, which is not one of the shapes allowed for a periodic crystal.
Of course, there is nothing wrong with having a single tile in the shape of a regular pentagon. You can make a single tile any shape you wish. But it is impossible to cover a floor with regular pentagons without leaving gaps. The same applies to an icosahedron. It is possible to make a single three-dimensional die in the shape of an icosahedron. But you cannot fill space with icosahedrons without leaving gaps and holes. (Steinhardt, Loc 408)
This discovery kicked off an incredible, decades-long quest to figure out what the hell is going on. Paul got a grad student, Dov Levine, interested in the problem of "quasicrystals", arrangements of matter that weren't periodic crystals... but still showed inexplicable signs of order.
The book is full of wonderful references, scientific celebrity appearances and little knowledge nuggets that could keep you busy on their own for months. For example, the duo gets their first clue about what's going on when Dov discovers Roger Penrose's "Penrose tilings", arrangements of tiles that never repeat, even on infinite planes. This is straight out of the puzzling world that my dad, Bill Ritchie, is a big part of; Martin Gardner, my dad's hero, wrote a book about these tilings called "Penrose Tiles to Trapdoor Ciphers".
Richard Feynman (Paul's mentor!) makes an early appearance and encourages Paul to keep poking at this problem seemingly unrelated to the inflationary theory that's making Paul famous:
My main scientific interests were pointing elsewhere at the time. But I took advantage of the summer research opportunities because I was intrigued by the fact that the atomic arrangement of something as rudimentary as amorphous matter was not yet known. In that, I was intentionally following one of the most important lessons learned from my mentor, Richard Feynman: It is wise to follow your heart and seek out good problems wherever they may lead, even if it is not in the direction you thought you were supposed to be headed. (Steinhardt, Loc 384)
Paul and Dov build on Penrose's work and generate an entire theory of 3d quasicrystals, or arrangements of matter that have symmetry, but never actually repeat. No one had ever seen anything like this in nature... until a massive coincidence occurs and, without knowing about Paul and Dov's work, someone finds one of these things while investigating different aluminum alloys in the lab.
The book tracks the story of the theory, the coincidences, and then Paul's later obsession with the next obvious idea - can these quasicrystals form in nature? Has anyone ever found one in the wild?
The answer turns out to be yes, and leads Paul deep into geology, and finally on an expedition that into the Russian interior on a quest to find quasicrystals that seem to have arrived from space.
Come on. How much more awesome can it get?
I've been writing many trip reports of late, and this whole book is a trip report of a legendary physical and intellectual adventure. It's a celebratory book, meant to get you excited about the physics that Paul is tackling, and the pleasure that can come from taking a passing curiosity seriously and trying to track down the answer.
It's easy to imagine passing over the initial curiosity of a strange symmetry popping up in a computer simulation. Paul keeps at it for years with no indication that the things he's simulating are even possible in the physical universe, with the collection of elements we have on earth. He's rewarded with the discovery of an entirely new kind of matter that actually does exist in nature.
Can you imagine making up, and then discovering, a whole new class of matter, like the metals?
There is a sense of suspense throughout! Paul and his collaborators spend their time learning the recreational math of Penrose tiles, tracking down geology samples, analyzing tiny little pieces of rock with no backup... and doors keep opening onto the next phase of the problem, often after months or years with no result.
The intuition that Paul ended up developing was outrageous. There are multiple moments in the book where Paul and the geologists he's working with are able to immediately identify some x-ray image of crystal structure as coming from some particular meteorite, some famous sample, just based on the rare geometric arrangement that the x-rays create on photo paper after scattering through the rock. Geology is not technically Paul's "field"... but maybe the grouping of knowledge into fields is arbitrary, and not helpful when you're following a problem around.
Yes, I think so if you have any interest in science; This is a great book about scientific discovery, and how a problem can fractally unfold and stay interesting through so many levels.
I don't know if this is what being a typical professor is like, really. Paul doesn't discuss his non-academic life, or family, or teaching; it's not really a memoir. Paul comes across as a humble, grateful dude, just happy to have the chance to get to peek behind the curtain.
I have so many notes that I can't bear to delete, so here's a brain dump on the book itself, and various topics that pop up.
You can also get stoked about quasicrystals! I don't know if you can learn a ton here, but if you want to step off and learn more about the various topics, I can offer some suggestions. If you read this book slowly and absorbed the examples I think the import of the various tests and mineral combinations he discusses later would be more interesting for sure.
I first encountered the idea of symmetry groups in Ian Stewart's Concepts of Modern Mathematics. Chapter 7 - "Symmetry: the Group concept" - covers the mathematics and builds up to an explanation of "wallpaper groups", which is the 2-dimensional version of the mathematics behind crystallography (called "space groups" in 3 dimensions).
Just as there are 230 types of possible periodic crystals, there are 17 possible periodic arrangements on the plane, or 17 possible types of wallpaper pattern.
The book covers periodic crystals and quasicrystals, which are "aperiodic"; there are in fact other kinds of aperiodic crystal, ordered arrangements of atoms with no repeating structure. One such aperiodic crystal is DNA, the molecule that encodes genetic information.
Before the actual discovery of DNA's double helix structure, the physicist Erwin Schrödinger, in 1944, had written a book called "What is Life?" (Wikipedia page) that predicted that whatever it was in living things that carried hereditary information must be some sort of highly-ordered molecule with no periodic structure. From Wikipedia:
Schrödinger explains that most physical laws on a large scale are due to chaos on a small scale. He calls this principle "order-from-disorder." As an example he mentions diffusion, which can be modeled as a highly ordered process, but which is caused by random movement of atoms or molecules. If the number of atoms is reduced, the behaviour of a system becomes more and more random. He states that life greatly depends on order and that a naïve physicist may assume that the master code of a living organism has to consist of a large number of atoms.
How naïve! Schrödinger reasoned that based on what was known about mutations and heredity, "the carrier of hereditary information [would have] to be both small in size and permanent in time"; some sort of molecule was the best known candidate.
I haven't read his book yet, but the guesses, and the philosophy he uses to come up with his theoretical model (the guiding light for Watson and Crick's search!) are fascinating.
The book "Life on the Edge: The Coming of Age of Quantum Biology" by McFadden and Al-Khalili has a chapter called "Quantum Genes" that gets into the details of how DNA encodes highly-ordered genetic data at molecular scales, where chaos usually rules and random motion wipes out any order. DNA has to actively resist that to maintain its aperiodic crystal structure.
For a more speculative discussion of DNA, check out "Cosmic Serpent: DNA and the Origins of Knowledge", by Jeremy Narby. This book might seem a bit New Age and hokey... but I recommend you read it like you'd read Paul Steinhardt's book. Narby is trying to pose a clear, well-formed question about how various cultures have extremely unlikely combinations of plant medicines; he's just not as far along in the quest as Paul, and the mist is thick around him.
Paul is a theoretical physicist from Princeton who's been engaged in a decades-long treasure hunt
What a tale! A decades-long trip report, not focused on some arbitrary athletic endeavour!
https://www.physics.princeton.edu/~steinh/
I'm including the notes I took while reading this book, for myself down the road and for you if you want a more detailed summary than the one above. Let me know if any of this is interesting in the comments, and I'll be happy to polish up any section and go into more detail.
The book starts with Paul's observation (that he learned from Richard Feynman, I believe?) that there are two kinds of impossible. Some impossible things are forbidden by the laws of physics as we know them. Other impossible things are just incredibly unlikely, and very surprising. Five-fold symmetry in matter across thousands of atoms is not strictly forbidden... just incredibly unlikely. The question of why it was appearing in Paul's computer simulations is a strange and interesting one, for someone with the time to grapple with impossible things.
Paul meets Dov Levine and they decide to work on a theoretical foundation for this new kind of matter; what could be causing five-fold symmetry? They find Penrose tilings. Here's a link again to the Martin Gardner book if you want to go deeper.
The first book encodes Paul's sense of a swelling crescendo of interest and excitement and effort.
We get an encounter with Feynman! Paul gets crushed by Feynman lectures back in college. In junior year he gets the courage to ask Feynman to revive "Physics X", but for juniors and seniors.
I also learned that “impossible,” when used by Feynman, did not necessarily mean “unachievable” or “ridiculous.” Sometimes it meant, “Wow! Here is something amazing that contradicts what we would normally expect to be true. This is worth understanding!”
Then on to the history of crystallography. René Just Haüy, the "father of modern crystallography", dropping a specimen of calcareous spar and discovering crystals shaped like a rhomboid...
Woah, totally repeated! Could this be some hint about the internal structure of the building block?
These are the three building blocks:
The principles of crystallography:
The first principle is that all pure substances, such as minerals, form crystals, as long as there is enough time for the atoms and molecules to move into an orderly arrangement.
The second states that all crystals are periodic arrangements of atoms, meaning their structure is entirely composed of one of Haüy’s primitive building blocks, a single cluster of atoms that repeats over and over in any direction with equal spacing.
The third principle is that every periodic atomic arrangement can be categorized according to its symmetries, and there is a finite number of possible symmetries.
The final thing note to "periodic tilings", or wallpaper patterns in 2d. (See my description above about Quasicrystals for links to resources that talk about wallpaper patterns and symmetry groups.) In 3d there are 230 options instead of the 17 in 2 dimensions, but the basic idea is the same.
Periodic tilings are frequently used in kitchens, patios, bathrooms, and entryways. And those patterns often include one of five basic shapes: rectangles, parallelograms, triangles, squares, or hexagons.
You can't tile with pentagons, though!
Next, a discussion of tools that allow you to examine crystal symmetries in actual physical specimens.
Steinhardt and his boy, Dov Levine, start investigating. They come up with quasicrystals!
Their inspiration was Penrose tilings: https://en.wikipedia.org/wiki/Penrose_tiling
They show that if such a material existed, it would have an x-ray diffraction pattern with pentagonal symmetry... goes into what this means, etc. But it's just a theory.
How did this idea get started?
But I took advantage of the summer research opportunities because I was intrigued by the fact that the atomic arrangement of something as rudimentary as amorphous matter was not yet known. In that, I was intentionally following one of the most important lessons learned from my mentor, Richard Feynman: It is wise to follow your heart and seek out good problems wherever they may lead, even if it is not in the direction you thought you were supposed to be headed.
They find this weird forbidden symmetry in a software model that's meant to investigate this idea of a "cubatic" phase, a sort-of-periodic crystal, sort-of ordered that would appear if you rapidly cool something to solid.
An icosahedron showed up! This has 5 fold forbidden symmetry if you look down at a point.
Robert Ammann, brilliant amateur... the story is getting more strange, definitely a mystery novel. They track down this guy based on based on a diffraction pattern pic posted by Martin Gardner:
They were taken directly from the work of an unknown amateur named Robert Ammann. That was the first time we had ever heard mention of the mysterious genius who communicated with very few people other than Martin Gardner, the Scientific American guru of recreational mathematicians. Mackay suggested we contact Gardner for help.
He's totally brilliant, gives them new tools to think about quasicrystal structure. Woah, then they discover the fibonacci sequence lurking! Total detective style.
And it turns out the only way to obtain a pattern of Ws and Ns that reproduces the Fibonacci numbers is to have the Ws repeat with a greater frequency than the Ns, as the Penrose tiling extends out in all directions, by a factor precisely equal to the golden ratio, an irrational number. And that is the secret of the Penrose tiling in a nutshell. A sequence composed of two elements that repeat at different frequencies, the ratio of which is an irrational number, is called quasiperiodic. A quasiperiodic sequence never repeats exactly.
OKAY, so they develop this whole theoretical model. Then, independently, Dan Schechtman, cataloguing aluminum manganese alloy properties, sees this in his lab:
Hee has no idea what's going on, sticks with it, finally publishes the results with no theoretical explanation.
Paul finds this result through another coincidence and realizes that this is experimental backup for his theoretical framework. He's a legend, right?
The experimental result starts to freak people out. Maybe there are other explanations? The x-ray pattern was a little fuzzy, after all.
They have to modify their theory with "growth" rules, since Penrose tilings at the time had to be created all at once; building them up in pieces wasn't possible. They just keep chipping away and do it, and even come up with 3d rules. It takes them 30 years to find a proof that their rules are correct. That is commitment, yikes.
Soon a scientist in Japan figures out how to make these things. Quasicrystals are real!
Okay, so they can make them in the lab. But what about in nature? Paul's main gig is physics, cosmology... he was one of the core inflationary universe inventors. Now he's deep into geology? I guess this is why it helps to get tenure.
It's now 1999, and Ken Deffeyes, a Princeton geologist, enters the story. I last encountered Ken in John McPhee's "Annals of the Former World". Ken asks Paul if anyone's ever found a natural quasicrystal.
They start combing museums and, 8 years later, find this tiny rock that, when they test it... seems to be a quasicrystal.
Luca had already sliced the mineral open to study its composition. He made six delicate sections, each the thickness of a human hair. In order to create the slices, though, Luca was forced to sacrifice the bulk of the sample. Ninety percent of what would eventually turn out to be an extremely precious mineral sample was destroyed in the process.
What a nightmare. The book gives a great overview of all of the nightmares that this sample destruction causes.
Another geologist, Hollister gets on board; he's so mean to everyone, so skeptical, but Paul, in classic trip report fashion, hints at this while playing it down hard, always tossing in nice notes about how welcome it was to have someone who wasn't a true believer.
The story is so good from here on out that I can't bear to summarize it, but here are a few more details of the detective story:
The sample that they almost destroyed was called "khatyrkite", and its origins are really uncertain. Luca (Paul's collaborator) and Paul spend hundreds of hours reading old geology notes, trying to track down the origins.
The Russian scientist who supposedly found it tries to elicit a huge bribe to give up the info about where he found it, at this tiny stream in the Kamchatka peninsula in Russia. As professors, Paul and his colleagues have no money, so they forget about it... but then, through a few other coincidences manage to track down the grad student who actually found the sample: Valery Kryachko. He's in his 50s now, but he's fired up and wants to take everyone back to the site to get more samples.
It just goes on and on, so much interesting geology, but the human story here is what's so great. I will say I have no idea what else is going on in Paul's life. He has a family, supposedly, and he's doing other work in physics, but he's comfortable to just keep this thing humming along.
This final section is too good to summarize really at all. The geologists convince Paul that to prove that natural quasicrystals exist, he needs to go to the Kamchatka peninsula, to this little bullshit stream, and get some more samples. I'm not sure Paul's ever been camping in his life; he tries to get out of actually going, but finally buckles and realizes that he's going to need to get after it. The Kamchatka peninsula turns out to be a hard-fucking-core first camping trip, no doubt about it.
Paul clearly had a lot of fun writing this piece, and was really proud. His son is a geologist and came along on the trip. And it just keeps getting better.
They go, they find the stream, dig everywhere, so much discovery, so much vodka. The team brings back huge amounts of material, and BOOM, they discover that the quasicrystal came from an asteroid, a previously undiscovered asteroid that slammed into the earth some untold number of years ago and carries insights into the origins of the solar system!
I'm excited and exhausted just writing about the journey. I can't imagine living it.
That's all I've got for now. Go get yourself a copy of "The Second Kind of Impossible" and revel in the detective tale that Paul's put together.
]]>