PAX PDF Output

Tags: tech, lispDate: 2025-05-15

Thanks to Paul A. Patience, PAX now has PDF support. See pax-manual-v0.4.1.pdf and dref-manual-v0.4.1.pdf. The PDF is very similar to the HTML, even down to the locative types (e.g [function]) being linked to the sources on GitHub, but cross-linking between PDFs doesn't work reliably on most viewers, so that's disabled. Also, for reading PDFs so heavy on internal linking to be enjoyable, one needs a viewer that supports going back within the PDF (not the case with Chrome at the moment). Here is a blurry screenshot to entice:

pax-pdf-doc

There is a bit of a Christmas tree effect due to syntax highlighting and the colouring of the links. Blue links are internal to the PDF, maroon links are external. I might want to change that to make it look more like the HTML, but I have not found a way on LaTeX to underline text without breaking automatic hyphenation.

Adaptive Hashing

Tags: tech, lispDate: 2025-05-02

At the 2024 ELS, I gave a talk on adaptive hashing, which focusses on making general purpose hash tables faster and more robust at the same time.

Theory vs Practice

Hash table theory most concerns itself with the asymptotic worst-case cost with a hash function chosen randomly from a family of hash functions. Although these results are very relevant in practice,

There are Perfect Hashing algorithms, that choose an optimal hash function for a given set of keys. The drawback is that they either require the set of keys to be fixed or they are too slow to be used as general purpose hash tables.

Still, the idea that we can do better by adapting the hash function to the actual keys is key. Can we do that online, that is, while the hash table is being used? Potential performance gains come from improving the constant factors mentioned above by

The first image above plots the regret (the expected number of comparisons of per lookup minus the minimum achievable) and the measured run-time of PUT operations vs the number of keys in the hash table with a particular key distribution. Green is Murmur (a robust hash function), Blue is SBCL's expedient EQ hash. The wiggling of the graphs is due to the resizing of the hash table as keys are added to it.

Note how SBCL's regret starts out much lower and becomes much higher than that of Murmur, but if anything, its advantage in run time (second image) grows.

Implementation

The general idea is sound, but turning it into real performance gains is hard due to the cost of choosing a hash function and switching to it. First, we have to make some assumption about the distribution of keys. In fact, default hash functions in language runtimes often make such assumptions to make the common cases faster, usually at the cost of weakened worst-case guarantees.

The rest of this post is about how SBCL's built-in hash tables, which had been reasonably fast, were modified. The core switching mechanism looks at

Adapting EQ hash tables

  1. Init to to constant hash function. This a fancy way of saying that we do linear search in a vector internally. This is an EQ hash table, so key comparison is as single assembly instruction.

  2. When the hash table is grown to more than 32 keys and it must be rehashed anyway, we switch to a hash function that does a single right shift with the number of bits to shift determined from the longest common run of low-bits in the keys.

  3. If too many collisions, we switch to the previous default SBCL EQ-hash function that has been tuned for a long time.

  4. If too many collisions, we switch to Murmur, a general purpose hash. We could also go all the way to cryptographic hashes.

In step 2, the hash function with the single shift fits the memory allocator's behaviour nicely: it is a perfect hash for keys forming arithmetic sequences, which is often approximately true for objects of the same type allocated in a loop.

adaptive-hash-eq-put

In this figure, the red line is the adaptive hash.

Adapting EQUAL hash tables

For composite keys, running the hash function is the main cost. Adaptive hashing does the following.

adaptive-hash-string-put


So, SBCL hash tables have been adaptive for almost a year now, gaining some speed in common cases, and robustness in others.

The full paper is here.

adaptive-hashing

PAX and DRef v0.4

Tags: tech, lispDate: 2025-04-23

Version 0.4 of PAX, the documentation system, and DRef, the definition reifier, was released. There were large refactorings, bug fixes, minor features, cosmetics, documentation and performance improvements too numerous to list. Here is a summary of the new features and notable changes.

TLDR of Argmin's Summary of Half of the Meehl Lectures

Tags: ai, pompousnessDate: 2024-05-22

Over at argmin.net, Ben Recht is reflecting on Meehl's lectures on the metatheory of science, which is about how science progresses. The original lectures are fascinating but also long as well as long-winded, and I found Ben's blog series a much better read (especially since the originals are video recordings). Still, at the time of writing, with 13 blog posts covering less than half of the lectures (5/12), no self-respecting 21st century scientist can risk the time investment (equivalent to publishing 0.25 papers in machine learning) or – even worse – getting slowed down by methodological considerations.

... read the rest of TLDR of Argmin's Summary of Half of the Meehl Lectures.

On Multifaceted Development and the Role of Documentation

Tags: tech, lisp, pompousnessDate: 2023-08-17

Catchy title, innit? I came up with it while trying to name the development style PAX enables. The original idea was something vaguely self-explanatory in a straight out of a marketing department kind of way, with tendrils right into your unconscious. Documentation-driven development sounded just the thing, but it's already taken. Luckily, I came to realize that neither documentation nor any other single thing should drive development. Less luckily for the philosophically disinclined, this epiphany unleashed my inner Richard P. Gabriel. I reckon if there is a point to what follows, it's abstract enough to make it hard to tell.

... read the rest of On Multifaceted Development and the Role of Documentation.

Try in Emacs

Tags: tech, lispDate: 2023-08-14

Try, my test anti-framework, has just got light Emacs integration.

... read the rest of Try in Emacs.

DRef and PAX v0.3

Tags: tech, lispDate: 2023-07-26

DEFSECTION needs to refer to definitions that do not create a first-class object (e.g. stuff like (*DOCUMENT-LINK-TO-HYPERSPEC* VARIABLE)), and since its original release in 2014, a substantial part of PAX dealt with locatives and references, which reify definitions. This release finally factors that code out into a library called DRef, allowing PAX to focus on documentation. Being very young, DRef lives under adult supervision, in a subdirectory of the PAX repository.

... read the rest of DRef and PAX v0.3.

PAX Live Documentation Browser

Tags: tech, lispDate: 2023-06-10

PAX got a live documentation browser to make documentation generation a more interactive experience. A great thing about Lisp development is changing a single function and quickly seeing how it behaves without the delay of a full recompile. Previously, editing a docstring required regenerating the full documentation to see how the changes turned out. The live documentation browser does away with this step, which tightens the edit/document loop.

PAX also got an apropos browser. It could always generate documentation for stuff not written with PAX in mind, so with the live browser already implemented, this was a just a small add-on.

The trouble with interactivity is, of course, that it's difficult to get the point across in text, so I made two short videos that demonstrate the basics.

Grid Typesetting

Tags: techDate: 2023-04-17

I put the sources of the Two-Tailed Averaging paper on github. Of course, the sources are also available on arxiv, but this may give better visibility to the LaTeX grid typesetting code in there. Also, note how much cleaner the paper looks with the XCharter font compared to Times New Roman. No wonder Matthew Butterick pushes Charter. By the way, see what he has to say about grids and baseline grids, in particular.

Normalize Fonts for Comparison

Tags: techDate: 2023-04-10

In short, comparing fonts at the same font size is almost never the right thing to do. Compare them at the same x-height or, better yet, at the same space usage.

... read the rest of Normalize Fonts for Comparison.

Practitioner's Guide to Two-Tailed Averaging

Tags: aiDate: 2022-12-06

This is a complement to the Two-Tailed Averaging paper, approached from the direction of what I think is a fairly common technique: averaging checkpoints.

We want to speed up training and improve generalization. One way to do that is by averaging weights from optimization, and that's a big win (e.g. 1, 2, 3). For example, while training a language model for the down-stream task of summarization, we can save checkpoints periodically and average the model weights from the last 10 or so checkpoints to produce the final solution. This is pretty much what Stochastic Weight Averaging (SWA) does.

... read the rest of Practitioner's Guide to Two-Tailed Averaging.

There is Try

Tags: lispDate: 2022-10-16

Do or do not. There is now Try. I forgot to announce Try, my Common Lisp test framework, on this blog.

... read the rest of There is Try.

PAX v0.1

Tags: tech, lispDate: 2022-02-16

PAX v0.1 is released. At this point, I consider it fairly complete. Here is the changelog for the last year or so.

New Features

... read the rest of PAX v0.1.

Journal, the Kitchen Sink

Tags: lispDate: 2020-09-04

Ever wished for machine-readable logs and TRACEs, maybe for writing tests or something more fancy? The Journal library takes a simple idea: user-defined execution traces and implements logging, tracing, a testing "framework" with mock support, and an Event Sourcing style database on top.

... read the rest of Journal, the Kitchen Sink.

Moving the Blog to PAX

Tags: lispDate: 2020-05-05

After more than five years of silence, I may be resurrecting my old blog. I already got as far as rewriting it using MGL-PAX, which is a curious choice because PAX is a documentation generator for Common Lisp. The blog "engine" is rather bare-bones but works admirably, especially considering that the implementation is only 72 lines of code, most of which deals with post categories and overview pages with shortened posts, something PAX hasn't seen the need for.

... see older posts