Gábor Melis' () blog - March 2009


Older posts

X86oid Pseudo Atomic

The relatively recent chit - chat about allocation and interrupts have had me looking at ways to speed up pseudo atomic in SBCL.

 (defmacro pseudo-atomic (&rest forms)
  (with-unique-names (label)
    `(let ((,label (gen-label)))
       (inst or (make-ea :byte :disp (* 4 thread-pseudo-atomic-bits-slot))
             (fixnumize 1) :fs)
       (inst xor (make-ea :byte :disp (* 4 thread-pseudo-atomic-bits-slot))
             (fixnumize 1) :fs)
       (inst jmp :z ,label)
       ;; if PAI was set, interrupts were disabled at the same
       ;; time using the process signal mask.
       (inst break pending-interrupt-trap)
       (emit-label ,label))))


Read more

Code alignment on x86

There has always been a lot of wiggling of SBCL boinkmarks results. It's easy to chalk this up to system load, but the same can be observed running the cl-bench benchmarks under more ideal circumstances. Part of the reason is the insufficient number of iterations of some tests: measurement accuracy is really bad when the run time is below 0.2s and it is abysmal when there is other activity on the system which is easy to tell even in retrospect by comparing the real and user time columns.

But that's not the end of the story, take for instance FPRINT/PRETTY: it takes more than two seconds but often experiences changes up to 7% caused by seemingly unrelated changes. People have fingered alignment as a likely cause.

Recently this issue has become more pressing as I've been trying to reduce the overhead of x86's pseudo atomic. Unfortunately, the effect is smallish which makes measurement difficult so I tried aligning loops on 16 byte boundaries. This being on x86, that meant aligning code similarly first (it's 8 byte aligned currently).

Read more