X86oid Pseudo Atomic

Tags: lisp, Date: 2009-03-29

The relatively recent chit - chat about allocation and interrupts have had me looking at ways to speed up pseudo-atomic in SBCL.

 (defmacro pseudo-atomic (&rest forms)
  (with-unique-names (label)
    `(let ((,label (gen-label)))
       (inst or (make-ea :byte :disp (* 4 thread-pseudo-atomic-bits-slot))
             (fixnumize 1) :fs)
       ,@forms
       (inst xor (make-ea :byte :disp (* 4 thread-pseudo-atomic-bits-slot))
             (fixnumize 1) :fs)
       (inst jmp :z ,label)
       ;; if PAI was set, interrupts were disabled at the same
       ;; time using the process signal mask.
       (inst break pending-interrupt-trap)
       (emit-label ,label))))

`EBP`

My first idea was that OR(0 1)ing is unnecessary since, with the slew of interrupt fixes going into 1.0.26, every interrupt deferred by pseudo-atomic is handled as soon as we leave the pa section. Hence, a simple MOV would suffice. Or, if we wanted to be fancy, we could rely on the fact that within SBCL EBP is always even (that leaves the first bit of PSEUDO-ATOMIC-BITS for the interrupted flag) and non-zero:

(defmacro pseudo-atomic (&rest forms)
 (with-unique-names (label)
   `(let ((,label (gen-label)))
      (inst mov (make-ea :dword :disp (* 4 thread-pseudo-atomic-bits-slot))
            ebp-tn :fs)
      ,@forms
      (inst xor (make-ea :dword :disp (* 4 thread-pseudo-atomic-bits-slot))
            ebp-tn :fs)
      (inst jmp :z ,label)
      ;; if PAI was set, interrupts were disabled at the same time
      ;; using the process signal mask.
      (inst break pending-interrupt-trap)
      (emit-label ,label))))

This shaves a few bytes off the code and is an overall 0.5% win in cl-bench (see "pseudo-atomic.ebp" in the results).

`mprotect`

But if the page of PSEDUO-ATOMIC-BITS is made write protected when an interrupt is deferred, then the pending interrupt can be run from the SIGSEGV handler, where we land coming out of pseudo-atomic:

(defmacro pseudo-atomic (&rest forms)
 `(progn
    (inst mov (make-ea :dword :disp (* 4 thread-pseudo-atomic-bits-slot))
          ebp-tn :fs)
    ,@forms
    (inst xor (make-ea :dword :disp (* 4 thread-pseudo-atomic-bits-slot))
          ebp-tn :fs)))

And four more bytes are saved, making the total overhead of pseudo-atomic 12 bytes (+2 bytes with threads). This version is labelled "pseudo-atomic.mprotect.ebp" and is not faster than the previous one. Somewhat suprisingly, this variant ("pseudo-atomic.mprotect.mov") is just as fast:

(defmacro pseudo-atomic (&rest forms)
 `(progn
    (inst mov (make-ea :byte :disp (* 4 thread-pseudo-atomic-bits-slot))
          2 :fs)
    ,@forms
    (inst mov (make-ea :byte :disp (* 4 thread-pseudo-atomic-bits-slot))
          0 :fs)))

Direction Flag

Another idea is to hijack the direction flag:

(defmacro pseudo-atomic (&rest forms)
 `(progn
    (inst std)
    ,@forms
    (inst cld)
    (inst mov (make-ea :byte :disp (* 4 thread-pseudo-atomic-bits-slot))
          0 :fs)))

where the MOV instruction sigsegvs if pseudo-atomic was interrupted. This is little more than a quick hack to gauge expected performance because SBCL itself uses the direction flag in a number of places, to say nothing about alien land. However, there seems to be no reason to pursue this further as its performance disappoints.

All in all, I have expected more gains. In particular, I'm disappointed by the performance of the mprotect trick. Still 0.5% is okay for such a small change. Code is available here (from the pseudo-atomic branch of my git tree).