X86oid Pseudo Atomic
Tags: lisp
, Date: 2009-03-29
The relatively recent chit - chat about allocation and interrupts have had me looking at ways to speed up pseudo-atomic in SBCL.
(defmacro pseudo-atomic (&rest forms)
(with-unique-names (label)
`(let ((,label (gen-label)))
(inst or (make-ea :byte :disp (* 4 thread-pseudo-atomic-bits-slot))
(fixnumize 1) :fs)
,@forms
(inst xor (make-ea :byte :disp (* 4 thread-pseudo-atomic-bits-slot))
(fixnumize 1) :fs)
(inst jmp :z ,label)
;; if PAI was set, interrupts were disabled at the same
;; time using the process signal mask.
(inst break pending-interrupt-trap)
(emit-label ,label))))
EBP
My first idea was that OR
(0
1
)ing is unnecessary since, with the slew of
interrupt fixes going into 1.0.26, every interrupt deferred by
pseudo-atomic is handled as soon as we leave the pa section. Hence,
a simple MOV
would suffice. Or, if we wanted to be fancy, we could
rely on the fact that within SBCL EBP
is always even (that leaves
the first bit of PSEUDO-ATOMIC-BITS
for the interrupted flag) and
non-zero:
(defmacro pseudo-atomic (&rest forms)
(with-unique-names (label)
`(let ((,label (gen-label)))
(inst mov (make-ea :dword :disp (* 4 thread-pseudo-atomic-bits-slot))
ebp-tn :fs)
,@forms
(inst xor (make-ea :dword :disp (* 4 thread-pseudo-atomic-bits-slot))
ebp-tn :fs)
(inst jmp :z ,label)
;; if PAI was set, interrupts were disabled at the same time
;; using the process signal mask.
(inst break pending-interrupt-trap)
(emit-label ,label))))
This shaves a few bytes off the code and is an overall 0.5% win in cl-bench (see "pseudo-atomic.ebp" in the results).
mprotect
But if the page of PSEDUO-ATOMIC-BITS
is made write protected when
an interrupt is deferred, then the pending interrupt can be run from
the SIGSEGV handler, where we land coming out of pseudo-atomic:
(defmacro pseudo-atomic (&rest forms)
`(progn
(inst mov (make-ea :dword :disp (* 4 thread-pseudo-atomic-bits-slot))
ebp-tn :fs)
,@forms
(inst xor (make-ea :dword :disp (* 4 thread-pseudo-atomic-bits-slot))
ebp-tn :fs)))
And four more bytes are saved, making the total overhead of pseudo-atomic 12 bytes (+2 bytes with threads). This version is labelled "pseudo-atomic.mprotect.ebp" and is not faster than the previous one. Somewhat suprisingly, this variant ("pseudo-atomic.mprotect.mov") is just as fast:
(defmacro pseudo-atomic (&rest forms)
`(progn
(inst mov (make-ea :byte :disp (* 4 thread-pseudo-atomic-bits-slot))
2 :fs)
,@forms
(inst mov (make-ea :byte :disp (* 4 thread-pseudo-atomic-bits-slot))
0 :fs)))
Direction Flag
Another idea is to hijack the direction flag:
(defmacro pseudo-atomic (&rest forms)
`(progn
(inst std)
,@forms
(inst cld)
(inst mov (make-ea :byte :disp (* 4 thread-pseudo-atomic-bits-slot))
0 :fs)))
where the MOV
instruction sigsegvs if pseudo-atomic was
interrupted. This is little more than a quick hack to gauge expected
performance because SBCL itself uses the direction flag in a number
of places, to say nothing about alien land. However, there seems to
be no reason to pursue this further as its performance disappoints.
All in all, I have expected more gains. In particular, I'm
disappointed by the performance of the mprotect
trick. Still 0.5%
is okay for such a small change. Code is available
here (from
the pseudo-atomic branch of my git tree).