X86oid Pseudo Atomic
2009-03-29 -- The relatively recent chit - chat about allocation and interrupts have had me looking at ways to speed up pseudo atomic in SBCL.
(defmacro pseudo-atomic (&rest forms)
(with-unique-names (label)
`(let ((,label (gen-label)))
(inst or (make-ea :byte :disp (* 4 thread-pseudo-atomic-bits-slot))
(fixnumize 1) :fs)
,@forms
(inst xor (make-ea :byte :disp (* 4 thread-pseudo-atomic-bits-slot))
(fixnumize 1) :fs)
(inst jmp :z ,label)
;; if PAI was set, interrupts were disabled at the same
;; time using the process signal mask.
(inst break pending-interrupt-trap)
(emit-label ,label))))
EBP
My first idea was that ORing is unnecessary since with the slew of
interrupt fixes going into 1.0.26 every interrupt deferred by pseudo
atomic is handled as soon as we leave the pa section. Hence, a
simple MOV would suffice. Or if we wanted to be fancy we could rely
on the fact that within SBCL EBP
is always even (that leaves the
first bit of PSEUDO-ATOMIC-BITS
for the interrupted flag) and
non-zero:
(defmacro pseudo-atomic (&rest forms)
(with-unique-names (label)
`(let ((,label (gen-label)))
(inst mov (make-ea :dword :disp (* 4 thread-pseudo-atomic-bits-slot))
ebp-tn :fs)
,@forms
(inst xor (make-ea :dword :disp (* 4 thread-pseudo-atomic-bits-slot))
ebp-tn :fs)
(inst jmp :z ,label)
;; if PAI was set, interrupts were disabled at the same time
;; using the process signal mask.
(inst break pending-interrupt-trap)
(emit-label ,label))))
This shaves a few bytes off the code and is an overall 0.5% win in cl-bench (see "pseudo-atomic.ebp" in the results).
mprotect
But if the page of PSEDUO-ATOMIC-BITS
is made write protected
when an interrupt is deferred, then the pending interrupt can be run
from the sigsegv handler where we land coming out of pseudo atomic:
(defmacro pseudo-atomic (&rest forms)
`(progn
(inst mov (make-ea :dword :disp (* 4 thread-pseudo-atomic-bits-slot))
ebp-tn :fs)
,@forms
(inst xor (make-ea :dword :disp (* 4 thread-pseudo-atomic-bits-slot))
ebp-tn :fs)))
And four more bytes are saved, making the total overhead of pseudo atomic 12 bytes (+2 bytes with threads). This version is labelled "pseudo-atomic.mprotect.ebp" and is not faster than the previous one. Somewhat suprisingly, this variant ("pseudo-atomic.mprotect.mov") is just as fast:
(defmacro pseudo-atomic (&rest forms)
`(progn
(inst mov (make-ea :byte :disp (* 4 thread-pseudo-atomic-bits-slot))
2 :fs)
,@forms
(inst mov (make-ea :byte :disp (* 4 thread-pseudo-atomic-bits-slot))
0 :fs)))
Direction flag
Another idea is to hijack the direction flag:
(defmacro pseudo-atomic (&rest forms)
`(progn
(inst std)
,@forms
(inst cld)
(inst mov (make-ea :byte :disp (* 4 thread-pseudo-atomic-bits-slot))
0 :fs)))
where the mov instruction sigsegvs if pseudo atomic was interrupted. This is little more than a quick hack to gauge expected performance, because SBCL itself uses the direction flag in a number of places, to say nothing about alien land. However, there seems to be no reason to pursue this further as its performance disappoints.
All in all, I have expected more gains, in particular I'm disappointed by the performance of the mprotect trick. Still 0.5% is okay for such a small change. Code is available here (from the pseudo-atomic branch of my git tree).