Content tagged java
I've been reading about multi-threaded programming out of curiosity over lock-free algorithms.
The first thing I've implemented for that was actually something completely different, namely a short example of memory reordering ported over from a C++ program.
In Common Lisp this relies on SBCL because of the availability of memory
barriers and the Linux-specific API to set the thread affinity to a single CPU
core (sb-cpu-affinity
). Of
course, given the appropriate compability libraries this could easily be done
on other implementations too.
(in-package #:cl-user)
(defun main (&key single-cpu-p barrierp)
(let ((begin-semaphore-1 (sb-thread:make-semaphore))
(begin-semaphore-2 (sb-thread:make-semaphore))
(end-semaphore (sb-thread:make-semaphore))
(iterations 0)
(detected 0)
(x 0)
(y 0)
(r1 0)
(r2 0)
done)
(declare (fixnum x y r1 r2))
(declare (boolean done))
(declare (optimize (speed 3) (safety 0) (space 0) (debug 0) (compilation-speed 0)))
(labels ((single-cpu ()
(when single-cpu-p
sb-cpu-affinity::
(with-cpu-affinity-mask (mask :save T)
(clear-cpu-affinity-mask mask)
(setf (cpu-affinity-p 0 mask) T))))
(thread-1 ()
(single-cpu)
(loop
(when done
(return))
(sb-thread:wait-on-semaphore begin-semaphore-1)
(setf x 1)
(when barrierp
(sb-thread:barrier (:memory)))
(setf r1 y)
(sb-thread:signal-semaphore end-semaphore)))
(thread-2 ()
(single-cpu)
(loop
(when done
(return))
(sb-thread:wait-on-semaphore begin-semaphore-2)
(setf y 1)
(when barrierp
(sb-thread:barrier (:memory)))
(setf r2 x)
(sb-thread:signal-semaphore end-semaphore))))
(let ((thread-1 (sb-thread:make-thread #'thread-1))
(thread-2 (sb-thread:make-thread #'thread-2)))
(unwind-protect
(loop
(setf x 0 y 0)
(sb-thread:signal-semaphore begin-semaphore-1)
(sb-thread:signal-semaphore begin-semaphore-2)
(sb-thread:wait-on-semaphore end-semaphore)
(sb-thread:wait-on-semaphore end-semaphore)
(when (and (eql r1 0) (eql r2 0))
(incf detected)
(format T "~D reorders detected after ~D iterations, every ~D on average~%" detected iterations (floor iterations detected)))
(incf iterations))
(setf done T)
(sb-thread:signal-semaphore begin-semaphore-1)
(sb-thread:signal-semaphore begin-semaphore-2)
(sb-thread:join-thread thread-1)
(sb-thread:join-thread thread-2))))))
On the JVM, respectively Kotlin, the code looks
again remarkably similar. I haven't yet looked at thread affinity, but what's
interesting here are two aspects related to the JVM memory model. Using
volatile
is, in contrast to C, viable, as it generates the necessary memory
barriers.
import java.util.concurrent.Semaphore
import kotlin.concurrent.thread
class HelloWorld {
var beginSempahore1 = Semaphore(0)
var beginSempahore2 = Semaphore(0)
var endSemaphore = Semaphore(0)
var detected = 0
var iterations = 0
@Volatile var x = 0
@Volatile var y = 0
var r1 = 0
var r2 = 0
fun run() {
thread {
while (true) {
beginSempahore1.acquire()
x = 1
r1 = y
endSemaphore.release()
}
}
thread {
while (true) {
beginSempahore2.acquire()
y = 1
r2 = x
endSemaphore.release()
}
}
while (true) {
iterations += 1
x = 0
y = 0
beginSempahore1.release()
beginSempahore2.release()
endSemaphore.acquire()
endSemaphore.acquire()
if (r1 == 0 && r2 == 0) {
detected++
println("$detected reorders detected after $iterations iterations")
}
}
}
}
fun main(args: Array<String>) {
HelloWorld().run()
}
Also, when using a JVM plugin, the generated assembly code can be dumped during compilation, which allows us to confirm whether those instructions actually have been generated; for the first loop:
0x00007f9ad4f503c9: mov (%rcx),%r11d ;*getfield this$0
; - HelloWorld$run$1::invoke@11 (line 21)
0x00007f9ad4f503cc: test %r11d,%r11d
0x00007f9ad4f503cf: je 0x00007f9ad4f507a9
0x00007f9ad4f503d5: movl $0x1,0x14(%r12,%r11,8)
0x00007f9ad4f503de: lock addl $0x0,(%rsp) ;*putfield x
; - HelloWorld::setX@2 (line 12)
; - HelloWorld$run$1::invoke@15 (line 21)
0x00007f9ad4f503e3: mov (%rcx),%r10d ;*getfield this$0
; - HelloWorld$run$1::invoke@19 (line 22)
0x00007f9ad4f503e6: test %r10d,%r10d
0x00007f9ad4f503e9: je 0x00007f9ad4f507cd ;*invokevirtual getY
; - HelloWorld$run$1::invoke@26 (line 22)
0x00007f9ad4f503ef: mov 0x18(%r12,%r10,8),%r11d ;*getfield y
; - HelloWorld::getY@1 (line 13)
; - HelloWorld$run$1::invoke@26 (line 22)
0x00007f9ad4f503f4: mov %r11d,0x1c(%r12,%r10,8) ;*putfield r1
; - HelloWorld::setR1@2 (line 14)
; - HelloWorld$run$1::invoke@29 (line 22)
And for the second loop:
0x00007f9ad4f47269: mov (%rcx),%r11d ;*getfield this$0
; - HelloWorld$run$2::invoke@11 (line 30)
0x00007f9ad4f4726c: test %r11d,%r11d
0x00007f9ad4f4726f: je 0x00007f9ad4f47629
0x00007f9ad4f47275: movl $0x1,0x18(%r12,%r11,8)
0x00007f9ad4f4727e: lock addl $0x0,(%rsp) ;*putfield y
; - HelloWorld::setY@2 (line 13)
; - HelloWorld$run$2::invoke@15 (line 30)
0x00007f9ad4f47283: mov (%rcx),%r10d ;*getfield this$0
; - HelloWorld$run$2::invoke@19 (line 31)
0x00007f9ad4f47286: test %r10d,%r10d
0x00007f9ad4f47289: je 0x00007f9ad4f4764d ;*invokevirtual getX
; - HelloWorld$run$2::invoke@26 (line 31)
0x00007f9ad4f4728f: mov 0x14(%r12,%r10,8),%r11d ;*getfield x
; - HelloWorld::getX@1 (line 12)
; - HelloWorld$run$2::invoke@26 (line 31)
0x00007f9ad4f47294: mov %r11d,0x20(%r12,%r10,8) ;*putfield r2
; - HelloWorld::setR2@2 (line 15)
; - HelloWorld$run$2::invoke@29 (line 31)
And the main loop:
0x00007fd383813838: lock addl $0x0,(%rsp) ;*putfield x
; - HelloWorld::run@58 (line 38)
0x00007fd38381383d: mov $0x0,%edx
0x00007fd383813842: mov %edx,0x18(%rsi)
0x00007fd383813845: lock addl $0x0,(%rsp) ;*putfield y
; - HelloWorld::run@63 (line 39)
0x00007fd38381384a: mov 0x24(%rsi),%edi
Note that in contrast to the original and the Common Lisp version there's one additional memory barrier here that's unnecessary.
Just a quick note on the JVM ecosystem since I've been wrestling with getting several different technologies to work together: It's a mess really.
The specific setup in this case is a mostly Java based project, sprinkled with some Kotlin code (which I only expect to grow in the future), using Maven as the build system. Added to that some Kotlin annotations (in lieu of using Kotlin in the first place).
Todays (and yesterdays) adventure was trying to get the Error Prone checker integrated with the existing system, which proved quite impossible, due to the fact that it's using a modified compiler(!) which conflicts with the use of Lombok annotation processing.
There are workarounds in the sense that Lombok can also be used to produce
processed Java files (instead of byte code generation), however it seems like
that process is less capable than the IDEA / internal processing and would have
me remove a lot of val
instances that didn't get their type inferred
properly, making it an arduous process.
Summing this up, the fact that these tools integrate on different levels of the "stack", while also making tinkering with it relatively hard due to byte code generation, complicates this endeavour greatly. In the end I resolved to drop the Error Prone integration in favour of the much easier to setup SonarQube platform. I also hope that annotation processing for Lombok will improve such that we don't need workarounds in case of "invisible" getters anymore, for example.