Recent Content

New Job, New ...
posted on 2018-04-15 18:46:10

Seems a good idea to write down some notes about this. I'm switching jobs starting May. I was hesitant about this, mostly because I wanted to figure out the right direction, but the opportunity presented itself, so why not. I'll see how that works out and not travelling for a while is absolutely necessary for me to get ... back on track, I guess.

I hope to get back into a bit of Open Source work as well, with a bit more time that might be possible. Notably that will be related to cl-cffi-gtk, which I hope to get into better shape with some additional collaborators, so there's a good chance the library will (finally) be switched over on Quicklisp too.

At the same time there's things on the bucket list I hope to start this year. It's already a bit late during the year, but better now than not at all. That includes lending a bow and signing up for training.

Lastly I'm not super happy about the lack of reading during the last months, I'm currently still on some Dune-related books and I can't say I enjoy them a lot (big surprise given that people have been slamming them I believe). Music-wise there's fortunately quite a lot of variety going on.

An early Christmas present^Wkeyboard
posted on 2017-12-17 12:25:40

Picture of the half-built keyboard

Picture of the keyboard with soldering iron

Almost at the end of this year I finally had enough components to build a custom keyboard. The case is reused from my existing Poker II, but I'm on the look-out for either a wooden or all-metal case instead of black plastic. Additionally I'm missing one 1.25u and one 1.5u key - with the maximum set of keys you end up needing a few more of those than on even a regular sized keyboard (but I do have an ISO Enter and Shift key too many).

Okay, so from the start. The goal was to have a fully reflashable keyboard. Although Vortex (the makers of the Poker keyboards) have provided some updated firmware, they don't want to share the firmware sources and the one GitHub project I found to reverse engineer it didn't go anywhere in the last few years unfortunately.

Looking through the number of various projects it seemed that a GeekHack GH60 or a similar model would suffice for me. The layout isn't much different from my previous keyboard and it would fit a lot of keys in a good form factor.

I actually went with the (Chinese?) "Satan" variant - comes set up for more LEDs, a feature which I'm not yet using (and might not in the long run at all actually). With that PCB, a black plate for mounting the switches, a stabiliser for the Space key, green Cherry switches and keys I now have something matching my taste quite well. The keys I'd actually like to replace with a nicer option later on, but they're alright for the moment.

All in all thumbs up, would build it again.

Build

Testing the PCB with a bit of wire is a good idea, so check every key ... possibly also the LEDs, but I didn't have any handy for it. As far as I understand modding the switches with SIP sockets can be done even after soldering in the switches, so I didn't do it yet.

Once that's confirmed the rest is also easy: Put in the first switches into the plate. Stabiliser goes onto the PCB, then the plate on top. Check the contacts are visible on the other side of the PCB and nothing was bent. Solder the switches to give a bit of stability. Then fill up the rest of the keyboard. I checked the layout with the keys in the most critical areas too because I wasn't sure it would all align well.

Check the solder joints, movement of the switches (my space key was a bit weird, sometimes getting stuck a bit), plug it in and confirm with debugging mode or some online keyboard tester.

Firmware

The default firmware is fine for testing, but of course the objective is to customise the hell out of it.

The best resource for me was this gist that is a bit cumbersome to set up with the different repositories, but works pretty well.

The wiring isn't completely in sync with the regular GH60, therefore a separate revision needs to be selected in the TMK sources. Also, it seems like at least one key is completely missing even with that and the default layouts are all ANSI! To use the remaining three keys the regular KEYMAP macro needs to be used. Note that also a few keys are on a different position in the macro, therefore I changed it slightly such that it more closely resembles the physical layout.

For debugging either cat from the /dev/hidraw devices (might take a moment to find the right one), or perhaps compile the HID Listen program. You really don't want to disable this until you've sorted out all keymap issues. By default the magic keys are left shift + right shift - h will print the short help, while x, the most interesting one for me, will print the keyboard matrix on every raw keypress.

Oneliners #1
posted on 2017-07-04 22:3011:+01:00

So, let's replicate this blog post in CL. In case it's gone, we want to fetch all pictures from Reddit's /r/pics:

curl -s -H "User-Agent: cli:bash:v0.0.0 (by /u/codesharer)" \
  https://www.reddit.com/r/pics/.json \
  | jq '.data.children[].data.url' \
  | xargs -P 0 -n 1 -I {} bash -c 'curl -s -O {}'

First, we need a HTTP client (drakma being canonical) and a JSON parser (cl-json will suffice here, you might want to use a different one depending on the parsing/serialisation needs).

(asdf:load-systems '#:drakma '#:cl-json)

Next, we define our workflow, fetch JSON content from a URL, get the list of URLs and download each one of them.

We need to download the JSON first, so let's start with that:

(defun scrape-sub (name)
  (drakma:http-request
   (format NIL "https://www.reddit.com/r/~A/.json" name)
   :user-agent "repl:abcl:v0.0.0 (by /u/nyir)"))

If we run, we'll see that the output is just a byte vector:

#(123 34 107 ...)

That's actually fine for the JSON library, but more readably we could either set a flag for Drakma, or convert it manually:

(babel:octets-to-string *)

The second step is parsing it from the JSON format, so extending it will be like the following:

(defun scrape-sub (name)
  (cl-json:decode-json-from-source
   (babel:octets-to-string
    (drakma:http-request
     (format NIL "https://www.reddit.com/r/~A/.json" name)
     :user-agent "repl:abcl:v0.0.0 (by /u/nyir)"))))

The output now looks a bit different:

((:KIND . "Listing") (:DATA (:MODHASH . "") ...))

But it's already getting more manageable. Next we want the URL bit of the data. Unfortunately I don't know of a good library that would allow us to specify something as a kind of XPath-style selector. So we'll go ahead and to it manually. The .data.children bit will be something like (cdr (assoc :children (cdr (assoc :data <json>)))), since cl-json returns an association list; children[] means we'll iterate over all children and collect the results; data.url again is the same kind of accessor like (cdr (assoc :url (cdr (assoc :data <json>)))):

(defun scrape-sub (name)
  (let ((json (cl-json:decode-json-from-source
               (babel:octets-to-string
                (drakma:http-request
                 (format NIL "https://www.reddit.com/r/~A/.json" name)
                 :user-agent "repl:abcl:v0.0.0 (by /u/nyir)")))))
    (mapcar (lambda (child)
              (cdr (assoc :url (cdr (assoc :data child)))))
            (cdr (assoc :children (cdr (assoc :data json)))))))

Now the output is just a list of strings:

("https://www.reddit.com/r/pics/comments/6ewxd6/may_2017_transparency_report/" "https://i.redd.it/luxhqoj95q5z.png" ...)

Here's one a addition I'll put in, filtering for image file types. That might still be unreliable of course, but it'll remove a whole bunch of potentially wrong links already. For filtering, MAPCAR isn't suitable, either we could do it in multiple stages, or we'll use something like MAPCAN, or an explicit iteration construct like LOOP/ITERATE. I'll go with MAPCAN here, meaning every element to collect needs to be wrapped in a list:

(defun scrape-sub (name)
  (let ((json (cl-json:decode-json-from-source
               (babel:octets-to-string
                (drakma:http-request
                 (format NIL "https://www.reddit.com/r/~A/.json" name)
                 :user-agent "repl:abcl:v0.0.0 (by /u/nyir)")))))
    (mapcan (lambda (child)
              (let ((url (cdr (assoc :url (cdr (assoc :data child))))))
                (and url
                     (member (pathname-type (pathname (puri:uri-path (puri:parse-uri url))))
                             '("jpg" "png")
                             :test #'string-equal)
                     (list url))))
            (cdr (assoc :children (cdr (assoc :data json)))))))

I'm happy with that and it now filters for two image types.

Last point, actually downloading all scraped results. For this, we just iterate and download them as before:

(defun scrape-sub (name)
  (let* ((agent "repl:abcl:v0.0.0 (by /u/nyir)")
         (json (cl-json:decode-json-from-source
                (babel:octets-to-string
                 (drakma:http-request
                  (format NIL "https://www.reddit.com/r/~A/.json" name)
                  :user-agent agent))))
         (downloads
           (mapcan (lambda (child)
                     (let ((url (cdr (assoc :url (cdr (assoc :data child))))))
                       (when url
                         (let ((pathname (pathname (puri:uri-path (puri:parse-uri url)))))
                           (when (member (pathname-type pathname)
                                         '("jpg" "png")
                                         :test #'string-equal)
                             `((,url ,pathname)))))))
                   (cdr (assoc :children (cdr (assoc :data json)))))))
    (mapc (lambda (download)
            (destructuring-bind (url pathname) download
              (with-open-file (stream (merge-pathnames *default-pathname-defaults* pathname)
                                      :direction :output
                                      :element-type '(unsigned-byte 8))
                (write-sequence
                 (drakma:http-request url :user-agent agent)
                 stream))))
          downloads)))

And this works.

Now. This is decidedly not a demonstration of how the final result should look like. In fact there a whole lof of things to improve and to consider when you'd put this into a reusable script.

From a maintainability perspective, we'd put each functional part into it's own component, be it a function or method, in order to make them easier to reason about and to test each bit individually.

From a performance part ... oh my, there's so much wrong with it, mostly slurping everything into memory multiple times, while drakma does support streams as results and HTTP Keep-Alive, both would improve things. The JSON parser could in theory also operate on tokens, but that's rarely worth the hassle (the CXML API can be used for that, by converting JSON "events" into a stream of SAX events basically). Lastly creating the output lists isn't necessary, this could all be done inline or with continuation passing style, but that's worse for maintaining a nice split between functions.

From a correctness part, all the URLS might have weird characters in them that don't work well with pathnames and/or the local filesystem. In fact PURI might not be the best choice here either. Also, even if the URLs are different, more than one of them might have the same filename, meaning there should either be some error handling in there, or the URLs should be hashed to be used as filename or some other scheme accomplishing the same thing. Lastly, the download files should be checked for emptiness, wrong content (HTML bits would indicate a failed download too), broken images, etc.

Another nice thing to add would be xattr support for indicating where the file was downloaded from.

Lisp performance considerations
posted on 2017-06-24 22:51:01
  • what are lisp performance problem patterns
  • especially ffi
  • how to debug and profile that
  • how to fix issues
  • how to architecture against it
Unified Communication
posted on 2017-06-18 16:04:25

Lately I've been thinking again about some way to unify (my) usage of different communication channels. Part of that is the additional distraction and lack of ease of use for some of the applications I'm forced to use.

This is partially a feature of my habits, i.e. I'm not a mobile phone user. At all. But "apps" like WhatsApp, WeChat, while thankfully having web clients, still force me to use a comparatively clunky interface.

Then we have Slack, the IRC clone that will eat every other IRC clone there is. Or maybe not. In my case it's the primary business-related communication form, after email.

Good that I'm not using Facebook, the moloch, but I imagine for a lot of people it's the primary way to use the web and internet. I haven't researched Facebook integration at all though, so there might be ways of integrating it more or less easily.

The previously mentioned channels were all active ones, but there's also a lot of passive consumption going on. (RSS and blogs, forums,) reddit, Hacker News are all channels that I frequently use. In case of reddit and Hacker News of course there's is the active element of posting, commenting, voting, but I rarely do that, so they fall under passive for me too.

So again, why unification? For all the above, getting notified (if new content is available) is a pain, comparatively. Both in the sense that for some of them (chat) the threshold is quite low, so reacting in near real-time is important, while for others it's absolutely the other way round, even though I'm still habitually checking them in case I'm bored (see how that goes?).

Unifying them would allow to aggregate, combine, filter in a general fashion instead of having (or not having in most cases) a distinct way to do that for each channel.

So again, would that solve the problem? I'm doubtful. On the one hand, there's clearly a need to remove friction. On the other hand, the cost of implementing it, the lack of distinctive features for each channel (visual mostly) would also undermine some of information. Possibly only at the start, it's hard to tell. I can however say that using RSS readers for me never worked out, precisely because the visual appearance is such a strong discriminator to (not) consume content. Though rtv, a console-based reddit client, worked rather well for some highly text-based content.

What other considerations are there? Well, the split between different contexts would be one thing. There's at least the work/life split for me, if not for many others.

Fidelity, as in, most text-based content can be viewed on the console, even if it might look better in a different renderer (browser). Showing pictures/clips is difficult on the console, but there are ugly hacks that can work around that problem if absolutely necessary (I'm personally not a fan).

Amount, blogs are rather infrequent, but have lots of text per each post, chat is very high frequent comparatively, but only has a few words per "post".

Context, again, there's also different groups in the personal context, that is, e.g. family, friends, different hobbies and interests, with each group having a somewhat overlapping set of sources.

So again, what can be solved? Technically, at least getting more sources into a single format is achievable. There are bridges from Slack to IRC, from RSS to IRC, etc. I'm choosing IRC here because it's a form of lowest common denominator, but similarly it could be mapped to email too. While IRC isn't good for long-form content, it can contain links which can then be viewed in other renderers, solving the notification issue. (Well, you still need to pay attention to the IRC client. In my case I'm always online on a VPS, so I need still to pass through notifications from the IRC client to the current local machine.)

What options would a unified architecture give us? E.g. having a single feed for chat, email, blog posts etc. for a group of people (channels). This can again be achieved manually, by tying in bots to post on behalf of a user, though in the architecture of IRC it wouldn't make sense to post some of these things publically - it's "your" view of the conversation, not the public view. That is, you'd want to view a feed with incoming emails, blog posts (Twitter, what have you) from a person inline.

Now, inertia. Given how XMPP basically failed and how each platform provider is aggressively trying to get people into their walled garden, what chance is there for a standard here?

Apart from that, can this idea be implemented purely client-side? AFAIK yes, there's still friction with the different technologies being integrated, but as a central communication hub this would still make sense.

Building on top I have some further (obvious) extensions in mind, the usual spam filters, deduplication, aggregation/search, also everything statistics basically, that can be applied on top.

Different interfaces would be available to have a view on the streams, or groups of streams. Traditionally this all hasn't worked out I feel, with the exception of very, very narrow things like email and text-based chat there's just a lot of variation going on.

How would this look like? For me, one console window to show everything, with desktop notifications on top. For others, the same in a browser perhaps, or (take a deep breath) a native application instead.

In any case, food for thought. I'm hoping to follow up on this with a more focused design at some point.

Hacking Java fields in ABCL
posted on 2017-06-06 23:41:05

Just as a quick note, JSS and the JAVA package too won't allow you to treat LispObjects objects as JavaObjects for the purposes of JFIELD and JSS:GET-JAVA-FIELD. But if you still want to access internal fields and implementation details (the usual warnings apply!), try the following:

(jss:get-java-field (jss:new 'JavaObject (car (threads:mapcar-threads #'identity))) "javaThread" T)

This was because I was looking for an answer to a question on #abcl, but even then, wrapping the "Lisp" object in a JavaObject manually helps achieve the requested goal of retrieving the Java Thread object.

The T at the end is necessary here because the field is actually package-private - but that's to be expected if you want to access internals without an (official) API.

Byte code verification and ABCL
posted on 2017-06-01 19:51:41

A feature of JVM byte code that I knew about, but didn't concern me too much when I previously hacked on ABCL was interesting to see in a particular bug report, where a function, when compiled, got rejected by the byte code validator:

(compile NIL (lambda (list) (nth (lambda ()) list)))
;; => Compiled function can't be loaded

This is true for any class loader, so COMPILE-FILE with LOAD would show the same behaviour.

The reason here is inlining: In order to give better performance when some information about the types is known, the call to NTH will actually be optimised here. One option to see this is to set a debug option to dump the generated byte code while the compilation runs:

(setf jvm::*compiler-debug* T)

(Part) of the output will look like this:

  0 GETSTATIC      #24 <Field LispObject 42c1_92be_e5b48c1894c6.LFUN1812724>  
  1 ALOAD          (1)  
  2 SWAP             
  3 INVOKEVIRTUAL  #30 <Method LispObject LispObject.NTH(int)> -1 
  4 ARETURN          

Using DISASSEMBLE is not an option precisely because the byte code can't be loaded at all (that's actually a nice idea as another function to dump not only the compiled FASL content, but also disassemble the contained byte code too).

With that option set (and perhaps setting a few BREAK statements in the compiler), it's somewhat easy to debug the compiled byte code and to see that the validator notices that the Java method that implements the NTH function requires an integer (fixnum) argument, but the LispObject (the lambda) doesn't match that signature. This is not a problem if this was a regular call. In fact, with inlining disabled the NTH function will still raise an error for an argument with the wrong type!

Finally the fix is to check for the derived type in the "P2" transformation function for NTH, COMPILE-NTH:

(define-inlined-function compile-nth (form target representation)
  ((check-arg-count form 2))
  (let* ((index-form (second form))
         (list-form (third form))

    ;;; new check here
         (index-type (derive-compiler-type index-form)))
    (unless (fixnum-type-p index-type)
      (compile-function-call form target representation)
      (return-from compile-nth))
    ;;; till here

    (with-operand-accumulation
        ((compile-operand index-form :int)
         (compile-operand list-form nil)
         (maybe-emit-clear-values index-form list-form))
      (emit 'swap)
      (emit-invokevirtual +lisp-object+ "NTH" '(:int) +lisp-object+))
    (fix-boxing representation nil) ; FIXME use derived result type
    (emit-move-from-stack target representation)))

Note the falling back to a "general" function call using COMPILE-FUNCTION-CALL here in case the type is not known in advance, or not a fixnum type (though that could also raise a warning here already).

Again, compiling the above function again looks a bit different in the general case:

  0 GETSTATIC      #29 <Field Symbol 4faf_99d9_62a0381d4d65.SYM1814566>  
  1 GETSTATIC      #33 <Field LispObject 4faf_99d9_62a0381d4d65.LFUN1814565>  
  2 ALOAD          (1)  
  3 INVOKEVIRTUAL  #39 <Method LispObject LispObject.execute(LispObject,LispObject)> -2 
  4 ARETURN          

If instead a fixnum constant is used, (nth 1 list), this simplifies a lot:

  0 ICONST_1         
  1 ALOAD          (1)  
  2 SWAP             
  3 INVOKEVIRTUAL  #24 <Method LispObject LispObject.NTH(int)> -1 
  4 ARETURN          

Compare that to adding a (declare (type fixnum ...)) declaration - not as good as a constant argument, but still directly calling the Java method:

  0 ALOAD          (1)  
  1 INVOKEVIRTUAL  #24 <Method int LispObject.intValue()> 0 
  2 ISTORE         (1)  
  3 ILOAD          (1)  
  4 ALOAD          (2)  
  5 SWAP             
  6 INVOKEVIRTUAL  #28 <Method LispObject LispObject.NTH(int)> -1 
  7 ARETURN          

Note here that the type error (e.g. for again supplying the (lambda ())) will be raised at the caller site already!

Lastly, a good idea would also be to generally add more hints, as e.g. SBCL does, to debug other issues ("failed to inline because ..."), but that's for another day.

Memory reordering in Common Lisp and Kotlin
posted on 2016-11-23 21:43:41

I've been reading about multi-threaded programming out of curiosity over lock-free algorithms.

The first thing I've implemented for that was actually something completely different, namely a short example of memory reordering ported over from a C++ program.

In Common Lisp this relies on SBCL because of the availability of memory barriers and the Linux-specific API to set the thread affinity to a single CPU core (sb-cpu-affinity). Of course, given the appropriate compability libraries this could easily be done on other implementations too.

(in-package #:cl-user)

(defun main (&key single-cpu-p barrierp)
  (let ((begin-semaphore-1 (sb-thread:make-semaphore))
        (begin-semaphore-2 (sb-thread:make-semaphore))
        (end-semaphore (sb-thread:make-semaphore))
        (iterations 0)
        (detected 0)
        (x 0)
        (y 0)
        (r1 0)
        (r2 0)
        done)
    (declare (fixnum x y r1 r2))
    (declare (boolean done))
    (declare (optimize (speed 3) (safety 0) (space 0) (debug 0) (compilation-speed 0)))
    (labels ((single-cpu ()
               (when single-cpu-p
                 sb-cpu-affinity::
                 (with-cpu-affinity-mask (mask :save T)
                   (clear-cpu-affinity-mask mask)
                   (setf (cpu-affinity-p 0 mask) T))))
             (thread-1 ()
               (single-cpu)
               (loop
                 (when done
                   (return))
                 (sb-thread:wait-on-semaphore begin-semaphore-1)
                 (setf x 1)
                 (when barrierp
                   (sb-thread:barrier (:memory)))
                 (setf r1 y)
                 (sb-thread:signal-semaphore end-semaphore)))
             (thread-2 ()
               (single-cpu)
               (loop
                 (when done
                   (return))
                 (sb-thread:wait-on-semaphore begin-semaphore-2)
                 (setf y 1)
                 (when barrierp
                   (sb-thread:barrier (:memory)))
                 (setf r2 x)
                 (sb-thread:signal-semaphore end-semaphore))))
      (let ((thread-1 (sb-thread:make-thread #'thread-1))
            (thread-2 (sb-thread:make-thread #'thread-2)))
        (unwind-protect
             (loop
               (setf x 0 y 0)
               (sb-thread:signal-semaphore begin-semaphore-1)
               (sb-thread:signal-semaphore begin-semaphore-2)
               (sb-thread:wait-on-semaphore end-semaphore)
               (sb-thread:wait-on-semaphore end-semaphore)
               (when (and (eql r1 0) (eql r2 0))
                 (incf detected)
                 (format T "~D reorders detected after ~D iterations, every ~D on average~%" detected iterations (floor iterations detected)))
               (incf iterations))
          (setf done T)
          (sb-thread:signal-semaphore begin-semaphore-1)
          (sb-thread:signal-semaphore begin-semaphore-2)
          (sb-thread:join-thread thread-1)
          (sb-thread:join-thread thread-2))))))

On the JVM, respectively Kotlin, the code looks again remarkably similar. I haven't yet looked at thread affinity, but what's interesting here are two aspects related to the JVM memory model. Using volatile is, in contrast to C, viable, as it generates the necessary memory barriers.

import java.util.concurrent.Semaphore
import kotlin.concurrent.thread

class HelloWorld {
    var beginSempahore1 = Semaphore(0)
    var beginSempahore2 = Semaphore(0)
    var endSemaphore = Semaphore(0)

    var detected = 0
    var iterations = 0

    @Volatile var x = 0
    @Volatile var y = 0
    var r1 = 0
    var r2 = 0

    fun run() {
        thread {
            while (true) {
                beginSempahore1.acquire()
                x = 1
                r1 = y
                endSemaphore.release()
            }
        }

        thread {
            while (true) {
                beginSempahore2.acquire()
                y = 1
                r2 = x
                endSemaphore.release()
            }
        }

        while (true) {
            iterations += 1
            x = 0
            y = 0
            beginSempahore1.release()
            beginSempahore2.release()
            endSemaphore.acquire()
            endSemaphore.acquire()
            if (r1 == 0 && r2 == 0) {
                detected++
                println("$detected reorders detected after $iterations iterations")
            }
        }
    }
}

fun main(args: Array<String>) {
    HelloWorld().run()
}

Also, when using a JVM plugin, the generated assembly code can be dumped during compilation, which allows us to confirm whether those instructions actually have been generated; for the first loop:

  0x00007f9ad4f503c9: mov    (%rcx),%r11d       ;*getfield this$0
                                                ; - HelloWorld$run$1::invoke@11 (line 21)

  0x00007f9ad4f503cc: test   %r11d,%r11d
  0x00007f9ad4f503cf: je     0x00007f9ad4f507a9
  0x00007f9ad4f503d5: movl   $0x1,0x14(%r12,%r11,8)
  0x00007f9ad4f503de: lock addl $0x0,(%rsp)     ;*putfield x
                                                ; - HelloWorld::setX@2 (line 12)
                                                ; - HelloWorld$run$1::invoke@15 (line 21)

  0x00007f9ad4f503e3: mov    (%rcx),%r10d       ;*getfield this$0
                                                ; - HelloWorld$run$1::invoke@19 (line 22)

  0x00007f9ad4f503e6: test   %r10d,%r10d
  0x00007f9ad4f503e9: je     0x00007f9ad4f507cd  ;*invokevirtual getY
                                                ; - HelloWorld$run$1::invoke@26 (line 22)

  0x00007f9ad4f503ef: mov    0x18(%r12,%r10,8),%r11d  ;*getfield y
                                                ; - HelloWorld::getY@1 (line 13)
                                                ; - HelloWorld$run$1::invoke@26 (line 22)

  0x00007f9ad4f503f4: mov    %r11d,0x1c(%r12,%r10,8)  ;*putfield r1
                                                ; - HelloWorld::setR1@2 (line 14)
                                                ; - HelloWorld$run$1::invoke@29 (line 22)

And for the second loop:

  0x00007f9ad4f47269: mov    (%rcx),%r11d       ;*getfield this$0
                                                ; - HelloWorld$run$2::invoke@11 (line 30)

  0x00007f9ad4f4726c: test   %r11d,%r11d
  0x00007f9ad4f4726f: je     0x00007f9ad4f47629
  0x00007f9ad4f47275: movl   $0x1,0x18(%r12,%r11,8)
  0x00007f9ad4f4727e: lock addl $0x0,(%rsp)     ;*putfield y
                                                ; - HelloWorld::setY@2 (line 13)
                                                ; - HelloWorld$run$2::invoke@15 (line 30)

  0x00007f9ad4f47283: mov    (%rcx),%r10d       ;*getfield this$0
                                                ; - HelloWorld$run$2::invoke@19 (line 31)

  0x00007f9ad4f47286: test   %r10d,%r10d
  0x00007f9ad4f47289: je     0x00007f9ad4f4764d  ;*invokevirtual getX
                                                ; - HelloWorld$run$2::invoke@26 (line 31)

  0x00007f9ad4f4728f: mov    0x14(%r12,%r10,8),%r11d  ;*getfield x
                                                ; - HelloWorld::getX@1 (line 12)
                                                ; - HelloWorld$run$2::invoke@26 (line 31)

  0x00007f9ad4f47294: mov    %r11d,0x20(%r12,%r10,8)  ;*putfield r2
                                                ; - HelloWorld::setR2@2 (line 15)
                                                ; - HelloWorld$run$2::invoke@29 (line 31)

And the main loop:

  0x00007fd383813838: lock addl $0x0,(%rsp)     ;*putfield x
                                                ; - HelloWorld::run@58 (line 38)

  0x00007fd38381383d: mov    $0x0,%edx
  0x00007fd383813842: mov    %edx,0x18(%rsi)
  0x00007fd383813845: lock addl $0x0,(%rsp)     ;*putfield y
                                                ; - HelloWorld::run@63 (line 39)

  0x00007fd38381384a: mov    0x24(%rsi),%edi

Note that in contrast to the original and the Common Lisp version there's one additional memory barrier here that's unnecessary.

EverythingDB
posted on 2016-11-23 21:40:04

How do you track everything you know? Computers and mobile phones in turn have become external memory, however data is still fragmented, stored in different formats and usable in different programs.

While there are partial solutions to this issue, none seem to be all-encompassing enough.

For example, PIMs come close for communication and contacts, but it's hard to extend and customise them. You should be able to associate custom tags and key-value pairs with "documents"/entities. This goes straight into ontologies since the question how to model many kinds of relations will be a problem to deal with.

Lastly tracking all data only makes sense if there are reasonable search facilities and if exchange with other databases can be done with security in mind, especially if even metadata is sensitive.

Asynchronous file I/O
posted on 2016-11-11 01:19:28

Occasionally the topic of asynchronous I/O on local files comes up, though while there are APIs for event-based processing they don't work on regular files, resulting in the use of worker threads to circumvent this restriction.

The inciting incident for me to look into this was the fact that one of my (external, spinning-disk) hard drives has a rather short timeout for spin-down, such that when pausing while browsing through a directory tree would often annoy me when the shell (or any other program) was completely blocked as the motor was being turned on again.

At least on Linux (since version 2.5) there is actually a kernel syscall interface for asynchronous file I/O. Unfortunately it is not being exposed via the libc at all, requiring custom wrappers in order to trigger the proper syscalls. Apart from scheduling asynchronous read and write requests it also supports exposing the corresponding events via an evenfd queue, which then allows us to use epoll and friends.

SBCL being the ultimate breadboard that's not particularly hard though. Using CFFI and IOLib for convenience it's straightforward to port the C examples while writing a minimal amount of C code. The code is of course not very high-level, but can be plugged straight into the IOLib event loop as there's now a file descriptor available to listen on.

Grovelling & wrapping

The groveller can be used quite nicely to prevent us from having to drop down to C completely. Of course we're also using ASDF, so everything's peachy.

Now what's required? CFFI provides additional components for ASDF, namely :CFFI-GROVEL-FILE and :CFFI-WRAPPER-FILE, which make the process seamless and don't require us to write and code related to compiling and loading the C wrapper:

;; -*- mode: lisp; syntax: common-lisp; coding: utf-8-unix; package: cl-user; -*-

(in-package #:cl-user)

(asdf:defsystem #:example
  :defsystem-depends-on (#:cffi-grovel)
  #+asdf-unicode :encoding #+asdf-unicode :utf-8
  :depends-on (#:iterate
               #:iolib
               #:cffi
               #:osicat)
  :serial T
  :components ((:module "src"
                :components
                ((:file "package")
                 (:file "wrapper")
                 (:cffi-grovel-file "linux-aio-grovel")
                 (:cffi-wrapper-file "linux-aio-wrapper")
                 (:file "linux-aio")))))

The package definition is probably not very interesting at this point:

(in-package #:cl-user)

(defpackage #:example
  (:use #:cl #:iterate #:iolib #:cffi))

I've added IOLib and CFFI, usually also ITERATE for convenience.

Next we grovel a couple of definitions related to the kernel API for the asynchronous requests and for eventfd. This is the linux-aio-grovel file mentioned above:

(in-package #:example)

(include "stdio.h" "unistd.h" "sys/syscall.h" "linux/aio_abi.h" "inttypes.h"
         "signal.h" "sys/eventfd.h")

(ctype aio-context-t "aio_context_t")

(cenum iocb-cmd-t
  ((:pread "IOCB_CMD_PREAD"))
  ((:pwrite "IOCB_CMD_PWRITE"))
  ((:fsync "IOCB_CMD_FSYNC"))
  ((:fdsync "IOCB_CMD_FDSYNC"))
  ((:noop "IOCB_CMD_NOOP"))
  ((:preadv "IOCB_CMD_PREADV"))
  ((:pwritev "IOCB_CMD_PWRITEV")))

(constantenum iocb-flags-t
  ((:resfd "IOCB_FLAG_RESFD")))

(cstruct iocb "struct iocb"
  (aio-data "aio_data" :type :uint64)
  ;; #-little-endian
  ;; (aio-reserved1 "aio_reserved1" :type :uint32)
  (aio-key "aio_key" :type :uint32)
  ;; #+little-endian
  ;; (aio-reserved1 "aio_reserved1" :type :uint32)
  (aio-lio-opcode "aio_lio_opcode" :type iocb-cmd-t)
  (aio-fildes "aio_fildes" :type :uint32)
  (aio-buf "aio_buf" :type :uint64)
  (aio-nbytes "aio_nbytes" :type :uint64)
  (aio-offset "aio_offset" :type :int64)
  ;; (aio-reserved2 "aio_reserved2" :type :uint64)
  (aio-flags "aio_flags" :type :uint32)
  (aio-resfd "aio_resfd" :type :uint32))

(cstruct io-event "struct io_event"
  (data "data" :type :uint64)
  (obj "obj" :type :uint64)
  (res "res" :type :int64)
  (res2 "res" :type :int64))

(cenum eventfd-flags-t
  ((:cloexec "EFD_CLOEXEC"))
  ((:nonblock "EFD_NONBLOCK"))
  ((:semaphore "EFD_SEMAPHORE")))

Note that this not a complete list and a couple of reserved members are commented out as they're primarily used to provide space for further expansion. Fortunately offsets for the rest of the struct aren't affected by leaving out parts in the Lisp-side definition.

The enums are easy enough, even though they both represent flags, so should be or-ed together, which might be necessary to do manually or by finding a way to let CFFI do the coercion from a list of flags perhaps.

In order to have nice syscall wrappers we'd normally use defsyscall from IOLib. Unfortunately we also want to use defwrapper from CFFI-GROVEL. This is an example of bad composability of macros, requiring copy and paste of source code. Of course with enough refactoring or an optional parameter this could be circumvented. This is the wrapper file from the ASDF definition.

;; groan

cffi-grovel::
(define-wrapper-syntax defwrapper/syscall* (name-and-options rettype args &rest c-lines)
  ;; output C code
  (multiple-value-bind (lisp-name foreign-name options)
      (cffi::parse-name-and-options name-and-options)
    (let ((foreign-name-wrap (strcat foreign-name "_cffi_wrap"))
          (fargs (mapcar (lambda (arg)
                           (list (c-type-name (second arg))
                                 (cffi::foreign-name (first arg) nil)))
                         args)))
      (format out "~A ~A" (c-type-name rettype)
              foreign-name-wrap)
      (format out "(~{~{~A ~A~}~^, ~})~%" fargs)
      (format out "{~%~{  ~A~%~}}~%~%" c-lines)
      ;; matching bindings
      (push `(iolib/syscalls:defsyscall (,foreign-name-wrap ,lisp-name ,@options)
                 ,(cffi-type rettype)
               ,@(mapcar (lambda (arg)
                           (list (symbol* (first arg))
                                 (cffi-type (second arg))))
                         args))
            *lisp-forms*))))

The only change from DEFWRAPPER is the use of IOLIB/SYSCALLS:DEFSYSCALL instead of DEFCFUN, which then performs additional checks with respect to the return value, raising a IOLIB/SYSCALLS:SYSCALL-ERROR that we can then catch rather than having to check the return value ourselves.

Lastly, the actual wrappers. Note that some inline C is used to define the function bodies. This is linux-aio-wrapper from the ASDF definition:

(define "_GNU_SOURCE")

(include "stdio.h" "unistd.h" "sys/syscall.h" "linux/aio_abi.h" "inttypes.h"
         "signal.h")

(defwrapper/syscall* "io_setup" :int
  ((nr :unsigned-int)
   (ctxp ("aio_context_t*" (:pointer aio-context-t))))
  "return syscall(__NR_io_setup, nr, ctxp);")

(defwrapper/syscall* "io_destroy" :int
  ((ctx aio-context-t))
  "return syscall(__NR_io_destroy, ctx);")

(defwrapper/syscall* "io_submit" :int
  ((ctx aio-context-t)
   (nr :long)
   (iocbpp ("struct iocb**" (:pointer (:pointer (:struct iocb))))))
  "return syscall(__NR_io_submit, ctx, nr, iocbpp);")

(defwrapper/syscall* "io_cancel" :int
  ((ctx aio-context-t)
   (iocbp ("struct iocb*" (:pointer (:struct iocb))))
   (result ("struct io_event*" (:pointer (:struct io-event)))))
  "return syscall(__NR_io_cancel, ctx, iocbp, result);")

(defwrapper/syscall* "io_getevents" :int
  ((ctx aio-context-t)
   (min-nr :long)
   (max-nr :long)
   (events ("struct io_event*" (:pointer (:struct io-event))))
   (timeout ("struct timespec*" (:pointer (:struct iolib/syscalls::timespec)))))
  "return syscall(__NR_io_getevents, ctx, min_nr, max_nr, events, timeout);")

Looping

Now that we have all definitions in place, let's translate a moderately complex example of reading from an existing file.

Also EVENTFD is defined here as the C function is already defined in the libc and doesn't have to generated.

(iolib/syscalls:defsyscall eventfd :int
  (initval :unsigned-int)
  (flags eventfd-flags-t))

(defun linux-aio-test (pathname &key (chunk-size 4096))
  (with-foreign-object (context 'aio-context-t)
    (iolib/syscalls:memset context 0 (foreign-type-size 'aio-context-t))
    (let ((eventfd (eventfd 0 :nonblock)))
      (unwind-protect
           (with-open-file (stream pathname :element-type '(unsigned-byte 8))
             (let* ((length (file-length stream))
                    (chunks (ceiling length chunk-size)))
               (with-foreign-object (buffer :uint8 length)
                 (with-event-base (event-base)
                   (io-setup 1 context) ; set up with number of possible operations
                   (with-foreign-object (iocbs '(:struct iocb) chunks)
                     (iolib/syscalls:memset iocbs 0 (* (foreign-type-size '(:struct iocb)) chunks))
                     ;; set up array of operations
                     (dotimes (i chunks)
                       (let ((iocb (mem-aptr iocbs '(:struct iocb) i)))
                         (setf (foreign-slot-value iocb '(:struct iocb) 'aio-lio-opcode) :pread)
                         (setf (foreign-slot-value iocb '(:struct iocb) 'aio-buf) (pointer-address (mem-aptr buffer :uint8 (* i chunk-size))))
                         (setf (foreign-slot-value iocb '(:struct iocb) 'aio-nbytes) (if (eql i (1- chunks)) (- length (* i chunk-size)) chunk-size))
                         (setf (foreign-slot-value iocb '(:struct iocb) 'aio-fildes) (sb-sys:fd-stream-fd stream))
                         (setf (foreign-slot-value iocb '(:struct iocb) 'aio-offset) (* i chunk-size))
                         (setf (foreign-slot-value iocb '(:struct iocb) 'aio-flags) (foreign-enum-value 'iocb-flags-t :resfd))
                         (setf (foreign-slot-value iocb '(:struct iocb) 'aio-resfd) eventfd)))
                     ;; set up array of pointers to operations
                     (with-foreign-object (iocbp '(:pointer (:struct iocb)) chunks)
                       (dotimes (i chunks)
                         (setf (mem-aref iocbp '(:pointer (:struct iocb)) i) (mem-aptr iocbs '(:struct iocb) i)))
                       ;; submit as many operations as possible
                       (let ((submitted (io-submit (mem-ref context 'aio-context-t) chunks iocbp)))
                         ;; keep track of how many operations completed total
                         (let ((total-events-read 0))
                           (flet ((get-events () ; named to be able to RETURN-FROM
                                    (with-foreign-object (events '(:struct io-event) 3)
                                      (loop
                                        (handler-case
                                            (with-foreign-object (available-buffer :uint64)
                                              (iolib/syscalls:read eventfd available-buffer 8)
                                              (let ((available (mem-ref available-buffer :uint64)))
                                                (dotimes (i available)
                                                  (let ((events-read (io-getevents (mem-ref context 'aio-context-t) 0 3 events (null-pointer))))
                                                    (when (eql events-read 0)
                                                      (return))
                                                    (incf total-events-read events-read))
                                                  (when (eql total-events-read chunks)
                                                    (return-from linux-aio-test)))))
                                          ;; in case reading would block
                                          (iolib/syscalls:syscall-error ()
                                            (when (< submitted chunks)
                                              (let ((more-submitted (io-submit (mem-ref context 'aio-context-t) chunks (mem-aptr iocbp '(:pointer (:struct iocb)) submitted))))
                                                (incf submitted more-submitted)))
                                            (return-from get-events)))))))
                             (set-io-handler
                              event-base eventfd :read
                              (lambda (fd event exception)
                                (declare (ignore fd event exception))
                                (get-events)))
                             (event-dispatch event-base))))))))))
        (io-destroy (mem-ref context 'aio-context-t))
        (iolib/syscalls:close eventfd)))))

Relatively straightforward. Complexity comes from accurately submitting chunks, reading the number of available events on demand and submitting a new batch of chunks as long as there are some remaining ones.

Insert FORMAT statement as you like. Tuning the values would need to be considered in order to keep memory consumption in check. Finally

Outlook

We still can't do many local file operations asynchronously. The whole reason to jump through these hoops is of course to integrate potentially blocking operations into an event loop, so some care still needs to be taken to do some work ahead of time or in separate threads as to not block the main part of the program from issuing the I/O requests.

Previous

This blog covers emacs, git, hardware, java, kotlin, lisp, postgresql, tachikoma, unix, work

View content from 2018-04, 2017-12, 2017-07, 2017-06, 2016-11, 2016-10, 2016-09, 2016-08, 2015-12, 2015-11, 2015-10, 2015-08, 2015-06, 2015-04, 2015-02, 2015-01, 2014-12, 2014-11, 2014-08


Unless otherwise credited all material Creative Commons License by Olof-Joachim Frahm