Recent Content

posted on 2016-11-11 01:19:28

Occasionally the topic of asynchronous I/O on local files comes up, though while there are APIs for event-based processing they don't work on regular files, resulting in the use of worker threads to circumvent this restriction.

The inciting incident for me to look into this was the fact that one of my (external, spinning-disk) hard drives has a rather short timeout for spin-down, such that when pausing while browsing through a directory tree would often annoy me when the shell (or any other program) was completely blocked as the motor was being turned on again.

At least on Linux (since version 2.5) there is actually a kernel syscall interface for asynchronous file I/O. Unfortunately it is not being exposed via the libc at all, requiring custom wrappers in order to trigger the proper syscalls. Apart from scheduling asynchronous read and write requests it also supports exposing the corresponding events via an evenfd queue, which then allows us to use epoll and friends.

SBCL being the ultimate breadboard that's not particularly hard though. Using CFFI and IOLib for convenience it's straightforward to port the C examples while writing a minimal amount of C code. The code is of course not very high-level, but can be plugged straight into the IOLib event loop as there's now a file descriptor available to listen on.

Grovelling & wrapping

The groveller can be used quite nicely to prevent us from having to drop down to C completely. Of course we're also using ASDF, so everything's peachy.

Now what's required? CFFI provides additional components for ASDF, namely :CFFI-GROVEL-FILE and :CFFI-WRAPPER-FILE, which make the process seamless and don't require us to write and code related to compiling and loading the C wrapper:

;; -*- mode: lisp; syntax: common-lisp; coding: utf-8-unix; package: cl-user; -*-

(in-package #:cl-user)

(asdf:defsystem #:example
  :defsystem-depends-on (#:cffi-grovel)
  #+asdf-unicode :encoding #+asdf-unicode :utf-8
  :depends-on (#:iterate
               #:iolib
               #:cffi
               #:osicat)
  :serial T
  :components ((:module "src"
                :components
                ((:file "package")
                 (:file "wrapper")
                 (:cffi-grovel-file "linux-aio-grovel")
                 (:cffi-wrapper-file "linux-aio-wrapper")
                 (:file "linux-aio")))))

The package definition is probably not very interesting at this point:

(in-package #:cl-user)

(defpackage #:example
  (:use #:cl #:iterate #:iolib #:cffi))

I've added IOLib and CFFI, usually also ITERATE for convenience.

Next we grovel a couple of definitions related to the kernel API for the asynchronous requests and for eventfd. This is the linux-aio-grovel file mentioned above:

(in-package #:example)

(include "stdio.h" "unistd.h" "sys/syscall.h" "linux/aio_abi.h" "inttypes.h"
         "signal.h" "sys/eventfd.h")

(ctype aio-context-t "aio_context_t")

(cenum iocb-cmd-t
  ((:pread "IOCB_CMD_PREAD"))
  ((:pwrite "IOCB_CMD_PWRITE"))
  ((:fsync "IOCB_CMD_FSYNC"))
  ((:fdsync "IOCB_CMD_FDSYNC"))
  ((:noop "IOCB_CMD_NOOP"))
  ((:preadv "IOCB_CMD_PREADV"))
  ((:pwritev "IOCB_CMD_PWRITEV")))

(constantenum iocb-flags-t
  ((:resfd "IOCB_FLAG_RESFD")))

(cstruct iocb "struct iocb"
  (aio-data "aio_data" :type :uint64)
  ;; #-little-endian
  ;; (aio-reserved1 "aio_reserved1" :type :uint32)
  (aio-key "aio_key" :type :uint32)
  ;; #+little-endian
  ;; (aio-reserved1 "aio_reserved1" :type :uint32)
  (aio-lio-opcode "aio_lio_opcode" :type iocb-cmd-t)
  (aio-fildes "aio_fildes" :type :uint32)
  (aio-buf "aio_buf" :type :uint64)
  (aio-nbytes "aio_nbytes" :type :uint64)
  (aio-offset "aio_offset" :type :int64)
  ;; (aio-reserved2 "aio_reserved2" :type :uint64)
  (aio-flags "aio_flags" :type :uint32)
  (aio-resfd "aio_resfd" :type :uint32))

(cstruct io-event "struct io_event"
  (data "data" :type :uint64)
  (obj "obj" :type :uint64)
  (res "res" :type :int64)
  (res2 "res" :type :int64))

(cenum eventfd-flags-t
  ((:cloexec "EFD_CLOEXEC"))
  ((:nonblock "EFD_NONBLOCK"))
  ((:semaphore "EFD_SEMAPHORE")))

Note that this not a complete list and a couple of reserved members are commented out as they're primarily used to provide space for further expansion. Fortunately offsets for the rest of the struct aren't affected by leaving out parts in the Lisp-side definition.

The enums are easy enough, even though they both represent flags, so should be or-ed together, which might be necessary to do manually or by finding a way to let CFFI do the coercion from a list of flags perhaps.

In order to have nice syscall wrappers we'd normally use defsyscall from IOLib. Unfortunately we also want to use defwrapper from CFFI-GROVEL. This is an example of bad composability of macros, requiring copy and paste of source code. Of course with enough refactoring or an optional parameter this could be circumvented. This is the wrapper file from the ASDF definition.

;; groan

cffi-grovel::
(define-wrapper-syntax defwrapper/syscall* (name-and-options rettype args &rest c-lines)
  ;; output C code
  (multiple-value-bind (lisp-name foreign-name options)
      (cffi::parse-name-and-options name-and-options)
    (let ((foreign-name-wrap (strcat foreign-name "_cffi_wrap"))
          (fargs (mapcar (lambda (arg)
                           (list (c-type-name (second arg))
                                 (cffi::foreign-name (first arg) nil)))
                         args)))
      (format out "~A ~A" (c-type-name rettype)
              foreign-name-wrap)
      (format out "(~{~{~A ~A~}~^, ~})~%" fargs)
      (format out "{~%~{  ~A~%~}}~%~%" c-lines)
      ;; matching bindings
      (push `(iolib/syscalls:defsyscall (,foreign-name-wrap ,lisp-name ,@options)
                 ,(cffi-type rettype)
               ,@(mapcar (lambda (arg)
                           (list (symbol* (first arg))
                                 (cffi-type (second arg))))
                         args))
            *lisp-forms*))))

The only change from DEFWRAPPER is the use of IOLIB/SYSCALLS:DEFSYSCALL instead of DEFCFUN, which then performs additional checks with respect to the return value, raising a IOLIB/SYSCALLS:SYSCALL-ERROR that we can then catch rather than having to check the return value ourselves.

Lastly, the actual wrappers. Note that some inline C is used to define the function bodies. This is linux-aio-wrapper from the ASDF definition:

(define "_GNU_SOURCE")

(include "stdio.h" "unistd.h" "sys/syscall.h" "linux/aio_abi.h" "inttypes.h"
         "signal.h")

(defwrapper/syscall* "io_setup" :int
  ((nr :unsigned-int)
   (ctxp ("aio_context_t*" (:pointer aio-context-t))))
  "return syscall(__NR_io_setup, nr, ctxp);")

(defwrapper/syscall* "io_destroy" :int
  ((ctx aio-context-t))
  "return syscall(__NR_io_destroy, ctx);")

(defwrapper/syscall* "io_submit" :int
  ((ctx aio-context-t)
   (nr :long)
   (iocbpp ("struct iocb**" (:pointer (:pointer (:struct iocb))))))
  "return syscall(__NR_io_submit, ctx, nr, iocbpp);")

(defwrapper/syscall* "io_cancel" :int
  ((ctx aio-context-t)
   (iocbp ("struct iocb*" (:pointer (:struct iocb))))
   (result ("struct io_event*" (:pointer (:struct io-event)))))
  "return syscall(__NR_io_cancel, ctx, iocbp, result);")

(defwrapper/syscall* "io_getevents" :int
  ((ctx aio-context-t)
   (min-nr :long)
   (max-nr :long)
   (events ("struct io_event*" (:pointer (:struct io-event))))
   (timeout ("struct timespec*" (:pointer (:struct iolib/syscalls::timespec)))))
  "return syscall(__NR_io_getevents, ctx, min_nr, max_nr, events, timeout);")

Looping

Now that we have all definitions in place, let's translate a moderately complex example of reading from an existing file.

Also EVENTFD is defined here as the C function is already defined in the libc and doesn't have to generated.

(iolib/syscalls:defsyscall eventfd :int
  (initval :unsigned-int)
  (flags eventfd-flags-t))

(defun linux-aio-test (pathname &key (chunk-size 4096))
  (with-foreign-object (context 'aio-context-t)
    (iolib/syscalls:memset context 0 (foreign-type-size 'aio-context-t))
    (let ((eventfd (eventfd 0 :nonblock)))
      (unwind-protect
           (with-open-file (stream pathname :element-type '(unsigned-byte 8))
             (let* ((length (file-length stream))
                    (chunks (ceiling length chunk-size)))
               (with-foreign-object (buffer :uint8 length)
                 (with-event-base (event-base)
                   (io-setup 1 context) ; set up with number of possible operations
                   (with-foreign-object (iocbs '(:struct iocb) chunks)
                     (iolib/syscalls:memset iocbs 0 (* (foreign-type-size '(:struct iocb)) chunks))
                     ;; set up array of operations
                     (dotimes (i chunks)
                       (let ((iocb (mem-aptr iocbs '(:struct iocb) i)))
                         (setf (foreign-slot-value iocb '(:struct iocb) 'aio-lio-opcode) :pread)
                         (setf (foreign-slot-value iocb '(:struct iocb) 'aio-buf) (pointer-address (mem-aptr buffer :uint8 (* i chunk-size))))
                         (setf (foreign-slot-value iocb '(:struct iocb) 'aio-nbytes) (if (eql i (1- chunks)) (- length (* i chunk-size)) chunk-size))
                         (setf (foreign-slot-value iocb '(:struct iocb) 'aio-fildes) (sb-sys:fd-stream-fd stream))
                         (setf (foreign-slot-value iocb '(:struct iocb) 'aio-offset) (* i chunk-size))
                         (setf (foreign-slot-value iocb '(:struct iocb) 'aio-flags) (foreign-enum-value 'iocb-flags-t :resfd))
                         (setf (foreign-slot-value iocb '(:struct iocb) 'aio-resfd) eventfd)))
                     ;; set up array of pointers to operations
                     (with-foreign-object (iocbp '(:pointer (:struct iocb)) chunks)
                       (dotimes (i chunks)
                         (setf (mem-aref iocbp '(:pointer (:struct iocb)) i) (mem-aptr iocbs '(:struct iocb) i)))
                       ;; submit as many operations as possible
                       (let ((submitted (io-submit (mem-ref context 'aio-context-t) chunks iocbp)))
                         ;; keep track of how many operations completed total
                         (let ((total-events-read 0))
                           (flet ((get-events () ; named to be able to RETURN-FROM
                                    (with-foreign-object (events '(:struct io-event) 3)
                                      (loop
                                        (handler-case
                                            (with-foreign-object (available-buffer :uint64)
                                              (iolib/syscalls:read eventfd available-buffer 8)
                                              (let ((available (mem-ref available-buffer :uint64)))
                                                (dotimes (i available)
                                                  (let ((events-read (io-getevents (mem-ref context 'aio-context-t) 0 3 events (null-pointer))))
                                                    (when (eql events-read 0)
                                                      (return))
                                                    (incf total-events-read events-read))
                                                  (when (eql total-events-read chunks)
                                                    (return-from linux-aio-test)))))
                                          ;; in case reading would block
                                          (iolib/syscalls:syscall-error ()
                                            (when (< submitted chunks)
                                              (let ((more-submitted (io-submit (mem-ref context 'aio-context-t) chunks (mem-aptr iocbp '(:pointer (:struct iocb)) submitted))))
                                                (incf submitted more-submitted)))
                                            (return-from get-events)))))))
                             (set-io-handler
                              event-base eventfd :read
                              (lambda (fd event exception)
                                (declare (ignore fd event exception))
                                (get-events)))
                             (event-dispatch event-base))))))))))
        (io-destroy (mem-ref context 'aio-context-t))
        (iolib/syscalls:close eventfd)))))

Relatively straightforward. Complexity comes from accurately submitting chunks, reading the number of available events on demand and submitting a new batch of chunks as long as there are some remaining ones.

Insert FORMAT statement as you like. Tuning the values would need to be considered in order to keep memory consumption in check. Finally

Outlook

We still can't do many local file operations asynchronously. The whole reason to jump through these hoops is of course to integrate potentially blocking operations into an event loop, so some care still needs to be taken to do some work ahead of time or in separate threads as to not block the main part of the program from issuing the I/O requests.

Integrated tools #2

posted on 2016-10-15 23:53:14

In order to integrate any (interactive) application the following three parts need to be available, otherwise the integrator will have trouble one way or another:

objects, representing the domain model; these would be inspectable in order to get the contained information,
behaviours, allows the external user to invoke them,
notifications, to react to state changes

E.g. not only does an application need to expose its windows, but also it needs to allow to close or open a window and finally send updates if a window was closed or opened. Notifications in particular need to be in a channel that allows to poll multiple of them at the same time - if communication is done over a socket select and friends would allow the other side to control and react to multiple applications at the same time.

Debugging Java with ABCL

posted on 2016-09-10 00:05:24

So after adding a dispatch mechanism into the DISASSEMBLE function of ABCL I also want to show a neat way of disassembling arbitrary Java classes using one of the provided disassemblers.

First of, make sure that you have the ObjectWeb ASM library in your classpath (everything up from version 3 should work at least, perhaps even lower versions), note that in future releases you might be able to load it via Maven and one of the optional contribs as well, likely via (require '#:asm-all).

Next up, try (disassemble #'list) and confirm that some output is shown.

Now to show bytecode for arbitrary Java classes we've got to jump through some additional hoops - though perhaps at some point this could become a stable API as well:

(system::objectweb-disassemble (#"getResourceAsStream" (java:jclass "java.lang.Object") "/java/util/Random.class"))

I haven't tried this via the JAD disassembler, but it seems likely that a similar approach should work for it too.

Java integration woes

posted on 2016-08-18 21:26:13

Just a quick note on the JVM ecosystem since I've been wrestling with getting several different technologies to work together: It's a mess really.

The specific setup in this case is a mostly Java based project, sprinkled with some Kotlin code (which I only expect to grow in the future), using Maven as the build system. Added to that some Kotlin annotations (in lieu of using Kotlin in the first place).

Todays (and yesterdays) adventure was trying to get the Error Prone checker integrated with the existing system, which proved quite impossible, due to the fact that it's using a modified compiler(!) which conflicts with the use of Lombok annotation processing.

There are workarounds in the sense that Lombok can also be used to produce processed Java files (instead of byte code generation), however it seems like that process is less capable than the IDEA / internal processing and would have me remove a lot of val instances that didn't get their type inferred properly, making it an arduous process.

Summing this up, the fact that these tools integrate on different levels of the "stack", while also making tinkering with it relatively hard due to byte code generation, complicates this endeavour greatly. In the end I resolved to drop the Error Prone integration in favour of the much easier to setup SonarQube platform. I also hope that annotation processing for Lombok will improve such that we don't need workarounds in case of "invisible" getters anymore, for example.

Setting up an ABCL mirror from SVN to Git

posted on 2016-08-03 12:03:02

Copies of copies

As it so happens the ABCL main repository is held in Subversion, but since I'm more productive with Git I usually use a conversion to Git via the git-svn program. I've based my own copy off of slryus' one, however I had to fetch and update my own local copy since neither of us had done any fetching in quite some time. I'm writing down some notes below to make sure I (or others) can repeat the process in the future.

First we need a copy of an existing Git conversion.

git clone https://github.com/Ferada/abcl.git abcl-git
# or in case you have SSH access
git clone git@github.com:Ferada/abcl.git abcl-git

The master branch should be a direct mirror of the SVN repository, i.e. only have commits with git-svn annotations, like git-svn-id: http://abcl.org/svn/trunk/abcl@14851 1c010e3e-....

Next we want to initialise the SVN remote and fetch the new commits.

git svn init --prefix=abcl/ http://abcl.org/svn/trunk/abcl
git svn fetch -r 14791:14851

Note that the first revision in the fetch command is the last one in the Git master branch and the other one is the current HEAD of the SVN repository.

This process will take just a few moments, however while all the new commits will be based off of the master branch in Git, the first commit will be a duplicate and have more changes than the existing commit in Git.

2015-08-31 20:55 mevenson           │ │ o │ │ │ ansi-test: reference new git repository
2015-07-01 04:16 mevenson           o─┴─│─┴─┴─┘ abcl-asdf: fix usage with local repository   <<<<
2015-07-01 04:16 mevenson           │ I─┘ abcl-asdf: fix usage with local repository         <<<<
2015-06-30 18:42 mevenson           o abcl-asdf: correct metadata

The newly created first commit (14791 here) has lots of changes.

.dir-locals.el
CHANGES
COPYING
MANUAL
README
abcl.asd
abcl.bat.in
...

While the second, pre-existing one, has way less (and is just showing the same changes that are in the SVN commit).

contrib/abcl-asdf/abcl-asdf.asd
contrib/abcl-asdf/abcl-asdf.lisp
contrib/abcl-asdf/maven-embedder.lisp

To correct that and continue with a linear history I'm using the interactive rebase command.

git co -b test
git reset --hard abcl/git-svn
git rebase -i <commit _before_ the duplicated one>

Copy the hash of the correct, smaller commit and replace the pick ... line (at the top) that contains the duplicated commit with the one from the cloned-from Git repository, then exit the editor. Once the rebase is done we just need to update the ref for git-svn.

git update-ref refs/remotes/abcl/git-svn test

Note the syntax. If done correctly it will be updated in e.g. tig, if not there'll be a new entry showing up in git show-ref.

Lastly we want to rebase master to get the new commits.

git co master
git co -D test
git rebase abcl/git-svn master

Which will look somewhat like this:

2016-06-13 08:06 mevenson           o [master] {abcl/git-svn} {github/master} doc: note changes for abcl-1.3.4
2016-05-16 23:05 mevenson           o Update to asdf-3.1.7
2016-05-16 21:43 mevenson           o Update to jna-4.2.2

And push the updates.

git push origin master

Setting up ABCL and LIRE

posted on 2015-11-11 20:34:29

Intro

The purpose of this article is to examine how using ABCL with existing libraries (arguably the main point of using ABCL at the moment) actually looks like in practice. Never mind integration with Spring, or other more involved frameworks, this will only touch a single library and won't require us to write from-Java-callable classes.

In the process of refining this I'm hoping to also get ideas about the requirements for building a better DSL for the Java FFI, based on the intended "look" of the code (that is, coding by wishful thinking).

Setup

Ensuring the correct package is somewhat optional:

(in-package #:cl-user)

Generally using JSS is a bit nicer than the plain Java FFI. After the contribs are loaded, JSS can be required and used:

(require '#:abcl-contrib)
(require '#:jss)

(use-package '#:jss)

Next, we need access to the right libraries. After building LIRE from source and executing the mvn dist command we end up with a JAR file for LIRE and several dependencies in the lib folder. All of them need to be on the classpath:

(progn
  (add-to-classpath "~/src/LIRE/dist/lire.jar")
  (mapc #'add-to-classpath (directory "~/src/LIRE/dist/lib/*.jar")))

Prelude

Since we're going to read pictures in a couple of places, a helper to load one from a pathname is a good start:

(defun read-image (pathname)
  (#"read" 'javax.imageio.ImageIO (new 'java.io.File (namestring pathname))))

To note here is the use of NEW from JSS with a symbol for the class name, the conversion of the pathname to a regular string, since the Java side doesn't expect a Lisp object and the #"" reader syntax from JSS to invoke the method read in a bit of a simpler way than using the FFI calls directly.

JSS will automatically "import" Java names, so the same function can simply be the following instead (provided that the names aren't ambiguous):

(defun read-image (pathname)
  (#"read" 'ImageIO (new 'File (namestring pathname))))

The names will be looked up again on every call though, so this option isn't the best performing one.

For comparison, the raw FFI would be a bit more verbose, but explicitely specifies all names:

(defun read-image (pathname)
  (jstatic "read" "javax.imageio.ImageIO" (jnew "java.io.File" (namestring pathname))))

Though with a combination of JSS and cached lookup it could be nicer, even though the setup is more verbose:

(defvar +image-io+ (jclass "javax.imageio.ImageIO"))
(defvar +file+ (jclass "java.io.File"))

(defun read-image (pathname)
  (#"read" +image-io+ (jnew +file+ (namestring pathname))))

At this point without other improvements (auto-coercion of pathnames, importing namespaces) it's about as factored as it will be (except moving every single call into its own Lisp wrapper function).

Building an index

To keep it simple building the index will be done from a list of pathnames in a single step while providing the path of the index as a separate parameter:

(defun build-index (index-name pathnames)
  (let ((global-document-builder
          (new 'GlobalDocumentBuilder (find-java-class 'CEDD)))
        (index-writer (#"createIndexWriter"
                       'LuceneUtils
                       index-name
                       +true+
                       (get-java-field 'LuceneUtils$AnalyzerType "WhitespaceAnalyzer"))))
    (unwind-protect
         (dolist (pathname pathnames)
           (let ((pathname (namestring pathname)))
             (format T "Indexing ~A ..." pathname)
             (let* ((image (read-image pathname))
                    (document (#"createDocument" global-document-builder image pathname)))
               (#"addDocument" index-writer document))
             (format T " done.~%")))
      (#"closeWriter" 'LuceneUtils index-writer))))

Note: This code won't work on current ABCL as is, because the lookup is disabled for for nested classes (those containing the dollar character). Because of this, the AnalyzerType class would have to be looked up as follows:

(jfield "net.semanticmetadata.lire.utils.LuceneUtils$AnalyzerType" "WhitespaceAnalyzer")

All in all nothing fancy, JSS takes care of a lot of typing as the names are all unique enough.

The process is simply creating the document builder and index writer, reading all the files one by one and adding them to the index. There's no error checking at the moment though.

To note here is that looking up the precise kind of a Java name is a bit of a hassle. Of course intuition goes a long way, but again, manually figuring out whether a name is a nested class or static/enum field is annoying enough since it involves either repeated calls to JAPROPOS, or reading more Java documentation.

Apart from that, this is mostly a direct transcription. Unfortunately written this way there's no point in creating a WITH-OPEN-* macro to automatically close the writer, however, looking at the LuceneUtils source this could be accomplished by directly calling close on the writer object instead - a corresponding macro might this then:

(defmacro with-open ((name value) &body body)
  `(let ((,name ,value))
     (unwind-protect
          (progn ,@body)
       (#"close" ,name))))

It would also be nice to have auto conversion using keywords for enum values instead of needing to look up the value manually.

Querying an index

The other way round, looking up related pictures by passing in an example, is done using an image searcher:

(defun query-index (index-name pathname)
  (let* ((image (read-image pathname))
         (index-reader (#"open" 'DirectoryReader
                                (#"open" 'FSDirectory
                                         (#"get" 'Paths index-name (jnew-array "java.lang.String" 0))))))
    (unwind-protect
         (let* ((image-searcher (new 'GenericFastImageSearcher 30 (find-java-class 'CEDD)))
                (hits (#"search" image-searcher image index-reader)))
           (dotimes (i (#"length" hits))
             (let ((found-pathname (#"getValues" (#"document" index-reader (#"documentID" hits i))
                                                 (get-java-field 'builders.DocumentBuilder "FIELD_NAME_IDENTIFIER"))))
               (format T "~F: ~A~%" (#"score" hits i) found-pathname))))
      (#"closeReader" 'LuceneUtils index-reader))))

To note here is that the get call on java.nio.file.Paths took way more time to figure out than should've been necessary: Essentially the method is using a variable number of arguments, but the FFI doesn't help in any way, so the array (of the correct type!) needs to be set up manually, especially if the number of variable arguments is zero. This is not obvious at first and also takes unnecessary writing.

The rest of the code is straightforward again. At least a common wrapper for the length call would be nice, but since the result object doesn't actually implement a collection interface, the point about having better collection iteration is kind of moot here.

A better DSL

Considering how verbose the previous examples were, how would the "ideal" way look like?

There are different ways which are more, or less intertwined with Java semantics. On the one end, we could imagine something akin to "Java in Lisp":

(defun read-image (pathname)
  (ImageIO/read (FileInputStream. pathname)))

Which is almost how it would look like in Clojure. However, this is complicating semantics. While importing would be an extension to the package mechanism (or possibly just a file-wide setting), the Class/field syntax and Class. syntax are non-trivial reader extensions, not from the actual implementation point of view, but from the user point of view. They'd basically disallow a wide range of formerly legal Lisp names.

(defun read-image (pathname)
  (#"read" 'ImageIO (new 'FileInputStream pathname)))

This way is the middle ground that we have now. The one addition here could be that name lookup is done at macro expansion / compilation time, so they are fixed one step before execution, whereas at the moment the JSS reader macro will allow for very late bound name lookup instead.

The similarity with CLOS would be the use of symbols for class names, but the distinction is still there, since there's not much in terms of integrating CLOS and Java OO yet (which might not be desirable anyway?).

Auto-coercion to Java data types also takes place in both cases. Generally this would be appropriate, except for places where we'd really want the Java side to receive a Lisp object. Having a special variable to disable conversion might be enough for these purposes.

If we were to forego the nice properties of JSS by requiring a function form, the following would be another option:

(defun read-image (pathname)
  $(read 'ImageIO (new 'FileInputStream pathname)))

Where $(...) would be special syntax indicating a Java method call. Of course the exact syntax is not very relevant, more importantly static properties could be used to generate a faster, early bound call by examining the supplied arguments as a limited form of type inference.

Summary

After introducing the necessary steps to start using ABCL with "native" Java libraries, we transcribed two example programs from the library homepage.

Part of this process was to examine how the interaction between the Common Lisp and Java parts looks like, using the "raw" and the simplified JSS API. In all cases the FFI is clunkier than needs be. Especially the additional Java namespaces are making things longer than necessary. The obvious way of "importing" classes by storing a reference in a Lisp variable is viable, but again isn't automated.

Based on the verbose nature of the Java calls an idea about how a more concise FFI DSL could look like was developed next and discussed. At a future point in time this idea could now be developed fully and integrated (as a contrib) into ABCL.

Limits of Unix shell

posted on 2015-08-20 14:54:10

Let's go back to the GNU Coreutils list of tools. ls for example. Usually the user will have set some alias to ls instead of the plain invocation, either to enable highlighting (--color), sorting (--sort), or to add more information than just the filenames (e.g. --format). There is even integration with Emacs (--dired).

The question then is: How much of the functionality of ls is actually devoted to secondary formatting instead of listing files? And shouldn't this functionality be moved into separate tools? Since output is intended for multiple kinds of recipients, additional data creeps in and complicate tools a lot.

Alternatively, we could imagine using ls only to get unformatted and unsorted output. Which would then be passed through to a sort command and a fmt command of sorts. Of course this all takes some more time, re-parsing of output etc., so it's understandable in the interest of performance not to do this in the traditional Unix shell.

However, let's assume a more sophisticated shell. Assuming ls is limited to listing files, then the user will alias ls to a pipeline instead, namely something akin to ls | sort | fmt. Then again, formatting is part of the user interface, not the functionality, so it should rather be part of the internal shell formatting, possibly exposed as separate filters as well.

The result of ls is a (possibly nested) directory listing. Regardless of post-processing, this "object" should still be available for further investigation. Which means that while sorting may be applied destructively, formatting may not, unless specifically requested, in which case the result would be a kind of "formatted" object (text, GUI widget) instead.

In other terms, the user should be able to refer to the last results immediately, instead of rerunning the whole pipeline. E.g. coming from Common Lisp, variables like * to *** will store the last three results for interactive use. In the shell then, ls would set * to the generated directory listing; since the listing is also most likely printed to the screen, the full listing will also be stored (in that object) to be used again if e.g. * is requested again. Rerunning the command, on the other hand, will possibly generate a different directory listing as files may have been changed, so there is an immediate difference between the two forms.

Examples

The pipeline ls | wc -l is (at least for me) often used to get the number of files in the (current) directory. Unfortunately there is no direct way to get this number directly except to enumerate the entries in a directory (under Linux that is).

Image viewers and other systems tools

posted on 2015-06-12 16:20:23

After all the trouble I went through with trying to get an image viewer to work in CL (due to the used GTK+ 3 library being problematic, that is, unmaintained) maybe a different approach is more viable. It would be possible to use one existing program as a front end by calling it via IPC. So e.g. feh, my go-to program for that task, already has configurable keybindings; it should be a smaller problem to remote control it (even with adding some code).

However, as with all these *nix combinators, it feels like a mish-mash of tools intertwined and not-quite the optimal solution.

Consider what happens if you want to add new functionality, i.e. new widgets. In that case composability breaks down since feh is relatively minimal and therefore doesn't have much options in terms of providing different menus, input widgets, etc. Therefore you'd have to find either a different viewer with more scripting capabilities (which is counter to the "one-tool" mantra), or switch to a more integrated approach to have this component as an internal part of your environment.

Obviously now would be the time for either components/CORBA, or a Lisp Machine to hack up other programs.

Or switch to Qt. It seems that the bindings for that framework are more stable than the GTK bindings and additionally they (Qt) just have more people working on the framework.

Since one of the problems with the GTK bindings is the relatively recent upgrade to GTK+ 3, there seems to be a point in using the previous version 2 instead, considering that even GIMP didn't update yet.

PostgreSQL insights I

posted on 2015-04-25 13:47:23+01:00

After working a lot more with PostgreSQL, I happened to stumble upon a few things, mostly related to query optimisation, that at least for me weren't quite readily apparent. Please note that this is all with regards to PostgreSQL 9.3, but unless otherwise noted it should still be the case for 9.4 as well.

Slow `DELETE`s

Obviously, indexes are key to good query performance. If you're already using EXPLAIN, with, or without ANALYZE, chances are good you know how and why your queries perform like they do. However I encountered a problem with a DELETE query, where the output from EXPLAIN was as expected, i.e. it was using the optimal plan, but still the performance was abysmal; a query to delete about 300 elements in bulk, like DELETE FROM table WHERE id IN (...);, was quite fast to remove the elements (as tested from a separately running psql), but still the query took about three minutes(!) to complete.

In this scenario the table in question was about a million rows long, had a primary index on the id column and was referenced from three other tables, which also had foreign key constraints set up; no other triggers were running on any of the tables involved.

The postgres server process in question wasn't doing anything interesting, it was basically taking up one core with semop calls as reported by strace, no other I/O was observed though.

At that point I finally turned to #postgresql. Since the information I could share wasn't very helpful, there were no immediate replies, but one hint finally helped me to fix this problem. It turns out that the (kinda obvious) solution was to check for missing indexes on foreign key constraints from other tables. With two out of three indexes missing I resumed to CREATE INDEX CONCURRENTLY...; and about five minutes later the DELETE was now running in a few milliseconds.

The part where I was really frustrated here is that none of the available statistics guided me in the search; EXPLAIN ANALYZE apparently doesn't include the runtime for foreign key constraints and they don't show up in other places as well. In hindsight this is something that I should've checked earlier (and from now on I will), but it's also a weakness of the analyze framework not to help the developer to see the slowdowns involved in a situation like this.

Common Table Expressions

Refactoring queries to reuse results with WITH queries is absolutely worth it and improved the runtime of rather large queries by a large factor. This is something that can be seen from the query plan, so when you're using the same expressions twice, start looking into this and see if it helps both for readability (don't repeat yourself) and performance.

JSON result construction

We need nested JSON output in a few cases. This means (in 9.3, there are some better functions available in 9.4) that a combination of row_to_json(row(json_agg(...))) was necessary to get proper nesting of aggregated sub-objects, as well as wrapping the end result in another object, because the output had to be formatted as a JSON object (with curly brackets) instead of a JSON array (with rectangular brackets).

Technicalities aside the JSON support is pretty good and since the initial code was written I've discovered that since we in many cases don't actually have multiple results (for json_agg), not using that method will again significantly improve performance.

That means instead of something like the following:

SELECT row_to_json(row(json_agg(...))) FROM ... JOIN ... GROUP BY id;

, where the input to json_agg is a single result from the JOIN, we can write the following instead:

SELECT row_to_json(row(ARRAY[...])) FROM ... JOIN ...;

, which, if you examine the output of EXPLAIN, means no sorting because of the GROUP BY clause. The convenience of json_agg here doesn't really justify the significant slowdown caused by the aggregation function.

Note that the array constructed via ARRAY[] is properly converted to JSON, so the end result is again proper JSON.

Previous Next

This blog covers work, unix, tachikoma, scala, sbt, redis, postgresql, no-ads, lisp, kotlin, jvm, java, hardware, go, git, emacs, docker