Content from 2015-11

Setting up ABCL and LIRE
posted on 2015-11-11 20:34:29

Intro

The purpose of this article is to examine how using ABCL with existing libraries (arguably the main point of using ABCL at the moment) actually looks like in practice. Never mind integration with Spring, or other more involved frameworks, this will only touch a single library and won't require us to write from-Java-callable classes.

In the process of refining this I'm hoping to also get ideas about the requirements for building a better DSL for the Java FFI, based on the intended "look" of the code (that is, coding by wishful thinking).

Setup

Ensuring the correct package is somewhat optional:

(in-package #:cl-user)

Generally using JSS is a bit nicer than the plain Java FFI. After the contribs are loaded, JSS can be required and used:

(require '#:abcl-contrib)
(require '#:jss)

(use-package '#:jss)

Next, we need access to the right libraries. After building LIRE from source and executing the mvn dist command we end up with a JAR file for LIRE and several dependencies in the lib folder. All of them need to be on the classpath:

(progn
  (add-to-classpath "~/src/LIRE/dist/lire.jar")
  (mapc #'add-to-classpath (directory "~/src/LIRE/dist/lib/*.jar")))

Prelude

Since we're going to read pictures in a couple of places, a helper to load one from a pathname is a good start:

(defun read-image (pathname)
  (#"read" 'javax.imageio.ImageIO (new 'java.io.File (namestring pathname))))

To note here is the use of NEW from JSS with a symbol for the class name, the conversion of the pathname to a regular string, since the Java side doesn't expect a Lisp object and the #"" reader syntax from JSS to invoke the method read in a bit of a simpler way than using the FFI calls directly.

JSS will automatically "import" Java names, so the same function can simply be the following instead (provided that the names aren't ambiguous):

(defun read-image (pathname)
  (#"read" 'ImageIO (new 'File (namestring pathname))))

The names will be looked up again on every call though, so this option isn't the best performing one.

For comparison, the raw FFI would be a bit more verbose, but explicitely specifies all names:

(defun read-image (pathname)
  (jstatic "read" "javax.imageio.ImageIO" (jnew "java.io.File" (namestring pathname))))

Though with a combination of JSS and cached lookup it could be nicer, even though the setup is more verbose:

(defvar +image-io+ (jclass "javax.imageio.ImageIO"))
(defvar +file+ (jclass "java.io.File"))

(defun read-image (pathname)
  (#"read" +image-io+ (jnew +file+ (namestring pathname))))

At this point without other improvements (auto-coercion of pathnames, importing namespaces) it's about as factored as it will be (except moving every single call into its own Lisp wrapper function).

Building an index

To keep it simple building the index will be done from a list of pathnames in a single step while providing the path of the index as a separate parameter:

(defun build-index (index-name pathnames)
  (let ((global-document-builder
          (new 'GlobalDocumentBuilder (find-java-class 'CEDD)))
        (index-writer (#"createIndexWriter"
                       'LuceneUtils
                       index-name
                       +true+
                       (get-java-field 'LuceneUtils$AnalyzerType "WhitespaceAnalyzer"))))
    (unwind-protect
         (dolist (pathname pathnames)
           (let ((pathname (namestring pathname)))
             (format T "Indexing ~A ..." pathname)
             (let* ((image (read-image pathname))
                    (document (#"createDocument" global-document-builder image pathname)))
               (#"addDocument" index-writer document))
             (format T " done.~%")))
      (#"closeWriter" 'LuceneUtils index-writer))))

Note: This code won't work on current ABCL as is, because the lookup is disabled for for nested classes (those containing the dollar character). Because of this, the AnalyzerType class would have to be looked up as follows:

(jfield "net.semanticmetadata.lire.utils.LuceneUtils$AnalyzerType" "WhitespaceAnalyzer")

All in all nothing fancy, JSS takes care of a lot of typing as the names are all unique enough.

The process is simply creating the document builder and index writer, reading all the files one by one and adding them to the index. There's no error checking at the moment though.

To note here is that looking up the precise kind of a Java name is a bit of a hassle. Of course intuition goes a long way, but again, manually figuring out whether a name is a nested class or static/enum field is annoying enough since it involves either repeated calls to JAPROPOS, or reading more Java documentation.

Apart from that, this is mostly a direct transcription. Unfortunately written this way there's no point in creating a WITH-OPEN-* macro to automatically close the writer, however, looking at the LuceneUtils source this could be accomplished by directly calling close on the writer object instead - a corresponding macro might this then:

(defmacro with-open ((name value) &body body)
  `(let ((,name ,value))
     (unwind-protect
          (progn ,@body)
       (#"close" ,name))))

It would also be nice to have auto conversion using keywords for enum values instead of needing to look up the value manually.

Querying an index

The other way round, looking up related pictures by passing in an example, is done using an image searcher:

(defun query-index (index-name pathname)
  (let* ((image (read-image pathname))
         (index-reader (#"open" 'DirectoryReader
                                (#"open" 'FSDirectory
                                         (#"get" 'Paths index-name (jnew-array "java.lang.String" 0))))))
    (unwind-protect
         (let* ((image-searcher (new 'GenericFastImageSearcher 30 (find-java-class 'CEDD)))
                (hits (#"search" image-searcher image index-reader)))
           (dotimes (i (#"length" hits))
             (let ((found-pathname (#"getValues" (#"document" index-reader (#"documentID" hits i))
                                                 (get-java-field 'builders.DocumentBuilder "FIELD_NAME_IDENTIFIER"))))
               (format T "~F: ~A~%" (#"score" hits i) found-pathname))))
      (#"closeReader" 'LuceneUtils index-reader))))

To note here is that the get call on java.nio.file.Paths took way more time to figure out than should've been necessary: Essentially the method is using a variable number of arguments, but the FFI doesn't help in any way, so the array (of the correct type!) needs to be set up manually, especially if the number of variable arguments is zero. This is not obvious at first and also takes unnecessary writing.

The rest of the code is straightforward again. At least a common wrapper for the length call would be nice, but since the result object doesn't actually implement a collection interface, the point about having better collection iteration is kind of moot here.

A better DSL

Considering how verbose the previous examples were, how would the "ideal" way look like?

There are different ways which are more, or less intertwined with Java semantics. On the one end, we could imagine something akin to "Java in Lisp":

(defun read-image (pathname)
  (ImageIO/read (FileInputStream. pathname)))

Which is almost how it would look like in Clojure. However, this is complicating semantics. While importing would be an extension to the package mechanism (or possibly just a file-wide setting), the Class/field syntax and Class. syntax are non-trivial reader extensions, not from the actual implementation point of view, but from the user point of view. They'd basically disallow a wide range of formerly legal Lisp names.

(defun read-image (pathname)
  (#"read" 'ImageIO (new 'FileInputStream pathname)))

This way is the middle ground that we have now. The one addition here could be that name lookup is done at macro expansion / compilation time, so they are fixed one step before execution, whereas at the moment the JSS reader macro will allow for very late bound name lookup instead.

The similarity with CLOS would be the use of symbols for class names, but the distinction is still there, since there's not much in terms of integrating CLOS and Java OO yet (which might not be desirable anyway?).

Auto-coercion to Java data types also takes place in both cases. Generally this would be appropriate, except for places where we'd really want the Java side to receive a Lisp object. Having a special variable to disable conversion might be enough for these purposes.

If we were to forego the nice properties of JSS by requiring a function form, the following would be another option:

(defun read-image (pathname)
  $(read 'ImageIO (new 'FileInputStream pathname)))

Where $(...) would be special syntax indicating a Java method call. Of course the exact syntax is not very relevant, more importantly static properties could be used to generate a faster, early bound call by examining the supplied arguments as a limited form of type inference.

Summary

After introducing the necessary steps to start using ABCL with "native" Java libraries, we transcribed two example programs from the library homepage.

Part of this process was to examine how the interaction between the Common Lisp and Java parts looks like, using the "raw" and the simplified JSS API. In all cases the FFI is clunkier than needs be. Especially the additional Java namespaces are making things longer than necessary. The obvious way of "importing" classes by storing a reference in a Lisp variable is viable, but again isn't automated.

Based on the verbose nature of the Java calls an idea about how a more concise FFI DSL could look like was developed next and discussed. At a future point in time this idea could now be developed fully and integrated (as a contrib) into ABCL.

This blog covers unix, tachikoma, postgresql, lisp, kotlin, java, git, emacs

View content from 2014-08, 2014-11, 2014-12, 2015-01, 2015-02, 2015-04, 2015-06, 2015-08, 2015-11, 2016-08, 2016-09, 2016-10, 2016-11, 2017-06, 2017-07


Unless otherwise credited all material Creative Commons License by Olof-Joachim Frahm