Recent Content

Setting up an ABCL mirror from SVN to Git
posted on 2016-08-03 12:03:02

Copies of copies

As it so happens the ABCL main repository is held in Subversion, but since I'm more productive with Git I usually use a conversion to Git via the git-svn program. I've based my own copy off of slryus' one, however I had to fetch and update my own local copy since neither of us had done any fetching in quite some time. I'm writing down some notes below to make sure I (or others) can repeat the process in the future.

First we need a copy of an existing Git conversion.

git clone https://github.com/Ferada/abcl.git abcl-git
# or in case you have SSH access
git clone git@github.com:Ferada/abcl.git abcl-git

The master branch should be a direct mirror of the SVN repository, i.e. only have commits with git-svn annotations, like git-svn-id: http://abcl.org/svn/trunk/abcl@14851 1c010e3e-....

Next we want to initialise the SVN remote and fetch the new commits.

git svn init --prefix=abcl/ http://abcl.org/svn/trunk/abcl
git svn fetch -r 14791:14851

Note that the first revision in the fetch command is the last one in the Git master branch and the other one is the current HEAD of the SVN repository.

This process will take just a few moments, however while all the new commits will be based off of the master branch in Git, the first commit will be a duplicate and have more changes than the existing commit in Git.

2015-08-31 20:55 mevenson           │ │ o │ │ │ ansi-test: reference new git repository
2015-07-01 04:16 mevenson           o─┴─│─┴─┴─┘ abcl-asdf: fix usage with local repository   <<<<
2015-07-01 04:16 mevenson           │ I─┘ abcl-asdf: fix usage with local repository         <<<<
2015-06-30 18:42 mevenson           o abcl-asdf: correct metadata

The newly created first commit (14791 here) has lots of changes.

.dir-locals.el
CHANGES
COPYING
MANUAL
README
abcl.asd
abcl.bat.in
...

While the second, pre-existing one, has way less (and is just showing the same changes that are in the SVN commit).

contrib/abcl-asdf/abcl-asdf.asd
contrib/abcl-asdf/abcl-asdf.lisp
contrib/abcl-asdf/maven-embedder.lisp

To correct that and continue with a linear history I'm using the interactive rebase command.

git co -b test
git reset --hard abcl/git-svn
git rebase -i <commit _before_ the duplicated one>

Copy the hash of the correct, smaller commit and replace the pick ... line (at the top) that contains the duplicated commit with the one from the cloned-from Git repository, then exit the editor. Once the rebase is done we just need to update the ref for git-svn.

git update-ref refs/remotes/abcl/git-svn test

Note the syntax. If done correctly it will be updated in e.g. tig, if not there'll be a new entry showing up in git show-ref.

Lastly we want to rebase master to get the new commits.

git co master
git co -D test
git rebase abcl/git-svn master

Which will look somewhat like this:

2016-06-13 08:06 mevenson           o [master] {abcl/git-svn} {github/master} doc: note changes for abcl-1.3.4
2016-05-16 23:05 mevenson           o Update to asdf-3.1.7
2016-05-16 21:43 mevenson           o Update to jna-4.2.2

And push the updates.

git push origin master
Setting up ABCL and LIRE
posted on 2015-11-11 20:34:29

Intro

The purpose of this article is to examine how using ABCL with existing libraries (arguably the main point of using ABCL at the moment) actually looks like in practice. Never mind integration with Spring, or other more involved frameworks, this will only touch a single library and won't require us to write from-Java-callable classes.

In the process of refining this I'm hoping to also get ideas about the requirements for building a better DSL for the Java FFI, based on the intended "look" of the code (that is, coding by wishful thinking).

Setup

Ensuring the correct package is somewhat optional:

(in-package #:cl-user)

Generally using JSS is a bit nicer than the plain Java FFI. After the contribs are loaded, JSS can be required and used:

(require '#:abcl-contrib)
(require '#:jss)

(use-package '#:jss)

Next, we need access to the right libraries. After building LIRE from source and executing the mvn dist command we end up with a JAR file for LIRE and several dependencies in the lib folder. All of them need to be on the classpath:

(progn
  (add-to-classpath "~/src/LIRE/dist/lire.jar")
  (mapc #'add-to-classpath (directory "~/src/LIRE/dist/lib/*.jar")))

Prelude

Since we're going to read pictures in a couple of places, a helper to load one from a pathname is a good start:

(defun read-image (pathname)
  (#"read" 'javax.imageio.ImageIO (new 'java.io.File (namestring pathname))))

To note here is the use of NEW from JSS with a symbol for the class name, the conversion of the pathname to a regular string, since the Java side doesn't expect a Lisp object and the #"" reader syntax from JSS to invoke the method read in a bit of a simpler way than using the FFI calls directly.

JSS will automatically "import" Java names, so the same function can simply be the following instead (provided that the names aren't ambiguous):

(defun read-image (pathname)
  (#"read" 'ImageIO (new 'File (namestring pathname))))

The names will be looked up again on every call though, so this option isn't the best performing one.

For comparison, the raw FFI would be a bit more verbose, but explicitely specifies all names:

(defun read-image (pathname)
  (jstatic "read" "javax.imageio.ImageIO" (jnew "java.io.File" (namestring pathname))))

Though with a combination of JSS and cached lookup it could be nicer, even though the setup is more verbose:

(defvar +image-io+ (jclass "javax.imageio.ImageIO"))
(defvar +file+ (jclass "java.io.File"))

(defun read-image (pathname)
  (#"read" +image-io+ (jnew +file+ (namestring pathname))))

At this point without other improvements (auto-coercion of pathnames, importing namespaces) it's about as factored as it will be (except moving every single call into its own Lisp wrapper function).

Building an index

To keep it simple building the index will be done from a list of pathnames in a single step while providing the path of the index as a separate parameter:

(defun build-index (index-name pathnames)
  (let ((global-document-builder
          (new 'GlobalDocumentBuilder (find-java-class 'CEDD)))
        (index-writer (#"createIndexWriter"
                       'LuceneUtils
                       index-name
                       +true+
                       (get-java-field 'LuceneUtils$AnalyzerType "WhitespaceAnalyzer"))))
    (unwind-protect
         (dolist (pathname pathnames)
           (let ((pathname (namestring pathname)))
             (format T "Indexing ~A ..." pathname)
             (let* ((image (read-image pathname))
                    (document (#"createDocument" global-document-builder image pathname)))
               (#"addDocument" index-writer document))
             (format T " done.~%")))
      (#"closeWriter" 'LuceneUtils index-writer))))

Note: This code won't work on current ABCL as is, because the lookup is disabled for for nested classes (those containing the dollar character). Because of this, the AnalyzerType class would have to be looked up as follows:

(jfield "net.semanticmetadata.lire.utils.LuceneUtils$AnalyzerType" "WhitespaceAnalyzer")

All in all nothing fancy, JSS takes care of a lot of typing as the names are all unique enough.

The process is simply creating the document builder and index writer, reading all the files one by one and adding them to the index. There's no error checking at the moment though.

To note here is that looking up the precise kind of a Java name is a bit of a hassle. Of course intuition goes a long way, but again, manually figuring out whether a name is a nested class or static/enum field is annoying enough since it involves either repeated calls to JAPROPOS, or reading more Java documentation.

Apart from that, this is mostly a direct transcription. Unfortunately written this way there's no point in creating a WITH-OPEN-* macro to automatically close the writer, however, looking at the LuceneUtils source this could be accomplished by directly calling close on the writer object instead - a corresponding macro might this then:

(defmacro with-open ((name value) &body body)
  `(let ((,name ,value))
     (unwind-protect
          (progn ,@body)
       (#"close" ,name))))

It would also be nice to have auto conversion using keywords for enum values instead of needing to look up the value manually.

Querying an index

The other way round, looking up related pictures by passing in an example, is done using an image searcher:

(defun query-index (index-name pathname)
  (let* ((image (read-image pathname))
         (index-reader (#"open" 'DirectoryReader
                                (#"open" 'FSDirectory
                                         (#"get" 'Paths index-name (jnew-array "java.lang.String" 0))))))
    (unwind-protect
         (let* ((image-searcher (new 'GenericFastImageSearcher 30 (find-java-class 'CEDD)))
                (hits (#"search" image-searcher image index-reader)))
           (dotimes (i (#"length" hits))
             (let ((found-pathname (#"getValues" (#"document" index-reader (#"documentID" hits i))
                                                 (get-java-field 'builders.DocumentBuilder "FIELD_NAME_IDENTIFIER"))))
               (format T "~F: ~A~%" (#"score" hits i) found-pathname))))
      (#"closeReader" 'LuceneUtils index-reader))))

To note here is that the get call on java.nio.file.Paths took way more time to figure out than should've been necessary: Essentially the method is using a variable number of arguments, but the FFI doesn't help in any way, so the array (of the correct type!) needs to be set up manually, especially if the number of variable arguments is zero. This is not obvious at first and also takes unnecessary writing.

The rest of the code is straightforward again. At least a common wrapper for the length call would be nice, but since the result object doesn't actually implement a collection interface, the point about having better collection iteration is kind of moot here.

A better DSL

Considering how verbose the previous examples were, how would the "ideal" way look like?

There are different ways which are more, or less intertwined with Java semantics. On the one end, we could imagine something akin to "Java in Lisp":

(defun read-image (pathname)
  (ImageIO/read (FileInputStream. pathname)))

Which is almost how it would look like in Clojure. However, this is complicating semantics. While importing would be an extension to the package mechanism (or possibly just a file-wide setting), the Class/field syntax and Class. syntax are non-trivial reader extensions, not from the actual implementation point of view, but from the user point of view. They'd basically disallow a wide range of formerly legal Lisp names.

(defun read-image (pathname)
  (#"read" 'ImageIO (new 'FileInputStream pathname)))

This way is the middle ground that we have now. The one addition here could be that name lookup is done at macro expansion / compilation time, so they are fixed one step before execution, whereas at the moment the JSS reader macro will allow for very late bound name lookup instead.

The similarity with CLOS would be the use of symbols for class names, but the distinction is still there, since there's not much in terms of integrating CLOS and Java OO yet (which might not be desirable anyway?).

Auto-coercion to Java data types also takes place in both cases. Generally this would be appropriate, except for places where we'd really want the Java side to receive a Lisp object. Having a special variable to disable conversion might be enough for these purposes.

If we were to forego the nice properties of JSS by requiring a function form, the following would be another option:

(defun read-image (pathname)
  $(read 'ImageIO (new 'FileInputStream pathname)))

Where $(...) would be special syntax indicating a Java method call. Of course the exact syntax is not very relevant, more importantly static properties could be used to generate a faster, early bound call by examining the supplied arguments as a limited form of type inference.

Summary

After introducing the necessary steps to start using ABCL with "native" Java libraries, we transcribed two example programs from the library homepage.

Part of this process was to examine how the interaction between the Common Lisp and Java parts looks like, using the "raw" and the simplified JSS API. In all cases the FFI is clunkier than needs be. Especially the additional Java namespaces are making things longer than necessary. The obvious way of "importing" classes by storing a reference in a Lisp variable is viable, but again isn't automated.

Based on the verbose nature of the Java calls an idea about how a more concise FFI DSL could look like was developed next and discussed. At a future point in time this idea could now be developed fully and integrated (as a contrib) into ABCL.

Limits of Unix shell
posted on 2015-08-20 14:54:10

Let's go back to the GNU Coreutils list of tools. ls for example. Usually the user will have set some alias to ls instead of the plain invocation, either to enable highlighting (--color), sorting (--sort), or to add more information than just the filenames (e.g. --format). There is even integration with Emacs (--dired).

The question then is: How much of the functionality of ls is actually devoted to secondary formatting instead of listing files? And shouldn't this functionality be moved into separate tools? Since output is intended for multiple kinds of recipients, additional data creeps in and complicate tools a lot.

Alternatively, we could imagine using ls only to get unformatted and unsorted output. Which would then be passed through to a sort command and a fmt command of sorts. Of course this all takes some more time, re-parsing of output etc., so it's understandable in the interest of performance not to do this in the traditional Unix shell.

However, let's assume a more sophisticated shell. Assuming ls is limited to listing files, then the user will alias ls to a pipeline instead, namely something akin to ls | sort | fmt. Then again, formatting is part of the user interface, not the functionality, so it should rather be part of the internal shell formatting, possibly exposed as separate filters as well.

The result of ls is a (possibly nested) directory listing. Regardless of post-processing, this "object" should still be available for further investigation. Which means that while sorting may be applied destructively, formatting may not, unless specifically requested, in which case the result would be a kind of "formatted" object (text, GUI widget) instead.

In other terms, the user should be able to refer to the last results immediately, instead of rerunning the whole pipeline. E.g. coming from Common Lisp, variables like * to *** will store the last three results for interactive use. In the shell then, ls would set * to the generated directory listing; since the listing is also most likely printed to the screen, the full listing will also be stored (in that object) to be used again if e.g. * is requested again. Rerunning the command, on the other hand, will possibly generate a different directory listing as files may have been changed, so there is an immediate difference between the two forms.

Examples

The pipeline ls | wc -l is (at least for me) often used to get the number of files in the (current) directory. Unfortunately there is no direct way to get this number directly except to enumerate the entries in a directory (under Linux that is).

Image viewers and other systems tools
posted on 2015-06-12 16:20:23

After all the trouble I went through with trying to get an image viewer to work in CL (due to the used GTK+ 3 library being problematic, that is, unmaintained) maybe a different approach is more viable. It would be possible to use one existing program as a front end by calling it via IPC. So e.g. feh, my go-to program for that task, already has configurable keybindings; it should be a smaller problem to remote control it (even with adding some code).

However, as with all these *nix combinators, it feels like a mish-mash of tools intertwined and not-quite the optimal solution.

Consider what happens if you want to add new functionality, i.e. new widgets. In that case composability breaks down since feh is relatively minimal and therefore doesn't have much options in terms of providing different menus, input widgets, etc. Therefore you'd have to find either a different viewer with more scripting capabilities (which is counter to the "one-tool" mantra), or switch to a more integrated approach to have this component as an internal part of your environment.

Obviously now would be the time for either components/CORBA, or a Lisp Machine to hack up other programs.

Or switch to Qt. It seems that the bindings for that framework are more stable than the GTK bindings and additionally they (Qt) just have more people working on the framework.

Since one of the problems with the GTK bindings is the relatively recent upgrade to GTK+ 3, there seems to be a point in using the previous version 2 instead, considering that even GIMP didn't update yet.

PostgreSQL insights I
posted on 2015-04-25 13:47:23+01:00

After working a lot more with PostgreSQL, I happened to stumble upon a few things, mostly related to query optimisation, that at least for me weren't quite readily apparent. Please note that this is all with regards to PostgreSQL 9.3, but unless otherwise noted it should still be the case for 9.4 as well.

Slow DELETEs

Obviously, indexes are key to good query performance. If you're already using EXPLAIN, with, or without ANALYZE, chances are good you know how and why your queries perform like they do. However I encountered a problem with a DELETE query, where the output from EXPLAIN was as expected, i.e. it was using the optimal plan, but still the performance was abysmal; a query to delete about 300 elements in bulk, like DELETE FROM table WHERE id IN (...);, was quite fast to remove the elements (as tested from a separately running psql), but still the query took about three minutes(!) to complete.

In this scenario the table in question was about a million rows long, had a primary index on the id column and was referenced from three other tables, which also had foreign key constraints set up; no other triggers were running on any of the tables involved.

The postgres server process in question wasn't doing anything interesting, it was basically taking up one core with semop calls as reported by strace, no other I/O was observed though.

At that point I finally turned to #postgresql. Since the information I could share wasn't very helpful, there were no immediate replies, but one hint finally helped me to fix this problem. It turns out that the (kinda obvious) solution was to check for missing indexes on foreign key constraints from other tables. With two out of three indexes missing I resumed to CREATE INDEX CONCURRENTLY...; and about five minutes later the DELETE was now running in a few milliseconds.

The part where I was really frustrated here is that none of the available statistics guided me in the search; EXPLAIN ANALYZE apparently doesn't include the runtime for foreign key constraints and they don't show up in other places as well. In hindsight this is something that I should've checked earlier (and from now on I will), but it's also a weakness of the analyze framework not to help the developer to see the slowdowns involved in a situation like this.

Common Table Expressions

Refactoring queries to reuse results with WITH queries is absolutely worth it and improved the runtime of rather large queries by a large factor. This is something that can be seen from the query plan, so when you're using the same expressions twice, start looking into this and see if it helps both for readability (don't repeat yourself) and performance.

JSON result construction

We need nested JSON output in a few cases. This means (in 9.3, there are some better functions available in 9.4) that a combination of row_to_json(row(json_agg(...))) was necessary to get proper nesting of aggregated sub-objects, as well as wrapping the end result in another object, because the output had to be formatted as a JSON object (with curly brackets) instead of a JSON array (with rectangular brackets).

Technicalities aside the JSON support is pretty good and since the initial code was written I've discovered that since we in many cases don't actually have multiple results (for json_agg), not using that method will again significantly improve performance.

That means instead of something like the following:

SELECT row_to_json(row(json_agg(...))) FROM ... JOIN ... GROUP BY id;

, where the input to json_agg is a single result from the JOIN, we can write the following instead:

SELECT row_to_json(row(ARRAY[...])) FROM ... JOIN ...;

, which, if you examine the output of EXPLAIN, means no sorting because of the GROUP BY clause. The convenience of json_agg here doesn't really justify the significant slowdown caused by the aggregation function.

Note that the array constructed via ARRAY[] is properly converted to JSON, so the end result is again proper JSON.

ELS 2015
posted on 2015-04-22 21:35:23+01:00

Yesterday the 8th European Lisp Symposium finished. In short it was a great experience (I was there the first time, but hopefully not the last). The variety and quality of talks was great, a good number of people attended both the actual talks as well as both(!) dinners, so there were lots of opportunities to exchange thoughts and quiz people, including on Lisp. Also except for one talk I believe all talks happened, which is also a very good ratio.

For the talks I still have to go through the proceedings a bit for details, but obviously the talk about the Lisp/C++ interoperability with Clasp was (at least for me) long awaited and very well executed. Both the background information on the origins, as well as the technical description on the use of LLVM and the integration of multiple other projects (ECL, SICL, Cleavir) were very interesting and informative.

There were also quite a number of Racket talks, which was surprising to me, but given the source of these projects it makes sense since the GUI is pretty good. VIGRA, although it's a bit unfortunate name, looks pretty nice. The fact that the bindings to a number of languages are available and in the case of the Lisps make the interaction a lot easier is good to see, so it might be a good alternative to OpenCV. It's also encouraging that students enjoy this approach and are as it seems productive with the library.

P2R, the Processing implementation in Racket is similarly interesting as there is a huge community using Processing and making programming CAD applications easier via a known environment is obviously nice and should give users more opportunities in that area.

If I remember correctly the final Racket talk was about constraining application behaviour, which was I guess more of a sketch how application modularity and user-understandable permissions could be both implemented and enforced. I still wonder about the applicability in e.g. a Lisp or regular *nix OS.

The more deeply technical talks regarding the garbage collector (be it in SBCL, or Allegro CL) were both very interesting in that normally I (and I imagine lots of people) don't have (a chance) to get down to that level and therefore learning about some details about those things is appreciated.

Same goes for the first talk by Robert Strandh, Processing List Elements in Reverse Order, which was really great to hear about in the sense that I usually appreciate the :from-end parameter of all the sequence functions and still didn't read the details of the interaction between actual order of iteration vs. the final result of the function. Then again, the question persists if any programs are actually processing really long lists in reverse in production. Somehow the thought that even this case is optimised would make me sleep easier, but then again, the tradeoff of maintainable code vs. performance improvements remains (though I don't think that the presented code was very unreadable).

Escaping the Heap was nice and it'll be great to see an open-sourced library for shared memory and off-heap data structures, be it just for special cases anyway.

Lots of content, so I doubt I'll get to the lightning talks. It'll be just this for now then. Hopefully I have time/opportunity to go to the next ELS or another Lisp conference; I can only recommend going.

Extensions, extensions, extensions
posted on 2015-04-17 10:15:23

For the longest time I've been using quite a number of Firefox extensions. The known problem with that is a steady slowdown, which is only amplified by my habit of soft bookmarks, i.e. having hundreds of open tabs with their corresponding state (which is the whole reason to do that).

However seeing that a lot of state is captured in a very inconvenient form, that is, it's hard to modify a long list of tabs, I want to make both this and incidentally also sharing of state between sessions and even browsers much easier.

The idea is to separate part of the browser state into a separate component, namely a database server for cookies (and other local storage), tabs, sessions and bookmarks. This way and by having a coarse control over loading of sessions the process of migrating state between sessions and browsers should be much easier.

Fortunately most of the browser extensions APIs seem to be usable enough to make this work for at least Firefox and Chrome, so at the moment I'm prototyping the data exchange. Weird as it is for Chrome you have to jump through some conversion hoops (aka local native extensions via a local process exchanging data via stdio), so it seems that the Firefox APIs, since they allow socket connections, are a bit friendlier to use. That said, the exchange format for Chrome, Pascal string encoded JSON, seems like a good idea with the exception of forcing local endianess, which is completely out of the question for a possibly network enabled system (which is to say, I'm definitely going to force network byte order instead).

Reifying Hidden Code
posted on 2015-04-09 13:17:46

The title sounds a bit too majestic too be true. However the though just occured to me, that much of what I've been doing over the last months often involved taking "dev ops" code, e.g. configuration code that lives in the deployment scripts, and putting it into a reusable format, i.e. an actual (Python) module.

That usually happens because what was once a depencency for an application (or service if you like), is now needed to be accessible from (Python) code for testing purposes, or because the setup routine isn't actually just setup any more, but happens more regularly as part of the application cycle.

Of course doing this has some major downsides, as the way scripts are written, using a specific library to access remote hosts, without much error handling, is fundamentally different from how a application code works, that is usually with a more powerful database interface, without any shell scripting underlying the commands (which will instead be replaced by "native" file manipulation commands) and with more proper data structures.

That leaves me with some real annoying busy work just to transform code from one way of writing it to another. Maybe the thing to take away here is that configuration code isn't and application code will sooner or later become library code as well -- aka. build everything as reusable modules. This of course means that using certain configuration (dev ops) frameworks is prohibited, as they work from the outside in, e.g. by providing a wrapper application (for Fabric that would be fab) that invokes "tasks" written in a certain way.

Properly done, that would mean that configuration code would be a wrapper so thin that the actual code could still be reused from application code and different projects later on. The difference between configuration and business logic would then be more of a distinction between where code is used, not how and which framework it was written against.

X11 keybindings for easier terminal clipboard handling
posted on 2015-02-12 15:35:15+01:00

After years of annoyance with the X11 behaviour for clipboard and selection handling with regards to terminal applications, I managed to find a good compromise via some additional shortcurts in my window manager of choice, dwm and terminal multiplexer, tmux.

Now, pressing C-o p (in tmux) pastes from X11 primary selection, C-o P from the clipboard and Linux-z (meaning the key formerly known is Windows key) exchanges clipboard and primary selection, so no more awkward pasting and selecting with the mouse in order to get the correct string in the correct location.

# to switch clipboard and primary selection
PRIMARY=`xsel -op`; xsel -ob | xsel -ip; echo "$PRIMARY" | xsel -ib

# to paste from primary selection in tmux
bind p run "tmux set-buffer \"$(xsel -op)\"; tmux paste-buffer

Other useful keybindings in tmux would copying into the clipboard etc. and there is a useful SO post explaning that.

As before this is public at the customisations branch on Github; I still have to upload the tmux part though.

A mocking library for Common Lisp
posted on 2015-01-06 14:52:47+01:00

After some time thinking about and rewriting the library in a subtly different approaches, CL-MOCK now looks good to me as a version one.

I've removed all mentions of generic functions for now, as first of all I'm unsure if functionality to dynamically rebind methods is even necessary, and second, because doing that is complicated by the details of that protocol. Which means that specifying which method to override is a bit hairy and I really want a good syntax before I let that stuff loose. So it'll have to wait until I figure out a good way to do that. Since it should be easily added to the existing frontend, it will very probably be done with some overloading of existing functions / macros (e.g. with a :method specifier or so).

I'm hoping to test all of this and possibly investigate the generic function issue on some other library. At the moment my single more complex example is a replacement for the DRAKMA HTTP-REQUEST call, which worked surprisingly well and might even make it into a new test suite. The benefit is obviously the improved reliability of not having to have a running network connection for testing libraries against a (HTTP) server.

Previous Next

This blog covers unix, tachikoma, postgresql, lisp, kotlin, java, git, emacs

View content from 2014-08, 2014-11, 2014-12, 2015-01, 2015-02, 2015-04, 2015-06, 2015-08, 2015-11, 2016-08, 2016-09, 2016-10, 2016-11, 2017-06, 2017-07


Unless otherwise credited all material Creative Commons License by Olof-Joachim Frahm