Content tagged unix
Let's go back to the GNU Coreutils list of tools.
ls for example. Usually
the user will have set some alias to
ls instead of the plain invocation,
either to enable highlighting (
--color), sorting (
--sort), or to add more
information than just the filenames (e.g.
--format). There is even
integration with Emacs (
The question then is: How much of the functionality of
ls is actually
devoted to secondary formatting instead of listing files? And shouldn't this
functionality be moved into separate tools? Since output is intended for
multiple kinds of recipients, additional data creeps in and complicate tools a
Alternatively, we could imagine using
ls only to get unformatted and unsorted
output. Which would then be passed through to a
sort command and a
command of sorts. Of course this all takes some more time, re-parsing of
output etc., so it's understandable in the interest of performance not to do
this in the traditional Unix shell.
However, let's assume a more sophisticated shell. Assuming
ls is limited to
listing files, then the user will alias
ls to a pipeline instead, namely
something akin to
ls | sort | fmt. Then again, formatting is part of the
user interface, not the functionality, so it should rather be part of the
internal shell formatting, possibly exposed as separate filters as well.
The result of
ls is a (possibly nested) directory listing. Regardless of
post-processing, this "object" should still be available for further
investigation. Which means that while sorting may be applied destructively,
formatting may not, unless specifically requested, in which case the result
would be a kind of "formatted" object (text, GUI widget) instead.
In other terms, the user should be able to refer to the last results
immediately, instead of rerunning the whole pipeline. E.g. coming from Common
Lisp, variables like
*** will store the last three results for
interactive use. In the shell then,
ls would set
* to the generated
directory listing; since the listing is also most likely printed to the screen,
the full listing will also be stored (in that object) to be used again if e.g.
* is requested again. Rerunning the command, on the other hand, will
possibly generate a different directory listing as files may have been changed,
so there is an immediate difference between the two forms.
ls | wc -l is (at least for me) often used to get the number of
files in the (current) directory. Unfortunately there is no direct way to get
this number directly except to enumerate the entries in a directory (under
Linux that is).
This blog post on LispCast is a pretty good start to get thinking about the intricacies of the interaction between Lisp (Machine) ideas and the current Unix environment. Of course that includes plan9 on the one side and Emacs on the other.
There is scsh, but it's not really what I'm looking for. Using emacs as
login shell (with the
eshell package) comes closest to it regarding both
with existing commands and integration of Lisp-based ones. However, while
pipes work as expected with
eshell, data is still passed around as
(formatted) text. There doesn't seem to be an easy way to pass around
in-memory objects, at least while staying in Emacs itself. That would of
course mean to reimplement some (larger?) parts of that system.
This all ties in to the idea that unstructured text isn't the best idea to
represent data between processes. Even though Unix pipes are extremely useful,
the ecosystem of shell and C conventions means that the obvious way isn't
completely correct, meaning that there are edge cases to consider. The best is
something as innocent as
ls | wc -l, which will break, depending on the shell
settings, with some (unlikely) characters in filenames, i.e. newlines.
One of the problems is obviously that in order to pass around structured data, i.e. objects, all participants have to understand their format. Passing references won't work without OS support though.
Instead of having unstructured streams, use streams of (data) objects. The distinction here is Plain Old Objects (PODs) instead of objects with an associated behaviour.
Let's take a look at standard Unix command line tools (I'm using GNU Coreutils here) in order to reproduce the behaviour and/or intent behind them:
Output of entire files
The first command here is
cat. Although GNU
cat includes additional
transformations, this command concatenates files. Similar to the description,
we can image a
CAT to perform a similar operation on streams of objects.
It doesn't make much sense to concatenate a HTML document and an MP3 file
(hence you won't do it in most cases anyway). However, since files are
cat can work on them.
Although you can call commands individually on files, some of them form an ad-hoc service interface already: The C compiler, along with the toolchain forms one such interface, where you're required to use the same interface if you want to seamlessly replace one part of the toolchain.
Same goes for the Coreutils: As long as you honour the interface, programs can be replaced with different implementations.
Emacs has a special form
interactive to indicate whether a command can be
directly called via the command prompt. There is also special handling there
to accomodate the use of interactive arguments. This is something that can be
generalised to the OS. An example of where this already happens is the
.mailcap file and the
.desktop files from desktop suites.
Threading and job control
Unfortunately getting proper job control to work will be a bit of a problem in any Lisp implementation, since the best way to implement the concurrent jobs is using threads, which are not particularly suited for handling the multitude of signals and other problems associated with them. Even without job control pipelines implemented in Lisp require shared in-memory communication channels, so something like (object-based) streams, mailboxes, or queues are necessary to move IO between different threads.