Content tagged unix

posted on 2018-08-08 01:36:29

Continuing from an earlier post, what might the semantics be that we'd like to have for a more useful shell integrated with a Lisp environment?

Syntax

Of course we can always keep the regular Common Lisp reader; however it's not best suited for interactive shell use. In fact I'd say interactive use period requires heavy use of editing commands to juggle parens. Which is why some people use one of the simplifications of not requiring the outermost layer of parens on the REPL.

So, one direction would be to have better S-expression manipulation on the REPL, the other to have a syntax that's more incremental than S-expressions.

E.g. imagine the scenario that,

as a terminal user,
I'm navigating directories,
listing files,
then looking for how many files there were,
then grepping for a particular pattern,
then opening one of the matches.

In sh terms that's something the lines of

$ cd foo
$ ls
...
$ ls | wc -l
42
$ ls | grep bar
...
$ vim ...

In reality I'm somewhat sure no one's going as far as doing the last two steps in an iterative fashion, like via ls | grep x | grep y | xargs vim, as most people will have a mouse readily available to select the desired name. There are some terminal widgets which allow the user to select from e.g. one of the input lines in a command line dialog, but again it's not a widespread pattern and still requires manual intervention in such a case.

Also note that instead of reusing the computation the whole expression keeps being reused, which also makes this somewhat inefficient. The notion of the "current" thing being worked on ("this", "self") also isn't expressed directly here.

In the new shell I'd like to see part of explored. We already have the three special variables *, ** and *** (and / etc.!) in Common Lisp - R, IPython and other environments usually generalise this to a numbered history, which arguably we might want to add here as well - so it stands to reason that these special variables make sense for a shell as well.

$ cd foo
$ ls
...
$ * | wc
42
$ grep ** bar
...
$ vim *

(Disregarding the need for a different globbing character ...)

There's also the very special exit status special variable in shells that we need to replicate. This will likely be similar to Perl special variables that keep track of one particular thing instead of reusing the * triad of variables for this too.

Pipelines

The expression compiler should be able to convert from a serial to a concurrent form as necessary, that is, by converting to the required form at each pipeline step.

ls | wc -l | vim

Here, ls will be the built-in LIST-DIRECTORY, which is capable of outputting e.g. a stream of entries. wc -l might be compiled to simply LENGTH, which, since it operates on fixed sequences, will force ls to be fully evaluated (disregarding a potential optimisation of just counting the entries instead of producing all filenames). The pipe to vim will then force the wc output to text format, since vim is an unknown native binary.

These would be the default conversions. We might force a particular interpretation by adding explicit (type) conversions, or adding annotations to certain forms to explain that they accept a certain input/output format; for external binaries the annotations necessarily would be more extensive.

For actual pipelines it's extremely important that each step progresses as new data becomes available. Including proper backpressure this will ensure good performance and keep the buffer bloat to a minimum. Again this might be a tunable behaviour too.

I/O

The shell should be convenient. As such any output and error output should be captured as a standard behaviour, dropping data if the buffers would get too big. This will then allow us to refer to previous output without having to reevaluate any expression / without having to run a program again (which in both cases might be expensive).

Each datatype and object should be able to be rendered and parsed in more than a single format too. E.g. a directory listing itself might be an object, but will typically be rendered to text/JSON/XML, or to a graphical representation even.

Limits of Unix shell

posted on 2015-08-20 14:54:10

Let's go back to the GNU Coreutils list of tools. ls for example. Usually the user will have set some alias to ls instead of the plain invocation, either to enable highlighting (--color), sorting (--sort), or to add more information than just the filenames (e.g. --format). There is even integration with Emacs (--dired).

The question then is: How much of the functionality of ls is actually devoted to secondary formatting instead of listing files? And shouldn't this functionality be moved into separate tools? Since output is intended for multiple kinds of recipients, additional data creeps in and complicate tools a lot.

Alternatively, we could imagine using ls only to get unformatted and unsorted output. Which would then be passed through to a sort command and a fmt command of sorts. Of course this all takes some more time, re-parsing of output etc., so it's understandable in the interest of performance not to do this in the traditional Unix shell.

However, let's assume a more sophisticated shell. Assuming ls is limited to listing files, then the user will alias ls to a pipeline instead, namely something akin to ls | sort | fmt. Then again, formatting is part of the user interface, not the functionality, so it should rather be part of the internal shell formatting, possibly exposed as separate filters as well.

The result of ls is a (possibly nested) directory listing. Regardless of post-processing, this "object" should still be available for further investigation. Which means that while sorting may be applied destructively, formatting may not, unless specifically requested, in which case the result would be a kind of "formatted" object (text, GUI widget) instead.

In other terms, the user should be able to refer to the last results immediately, instead of rerunning the whole pipeline. E.g. coming from Common Lisp, variables like * to *** will store the last three results for interactive use. In the shell then, ls would set * to the generated directory listing; since the listing is also most likely printed to the screen, the full listing will also be stored (in that object) to be used again if e.g. * is requested again. Rerunning the command, on the other hand, will possibly generate a different directory listing as files may have been changed, so there is an immediate difference between the two forms.

Examples

The pipeline ls | wc -l is (at least for me) often used to get the number of files in the (current) directory. Unfortunately there is no direct way to get this number directly except to enumerate the entries in a directory (under Linux that is).

Lisp layered on Unix

posted on 2014-11-27 21:21:10

This blog post on LispCast is a pretty good start to get thinking about the intricacies of the interaction between Lisp (Machine) ideas and the current Unix environment. Of course that includes plan9 on the one side and Emacs on the other.

Lisp shell

There is scsh, but it's not really what I'm looking for. Using emacs as login shell (with the eshell package) comes closest to it regarding both with existing commands and integration of Lisp-based ones. However, while pipes work as expected with eshell, data is still passed around as (formatted) text. There doesn't seem to be an easy way to pass around in-memory objects, at least while staying in Emacs itself. That would of course mean to reimplement some (larger?) parts of that system.

This all ties in to the idea that unstructured text isn't the best idea to represent data between processes. Even though Unix pipes are extremely useful, the ecosystem of shell and C conventions means that the obvious way isn't completely correct, meaning that there are edge cases to consider. The best is something as innocent as ls | wc -l, which will break, depending on the shell settings, with some (unlikely) characters in filenames, i.e. newlines.

Common formats

One of the problems is obviously that in order to pass around structured data, i.e. objects, all participants have to understand their format. Passing references won't work without OS support though.

Instead of having unstructured streams, use streams of (data) objects. The distinction here is Plain Old Objects (PODs) instead of objects with an associated behaviour.

Let's take a look at standard Unix command line tools (I'm using GNU Coreutils here) in order to reproduce the behaviour and/or intent behind them:

Output of entire files

The first command here is cat. Although GNU cat includes additional transformations, this command concatenates files. Similar to the description, we can image a CAT to perform a similar operation on streams of objects.

It doesn't make much sense to concatenate a HTML document and an MP3 file (hence you won't do it in most cases anyway). However, since files are unstructured, cat can work on them.

Registering functionality

Although you can call commands individually on files, some of them form an ad-hoc service interface already: The C compiler, along with the toolchain forms one such interface, where you're required to use the same interface if you want to seamlessly replace one part of the toolchain.

Same goes for the Coreutils: As long as you honour the interface, programs can be replaced with different implementations.

Interactive commands

Emacs has a special form interactive to indicate whether a command can be directly called via the command prompt. There is also special handling there to accomodate the use of interactive arguments. This is something that can be generalised to the OS. An example of where this already happens is the .mailcap file and the .desktop files from desktop suites.

Threading and job control

Unfortunately getting proper job control to work will be a bit of a problem in any Lisp implementation, since the best way to implement the concurrent jobs is using threads, which are not particularly suited for handling the multitude of signals and other problems associated with them. Even without job control pipelines implemented in Lisp require shared in-memory communication channels, so something like (object-based) streams, mailboxes, or queues are necessary to move IO between different threads.

This blog covers work, unix, tachikoma, scala, sbt, redis, postgresql, no-ads, lisp, kotlin, jvm, java, hardware, go, git, emacs, docker

View content from 2014-08, 2014-11, 2014-12, 2015-01, 2015-02, 2015-04, 2015-06, 2015-08, 2015-11, 2016-08, 2016-09, 2016-10, 2016-11, 2017-06, 2017-07, 2017-12, 2018-04, 2018-07, 2018-08, 2018-12, 2020-04, 2021-03