Content from 2018-07
Now in fact there are still reasons we need key codes that are different to the eventual text representation, e.g. for cursor movement and other special characters like function keys.
Looking at the Xorg source code now, there's a relatively fixed notion of what a keyboard can actually do. I suspect that conceptually a somewhat backwards-compatible extension would be to have a new dedicated kind of device, that is exposed (similarly to a keyboard with an integrated touchpad) separately to the other functionality of the device.
In particular, I'd like to keep the keyboard in "regular" mode as long as the host hasn't signaled that it wants to use the extended functionality via, presumably some part of the USB negotiation. Only at that point would the extension be activated and the keyboard output would be sent via it. The regular keyboard device would then be virtually unplugged.
I suspect that this is better than having two devices, one for key codes and one for text input, especially because we'd not be able to guarantee in which order the two devices would be read. This is less / not a problem between a keyboard and a pointer device of course.
Now given that X11 isn't the interface most applications are written against, how would the text actually arrive at an application? I'd imagine basically extending the whole event handling by one more event, the text event, which wouldn't correspond to any key (thus, it can't be in a pressed or unpressed state). In terms of GTK+ and QT this might be even easier than for a lower level application since many applications will only want to deal with text input and pre-defined keyboard shortcuts anyway.
Speaking of which, what does "Ctrl-C" actually mean? Of course the mnemonic for "copy" is in there, but also the "control" modifier. How well does this play with text input? Not at all, and I believe modifiers work better logically on the key code level; for text input I imagine other modifiers like "emphasis", or, more specifically, "bold", would make more sense, possible "URL", or "footnote".
Overall there can of course be modifiers active while text input occurs, it's more a question of whether they (can) be assigned any meaning without falling back to the flawed character equals key press comparison.
What does this gain us? Ideally every application (or more accurately: each toolkit) could now drop logic specifically to translate key codes to text, since all of it would already be handled by the keyboard itself. Keypresses would come in via the same interface and be used for specific, non-text functionality.
The keyboard protocol is still using the same approach as roughly since the start of computing: The keyboard is a dumb device, reporting each mechanical button that's pressed. The computer is the intelligent device, translating those into characters.
There are some attempts that have made it into various places, e.g. there's a flag in the USB HID protocol to indicate the physical layout, be it US, or some other one. Except no manufacturer sets it, so no driver uses it.
But, what if we had a keyboard, courtesy of QMK and similar firmwares, that is substantially more intelligent? If the protocol allowed for it we'd be able to have such nice things as sending an "a", an "あ", or a "●" without any remapping to be done! In fact if the keyboard could send Unicode sequences we can do things that aren't possible by remapping, like sending characters from various scripts through a macro key without impacting any keypress since we have an immensely increased value space to work with.
One part of the lack of cohesion on *nix systems is the hack of command line completion via e.g. Bash or ZSH modules. I can think of roughly three ways in the *nix paradigm to have a more cohesive pattern here:
Give full control to the command line program and invoke it with a special handler to parse and generate appropriate output for command line completion.
Give partial control and generate a script like description of the completion, e.g. as a Bash- or ZSH-specific script.
Which is pretty similar, generate a static description on request.
All of these use the program, the thing that's being invoked, as the central mechanism of how to do completion. No extra files are required to be put anywhere and the description can always be in sync with what the program actually does later on. The downside might very well be that the program is invoked multiple times.
Therefore perhaps also
- Invoke the program in a special mode and keep it alive while it handles command line completion. Once finished and the user wants to invoke it, it would simply use the already existing in-memory structures and continue.
Of course I can see the downsides here too, though in the interest of having a more interwoven system I'd argue that the benefit might be worth it overall, in particular for programs whose arguments are a little bit more like a DSL than just simple file arguments. Notably though any shell features would not be available in variant 4, instead the program would have to call back or replicate the required functionality, making it somewhat less appealing?
Let me tell you about one particular annoying issue I came across when using a GC-ed environment. No, not Python, though I'm almost definitely sure it will have the same issue. No, it's of course Common Lisp, in this case CCL and SBCL in particular, both of which have a stop-the-world GC (as far as I know there are no knobs to change the behaviour outside of what I'll describe below).
Now, do you also remember these things called HDDs? Turns out that one of my external hard drives likes to shut itself down after a while. You can hear that very well. However, since the drive is still mounted, accessing (uncached) parts of the filesystem will trigger a wake event. But getting the platters up to speed again takes a lot of time, so in between what happens?
Exactly, uninterruptible sleep for the process in question. It's one of the few possibilities where that process state can happen and if it was without the specifics of the GC involved it would just block one thread while the rest, in particular the GUI thread, would keep moving.
In this particular instance though, the GUI would work for a moment and then freeze until the drive finally responded with the requested data.
Now why is that?
Turns out the GC asks each (runtime) thread to block for the GC run. This is done via signals. Except one thread won't respond to signals since it's ... sleeping. Uninterruptibly.
Great.
Suggested options include spawning a separate process and wait for I/O (which I'd rather not do, since it'd mean doing that for every single library call that might operate with e.g. files, which is basically a lot. It just seems there's no good way to deal with this except in changing the runtime and dealing with the fact that the GC might not be able to reach all threads.
I looked a bit at the CCL runtime and it seems if we promise not to do anything bad with regards to the GC, we might be able to work around it by setting a new flag on a per-thread level that excludes it from the GC waiting loop. We'd only do this around potentially problematic FFI calls, but also around known internal system calls. When returning from the foreign land we'd also need to "catch up" with the GC, potentially blocking until the current run is done (if there is one). But that's solvable, with a little bit more overhead for FFI calls. I believe that's worth it.
Since I'm all around not familiar with either the CCL or SBCL code bases I suspect that even the generalisation of doing this for all FFI would be tenable, in which case the pesky annotations would cease to be necessary, fixing the issue without having to manually adjust anything on the developer's side.
Lastly, if I knew the right terms I'm sure there are solutions for this in either GHC or Erlang, except that I wouldn't really know what to search for.
Btw. this is all super iffy to debug too, since e.g. gdb will just
hang
in case the target process
is in this specific task state, being based on
ptrace
and seemingly
inheriting it through there.