I wrote down a little scenario for myself for the next iteration of act
:
- mark "http://www.google.de" with mouse (selection -> goes to x11 buffer-cut)
- listen on those changes and show current context and choices
- press+release "act primary" key -> runs primary action immediately
- (or) press "act" key once -> opens buffer for single character input to select choice
- buffer captures keyboard, after single character action is taken, esc aborts, "act" again should expand the buffer to full repl
- mouse -> selection of source? history?
- context -> focused window? -> lookup for program gives cwd, etc.
- since that's not foolproof an obvious addition to the protocol would be to store a reference to the source of the clipboard (or the program that will be queried for it)
- you should always be able to interrogate windows about their origin and capabilities, including their scripting interface
- pattern matching url -> rule based decisions
- primary / secondary actions -> rule of three basically, except we err on the side of caution and just bind two actions two keys
- special handling for clipboard types that aren't text? allow for example to override a rule based on whether a picture would be available
Now, there are several problems with this on a regular X11-based desktop already:
- How do you identify the program / PID belonging to a particular window?
- How do you get information about that program, e.g. its current working directory?
In addition, since I'm using tmux
inside a terminal emulator there are some
more problems:
- How do you know which
tmux
session is currently active? - How do you know which program is currently "active" in a
tmux
session?
Basically it's a recursive lookup for the "current state" of what's being displayed to the user. Not only that, but for things like browsers, image editors, video editors, anything document based it's still the same problem at another level, namely, what the current "context" is, like the currently open website, picture, video, scene, what have you.
Coming back to earlier thoughts about automation, there's no way for most of
these to be accurately determined at this time. Neither is e.g. DBUS
scripting common enough to "just use it" for scripting, there are also several
links missing in the scenario described above and some can likely never be
fixed to a sufficient degree to not rely on heuristics.
Nevertheless, with said heuristics it's still possible to get to a state where a productivity plus can be achieved with only moderate amount of additional logic to step through all the indirections between processes and presentation layers.
Now let me list a few answers to the questions raised above:
- The PID for a particular window can be derived from an X11 property, together
with
xdotool
this gives us an easy pipeline to get this value: ``. - Information about the running process can then be retrieved via the
proc
filesystem, e.g.readlink /proc/$PID/cwd
for the current working directory. Of course this has limited value in a multi-threaded program or any environment that doesn't rely on the standard filesystem interface (but uses its own defaults). - I do not have an answer for the currently active
tmux
session yet, presumably you should somehow be able to get from a PID to a socket and thus to the session? - For
tmux
, the currently active program in a session is a bit more complex, ``, which we'll also have to filter for the active values.
For scripting interfaces, many programs have their own little implementation of this, but most problematic here is that you want to go from a X11 window / PID to the scripting interface, not through some workaround by querying for interfaces!
For programs like Emacs and any programmable environment we can likely script something together, but again it's not a very coherent whole by any means.