(proclaim λ) -

I wrote down a little scenario for myself for the next iteration of act:

mark "http://www.google.de" with mouse (selection -> goes to x11 buffer-cut)
listen on those changes and show current context and choices
press+release "act primary" key -> runs primary action immediately
(or) press "act" key once -> opens buffer for single character input to select choice
buffer captures keyboard, after single character action is taken, esc aborts, "act" again should expand the buffer to full repl
mouse -> selection of source? history?
context -> focused window? -> lookup for program gives cwd, etc.
since that's not foolproof an obvious addition to the protocol would be to store a reference to the source of the clipboard (or the program that will be queried for it)
you should always be able to interrogate windows about their origin and capabilities, including their scripting interface
pattern matching url -> rule based decisions
primary / secondary actions -> rule of three basically, except we err on the side of caution and just bind two actions two keys
special handling for clipboard types that aren't text? allow for example to override a rule based on whether a picture would be available

Now, there are several problems with this on a regular X11-based desktop already:

How do you identify the program / PID belonging to a particular window?
How do you get information about that program, e.g. its current working directory?

In addition, since I'm using tmux inside a terminal emulator there are some more problems:

How do you know which tmux session is currently active?
How do you know which program is currently "active" in a tmux session?

Basically it's a recursive lookup for the "current state" of what's being displayed to the user. Not only that, but for things like browsers, image editors, video editors, anything document based it's still the same problem at another level, namely, what the current "context" is, like the currently open website, picture, video, scene, what have you.

Coming back to earlier thoughts about automation, there's no way for most of these to be accurately determined at this time. Neither is e.g. DBUS scripting common enough to "just use it" for scripting, there are also several links missing in the scenario described above and some can likely never be fixed to a sufficient degree to not rely on heuristics.

Nevertheless, with said heuristics it's still possible to get to a state where a productivity plus can be achieved with only moderate amount of additional logic to step through all the indirections between processes and presentation layers.

Now let me list a few answers to the questions raised above:

The PID for a particular window can be derived from an X11 property, together with xdotool this gives us an easy pipeline to get this value: ``.
Information about the running process can then be retrieved via the proc filesystem, e.g. readlink /proc/$PID/cwd for the current working directory. Of course this has limited value in a multi-threaded program or any environment that doesn't rely on the standard filesystem interface (but uses its own defaults).
I do not have an answer for the currently active tmux session yet, presumably you should somehow be able to get from a PID to a socket and thus to the session?
For tmux, the currently active program in a session is a bit more complex, ``, which we'll also have to filter for the active values.

For scripting interfaces, many programs have their own little implementation of this, but most problematic here is that you want to go from a X11 window / PID to the scripting interface, not through some workaround by querying for interfaces!

For programs like Emacs and any programmable environment we can likely script something together, but again it's not a very coherent whole by any means.

Act! Heuristics for determining intent.