Recent Content
Stylus (formerly known as Stylish) is a browser extension to apply custom CSS stylesheets to any website you might want to alter. This a fantastic capability, which allows you to remove unnecessary clutter and otherwise give control back to you, the user.
Let's walk through this using the Twitter website as an example.
As of 2021 you'll normally see a "Trends" and "Who to follow?" block on the right sight of the desktop version of the page, assuming enough screen space is available. You might want to skip these and by that, focus more on the actual content of the accounts you're following:
Now how do you achieve this goal? For one, you might download a pre-made stylesheet, or install a browser extensions, or use a custom script (say, via Greasemonkey).
However, Stylish makes it very low effort to filter out unwanted content yourself. (Later we will see some other tools (like uBlock Origin), which are less powerful in some ways, but can accomplish a subset of what we're doing here with perhaps greater speed.)
Unfortunately, Twitter doesn't like you enough to use unminified and unobfuscated CSS rules, therefore what you end up with in the inspector is something like this:
Not to worry though, using the "Copy > CSS Selector" option allows us to get a matching selector that we can then paste into the User Styles editor:
Since we're at it, let's do the same for the other two elements, but let's leave the search option, just in case:
.r-1uhd6vh:nth-child(3), .r-1uhd6vh:nth-child(4), .r-1niwhzg:nth-child(5) {
display: none;
}
Once saved, the homepage greets us with plenty empty and unobstrusive space. Much better.
Today I've had to dig deeper into some problem authenticating against an HTTPS API. This client was sending Basic Authentication information following a 3XX redirect, which then would make the second server (well, S3 really) return a 400 Bad Request, since it's refusing to deal with more than one authentication method at the same time.
This is all and good, but debugging what was actually being sent is a
little bit more difficult if curl
is not the method of choice.
Instead I found the
-Djavax.net.debug=all
option for the JVM. This will make it dump a lot of information
throughout a connection. Mostly that's already enough to debug the
issue, since a hexdump of the HTTP traffic is included. On the other
hand it's also pretty verbose.
Another option is the slightly more involved jSSLKeyLog, which requires the use of a JVM parameter to include the Java agent, e.g. for SBT like so:
env JAVA_OPTS="-javaagent:jSSLKeyLog.jar==/jsslkeylog.log" sbt
Two more notes here: Compiling the tool is really easy, once cloned mvn
package
results in a ready-to-use JAR file. Also the log contains more
information when two equal signs are used (handy for manual inspection).
This file can then be directly fed into WireShark ("Edit",
"Preferences", "Protocols", "TLS", "(Pre-)-Master-Secret log filename")
and will then allow the decoding of the captured network traffic
(e.g. via tcpdump -i any -s 0 -w dump.pcap
).
Docker is ubiquitous in many projects and therefore it may be useful to dig into more detail about its inner workings. Arguably those aren't too complicated to build a smallish program that does the essentials in a few hours.
The CodeCrafters challenges focus on exactly this kind of idea, taking an existing tool and rebuilding it from scratch. Since they're currently in Early Access, I've only had the opportunity to try out the Docker and Redis challenges so far, but I thought maybe a few insights from them would be good to share.
Part of the challenge is to run the entrypoint of a container; using Go
it's actually fairly easy to run external programs. Using the
os/exec package is straightforward, even
redirecting I/O is easy enough by looking at the
Cmd structure a bit closer and
assigning values to the Stdin
, Stdout
and Stderr
fields. Also the
exit status can be easily gotten from the error
return value by
checking for ExitError
(only if it was not successful, that is,
non-zero):
if err = cmd.Run(); err != nil {
if exitError, ok := err.(*exec.ExitError); ok {
...
}
}
Interestingly enough the SysProcAttr
field exposes some functionality
that is a bit more difficult to use in, say, C. While using the
syscall package is
possible, it's mostly easier to assign a few values in that field
instead, using the definition of the
SysProcAttr structure
itself.
Later on there's also the need to parse some JSON - that's again easily
done with the standard library, using
encoding/json
, in
particular
Unmarshal
to a
map[string]interface{}
(in case we just want to grab a top-level entry
in a JSON object), or to a pointer of a custom class using structure
tags like so:
type Foo struct {
Bars []Bar `json:"bars"`
}
type Bar struct {
Baz string `json:"baz"`
}
...
foo := Foo{}
if err := json.Unmarshal(body, &foo); err != nil {
panic(err)
}
for _, bar := range foo.Bars {
println(bar.Baz)
}
The Redis challenge is comparatively more contained to just using
standard library tools, the most interesting thing I've noticed was that
there's now a concurrency-friendly map implementation called
sync.Map
, so no external
synchronization primitive is needed.
What else helped is the redis-cli
tool, though I had to find out for myself that it doesn't interpret the
specification very strictly, in fact, just about everything returned
from the server response will be printed, even when not valid according
to the spec.
Overall the biggest challenge here might be to accurately parse the command input and deal with expiration (I simply chose a lazy approach there, instead of clearing out the map on a timer I suppose - this will of course not be memory-friendly long-term, but for implementing a very simple Redis server it's more than enough to pass all tests).
After working with Scala for a while now, I thought it would be good to write down a couple of notes on my current testing setup, in particular with regards to which libraries I've settled on and which style of testing I've ended up using.
Tests end up in the same package as the code that's tested. A group of
tests are always in a class with the Tests
suffix, e.g. FooTests
.
If it's about a particular class Foo
the same applies.
scalatest
is used as the testing
framework, using
AnyWordSpec
,
that means we're using the should
/
in
pattern.
For mocking the only addition is
MockitoSugar
to make things more
Scala-ish.
How does it look like?
package com.example.foo
import org.mockito.MockitoSugar
import org.scalatest.wordspec.AnyWordSpec
class FooTests extends AnyWordSpec with MockitoSugar {
"Foo" should {
"do something" in {
val bar = mock[Bar]
val foo = new Foo(bar)
foo.baz(42L)
verify(bar).qux(42L)
}
}
}
Easy enough. There's also some more syntactic sugar for other Mockito
features, meaning
ArgumentMatchersSugar
should also be imported when
needed. Same as scalatest
has a number of additional helpers for
particular types like Option
or Either
,
e.g. OptionValues
and
EitherValues
.
class BarTests extends AnyWordSpec with Matchers with EitherValues with OptionValues {
"Bar" should {
"do something else" in {
val bar = new Bar
bar.qux(42L).left.value should be(empty)
bar.quux().value shouldBe "a value"
}
}
}
This can be done to the extreme, but usually it looks easier to me to simply assign highly nested values to a variable and continue with matchers on that variable instead.
Since sbt
is often used, the two test dependencies would look like this:
libraryDependencies ++= Seq(
"org.scalatest" %% "scalatest" % "3.1.1" % Test,
"org.mockito" %% "mockito-scala-scalatest" % "1.13.0" % Test,
)
Did you know Scala has macros? Coming from Common Lisp they serve pretty much the same purpose, doing things that the (plethora of) other language features don't support and to shortcut the resulting boilerplate code. And even the S-expressions can be had when macro debugging is turned on, though the pretty-printed Scala code is arguably much more useful here.
Why would you realistically use them then? Turns out I had to deal with some auto-generated code dealing with Protobuf messages. The generated classes for any message look something like this (in Java syntax since that's what the generated code is):
public interface ExampleResponseOrBuilder
extends com.google.protobuf.MessageOrBuilder;
public static final class ExampleResponse
extends com.google.protobuf.GeneratedMessageV3
implements ExampleReponseOrBuilder {
public static Builder newBuilder();
public static final class Builder
extends com.google.protobuf.GeneratedMessageV3.Builder<Builder>
implements ExampleResponseOrBuilder;
}
That is, we have one interface, two classes, one of them conveniently gives you a builder for new objects of the class. That's used like this (back to Scala here):
val builder: ExampleResponse.Builder = Example.newBuilder()
builder.mergeFrom(stream)
val result: ExampleResponse = builder.build()
If you try and make a generic builder here, you'll quickly notice that this is
rather hard as the generic types don't really express the relationship between
ExampleResponse
and ExampleResponse.Builder
well.
As an aside, you want to have a generic builder parametrised on the return type to be able to write something like this:
val result = build[ExampleResponse](stream)
Without ever having to pass through the type as a value. Better even if you
just specify the result type and the type parameter for build
is then
automatically derived.
These builders look something like this then:
trait ProtobufBuilder[T <: Message] {
def underlying(): Message.Builder
def build(string: String)(implicit parser: JsonFormat.Parser): T = {
val builder = underlying()
parser.merge(string, builder)
builder.build().asInstanceOf[T]
}
}
class ExampleResponseBuilder() extends ProtobufBuilder[ExampleResponse] {
override def underlying(): ExampleResponse.Builder =
ExampleResponse.newBuilder()
}
This then allows us to use some implicit
magic to pass these through to the
decoder (sttp
's framework in this case) to correctly decode the incoming
data.
But, we've to 1. write one class for each type, 2. instantiate it. This is roughly five lines of code per type depending on the formatting.
Macros to the rescue!
Inspired by the circe
derivation API I finally got all the pieces together to
create such a macro:
def deriveProtobufBuilder[T <: Message]: ProtobufBuilder[T] = macro deriveProtobufBuilder_impl[T]
def deriveProtobufBuilder_impl[T <: Message: c.WeakTypeTag](
c: blackbox.Context): c.Expr[ProtobufBuilder[T]] = {
import c.universe._
val messageType = weakTypeOf[T]
val companionType = messageType.typeSymbol.companion
c.Expr[ProtobufBuilder[T]](q"""
new ProtobufBuilder[$messageType] {
override def underlying(): $companionType.Builder = $companionType.newBuilder()
}
""")
}
Used then like this:
private implicit val exampleResponseBuilder: ProtobufBuilder[ExampleResponse] = deriveProtobufBuilder
That's one or two lines and the types are only mentioned once (the variable name can be changed). Unfortunately getting rid of the variable name doesn't seem to be possible.
Easy, wasn't it? Unfortunately all of this is hampered by the rather undocumented APIs, you really have to search for existing code or Stackoverflow questions to figure this out.
One thing that helped immensly was the -Ymacro-debug-lite
option, which
prints the expanded macro when used in sbt
via compile
.
I wrote down a little scenario for myself for the next iteration of act
:
- mark "http://www.google.de" with mouse (selection -> goes to x11 buffer-cut)
- listen on those changes and show current context and choices
- press+release "act primary" key -> runs primary action immediately
- (or) press "act" key once -> opens buffer for single character input to select choice
- buffer captures keyboard, after single character action is taken, esc aborts, "act" again should expand the buffer to full repl
- mouse -> selection of source? history?
- context -> focused window? -> lookup for program gives cwd, etc.
- since that's not foolproof an obvious addition to the protocol would be to store a reference to the source of the clipboard (or the program that will be queried for it)
- you should always be able to interrogate windows about their origin and capabilities, including their scripting interface
- pattern matching url -> rule based decisions
- primary / secondary actions -> rule of three basically, except we err on the side of caution and just bind two actions two keys
- special handling for clipboard types that aren't text? allow for example to override a rule based on whether a picture would be available
Now, there are several problems with this on a regular X11-based desktop already:
- How do you identify the program / PID belonging to a particular window?
- How do you get information about that program, e.g. its current working directory?
In addition, since I'm using tmux
inside a terminal emulator there are some
more problems:
- How do you know which
tmux
session is currently active? - How do you know which program is currently "active" in a
tmux
session?
Basically it's a recursive lookup for the "current state" of what's being displayed to the user. Not only that, but for things like browsers, image editors, video editors, anything document based it's still the same problem at another level, namely, what the current "context" is, like the currently open website, picture, video, scene, what have you.
Coming back to earlier thoughts about automation, there's no way for most of
these to be accurately determined at this time. Neither is e.g. DBUS
scripting common enough to "just use it" for scripting, there are also several
links missing in the scenario described above and some can likely never be
fixed to a sufficient degree to not rely on heuristics.
Nevertheless, with said heuristics it's still possible to get to a state where a productivity plus can be achieved with only moderate amount of additional logic to step through all the indirections between processes and presentation layers.
Now let me list a few answers to the questions raised above:
- The PID for a particular window can be derived from an X11 property, together
with
xdotool
this gives us an easy pipeline to get this value: ``. - Information about the running process can then be retrieved via the
proc
filesystem, e.g.readlink /proc/$PID/cwd
for the current working directory. Of course this has limited value in a multi-threaded program or any environment that doesn't rely on the standard filesystem interface (but uses its own defaults). - I do not have an answer for the currently active
tmux
session yet, presumably you should somehow be able to get from a PID to a socket and thus to the session? - For
tmux
, the currently active program in a session is a bit more complex, ``, which we'll also have to filter for the active values.
For scripting interfaces, many programs have their own little implementation of this, but most problematic here is that you want to go from a X11 window / PID to the scripting interface, not through some workaround by querying for interfaces!
For programs like Emacs and any programmable environment we can likely script something together, but again it's not a very coherent whole by any means.
Intro
Regardless of your position on DBus, sometimes you might need to interact with it. Common Lisp currently has at least two libraries ready for you, one of them is in Quicklisp, https://github.com/death/dbus/.
Setup
Have it loaded and create a package to use it, then change into it.
;; (asdf:load-system '#:dbus)
;; or
;; (ql:quickload "dbus)
(defpackage #:example
(:use #:cl #:dbus))
(in-package #:example)
For reference, I'm going to refer to a (very old)
polkit example
in Python. For reference I'm reproducing it here (it still works in current
Python 3 without any changes except the print
):
import dbus
bus = dbus.SystemBus()
proxy = bus.get_object('org.freedesktop.PolicyKit1', '/org/freedesktop/PolicyKit1/Authority')
authority = dbus.Interface(proxy, dbus_interface='org.freedesktop.PolicyKit1.Authority')
system_bus_name = bus.get_unique_name()
subject = ('system-bus-name', {'name' : system_bus_name})
action_id = 'org.freedesktop.policykit.exec'
details = {}
flags = 1 # AllowUserInteraction flag
cancellation_id = '' # No cancellation id
result = authority.CheckAuthorization(subject, action_id, details, flags, cancellation_id)
print(result)
So, how does this look in Common Lisp? Mostly the same, except that at least at the moment you have to specify the variant type explicitly! This was also the reason to document the example, it's quite hard to understand what's wrong if there's a mistake, including the socket connection just dying on you and other fun stuff.
(with-open-bus (bus (system-server-addresses))
(with-introspected-object (authority bus
"/org/freedesktop/PolicyKit1/Authority"
"org.freedesktop.PolicyKit1")
(let* ((subject `("system-bus-name" (("name" ((:string) ,(bus-name bus))))))
(action-id "org.freedesktop.policykit.exec")
(details ())
(flags 1)
(cancellation-id "")
(result
(authority "org.freedesktop.PolicyKit1.Authority" "CheckAuthorization"
subject action-id details flags cancellation-id)))
(format T "~A~%" result))))
Note the encoding of the dictionary: The type of the whole argument
is specified
as (sa{sv})
, a structure of a string and a dictionary of strings to variants
- we're spelling out the variant type here, compared to what's automatically
done by the Python library.
Continuing from an earlier post, what might the semantics be that we'd like to have for a more useful shell integrated with a Lisp environment?
Syntax
Of course we can always keep the regular Common Lisp reader; however it's not best suited for interactive shell use. In fact I'd say interactive use period requires heavy use of editing commands to juggle parens. Which is why some people use one of the simplifications of not requiring the outermost layer of parens on the REPL.
So, one direction would be to have better S-expression manipulation on the REPL, the other to have a syntax that's more incremental than S-expressions.
E.g. imagine the scenario that,
- as a terminal user,
- I'm navigating directories,
- listing files,
- then looking for how many files there were,
- then grepping for a particular pattern,
- then opening one of the matches.
In sh
terms that's something the lines of
$ cd foo
$ ls
...
$ ls | wc -l
42
$ ls | grep bar
...
$ vim ...
In reality I'm somewhat sure no one's going as far as doing the last two steps
in an iterative fashion, like via ls | grep x | grep y | xargs vim
, as most
people will have a mouse readily available to select the desired name. There
are some terminal widgets which allow the user to select from e.g. one of the
input lines in a command line dialog, but again it's not a widespread pattern
and still requires manual intervention in such a case.
Also note that instead of reusing the computation the whole expression keeps being reused, which also makes this somewhat inefficient. The notion of the "current" thing being worked on ("this", "self") also isn't expressed directly here.
In the new shell I'd like to see part of explored. We already have the three
special variables *
, **
and ***
(and /
etc.!) in Common Lisp - R,
IPython and other environments usually generalise this to a numbered history,
which arguably we might want to add here as well - so it stands to reason that
these special variables make sense for a shell as well.
$ cd foo
$ ls
...
$ * | wc
42
$ grep ** bar
...
$ vim *
(Disregarding the need for a different globbing character ...)
There's also the very special exit status special variable in shells that we
need to replicate. This will likely be similar to Perl special variables that
keep track of one particular thing instead of reusing the *
triad of
variables for this too.
Pipelines
The expression compiler should be able to convert from a serial to a concurrent form as necessary, that is, by converting to the required form at each pipeline step.
ls | wc -l | vim
Here, ls
will be the built-in LIST-DIRECTORY
, which is capable of
outputting e.g. a stream of entries. wc -l
might be compiled to simply
LENGTH
, which, since it operates on fixed sequences, will force ls
to be
fully evaluated (disregarding a potential optimisation of just counting the
entries instead of producing all filenames). The pipe to vim
will then force
the wc
output to text format, since vim
is an unknown native binary.
These would be the default conversions. We might force a particular interpretation by adding explicit (type) conversions, or adding annotations to certain forms to explain that they accept a certain input/output format; for external binaries the annotations necessarily would be more extensive.
For actual pipelines it's extremely important that each step progresses as new data becomes available. Including proper backpressure this will ensure good performance and keep the buffer bloat to a minimum. Again this might be a tunable behaviour too.
I/O
The shell should be convenient. As such any output and error output should be captured as a standard behaviour, dropping data if the buffers would get too big. This will then allow us to refer to previous output without having to reevaluate any expression / without having to run a program again (which in both cases might be expensive).
Each datatype and object should be able to be rendered and parsed in more than a single format too. E.g. a directory listing itself might be an object, but will typically be rendered to text/JSON/XML, or to a graphical representation even.
Now in fact there are still reasons we need key codes that are different to the eventual text representation, e.g. for cursor movement and other special characters like function keys.
Looking at the Xorg source code now, there's a relatively fixed notion of what a keyboard can actually do. I suspect that conceptually a somewhat backwards-compatible extension would be to have a new dedicated kind of device, that is exposed (similarly to a keyboard with an integrated touchpad) separately to the other functionality of the device.
In particular, I'd like to keep the keyboard in "regular" mode as long as the host hasn't signaled that it wants to use the extended functionality via, presumably some part of the USB negotiation. Only at that point would the extension be activated and the keyboard output would be sent via it. The regular keyboard device would then be virtually unplugged.
I suspect that this is better than having two devices, one for key codes and one for text input, especially because we'd not be able to guarantee in which order the two devices would be read. This is less / not a problem between a keyboard and a pointer device of course.
Now given that X11 isn't the interface most applications are written against, how would the text actually arrive at an application? I'd imagine basically extending the whole event handling by one more event, the text event, which wouldn't correspond to any key (thus, it can't be in a pressed or unpressed state). In terms of GTK+ and QT this might be even easier than for a lower level application since many applications will only want to deal with text input and pre-defined keyboard shortcuts anyway.
Speaking of which, what does "Ctrl-C" actually mean? Of course the mnemonic for "copy" is in there, but also the "control" modifier. How well does this play with text input? Not at all, and I believe modifiers work better logically on the key code level; for text input I imagine other modifiers like "emphasis", or, more specifically, "bold", would make more sense, possible "URL", or "footnote".
Overall there can of course be modifiers active while text input occurs, it's more a question of whether they (can) be assigned any meaning without falling back to the flawed character equals key press comparison.
What does this gain us? Ideally every application (or more accurately: each toolkit) could now drop logic specifically to translate key codes to text, since all of it would already be handled by the keyboard itself. Keypresses would come in via the same interface and be used for specific, non-text functionality.
The keyboard protocol is still using the same approach as roughly since the start of computing: The keyboard is a dumb device, reporting each mechanical button that's pressed. The computer is the intelligent device, translating those into characters.
There are some attempts that have made it into various places, e.g. there's a flag in the USB HID protocol to indicate the physical layout, be it US, or some other one. Except no manufacturer sets it, so no driver uses it.
But, what if we had a keyboard, courtesy of QMK and similar firmwares, that is substantially more intelligent? If the protocol allowed for it we'd be able to have such nice things as sending an "a", an "あ", or a "●" without any remapping to be done! In fact if the keyboard could send Unicode sequences we can do things that aren't possible by remapping, like sending characters from various scripts through a macro key without impacting any keypress since we have an immensely increased value space to work with.