unexist.dev

Fun with embeddability

2026-01-25T17:55:00+01:00

Embeddability? I am quite certain this is in fact an english word, but you probably won’t find it among the other -ilities of qualities, software might want to address. Definitely among it, there are some boring and pretty self-explanatory ones like maintainability, even some dogmatic ones like correctness^[1], but fortunately also funnier ones like extensibility.

And like to often, how a certain quality is best achieved depends on a plethora of things, but according to Wikipedia, one to archive extensibility is to use scripting languages and everything finally comes together: They can be quite embeddable.

So in case you have some time to kill, join me on a lengthy journey through 20+ years of personal FOSS-history. We are having a look at different approaches of embeddings and also see why this is always great idea - plus there are memes.

https://knowyourmeme.com/photos/930538-the-fairly-oddparents

Barebone

Unbeknownst to my past self, I made my first experience with this kind of extensibility in 2004, when I started my long journey with Xlib. During that time I started a project called deskbar with the lofty goal to print system information like cpu load, battery usage etc. directly onto the root window of the X session. There were plenty of alternatives like GKrellM readily available, but who in their right mind prefers pre-built stuff over rolling your own^[2]?

The initial idea was just to include everything in one binary, but I quickly discovered the ergonomics of re-compiling and shipping everything together are annoying and I switched to a simple plugin system.

Screenshots first

I would have loved to show some screenshots of deskbar in action here, but unfortunately after messing with the infamous Autotools and trying to compile old C-code with a modern compiler this is as far as I got^[3]:

Build #1 attempt of deskbar

$ ./configure && make
deskbar 0.1
-----------------
Build with ZLIB support.......: yes
Build with PNG support........: yes

Plugins:
Common Plugins................: Clock CPU Date
Battery Plugin................: no
XMMS Plugin...................: no (1)
BMP Plugin....................: no (2)
Debug Plugin..................: no

The binary will be installed in /usr/local/bin,
the lib in /usr/local/lib and the plugins
in /usr/local/lib/deskbar.

Try make now, good luck!

make  all-recursive
make[1]: Entering directory '/home/unexist/build/deskbar-0.1'

# --- %< --- snip --- %< ---

/bin/bash ../libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I..     -g -O2  -I/usr/include -I/usr/include -MT htable.lo -MD -MP -MF .deps/htable.Tpo -c -o htable.lo htable.c
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I.. -g -O2 -I/usr/include -I/usr/include -MT htable.lo -MD -MP -MF .deps/htable.Tpo -c htable.c  -fPIC -DPIC -o .libs/htable.o
In file included from htable.c:2:
/usr/include/string.h:466:13: error: storage class specified for parameter 'explicit_bzero'
  466 | extern void explicit_bzero (void *__s, size_t __n) __THROW __nonnull ((1)) (3)
      |             ^~~~~~~~~~~~~~
/usr/include/string.h:471:14: error: make[2]: *** [Makefile:457: htable.lo] Error 1
make[2]: Leaving directory '/home/unexist/build/deskbar-0.1/libdeskbar'
make[1]: *** [Makefile:479: all-recursive] Error 1
make[1]: Leaving directory '/home/unexist/build/deskbar-0.1'
make: *** [Makefile:374: all] Error 2storage class specified for parameter 'strsep'
  471 | extern char *strsep (char **__restrict __stringp,
      |              ^~~~~~
/usr/include/string.h:478:14: error: storage class specified for parameter 'strsignal'
  478 | extern char *strsignal (int __sig) __THROW;
      |              ^~~~~~~~~

# --- %< --- snip --- %< ---

make[2]: *** [Makefile:457: htable.lo] Error 1
make[2]: Leaving directory '/home/unexist/build/deskbar-0.1/libdeskbar'
make[1]: *** [Makefile:479: all-recursive] Error 1
make[1]: Leaving directory '/home/unexist/build/deskbar-0.1'
make: *** [Makefile:374: all] Error 2

1	X Multimedia System (XMMS)
2	I can only guess what is was supposed to do, since the plugin is just an empty stub that returns `NULL`
3	Yes, oh well…

Nevertheless, this output clearly proves there has been a plugin system with conditional compilation, which bases solely on linking magic, and we have to move on.

Dang it!

I dug a bit further and stumbled upon my old project page on SourceForge, which luckily still provides sftp access to the project page:

https://deskbar.sourceforge.net

And with even more luck, although the page is a bit unfinished, the file listing included screenshots:

Screenshot of deskbar-0.7c (1/2)

Screenshot of deskbar-0.7c (2/2)

Runtime loading

Everything in C is a bit more complicated, so let us ignore the scary memory handling and just talk about the two interesting calls dlopen and dlsym:

deskbar/deskbar/plug.c:97

DbPlugElement *element = NULL;

element = (DbPlugElement *) malloc (sizeof (DbPlugElement));

snprintf (buf, sizeof (buf), "%s/%s.so", PLUGIN_DIR, file);

element->handle = dlopen (buf, RTLD_LAZY); (1)

if ((err = dlerror ())) (2)
    {
        db_log_err ("Cannot load plugin `%s'\n", file);
        db_log_debug ("dlopen (): %s\n", err);

        free (element);

        return;
    }

/* Get entrypoint and call it */
entrypoint      = dlsym (element->handle, "db_plug_init"); (3)
element->data   = (*entrypoint) (); (4)

1	Load the named shared object from path
2	There is apparently a third call, but rarely mentioned at all
3	Find the address of a named entrypoint
4	Execute the entrypoint for profit

Excursion: Linking in a Nutshell

Linking is complex topic, but in a nutshell during the linking process all intermediate parts obj(ect)-files and static libraries) are put together and rolled into a final executable binary or library:

1	Static libraries are directly included in the resulting artifact
2	Object files are the compiled form of the source code
3	Shared objects can be loaded at runtime
4	The result can either be a shared, library or executable type

The entrypoint here is quite interesting, since the main application cannot know what is included in the plugin or even what is exported. Following the idea of Convention-over-configuration, the defined contract here expects a symbol named db_plug_init inside a plugin, which is called on load and must return a pointer to an initialized struct of type DBPlug:

deskbar/plugins/battery.c:107

static DbPlug plugin =
{
    "Battery",       /* Plugin name */
    battery_create,  /* Plugin create function */
    battery_update,  /* Plugin update function */
    battery_destroy, /* Plugin destroy function */

    &data,           /* Plugin data */
    NULL,            /* Plugin format */

    3600             /* Plugin update interval */
};

DbPlug *
db_plug_init (void)
{
    plug = &plugin;

    return (&plugin); (1)
}

1	Pass the local address back to the main application

Once loaded the plugin is called in the given interval and can exchange data with the main application.

deskbar/plugins/battery.c:58

void
battery_update (void)
{
    int capacity    = 0,
    percent         = 0;

    char buf[100], state[20];

    /* Get battery info */
    if (!fd1)
        {
            snprintf (buf, sizeof (buf), "/proc/acpi/battery/BAT%d/state", bat_slot); (1)

            fd1 = fopen (buf, "r");

            memset (buf, 0, sizeof (buf));
        }
    else
        fseek (fd1, 0, SEEK_SET);

    /* --- %< --- snip --- %< --- */
}

1	Here the battery plugin checks the battery values from the ACPI interface

Allowing contribution this way is really easy and powerful, but like so often comes with a catch. Segmentation faults, the bane of software engineering, don’t make halt inside plugins like they should, but they wipe the board and kill the entire application.

I think Torvalds nailed it perfectly and I agree this should never happen:

Mauro, SHUT THE FUCK UP!
WE DO NOT BREAK USERSPACE!

— Linus Torvals
https://www.shutupmauro.com/

I am kind of surprised how far I went in trying to keep problems in the plugin at bay. The original project included memory management^[4] for plugins and also applied the next two calls I’d like to demonstrate next.

Error handling

Handling segmentation faults properly is really difficult and the common sense is normally catch them and exit gracefully when possible. Still, there are cases when faults can be safely ignored and a plugin interface is a paragon for this.

This can be done with the pair of setjmp and longjmp, which behave for most practical senses like a goto on steroids:

deskbar/deskbar/plug.c:25

static int¬                                                                                                                                                                                                                                                                                                                                                              26 save_call (DbPlugElement *element,¬
save_call (DbPlugElement *element,
    DbPlugFunc plugfunc
    const char *name)
{
    if (plugfunc)
        {
            if (setjmp (env) == 0) (1)
                plugfunc ();
            else
                {
                    db_log_mesg ("Ayyyee! Segmentation fault in plugin %s!\n", element->data->name); (2)
                    db_log_debug ("Call to %s () failed\n", name);
                    db_plug_unload (element);

                    return (1);
                }
        }

    return (0);
}

1	Save stack and instruction pointer for later use when it is for the first time; otherwise ditch the plugin
2	Well, different times back then..

When the application receives the bad signal SISEGV, it checks if there are stored stack and instruction values and rewinds the stack accordingly:

deskbar/deskbar/sig.c:35

static void
sig_handler (int sig)
{
    switch (sig)
        {
            case SIGSEGV:
                longjmp (env, 1); (1)

                db_log_debug ("Something went wrong! Segmentation fault!\n");
                db_sig_destroy ();

                abort ();
            break;

    /* --- %< --- snip --- %< --- */
}

1	Check the values and pass control if necessary; otherwise just bail out

Recap

Ease of use	Richness of API	Language agnostic	Error handling	Performance
Low; requires compilation and linking	The API is simple, but can be enriched by the host	No; requires plugins to be in C^[5]	Arcane; requires stack unwinding	Runs natively, so pretty fast

Ease of use

Richness of API

Language agnostic

Error handling

Performance

Low; requires compilation and linking

The API is simple, but can be enriched by the host

No; requires plugins to be in C^[5]

Arcane; requires stack unwinding

Runs natively, so pretty fast

Scripting languages

Three years later in 2007 I continued on building upon my Xlib skills and started my long-lasting project subtle.

Over the years there have been many major breaking changes, from the initial design to the state it currently is in. Two of the post-related changes were the integration of the scripting language Lua and its later replacement with Ruby after a few years in this glorious issue #1.

Integrating Lua

I am not entire sure where I picked Lua up, but I never played WoW so probably from somewhere else and I can only talk about the state and API from back then.

Adding a scripting language solves quite a few problems:

File loading and parsing can be offloaded to the language core
The language itself comes with a basic subset of things you can do with it
Bonus: Config handling can also be directly offloaded

Time for Screenshots

My attempt of trying to compile the project and provide an actual screenshot this time ended quickly as well:

Build attempt #1 of subtle-0.7b

$ ./configure && make

# --- %< --- snip --- %< ---

subtle 0.7b
-----------------
Binary....................: /usr/local/bin
Sublets...................: /usr/local/share/subtle
Config....................: /usr/local/etc/subtle

Debugging messages........:

Try make now, good luck!

make  all-recursive
make[1]: Entering directory '/home/unexist/build/subtle-0.7b'
Making all in src

# --- %< --- snip --- %< ---

if gcc -DHAVE_CONFIG_H -I. -I. -I.. -I..   -g -O2  -I/usr/include/lua5.1  -g -O2  -MT subtle-event.o -MD -MP -MF ".deps/subtle-event.Tpo" -c -o subtle-event.o `test -f 'event.c' || echo './'`event.c; \
then mv -f ".deps/subtle-event.Tpo" ".deps/subtle-event.Po"; else rm -f ".deps/subtle-event.Tpo"; exit 1; fi
event.c: In function ‘subEventLoop’:
event.c:352:57: error: implicit declaration of function ‘subSubletSift’; did you mean ‘subSubletKill’? [-Wimplicit-function-declaration]
  352 |                                                         subSubletSift(1);
      |                                                         ^~~~~~~~~~~~~
      |                                                         subSubletKill
make[2]: *** [Makefile:310: subtle-event.o] Error 1
make[2]: Leaving directory '/home/unexist/build/subtle-0.7b/src'
make[1]: *** [Makefile:233: all-recursive] Error 1
make[1]: Leaving directory '/home/unexist/build/subtle-0.7b'
make: *** [Makefile:171: all] Error 2

This is kind of embarrassing for an official release and really have to question the quality in retrospect, but this won’t stop us now.

https://knowyourmeme.com/memes/spongebob-time-cards

After a dive into the code there were some obviously problems and also blatant oversights and if you are interested in the shameful truth here is silly patch:

https://gist.github.com/unexist/4ee3cb94c91555b1bac01e23f992b9e4

And without further ado here is finally the screenshot of the scripting part in action, before we dive into how this is actually done under the hood:

Screenshot of subtle-0.7b (1/2)

Runtime loading

Starting with the easy part, offloading the config handling was one of the first things I did and this made a config like this entirely possible:

subtle-0.7b/config/config.lua:1

-- Options config
font = {
    face    = "lucidatypewriter",  -- Font face for the text
    style   = "medium",            -- Font style (medium|bold|italic)
    size    = 12                   -- Font size
}

-- Color config
colors = {
    font       = "#ffffff",        -- Color of the font
    border     = "#ffffff",        -- Color of the border/tiles
    normal     = "#CFDCE6",        -- Color of the inactive windows
    focus      = "#6096BF",        -- Color of the focussed window
    shade      = "#bac5ce",        -- Color of shaded windows
    background = "#596F80"         -- Color of the root background
}

-- --- %< --- snip --- %< ---

Essentially the C API of Lua is a stack machine and the interaction with is through pushing and popping values onto and from the stack.^[6]

I’ve removed a bit of the fluff and checks upfront, so we can have a quick glance at the config loading and jump further into nitty-gritty details:

subtle-0.7b/src/lua.c:150

/* --- %< --- snip --- %< --- */

subLogDebug("Reading `%s'\n", buf);
if(luaL_loadfile(configstate, buf) || lua_pcall(configstate, 0, 0, 0)) (1)
    {
        subLogDebug("%s\n", (char *)lua_tostring(configstate, -1));
        lua_close(configstate);
        subLogError("Can't load config file `%s'.\n", buf);
    }

/* --- %< --- snip --- %< --- */

/* Parse and load the font */¬
face  = GetString(configstate, "font", "face", "fixed"); (2)
style = GetString(configstate, "font", "style", "medium");
size  = GetNum(configstate, "font", "size", 12);

/* --- %< --- snip --- %< --- */

1	Internal calls to load the config file and just execute it in a safe way pcall
2	Once everything is stored inside `configstate` we fetch required values

subtle-0.7b/src/lua.c:47+72

#define GET_GLOBAL(configstate) do { \ (1)
    lua_getglobal(configstate, table); \ (2)
    if(lua_istable(configstate, -1)) \
        { \
            lua_pushstring(configstate, field); \ (3)
            lua_gettable(configstate, -2); \
        } \
} while(0)

/* --- %< --- snip --- %< --- */

static char *
GetString(lua_State *configstate,
    const char *table,
    const char *field,
    char *fallback)
{
    GET_GLOBAL(configstate);
    if(!lua_isstring(configstate, -1)) (4)
        {
            subLogDebug("Expected string, got `%s' for `%s'.\n", lua_typename(configstate, -1), field);
            return(fallback);
        }
    return((char *)lua_tostring(configstate, -1)); (5)
}

1	Blocks in C macros require this fancy hack; probably best to skip over it
2	We check and fetch a table^[7]
3	Push the string onto the current stack
4	Pull the value with index -2 from the stack
5	And convert it to our desired format

Excursion: Playing with the stack

If you haven’t played with stack machines before it might be a bit difficult to follow what it is done, so here is a small break down how the API works:

1	Call lua_getglobal to put the table `font` onto the stack at position `-1` ^[8]
2	Call lua_pushstring to put the string `face` of the desired row name on the stack at position `-1`
3	Call lua_gettable to consume both values and fetch the row by given name from the table and put the result at stack position `-1`
4	Call lua_tostring to convert on the stack at position `-1` to string if possible

Error handling

Loading of plugins at runtime is basically the same as loading the config upfront, so let us just move on to error handling, which is slightly more interesting. It is probably no surprise, but the API is quite rudimentary and the handling of the stack and calls in case of an actual error is up to person to embed the engine.

Before we can see how this is done, let us quickly check how our battery plugin evolved from the arcane version in C to the Lua glory. First of all, plugins have been rebranded to sublets^[9] and it (at least to me) became a bit more readable:

subtle-0.7b/sublets/battery.lua:30

-- Get remaining battery in percent
function battery:meter() (1)
    local f = io.open("/proc/acpi/battery/BAT" .. battery.slot .. "/state", "r")
    local info = f:read("*a")
    f:close()

    _, _, battery.remaining = string.find(info, "remaining capacity:%s*(%d+).*")
    _, _, battery.rate      = string.find(info, "present rate:%s*(%d+).*")
    _, _, battery.state     = string.find(info, "charging state:%s*(%a+).*")

    return(math.floor(battery.remaining * 100 / battery.capacity))
end

1	The `:` here is used as a kind of namespace separator and should be read as a global table called `battery` with the entry `meter`.

Once the sublet is loaded and initialized we can just call it analogue to our save_call from before:

subtle-0.7b/src/lua.c:345

void
subLuaCall(SubSublet *s)
{
    if(s)
        {
            lua_settop(state, 0); (1)
            lua_rawgeti(state, LUA_REGISTRYINDEX, s->ref);
            if(lua_pcall(state, 0, 1, 0)) (2)
                {
                    if(s->flags & SUB_SUBLET_FAIL_THIRD) (3)
                        {
                            subLogWarn("Unloaded sublet (#%d) after 3 failed attempts\n", s->ref);
                            subSubletDelete(s);
                            return;¬
                        }
                    else if(s->flags & SUB_SUBLET_FAIL_SECOND) s->flags |= SUB_SUBLET_FAIL_THIRD;
                    else if(s->flags & SUB_SUBLET_FAIL_FIRST) s->flags |= SUB_SUBLET_FAIL_SECOND;

                    subLogWarn("Failed attempt #%d to call sublet (#%d).\n",
                        s->flags & SUB_SUBLET_FAIL_SECOND) ? 2 : 1, s->ref);
                }

            switch(lua_type(state, -1)) (4)
                {
                    case LUA_TNIL: subLogWarn("Sublet (#%d) does not return any usuable value\n", s->ref); break;
                    case LUA_TNUMBER: s->number = (int)lua_tonumber(state, -1); break;
                    case LUA_TSTRING:
                        if(s->string) free(s->string);
                        s->string = strdup((char *)lua_tostring(state, -1));
                        break;
                    default:
                        subLogDebug("Sublet (#%d) returned unkown type %s\n", s->ref, lua_typename(state, -1));
                        lua_pop(state, -1);
                    }
                }
        }
}

1	A bit stack setup and retrieval via upfront
2	Here we call lua_pcall, which abstracts and hides the nasty `setjmp` and `longjmp` handling from us
3	Looks like I discovered bitflags there and utilized it for error handling
4	Type handling for a more generic interface

Integrating Ruby

Moving fast-forward with subtle, I’ve replaced Lua with Ruby after a while and this is an entirely different way of integration, but let us just stick to our recipe here and do one mistake after another.

Time for Screenshots

This time we can keep it short and simple, since I am using it on a daily on several devices and can easily provide screenshots without messing with outdated and broken builds^[10].

Screenshot of subtle-0.12.6606 (2/2)

Runtime loading

So when we finally start subtle everything comes together, and we see known pieces from other projects before, which is more or the less entirely the same.

Just feel free to skip the next few listings and join us later and for the ones remaining..

https://knowyourmeme.com/memes/wall-of-text

Just kidding, here is the promised triplet of loading info, config and the battery thingy:

Running of subtle-0.12.6606

$ subtle -d :2 -c subtle.rb -s sublets
subtle 0.12.6606 - Copyright (c) 2005-present Christoph Kappel
Released under the GNU General Public License
Compiled for X11R0 and Ruby 2.7.8
Display (:2) is 640x480
Running on 1 screen(s)
ruby: warning: already initialized constant TMP_RUBY_PREFIX
Reading file `subtle.rb'
Reading file `sublets/battery.rb'
Loaded sublet (battery)
Reading file `sublets/fuzzytime.rb'
Loaded sublet (fuzzytime)

The config looks a bit different, mainly because we are now using a custom DSL, but we are going to cover this part in detail shortly, promised.

subtle-0.12.6606/data/subtle.rb:94

# Style for all style elements
style :all do (1)
    foreground  "#757575"
    background  "#202020"
    icon        "#757575"
    padding     0, 3
    font        "-*-*-*-*-*-*-14-*-*-*-*-*-*-*"
    #font        "xft:sans-8"
end

# Style for the all views
style :views do (2)
    # Style for the active views
    style :focus do
        foreground  "#fecf35"
    end

    # --- %< --- snip --- %< ---
end

1	Ruby is famous for metaprogramming and we obviously make have use of it here
2	Styles are a CSS-like way of configuring colors in subtle - batteries and inheritance included

And lastly, a quick glimpse into the battery sublet, which naturally also makes use of the mentioned DSL:

battery-0.9/battery.rb:64

on :run do |s|
    begin (1)
        now     = IO.readlines(s.now).first.to_i
        state   = IO.readlines(s.status).first.chop
        percent = (now * 100 / s.full).to_i

        # --- %< --- snip --- %< ---

        # Select icon for state
        icon = case state (2)
            when "Charging"  then :ac
            when "Discharging"
                case percent
                    when 67..100 then :full
                    when 34..66  then :low
                    when 0..33   then :empty
                end
            when "Full"          then :ac
            else                      :unknown
        end

        s.data = "%s%s%s%d%%" % [
            s.color_icon ? s.color : s.color_def, s.icons[icon],
            s.color_text ? s.color : s.color_def, percent
            ]
        rescue => err # Sanitize to prevent unloading
            s.data = "subtle"
        p err
    end
end

1	Ruby comes with exception handling and this eases the whole scripting part greatly
2	Aww, this kind of reminds of Rust <3

So when we talk about metaprogramming, what exactly is different here? If you have a closer look at the previous examples, we mostly defined data structures and methods there, which were later collected during load and/or actually called by the host application. In other words our scripts defined an API according to the rules of the host application, which then runs it. With metaprogramming now, we turn this around and define methods and provide an API for our scripts to let them call it.

The Ruby integration in subtle is quite vast, and there are many cool things I’d like to show, but time is precious, as is our attention span and sobriety is in order. So we have to cut a few corners here and there and follow loads of indirection abstraction, but I think we better stay with the styles excerpt from above.

Loading styles from the config consists of following basic building blocks:

subtle-0.12.6606/src/subtle/ruby.c:3161

void subRubyInit(void) {
    VALUE config = Qnil, options = Qnil, sublet = Qnil;

    /* --- %< --- snip --- %< --- */

    config = rb_define_class_under(mod, "Config", rb_cObject); (1)

    /* Class methods */¬
    rb_define_method(config, "style", RubyConfigStyle, 1); (2)

    /* --- %< --- snip --- %< --- */
}

1	Define a holding class for our method definition
2	Define the actual method `style` and bind it to `RubyConfigStyle`

subtle-0.12.6606/src/subtle/ruby.c:3239

void subRubyLoadConfig(void) {
    VALUE klass = Qnil;

    /* Load supplied config or default */
    klass = rb_const_get(mod, rb_intern("Config")); (1)
    config_instance = rb_funcall(klass, rb_intern("new"), 0, NULL);
    rb_gc_register_address(&config_instance); (2)

    if (Qfalse == RubyConfigLoadConfig(config_instance,¬
        rb_str_new2(subtle->paths.config ? subtle->paths.config : PKG_CONFIG))) { (3)
        subSubtleFinish();

        exit(-1);¬
    } else if (subtle->flags & SUB_SUBTLE_CHECK) {
        printf("Syntax OK\n");
    }

    /* --- %< --- snip --- %< --- */
}

1	Call back our config class and create a new instance
2	Take care, that the internal garbage collector doesn’t get rid of it
3	Wrap it again and continue in the next snippet

subtle-0.12.6606/src/subtle/ruby.c:1688

static VALUE RubyConfigLoadConfig(VALUE self, VALUE file) {
    /* --- %< --- snip --- %< --- */

    printf("Reading file `%s'\n", buf);

    /* Carefully load and eval file */
    rargs[0] = rb_str_new2(buf);
    rargs[1] = self;

    rb_protect(RubyWrapEvalFile, (VALUE) &rargs, &state); (1)
    if (state) {
        subSubtleLogWarn("Cannot load file `%s'\n", buf);
        RubyBacktrace();

        return Qfalse;
    }

    return Qtrue;
} /* }}} */¬

1	Ruby uses its own version of `setjmp` and `longjmp`, so wrap everything up and pass it over

subtle-0.12.6606/src/subtle/ruby.c:1442

/* RubyWrapEvalFile */
static VALUE RubyWrapEvalFile(VALUE data) {
    VALUE *rargs = (VALUE *) data, rargs2[3] = {Qnil};

    /* Wrap data */
    rargs2[0] = rb_funcall(rb_cFile, rb_intern("read"), 1, rargs[0]); (1)
    rargs2[1] = rargs[0];
    rargs2[2] = rargs[1];

    rb_obj_instance_eval(2, rargs2, rargs[1]); (2)

     return Qnil;
 } /* }}} */

1	Then we use the internal symbol `rb_cFile` to call `File#read` on our arguments
2	And then a final eval - see we adhere to the motto!

Error handling

Actually we covered this already in the previous section, so nothing to be done here and we better hurry.

https://knowyourmeme.com/memes/white-rabbit-pointing-at-a-clock

Integrating JavaScript

During the 2020s lots of weird things happened and I was forced into my own sort of crisis being stuck with and on a macOS for some years. Needless to say the window management there totally annoyed me and I started another highly ambitious project aptly named touchjs.

There, I tied the new^[11] Touch Bar, basic window management via Accessibility API and a JavaScript integration based on duktape together.

Time for Screenshots

Unfortunately we are back at build problems: Somehow and totally unexplainable to me, I forgot to check in some essential headers to the project which led to a full-halt:

Build attempt #1 of touchjs

$ make
clang -c -mmacosx-version-min=10.12 -x objective-c src/touchjs.m -o src/touchjs.o
src/touchjs.m:17:10: fatal error: 'delegate.h' file not found
   17 | #include "delegate.h"
      |          ^~~~~~~~~~~~
1 error generated.
make: *** [src/touchjs.o] Error 1

Build attempt #2 of touchjs

$ make
clang -c -mmacosx-version-min=10.12 -x objective-c src/touchbar.m -o src/touchbar.o
src/touchbar.m:57:23: error: use of undeclared identifier 'kQuit'
   57 |     [array addObject: kQuit];
      |                       ^
src/touchbar.m:149:21: warning: class method '+presentSystemModalTouchBar:systemTrayItemIdentifier:' not found (return type defaults to 'id') [-Wobjc-method-access]
  149 |         [NSTouchBar presentSystemModalTouchBar: self.groupTouchBar
      |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  150 |             systemTrayItemIdentifier: kGroupButton];
      |             ~~~~~~~~~~~~~~~~~~~~~~~~
src/touchbar.m:150:39: error: use of undeclared identifier 'kGroupButton'; did you mean 'kGroupIcon'?
  150 |             systemTrayItemIdentifier: kGroupButton];
      |                                       ^~~~~~~~~~~~
      |                                       kGroupIcon

# --- %< --- snip --- %< ---

Fixing something that isn’t there is quite difficult, and it took me some time and reading reference manuals to understand what I actually have to restore. When I made the first progress there, I suddenly remembered I have in fact a backup of the MacBook Pro from back then.

Although I really had fun playing with it, there has never been a real usage of the project. Luckily I already worked test-driven, so I can show off these test scripts written in JavaScript^[12] along with some resulting shots of the Touch Bar:

touchjs/test/observer.js:1

/* WM */
var wm = new TjsWM(); (1)

tjs_print("wm: trusted=" + wm.isTrusted());

/* Events */
wm.observe("win_open", function (win) {
    tjs_print("Open: name=" + win.getTitle() + ", id=" + win.getId() + ", frame=" + win.getFrame()); (2)
});

1	Highly ambitious as I’ve promised
2	Well, just print some details of windows in the normal state

Screenshot^[13] of touchjs/test/observer.js (1/3)

And some more with actual UI elements:

touchjs/test/button.js:1

var b = new TjsButton("Test")
    .setBgColor(255, 0, 0)
    .bind(function () {
      tjs_print("Test");
    });

/* Attach */
tjs_attach(b);

Barshot of touchjs/test/button.js (2/3)

touchjs/test/widgets.js:1

/* --- %< --- snip --- %< --- */

var b4 = new TjsButton("Exec")
    .setBgColor(255, 0, 255)
    .bind(function () {
        var c1 = new TjsCommand("ls -l src/");

        tjs_print(c1.exec().getOutput());
    });

var s1 = new TjsSlider(0)
    .bind(function (value) {
        tjs_print(value + "%");

        rgb[idx] = parseInt(255 * value / 100);

        l1.setFgColor.apply(l1, rgb);
    });

var sc1 = new TjsScrubber()
    .attach(b1)
    .attach(b2)
    .attach(b3)
    .attach(b4);

/* Attach */
tjs_attach(l1);
tjs_attach(sc1);
tjs_attach(s1);

Barshot of touchjs/test/widgets.js (2/3)

Runtime loading

We could go into details here how the loading process and error handling works in Obj-C, but I ultimately replaced Obj-C with Rust and later on also got rid of the macbook, so interested in how this can be done in Rust? Bet you are!

Around 2023 I started another pet project under the nice moniker rubtle. I can only guess what my plans for it were, but might have been glimpse into the future, but more on that later, when we talk about the last project of this blog post. Whatever the plans were, I didn’t spend too much time on it and rubtle isn’t polished in any sense.

So why do I mention it at all you might ask? Within rubtle I followed a different approach we haven’t covered so far. Instead of inventing an own API, I created bridge^[14] and allowed the scripts to interact directly with the underlying engine:

rubtle/src/main.rs.js:99

fn main() {
    let args: Vec<String> = env::args().collect();

    if 1 < args.len() {
        let contents = fs::read_to_string(&args[1]); (1)
        let rubtle = Rubtle::new();

        init_global(&rubtle);
        init_rubtle(&rubtle); (2)

        match contents {
            Ok(val) => rubtle.eval(&val),
            Err(_) => eprintln!("File read failed"),
        }
    } else {
        println!("Usage: {}: ", args[0]);
    }
}

1	Just file loading, no surprises here yet
2	Now it is getting exciting - off to the next listing!

rubtle/src/main.rs.js:55

fn init_rubtle(rubtle: &Rubtle) {
    #[derive(Default)]
    struct UserData {
        value: i32,
    };

    let mut object = ObjectBuilder::<UserData>::new() (1)
        .with_constructor(|inv| {
            let mut udata = inv.udata.as_mut().unwrap();

            udata.value = 1;
        })
        .with_method("inc", |inv| -> CallbackResult<Value> { (2)
            let mut udata = inv.udata.as_mut().unwrap();

            udata.value += 1;

            Ok(Value::from(udata.value))
        })

        /* --- %< --- snip --- %< --- */

        .build();

    rubtle.set_global_object("Rubtle", &mut object); (3)
}

1	Using the builder pattern was really a fight back then to me
2	Here we are assembling an object by adding some values and methods
3	And we register this as a global object

Compiled, ready and armed we can feed this fancy test script into it:

rubtle/test.rs.js:55

var rubtle = new Rubtle();

rubtle.set(5);
assert(5, rubtle.get(), "Damn"); (1)
rubtle.inc();
assert(6, rubtle.get(), "Damn");
print(rubtle.get()) (2)

1	Seriously no idea..

$ RUSTFLAGS=-Awarnings cargo run -- ./test.js
   Compiling rubtle-duktape v0.1.0 (/home/unexist/projects/rubtle/rubtle-duktape) (1)
   Compiling rubtle-lib v0.1.0 (/home/unexist/projects/rubtle/rubtle-lib) (2)
   Compiling rubtle v0.1.0 (/home/unexist/projects/rubtle/rubtle)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 2.03s
     Running `target/debug/rubtle ./test.js`
 "6" (3)

1	The required Rust-C-bindings, a courtesy of bindgen
2	This contains the heave and unsafe lifting we are going to see next
3	Well, 6, yes?

Error handling

Inside rubtle-lib is lots of scary stuff and I don’t want to scare away my dear readers, so the next excerpt is boiled down and absolutely safe to handle:

rubtle/rutle-lib/src/rubtle.rs:291+777

impl Rubtle {
    /* --- %< --- snip --- %< --- */

    ///
    /// Set value to context and assign a global reachable name
    ///
    /// # Arguments
    ///
    /// `name`- Name of the value
    /// `rval` - The actual value
    ///
    /// # Example
    ///
    ///     use rubtle_lib::{Rubtle, Value};
    ///
    ///     let rubtle = Rubtle::new();
    ///     let rval = Value::from(4);
    ///
    ///     rubtle.set_global_value("rubtle", &rval);
    ///

    pub fn set_global_value(&self, name: &str, rval: &Value) {
        unsafe {
            let cstr = CString::new(to_cesu8(name));

            match cstr {
                Ok(cval) => {
                    self.push_value(rval); (1)

                    ffi::duk_require_stack(self.ctx, 1); (2)
                    ffi::duk_put_global_lstring(
                        self.ctx,
                        cval.as_ptr(),
                        cval.as_bytes().len() as u64,
                    );
                }
                Err(_) => unimplemented!(),
            }
        }
    }

    /* --- %< --- snip --- %< --- */
}

unsafe extern "C" fn fatal_handler(_udata: *mut c_void, msg: *const c_char) { (3)
    let msg = from_cesu8(CStr::from_ptr(msg).to_bytes())
        .map(|c| c.into_owned())
        .unwrap_or_else(|_| "Failed to decode message".to_string());

    eprintln!("Fatal error from duktape: {}", msg);

    process::abort();
}

1	Did I mention duktape is also a stack machine and exposes this type of API?
2	This is a similar handling of the stack like we’ve seen in Lua
3	And we essentially provide a error handler to trap errors when they occur

Recap

Ease of use	Richness of API	Language agnostic	Error handling	Performance
Low to complex; depends on the chosen language	You provide the API, can be full-fledged interface but also just a simple bridge	Absolutely, there usually are many bindings and they can also be created with FFI	Depends a bit on the language, but ranges from easy to complex	Another thing that depends on the embedder and the embeddee^[15]

Ease of use

Richness of API

Language agnostic

Error handling

Performance

Low to complex; depends on the chosen language

You provide the API, can be full-fledged interface but also just a simple bridge

Absolutely, there usually are many bindings and they can also be created with FFI

Depends a bit on the language, but ranges from easy to complex

Another thing that depends on the embedder and the embeddee^[15]

Webassembly

I think Webassembly is one of the more interesting topics from the web technology cosmos. It allows to create binaries from a plethora of languages and to run them mostly at full speed directly inside stack-based^[16] virtual machines. Originally meant for embedding in the web, it can also be utilized in other types of software and provide more flexibility when required, but also raw speed on execution.

There is lots of movement and things might break change quite often, but frameworks provide stability here where required. Extism is such a framework and also the one used in my latest project subtle-rs as a re-write in Rust and the spiritual successor of subtle.

Screenshots first

subtle-rs is under active development and therefore a piece of a cake to demonstrate it:

Screenshot of subtle-rs-0.1.0

Runtime loading

In contrast to the other projects, subtle-rs doesn’t use a scripting language as its config, but relies on a simple TOML file. Therefore it doesn’t make sense to go into detail here. If you still are curious just check the repository: https://github.com/unexist/subtle-rs/blob/master/subtle.toml

Startup and loading the four existing plugins works like a charm:

Running of subtle-rs-0.1.0

$ cargo run -- -d :2 --config-file ./demo.toml
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.12s
     Running `target/debug/subtle-rs -d ':2' --config-file ./demo.toml`
[2026-01-25T15:48:20Z INFO  subtle_rs] Reading file `"./demo.toml"'
[2026-01-25T15:48:20Z INFO  subtle_rs] subtle-rs 0.1.0 - Copyright (c) 2025-present Christoph Kappel <[email protected]>
[2026-01-25T15:48:20Z INFO  subtle_rs] Released under the GNU GPLv3
[2026-01-25T15:48:20Z INFO  subtle_rs] Compiled for X11
[2026-01-25T15:48:20Z INFO  subtle_rs::display] Display (:2) is 640x480
[2026-01-25T15:48:20Z INFO  subtle_rs::plugin] Loaded plugin (time) (1)
[2026-01-25T15:48:20Z INFO  subtle_rs::plugin] Loaded plugin (fuzzytime) (2)
[2026-01-25T15:48:20Z INFO  subtle_rs::plugin] Loaded plugin (mem) (3)
[2026-01-25T15:48:20Z INFO  subtle_rs::plugin] Loaded plugin (battery) (4)
[2026-01-25T15:48:20Z INFO  subtle_rs::screen] Running on 1 screen(s)

1	Written in Zig - https://github.com/unexist/subtle-rs/tree/master/plugins/time
2	Written in Go - https://github.com/unexist/subtle-rs/tree/master/plugins/fuzzytime
3	Written in JavaScript - https://github.com/unexist/subtle-rs/tree/master/plugins/mem
4	Written in Rust - https://github.com/unexist/subtle-rs/tree/master/plugins/battery

Under the hood the integration works a bit different to the embeddings before. The plugins run alone and isolate in their virtual machine and all capabilities beside the ones provided by the language and the wasm target must be exported by the embedding host. On the other side, the plugin can also define methods, export them and they can in turn be called by the host.

Creating such exports and load a plugin is quite easy with Extism:

subtle-rs/src/plugin.rs:64+84

    /* --- %< --- snip --- %< --- */

    host_fn!(get_battery(_user_data: (); battery_idx: String) -> String { (1)
        let charge_full = std::fs::read_to_string(
            format!("/sys/class/power_supply/BAT{}/charge_full", battery_idx))?; (2)
        let charge_now = std::fs::read_to_string(
            format!("/sys/class/power_supply/BAT{}/charge_now", battery_idx))?;

        Ok(format!("{} {}", charge_full.trim(), charge_now.trim()))
    });

    /* --- %< --- snip --- %< --- */

    pub(crate) fn build(&self) -> Result<Plugin> {
        let url = self.url.clone().context("Url not set")?;

        // Load wasm plugin
        let wasm = Wasm::file(url.clone());
        let manifest = Manifest::new([wasm]);

        let plugin = extism::PluginBuilder::new(&manifest) (3)
            .with_wasi(true)

            /* --- %< --- snip --- %< --- */

            .with_function("get_battery", [PTR], [PTR],
                           UserData::default(), Self::get_battery) (4)
            .build()?;

        debug!("{}", function_name!());

        Ok(Plugin {
            name: self.name.clone().context("Name not set")?,
            url,
            interval: self.interval.unwrap(),
            plugin: Rc::new(RefCell::new(plugin)),
        })
    }

    /* --- %< --- snip --- %< --- */

1	The macro `host_fn!` allows us to define functions for our webassembly guest
2	Funny how the path of the acpi interface has changed over the years
3	Extism also provides an easy-to-use loader
4	Time to register our host function

And just to complete the usual triplet again, here is what the battery plugin actually does:

subtle-rs/plugins/battery/src/lib.rs

#[host_fn("extism:host/user")]
extern "ExtismHost" {
    fn get_battery(battery_idx: String) -> String; (1)
}

#[plugin_fn] (2)
pub unsafe fn run<'a>() -> FnResult<String> {
    let values: String = unsafe { get_battery("0".into())? }; (3)

    info!("battery {}", values);

    let (charge_full, charge_now) = values.split(" ") (4)
        .filter_map(|v| v.parse::<i32>().ok())
        .collect_tuple()
        .or(Some((1, 0)))
        .unwrap();

    Ok(format!("{}%", charge_now * 100 / charge_full))
}

1	This imports the function from the host
2	Mark this function for export to the host
3	Sadly the `unsafe` here is required…
4	Pretty straight forward - parse and convert with a bit error checking - one line

Error handling

Due to the isolation of the plugins the error handling happens inside the virtual machine:

subtle-rs/src/plugins.rs:118

    /* --- %< --- snip --- %< --- */
impl Plugin {

    pub(crate) fn update(&self) -> Result<String> {
        let res = self.plugin.borrow_mut().call("run", "")?; (1)

        debug!("{}: res={}", function_name!(), res);

        Ok(res)
    }

    /* --- %< --- snip --- %< --- */
}

1	Just a quick call and result check of the plugin function

Recap

Ease of use	Richness of API	Language agnostic	Error handling
Depends on the language, but you can pick from the list of supported ones	All noteworthy API must be provided by the host, like time	Yes, the list of supported language is quite nice	Extism offers easy integration and error checking

Ease of use

Richness of API

Language agnostic

Error handling

Depends on the language, but you can pick from the list of supported ones

All noteworthy API must be provided by the host, like time

Yes, the list of supported language is quite nice

Extism offers easy integration and error checking

Conclusion

Time for a conclusion after such a marathon through many ideas, languages and projects, so we can call this a day. We have seen different approaches of providing an API to essentially shape what a guest or plugin can do in your application. And we have also covered error checking and seen how it can range from being arcane and nasty to be handled entirely by your framework.

I think taken with care the integration of scripting languages can be a great way to ease the hurdle of providing new feature sets. It can also allow different audiences not familiar with the host language or host domain to enrich it. And additionally approaches like webassembly allow to combine the raw processing speed of compiled languages with the ease-of-use flow of scripting.

The list of examples is quite long, but please help yourself:

1. Correctness-ily?

2. Beside crypto, never do that unless this is your entire business

3. ..followed by kilometres of error trace I wasn’t in the mood to fix right now

4. Check mem.c if you are curious

5. Technically everything that can be linked, so other languages might actually be possible here. Dare I to try Rust?

6. Haven’t touched Lua for ages, but apparently this is still true.

7. There are jokes about how every type in Lua is a table..

8. Lua uses negative numbers

9. You probably get the pun..

10. Well, technically this isn’t true, if you consider the outdated Ruby version, but..

11. To me at least..

12. Adding Obj-c to this mix is something for another post..

13. Just kidding - see next shots!

14. Which probably is another API?

15. Does this word exist?

16. Again a stack machine?

]]>

Rewriting in Rust

2025-11-09T17:32:00+01:00

During my career facing legacy code has always been an annoying task and it took me quite some years to understand, that oftentimes today’s code is tomorrow’s legacy. Still, legacy code can be a great opportunity to learn something new and especially when you are the original author of the piece.

This post jumps on the bandwagon of rewriting everything in Rust and elaborates a bit on my personal motivation and learnings of rewriting my pet window manager project subtle, which I started ~20 years^[1] ago and still use it on a daily basis.

Why?

Among the many things AI can do for us, migrating code from one language into another is usually a strong selling point and even without AI there are excellent tools on its own, like C2Rust, to get the job done with just a flick of a finger.

So why is an excellent question.

One of my main motivators isn’t just to get the job done, like I lamented on a bit in my previous blog post, but to have a learning experience and take something from it besides another code base, which easily ticks every point of the legacy code checklist.

Manual labor isn’t probably the most controversial aspect of it, but porting an X11 application in the day year epoch of Wayland might look like a waste of time.

Alas, the reasoning here is basically the same. Plus I’ve spent many years with X11 learning its core concepts and still like the system and capabilities.

On a side note - I am not entirely certain there is a giant switch to get rid of X11 yet, despite how decisions of e.g. the GNOME project^[2] might appear.

Learnings so far

Porting a codebase, like the one of subtle with 14728 LoC (according to sloccount^[3]), brought loads of challenges with it. Some of them were the usual ones like "where to start" and how can this be done in language X, but let us concentrate here on a handful of interesting points.

The problems are inter-related, and it is sometimes a chicken or the egg-type of problem which to address first, so please be prepared to jump a bit around if necessary.

God objects

When I started subtle back then, I didn’t even know that this pattern is called God Object or that it is considered to be prime example of an anti-pattern. To me it was something that I’ve learned by reading other people’s code and looked like a good solution to a problem, which is still relevant today.

Problem

The main problem is kind of easy to explain and mainly related to software design: Your program needs to keep track of data like state or socket descriptors and many related functions have to access and sometimes mutate them.

There are several ways to tackle it, like moving everything related together, but this can also mean there is basically just one big file and C isn’t the strongest language to enforce a proper structure and coherence. It was way easier to have a global object which included every bit and was available throughout the program.

This might obviously lead to interesting side-effects in multi-threaded applications, but fortunately the design goal of subtle has always been to be single-threaded and no other means of locking were required.

What I did not understand back then and which is more of concern here, is the implicit coupling of everything to this god object. This means changing the god object may require changes of other parts of the program and also may unknowingly break other parts of the application.

Solution

subtle-rs (as its predecessor) is event-driven and many parts revolve around a single connection to the X11 server. This connection must be available to most parts and moving everything into the holding object made proper separation of concerns more difficult.

Like every worth-while decision this is a classical trade-off and the original design was kept with the addition to carry the dependency explicitly through the codebase.

C version

subtle/client.c:

void subClientSetSizeHints(SubClient *c, int *flags) {
...
}

Rust version

subtle-rs/client.rs:

pub(crate) fn set_size_hints(&mut self, subtle: &Subtle, mode_flags: &mut ClientFlags) -> Result<()> { (1)
...
}

1	The signature includes a reference to `Subtle`.

RAII

Resource acquisition is initialization (RAII) is another programming idiom, which is less of a concern in C-based languages, but can turn into a problem in strict languages like Rust. Simply put this just means whenever we initialize something like a holding structure, we also have to initialize all of its members due to the general idea of predictable runtime behavior and zero-cost abstraction.

Problem

This easily turns into a problem, whenever the holding structure contains something, that requires some preparation before it can be initialized - like a socket connection:

Problematic C code

struct Holder {
    Connection *conn;
}

Holder *holder = calloc(1, sizeof(Holder)); (1)

holder.conn = MagicallyOpenConnection(); (2)

1	Init the holding structure
2	Open the actual connection

Solution

Since this a more general problem in Rust, there exists a bunch of options with different ergonomics. One of the easiest ways is to wrap the connection in Option, which can be initialized with its default value and set later, but as I’ve said the ergonomics of mutating^[4] something on the inside are bothersome.

A better option alternative is let one of the many cells^[5] handle this job. OnceCell, as the name implies, offers an easy way to initialize our socket once we are prepped.

C version

subtle/subtle.h:

struct subtle_t {
...
    Display *dpy; //< Subtle Xorg display
...
} SubSubtle;

extern SubSubtle *subtle; (1)

1	God mode - on!

subtle/display.c:

void subDisplayInit(const char *display) { (1)
...
    /* Connect to display and setup error handler */
    if (!(subtle->dpy = XOpenDisplay(display))) {
...
}

1	We usually pass the ENV var `DISPLAY`, but `NULL` is also an accepted value.

subtle/subtle.c:

int main(int argc, char *argv[]) {
...
    /* Create subtle */
    subtle = (SubSubtle *) (subSharedMemoryAlloc(1, sizeof(SubSubtle))); (1)
...
}

1	This is just `calloc` with some error handling.

Rust version

subtle-rs/subtle.rs:

pub(crate) struct Subtle {
...
    pub(crate) conn: OnceCell<RustConnection>,
...
}

impl Default for Subtle { (1)
    fn default() -> Self {
        Subtle {
...
            conn: OnceCell::new(), (2)
...
        }
    }
}

1	Unfortunately deriving the Default trait doesn’t work for all members of `Subtle`.
2	This initializes our `OnceCell` with its default value.

subtle-rs/display.rs:

pub(crate) fn init(config: &Config, subtle: &mut Subtle) -> Result<()> {
    let (conn, screen_num) = x11rb::connect(Some(&*config.display))?;
....
    subtle.conn.set(conn).unwrap(); (1)
....
}

1	Error handling here would require more explanation, so let us just forget about it and move on.

subtle-rs/main.rs:

fn main() -> Result<()> {
...
    // Init subtle
    let mut subtle = Subtle::from(&config); (1)
...
    display::init(&config, &mut subtle)?;
...
}

1	`Config` holds the configured values - a courtesy of clap - and we convert it with the help of our From trait implementation.

Borrow checker

Did you wonder why the (in)famous borrow checker isn’t number one on our list of problems? Well, simply because you can come pretty far without running into beloved errors like E0499 or E0502 and grouping problems to keep a common thread is quite difficult.

Anyway, back to the topic at hand: Why can’t we just keep a mutable reference of our god object all the time and pass it around?

Problem

Interestingly this is again more about software design and Rust’s pragmatic way of handling mutability in contrast to other (functional) languages like Haskell. Please have a look at the next code block:

Problematic Rust code

#[derive(Default)] (1)
struct Counter {
    number: u32,
}

impl Counter {
    fn increment(&mut self) { (2)
        self.number += 1;
    }

    fn print(&mut self) { (3)
        println!("number={}", self.number);
    }
}

fn increment_counter(counter: &mut Counter) { (4)
    counter.number += 1;
}

fn print_counter(counter: &mut Counter) { (5)
    println!("counter={}", counter.number);
}

fn main() {
    let mut counter = Counter::default();

    counter.increment(); (6)
    counter.print(); (7)

    increment_counter(&mut counter); (8)
    print_counter(&mut counter); (9)
}

1	Derive is one of Rust’s real work horses.
2	`Mut` required due to write to binding.
3	Is `mut` required here?
4	`Mut`!
5	`Mut`?
6	Implied `mut`!
7	Implied `mut`?
8	`Mut`!
9	Why `mut`?

If you don’t mind trailing all those terribly explicit mut keywords the above code runs fine and if you don’t try to re-borrow anything the aliasing rules work in your favor.

A different story is the coupling and the cognitive load: When everything gets a mutable reference, everything is coupled together and you can never be sure about the side-effects of calling a certain function.

Solution

The easiest and most naive solution to this kind of problem is just omit mut wherever possible.

Adapted Rust code

#[derive(Default)]
struct Counter {
    number: u32,
}

impl Counter {
    fn increment(&mut self) {
        self.number += 1;
    }

    fn print(&self) { (1)
        println!("number={}", self.number);
    }
}

fn increment_counter(counter: &mut Counter) {
    counter.number += 1;
}

fn print_counter(counter: &Counter) { (2)
    println!("number={}", counter.number);
}

fn main() {
    let mut counter = Counter::default();

    counter.increment();
    counter.print();

    increment_counter(&mut counter);
    print_counter(&counter); (3)
}

1	This access is just read-only, so no need for `mut` and also a promises of being side-effect free.
2	See ❶!
3	See ❶!

Interior mutability

Now its getting interesting, and we have to talk about given promises of immutability and one more time about ergonomics of our general design.

With the last problem we established the underlying promise of functions, that don’t require a mutable reference, will never change the object itself and only changes made to a mutable reference are of any consequence to you.

Problem

What happens, when you need to change some internal state, which is just required for internal bookkeeping and doesn’t change anything at all for the caller?

Have a look at following contrived^[6] example:

Problematic Rust code

use std::time::{SystemTime, UNIX_EPOCH};

#[derive(Default)]
struct Counter {
    number: u32,
    last_printed: u32,
}

impl Counter {
    fn increment(&mut self) {
        self.number += 1;
    }

    fn print(&mut self) { (1)
        self.last_printed = SystemTime::now()
            .duration_since(UNIX_EPOCH).unwrap().as_secs() as u32; (2)

        println!("number={}", self.number);
    }
}

fn main() {
    let mut counter = Counter::default();

    counter.increment();
    counter.print();
}

1	To allow our internal bookkeeping the signature must include `mut` now.
2	Error checking skipped for brevity - unwrap all the things!

Here we had to change the methods signature just to allow the pointless action of storing the last printing time, maybe for big data applications, who knows.

From the caller’s perspective it doesn’t make any sense to pass a mutable reference into the print function and from the counter’s perspective^[7] there wasn’t any actual change of the number.

Solution

This is a pretty common problem and Rust provides many different options like Cell and RefCell, Atomic and some more advanced options like the smart pointer Arc for more shenanigans.^[8]

In our case Cell works splendidly here for our type comes prepared with the copy trait:

C version

subtle/subtle.h:

typedef struct subsubtle_t {
...
    int visible_tags; //< Subtle visible tags
...
} SubSubtle;

subtle/screen.c:

void subScreenConfigure(void) {
...
    /* Reset visible tags, views and available clients */
    subtle->visible_tags = 0; (1)
...
    /* Set visible tags and views to ease lookups */
    subtle->visible_tags |= v->tags;
...
}

1	No one can stop us from just accessing our god object directly.

Rust version

subtle-rs/subtle.rs:

pub(crate) struct Subtle {
...
    pub(crate) visible_tags: Cell<Tagging>,
...
}

impl Default for Subtle {
    fn default() -> Self {
        Subtle {
...
            visible_tags: Cell::new(Tagging::empty()),
...
        }
    }
}

subtle-rs/screen.rs:

pub(crate) fn configure(subtle: &Subtle) -> Result<()> {
...
    // Reset visible tags, views and available clients
    let mut visible_tags = Tagging::empty(); (1)
...
    // Set visible tags and views to ease lookups
    visible_tags.insert(view.tags);
...
    subtle.visible_tags.replace(visible_tags); (2)
...
}

1	This is a pretty easy case: We introduce a local variable via let binding first.
2	And then once we are happy with the result we tell the cell to swap-out the content entirely.

Explicit copies

Likewise with mutability, Rust in a similar way annoyingly verbose and explicit with how it handles data and copies of it. Seems like keeping all the guarantees and promises there has to be done some work upfront from every side.

Problem

In the next example we just continue with the counter from before, but the repetition of the struct definition and implementation itself have been removed, since they just divert from the actual problem:

Problematic Rust code

...
fn print_counter(counter: &Counter) {
    counter.print();
}

fn main() {
    let mut counter1 = Counter::default();

    counter1.increment();

    let counter2 = counter1; (1)

    print_counter(&counter1);
    print_counter(&counter2);
}

D’oh!

The above snippet fails to compile for apparent reasons, still the error message of the compiler is kind of a surprise in its detail and content:

error[E0382]: borrow of moved value: `counter1`
  --> src/main.rs:27:19
   |
21 |     let mut counter1 = Counter::default();
   |         ------------ move occurs because `counter1` has type `Counter`, which does not implement the `Copy` trait
...
25 |     let counter2 = counter1;
   |                    -------- value moved here
26 |
27 |     print_counter(&counter1);
   |                   ^^^^^^^^^ value borrowed here after move
   |
note: if `Counter` implemented `Clone`, you could clone the value
  --> src/main.rs:2:1
   |
 2 | struct Counter {
   | ^^^^^^^^^^^^^^ consider implementing `Clone` for this type
...
25 |     let counter2 = counter1;
   |                    -------- you could clone this value

For more information about this error, try `rustc --explain E0382`.
error: could not compile `example` (bin "example") due to 1 previous error

Solution

This is just an example of a really overwhelming and also quite helpful error message from our partner in crimes - the Rust compiler. What it points out here is that we can just add the copy trait marker and also implement the clone trait to satisfy this move.

And like our friendly compiler told us, when we just do as suggested the code runs perfectly fine:

Fixed Rust code

#[derive(Default, Clone, Copy)]
struct Counter {
    number: u32,
}

This innocent assignment over there just introduced the concept of move semantics, that Rust uses internally in its affine type system:

An affine resource can be used at most once, while a linear one must be used exactly once.

— https://en.wikipedia.org/wiki/Substructural_type_system#Affine_type_systems

The definition is quite heavy and somehow unwieldy, but what it basically says, is every type that doesn’t come along with a Copy marker trait is moved and the ownership transferred to the recipient. All other types are just copied along the way.

Accessing the object afterward is a violation of the ownership^[9] model and hence causes such an error.

Conclusion

Writing this blog post has been an interesting experience on its own and helped me to sharpen my understanding of how Rust internally works and also helped me to summarize what I actually learned about it over the course of this project.

Porting such a large codebase from my past into a modern language and also re-visiting many of the taken design choices have been a great experience so far. And in regard to the legacy code aspect I mentioned initially - there are tests but still even I don’t understand some of the odd namings for variables and steps in algorithm anymore. Maybe I should have read Clean Code some years earlier [cleancode]..

I currently do not dare to use subtle-rs as my daily window manager yet, mainly because some required features are still missing like something simple to bring e.g. a clock into the panel, but I am eagerly looking at Extism for this matter.

Naturally I’ve read some books about Rust if you are looking for inspiration:

Most of the examples were taken from following repositories:

https://github.com/unexist/subtle
https://github.com/unexist/subtle-rs

Bibliography

[idiomaticrust] Brenden Matthwes, Idiomatic Rust: Code like a Rustacean, Manning 2024
[coderustpro] Brenden Matthwes, Code Like a Pro in Rust, Manning 2024
[asyncrust] Maxwell Flitton, Caroline Morton, Async Rust: Unleashing the Power of Fearless Concurrency, O’Reilly 2024
[effectiverust] David Drysdale, Effective Rust: 35 Specific Ways to Improve Your Rust Code, O’Reilly 2024
[cleancode] Robert C. Martin, Clean Code: A Handbook of Agile Software Craftsmanship, O’Reilly 2007

1. The first commit is actually a bit older, but apparently moving from CVS > SVN > HG isn’t frictionless

2. GNOME and I had some intermezzos back then, but that is more than a decade ago.

3. The underlying model is good way to annoy every developer, but the tool is still nice to count lines.

4. I’ll use mutate instead of write onwards, because I think it better transports the intent.

5. More on this later.

6. It is really difficult to find an easy example here..

7. Yes, I know..

8. Interior mutability is sometimes also called escape hatch..

9. No post about Rust can be complete without talking about ownership - glad we managed that!

]]>

Fear of missing out on AI

2025-09-22T15:48:00+02:00

It has been roughly two years since my last post regarding my experience with the state of AI (Coding with AI) and I think it is about time to talk about this again.

In contrast to my previous post, I don’t want to dwell on specific products and tools, but talk about some points about that I think we should pay close attention to and why this topic is a generally such a mixed bag to me.

Enough chit-chat, let us begin.

The journey so far

When I look back at the last two years I can probably safely say the whole thing got even more traction, as we’ve left the early adopting phase and AI got a lot of traction.

AI has entered the mainstream and almost everything gets AI support.
Content is specially prepared for AI systems apparently without following best practices.
There is an abundance of new tools and new companies sprout like pop-up stores.
There are experiments to replace general human labor and also specific ones like nurses in healthcare.
There are promises AI reduces our daily working time.

So in hindsight everything went according to plan from 2023:

Gartner https://www.gartner.com/en/articles/hype-cycle-for-artificial-intelligence

Asking the right questions

I’ve spent a lot of time reading about the general progression of AI and besides dozen of blog posts and other articles for and against AI, also some books to get a broader perspective.

Among these books are following:

I somehow begin to wonder what kind of problems are we trying to address really with AI?

When I I look at the business side I’d say the overall themes are increase of productivity^[1] like reducing tiresome and/or manual labor and probably fear of missing out competitive advantage. And on the personal side I mostly see quality-of-life improvements like easy access to information with the help of the natural interfaces like ChatGPT^[2] and generational parts to create memes and reels more easily.

This short list is non-exhaustive mind you, but is sufficient for the points I’d like to make next.

Increase of productivity

Increasing productivity and reducing work time with technology isn’t strictly speaking a new idea, blue collar industrial workers faced this already during the mid-19th century during the industrial age, but for the first time white collar knowledge workers are impacted, and they are probably not backed by any labour union.

There have been lots of riots and protests according to Wikipedia, so apparently the workforce wasn’t all happy with the outcome, but we aren’t there yet, so let us focus on the promise of improved work-life-balance.

Interestingly during that time a strange phenomenon could be observed:

But rather than allowing a massive reduction of working hours to free the world’s population to pursue their own projects, pleasures, visions, and ideas, we have seen the ballooning of not even so much of the ‘service’ sector as of the administrative sector, up to and including the creation of whole new industries like financial services or telemarketing, or the unprecedented expansion of sectors like corporate law, academic and health administration, human resources, and public relations. And these numbers do not even reflect on all those people whose job is to provide administrative, technical, or security support for these industries, or for that matter the whole host of ancillary industries (dog-washers, all-night pizza delivery) that only exist because everyone else is spending so much of their time working in all the other ones. These are what I propose to call ‘bullshit jobs^[3]’.

— David Graeber
https://davidgraeber.org/articles/on-the-phenomenon-of-bullshit-jobs-a-work-rant

This is just an excerpt of an article written for a magazine under the umbrella of things nobody would print, as the author points out in his book Bullshit Jobs [bullshitjobsbook], but still the term hits a mark. There are a lot more examples and explanations in the book or supposedly bullshit jobs and also jobs that just feel like one, but my key takeaway is the implied question what can our highly skilled workforce do for a living, when their field of expertise has been replaced with automatons and we haven’t reached a moneyless utopia yet?

Hell is a collection of individuals who are spending the bulk of their time working on a task they don’t like and are not especially good at. Say they were hired because they were excellent cabinet-makers, and then discover they are expected to spend a great deal of their time frying fish.

— David Graeber
https://davidgraeber.org/articles/on-the-phenomenon-of-bullshit-jobs-a-work-rant

Another issue I see with increasing productivity is the orientation towards throughput or rather output in general in information related topics. Viewed from the business side increasing quantity makes sense to me, this is what the business has been created for in the first place, but is output the only and ulterior goal and the learning how to reach and achieve something can be totally neglected?

Fear of missing out

If your media feeds are like mine, there is probably something about AI every two or three posts and depending on the type of media, like e.g. LinkedIn, the posts are full of promises and how the full potential of AI can be unlocked to utilize it for your business.

Oftentimes these posts appear to be written with the help of AI and especially em-dashes enjoy increasing popularity. I think eating your own dog food is always advisable, so I see no fault there. On the other hand, I can rarely find some kind of empirical evidence or any other kind of proof for these theses and here I consider this usually as a red flag - skepticism can help.

The current hype and pressure increases and academia has also started looking into the phenomenon of development of fear of missing out on AI. And there is an increasingly number of posts and voices besides the ones from common {aibro][AI Bros] who foretell if you don’t start to use AI today you are going to lose your edge.

I won’t cite any of these posts, but if you are curious here is a starter: https://kagi.com/search?q=use+ai+or+lose

Access to information

Let us start with something positive: AI does a splendid job of lowering the bar to access information! Hallucinations vary between dangerous and hilarious and some people are bold enough to state this is an original feature of LLM design, but with our previous established skepticism regarding media consumption this should be fine.

Delivering probability-based answers to question is only part of the deal, another great application of these models if for content generation and both goes perfectly hand in hand:

https://marketoonist.com/wp-content/uploads/2023/03/230327.n.aiwritten.jpg

I personally think we should just stick to the bullet point list instead of applying a "prosa-2-text" conversion twice, but still I wonder what happens to the quality of the information underneath. Writing this blog post or generally writing is a really time-consuming task. Drafting a new post and trying to fill the intended outline with content is a tasks which helps me personally to pinpoint what I really want to say^[4] and I wouldn’t want to miss this journey.

I am a bit afraid the following is more than true:

After all, who are you writing for? Do you care if anybody reads it and how they respond to it? How can you expect anybody to relate to a piece of writing if it was generated by an AI model? If you can’t be bothered to write the entire article, you can’t really expect anybody else to be bothered to read it.

— Ben Morris
https://www.ben-morris.com/ai-and-the-creeping-enshittification-of-work

Impact on society

This is probably the most interesting point and I think it is really difficult to imagine the world to come and visionaries like Sam Altman play a big role in it. Still, when money gets involved things are sometimes getting sour and I think one of more recent posts from Altman really condenses the problem down well:

Sam Altman on X

The implied comparison of mass-produced fast fashion with the overgeneralized idea of Software-as-a-Service is interesting by itself, although I think it is not a good one to promote your AI services. For me two of the pain points of fast fashion are the environmental footprint and the exploitation of people in fabric factories and according to media the same is true for the AI industry. There are many reports of the energy requirements of AI and the references to the Mechanical Turk are also increasing:

Amazon using this name [Amazon Mechanical Turk] or their product is surprisingly on the nose: their system also plays the function of hiding the massive amount of labor needed to make any modern AI infrastructure work. ImageNet, during its development in the late 2000s, was the largest single project hosted on the MTurk platform, according to Li. It took two and a half years and nearly 50,000 workers across 167 countries to create the dataset. In the end, the data contained over 14 million images, labeled across 22,000 categories.

— https://restofworld.org/2025/the-ai-con-book-invisible-labor/

I think the real point he wanted to make is with the help of AI can cheap software be mass-produced instead of paying monthly fees to service providers or individual solutions to problems. And this works actually well with software, since there is a negligible impact on the environment in contrast to physical products.

Conclusion

Currently, I am not exactly sure where we are on the hype cycle from the beginning of this post and I hope the next few months and years will surely show a direction there. We are going to see if history repeats itself in the protests of workers and if the dystopian outlooks of the movie Idiocracy stay a work of fiction.

I think my personal usage of AI won’t sky rocket any time soon, since I am most of the time interested in discovering how and why something can be done and rarely just in a fast solution. Given the situation that I am interested in exactly that and I don’t plan on using it beyond this narrow scope I might ask AI would still write it myself.

For any other stuff that can readily be automated I totally agree to this:

Joanna Macijewska on X

Bibliography

[tamingsiliconvalleybook] Gary F. Marcus, Taming Silicon Valley: How We Can Ensure That AI Works for Us, The MIT Press 2024
[theaiconbook] Emily M. Bender, Alex Hanna, The AI Con: How to Fight Big Tech’s Hype and Create the Future We Want, Harper 2025
[searchesbook] Vauhini Vara, Searches: Selfhood in the Digital Age, Random House 2025
[stupidityparadoxbook] Mats Alvesson, André Spicer, The Stupidity Paradox: The Power and Pitfalls of Function Stupidity at Work, Profile Books 2016
[bullshitjobsbook] David Graeber, Bullshit Jobs: A Theory, Simon & Schuster 2019

1. Read: getting faster

2. Or just "Chatty" as I’ve learned recently

3. Emphasis is mine

4. Or rather I am hoping

]]>

Exploring OCI registries

2025-06-07T15:16:00+02:00

Handling containers is probably something a modern developer can’t and probably should not live without anymore. They provide flexibility, allow easy packaging and also sandboxing of stuff you might not want to have installed on your machine.

Like so often in tech, using something successfully doesn’t imply real understanding how it works under the hood, but I lived quite happily with this black box and all greasy details shrouded in mysteries hidden behind tooling like Podman. This changed, when I started looking for an artifact store for our firmware binary artifacts. I quickly discovered there are many container registries available, but just a few stores for ordinary artifacts without spending large parts of our engineering budget on enterprise license fees. Passing this question to my bubble lead to a suggestion of a good friend to have a look at ORAS, which leverages OCI-compliant registries for exactly what I wanted to literally archive. We are already using Harbor, so moving other artifacts there as well aroused my interest.

So over the course of this article we are going to dive into the container world with a short primer of the duality of OCI, talk about basic usage and a few advanced points like SBOM and signing and conclude with my impression on the technology.

This post includes several introductional chapters as a deep dive into a specific topic. If you are just here for the examples and how to use the tooling quickly jump ahead and wait for us.

What is OCI?

Turns out the Open Container Initiative (OCI) isn’t a single spec by itself, but rather a governance body around several container formats and runtimes - namely:

Runtime Specification (runtime-spec)
Image Specification (image-spec)
Distribution Specification (distribution-spec)

The links lead to the related GitHub projects in case you want to build your own container engine, but I suggest we focus on image-spec, which lays out the structure in all gory details.

Containers inside out

If you’ve dutifully studied the spec the overall structure of an actual container will probably not surprise you. If not believe me, they are less magically than thought, can be fetched with the help of Podman and easily be dissected on the shell:

$ podman save ghcr.io/oras-project/oras:main -o oras.tar
Copying blob 08000c18d16d done   |
...
Writing manifest to image destination
$ tar xvf oras.tar --one-top-level
08000c18d16dadf9553d747a58cf44023423a9ab010aab96cf263d2216b8b350.tar
...
manifest.json
repositories
$ tree oras
oras
├── 08000c18d16dadf9553d747a58cf44023423a9ab010aab96cf263d2216b8b350.tar
...
├── 29ec8736648c6f233d234d989b3daed3178a3ec488db0a41085d192d63321c72
    ├── json
    ├── layer.tar -> ../08000c18d16dadf9553d747a58cf44023423a9ab010aab96cf263d2216b8b350.tar
    └── VERSION
...
├── manifest.json
└── repositories

6 directories, 23 files

Following links in JSON-files and memorizing digests is a bit cumbersome^[1], so let us try arrows in a diagram instead.

Containers mapped out

1	Blobs is the main directory with all adressable filesystem layers and their related metadata defined in the appropriate JSON files config and manifest. The name of the layers are actually digests as well, but to make it easier to follow let us keep the fancy numbers.
2	Config contains entries like meta information about author as well as other runtime information like environment variables, entrypoints, volume mounts etc. as well as infos about specific hardware architecture and OS.
3	rootfs contains an ordered list of the digests that compose the actual image.
4	The manifest just links to the actual configugration by digest and to the layers.
5	And finally the index includes all available manifests and also image annotations.

Mysteries solved, but there is still one essential piece missing - namely media types.

What are media types?

This surprises probably no one, but media types are also covered by a spec ^[2] - the media-spec

There you can see the exhaustive list of the known types and an implementor’s todo list for compliance to the specs. Conversely, this also means as long as we pick something different we are free to fill layers with anything to our liking without triggering a certain behaviour accidentally.

Use-Cases

The next few examples require an OCI-compatible registry and also access to the binaries of oras and cosign and some more. Since installation is usually a hassle, all examples rely on Podman and the well-supported Zot Registry.

Firing up Zot

Setting up our registry is just a piece of cake and shouldn’t raise any eyebrows yet. We pretty much set just the bare essentials - deliberately without any hardening for actual logins.

$ podman run --rm -it --name zot-registry -p 5000:5000 --network=host \
  -v ./infrastructure/zot-registry/config.json:/etc/zot/config.json \ (1)
  ghcr.io/project-zot/zot-linux-amd64:v2.1.2

1	Apart from host stuff we also want to enable the fancy web UI and the CVE scanner - have a glimpse how this can be done on GitHub: https://github.com/unexist/showcase-oci-registries/blob/master/infrastructure/zot-registry/config.json

Once started and after Trivy's update of the vulnerabilities is done we are dutifully greeted with an empty list:

Zot Registry on http://localhost:5000

Time to push our first artifact!

Pushing a binary artifact

Ultimately I want to push embedded software artifacts to the registry, but since this is public and my own project heos-dial isn’t ready yet we are pushing a binary of the Golang version of my faithful todo service:

$ podman run --rm -v .:/workspace -it --network=host \ (1)
    ghcr.io/oras-project/oras:main \
    push localhost:5000/todo-service:latest \
        --artifact-type showcase/todo-service \ (2)
        --plain-http \ (3)
        todo-service/todo-service.bin:application/octet-stream
✓ Uploaded  todo-service/todo-service.bin                                                                                                                                                                                                            26.1/26.1 MB 100.00%   32ms
  └─ sha256:cc8ab19ee7e1f1f7d43b023317c560943dd2c15448ae77a83641e272bc7a5dbc
✓ Uploaded  application/vnd.oci.empty.v1+json (4)
  └─ sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a
✓ Uploaded  application/vnd.oci.image.manifest.v1+json
  └─ sha256:fb1f02fff7f1406ae3aa2d9ebf3f931910b69e99c95e78e211037f11ec8f1eb6
Pushed [registry] localhost:5000/todo-service:latest
ArtifactType: showcase/todo-service
Digest: sha256:fb1f02fff7f1406ae3aa2d9ebf3f931910b69e99c95e78e211037f11ec8f1eb6

1	The ORAS container allows us to call it this way and directly pass our arguments.
2	Here we set our custom artifact type, to be able to distinguish it.
3	No need to make our live miserable with SSL/TLS!
4	This isn’t a real container, so we must provide a https://oras.land/docs/how_to_guides/manifest_config/[dummy config}.

Pull it back

One-way-success, time to get it back:

Naively with Podman

Pulling images from container registries is one of the core tasks of Podman:

$ podman pull localhost:5000/todo-service:latest
Trying to pull localhost:5000/todo-service:latest...
Error: parsing image configuration: unsupported image-specific operation on artifact with type "showcase/todo-service" (1)

1	Unsurprisingly Podman doesn’t understand our custom artifact type and hence refuses to do our bidding.

If Podman cannot connect to your local registry and bails out with http: server gave HTTP response to HTTPS client please make sure to add your insecure registry to your /etc/containers/registries.conf file:

$ tail -n2 /etc/containers/registries.conf
[registries.insecure]
registries = ['localhost:5000']

Confidently with ORAS

Let us try again - this time with ORAS.

$ podman run --rm -v .:/workspace -it --network=host \
    ghcr.io/oras-project/oras:main \
    pull localhost:5000/todo-service:latest --plain-http
✓ Pulled      todo-service/todo-service.bin                                                                                                                                                                                                          26.1/26.1 MB 100.00%   38ms
  └─ sha256:cc8ab19ee7e1f1f7d43b023317c560943dd2c15448ae77a83641e272bc7a5dbc
✓ Pulled      application/vnd.oci.image.manifest.v1+json                                                                                                                                                                                               586/586  B 100.00%   66µs
  └─ sha256:fb1f02fff7f1406ae3aa2d9ebf3f931910b69e99c95e78e211037f11ec8f1eb6
Pulled [registry] localhost:5000/todo-service:latest
Digest: sha256:fb1f02fff7f1406ae3aa2d9ebf3f931910b69e99c95e78e211037f11ec8f1eb6
$ tree todo-service
todo-service
└── todo-service.bin

1 directory, 1 file

Print information about the image

There are several commands available to gather information about images on the registry.

Fetch the manifest

$ podman run --rm -v .:/workspace -it --network=host \
    ghcr.io/oras-project/oras:main \
    manifest fetch --pretty --plain-http \
        localhost:5000/todo-service:latest
{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "artifactType": "showcase/todo-service",
  "config": {
    "mediaType": "application/vnd.oci.empty.v1+json", (1)
    "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
    "size": 2,
    "data": "e30="
  },
  "layers": [
    {
      "mediaType": "application/octet-stream",
      "digest": "sha256:cc8ab19ee7e1f1f7d43b023317c560943dd2c15448ae77a83641e272bc7a5dbc",
      "size": 27352532,
      "annotations": { (2)
        "org.opencontainers.image.title": "todo-service/todo-service.bin"
      }
    }
  ],
  "annotations": {
    "org.opencontainers.image.created": "2025-06-04T11:57:57Z"
  }
}

1	This is our empty dummy config - check the `size` and `data` fields.
2	Annotations are supported as well and can be added with oras push --annotation.

Discover the tree

$ podman run --rm -v .:/workspace -it --network=host \
    ghcr.io/oras-project/oras:main \
    discover --format tree --plain-http \
        localhost:5000/todo-service:latest
localhost:5000/todo-service@sha256:fb1f02fff7f1406ae3aa2d9ebf3f931910b69e99c95e78e211037f11ec8f1eb6

There are many more helpful commands that can be used to interact with stored images, other types of blobs and also with supporting files. Typically among these supporting are museum-less Helm-charts and also SBOM.

What is an SBOM?

A software bill of materials or SBOM is a kind of inventory list of an artifact, which details included software components and assists in securing the software supply chain. This gets more and more attention as it should especially since the log4j vulnerability back then in 2020 and 2021.

There are different formats for SBOM files like SPDX or CycloneDX and also a broad range of tools that support one or more of them as input and output is available.

I am kind of fond^[3] of Anchore with their tools syft and grype and therefore the next examples are going to make use of both of them.

Syfting through

Since my todo service is based on Golang syft can easily scan the source code and assemble our SBOM

$ podman run --rm -v .:/workspace -it --network=host \
    -v ./todo-service:/in \
    docker.io/anchore/syft:latest \
        scan dir:/in -o cyclonedx-json=/workspace/sbom.json (1)
 ✔ Indexed file system                                                                                                                                                                                                                                                    /in
 ✔ Cataloged contents                                                                                                                                                                                        86121fea66864109267c361a1fec880ab49dc5f619205b1f364ecb7ba31eb066
   ├── ✔ Packages                        [70 packages]
   ├── ✔ Executables                     [1 executables]
   ├── ✔ File digests                    [1 files]
   └── ✔ File metadata                   [1 locations]
[0000]  WARN no explicit name and version provided for directory source, deriving artifact ID from the given path (which is not ideal)
A newer version of syft is available for download: 1.26.1 (installed version is 1.26.0) (2)
$ cat sbom.json | jq '.components | length' (3)
71

1	My pick is entirely based on the cool name though.
2	Interesting since I am using the `latest` tag.
3	Quite a lot of components..

Scanning for vulnerabilities

Like Trivy, grype can easily scan from inside a container and provide machine-readable statistics by default:

$ podman run --rm -v .:/workspace -it --network=host \
    docker.io/anchore/grype:latest \
        sbom:/workspace/sbom.json
 ✔ Vulnerability DB                [updated]
 ✔ Scanned for vulnerabilities     [9 vulnerability matches]
   ├── by severity: 1 critical, 2 high, 6 medium, 0 low, 0 negligible
   └── by status:   9 fixed, 0 not-fixed, 0 ignored
NAME                        INSTALLED  FIXED-IN  TYPE       VULNERABILITY        SEVERITY  EPSS%  RISK
golang.org/x/crypto         v0.15.0    0.17.0    go-module  GHSA-45x7-px36-x8w8  Medium    98.45   36.5
golang.org/x/net            v0.18.0    0.23.0    go-module  GHSA-4v7x-pqxf-cx7m  Medium    98.35   33.4
golang.org/x/crypto         v0.15.0    0.31.0    go-module  GHSA-v778-237x-gjrc  Critical  96.91   32.6
google.golang.org/protobuf  v1.31.0    1.33.0    go-module  GHSA-8r3f-844c-mc37  Medium    46.14    0.1
github.com/jackc/pgx/v5     v5.4.3     5.5.4     go-module  GHSA-mrww-27vc-gghv  High      38.06    0.1
golang.org/x/crypto         v0.15.0    0.35.0    go-module  GHSA-hcg3-q754-cr77  High      15.90  < 0.1
golang.org/x/net            v0.18.0    0.38.0    go-module  GHSA-vvgc-356p-c3xw  Medium     5.05  < 0.1
golang.org/x/net            v0.18.0    0.36.0    go-module  GHSA-qxp5-gwg8-xv66  Medium     1.24  < 0.1
github.com/jackc/pgx/v5     v5.4.3     5.5.2     go-module  GHSA-fqpg-rq76-99pq  Medium      N/A    N/A

Attaching our SBOM

If we are content with the scanning result^[4] let us quickly add this to our image:

$ podman run --rm -v .:/workspace -it --network=host \
    ghcr.io/oras-project/oras:main \
    attach localhost:5000/todo-service:latest --plain-http \
        --artifact-type showcase/sbom \ (1)
        sbom.json:application/vnd.cyclonedx+json
✓ Uploaded  sbom.json                                                                                                                                                                                                                                50.1/50.1 KB 100.00%    2ms
  └─ sha256:0690e255a326ee93c96bf1471586bb3bc720a1f660eb1c2ac64bbf95a1bd9693
✓ Exists    application/vnd.oci.empty.v1+json                                                                                                                                                                                                              2/2  B 100.00%     0s
  └─ sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a
✓ Uploaded  application/vnd.oci.image.manifest.v1+json                                                                                                                                                                                                 724/724  B 100.00%    3ms
  └─ sha256:5c6bb144aaed7d3e4eb58ac6bcdbf2a68d0409d5328f81c9d413e9301e2517a9
Attached to [registry] localhost:5000/todo-service@sha256:fb1f02fff7f1406ae3aa2d9ebf3f931910b69e99c95e78e211037f11ec8f1eb6
Digest: sha256:5c6bb144aaed7d3e4eb58ac6bcdbf2a68d0409d5328f81c9d413e9301e2517a9

1	This gave me a bit of a headache, because Zot supports SBOM scanning and also propagates the results on the web UI - see the sidepanel for more information.

SBOM handling in Zot

Unfortunately Zot or rather its internal handling of Trivy just allows scans of known media types and doesn’t rely on any specific media type to identify passed SBOM files:

// https://github.com/project-zot/zot/blob/main/pkg/extensions/search/cve/trivy/scanner.go#L278
func (scanner Scanner) isManifestScanable(digestStr string) (bool, error) {
...
    switch imageLayer.MediaType {
    case ispec.MediaTypeImageLayerGzip, ispec.MediaTypeImageLayer, string(regTypes.DockerLayer): (1)
        continue
    default:
    return false, zerr.ErrScanNotSupported
...
}

1	This relies on borrowed definitions from our known image-spec as well as go-containerregistry.

I thought I just got the type wrong, since many pages I’ve read were a bit vague if it is example/sbom or sbom/example. After several hours I found a pending issue which is kind of related to my problem, but the timestamp of the issue doesn’t look promising though. I’ll put patch-work on my todo list^[5] so I might bring this forward.

https://github.com/project-zot/zot/issues/2415

Discover our changes

And if we run discover again we can see there is a new layer:

$ podman run --rm -v .:/workspace -it --network=host \
    ghcr.io/oras-project/oras:main \
    discover --format tree --plain-http \
        localhost:5000/todo-service:latest
localhost:5000/todo-service@sha256:fb1f02fff7f1406ae3aa2d9ebf3f931910b69e99c95e78e211037f11ec8f1eb6
└── showcase/sbom
    └── sha256:5c6bb144aaed7d3e4eb58ac6bcdbf2a68d0409d5328f81c9d413e9301e2517a9
        └── [annotations]
            └── org.opencontainers.image.created: "2025-06-04T12:40:38Z"

Speaking about security: Just adding images without means of verification if this is the real deal apart from the checksum doesn’t make too much sense too me.

I think the why should be clear, let us talk about how.

Image signing

Needless to say topics like encryption, signatures etc. are usually pretty complicated, so I can gladly there exists lots of tooling to ease this for us dramatically. I did the homework for us in preparation for this post and checked our options. While doing that I found lots of references to notary and {skopeo[skopeo], but the full package and overall documentation of cosign just convinced me and it can basically sign anything in a registry.

In this last chapter we are going to sign our image and specific layers via in-toto attestations with the help of cosign.

Signing the image

Cosign comes with lots of useful commands to create and manage identities, signatures and whatnot, but in the most convenient way it just allows us to select from a list of supported identity provider in our browser per runtime:

$ podman run --rm -v .:/workspace --network=host \
    ghcr.io/sigstore/cosign/cosign:v2.4.1 \
    sign --yes \
        localhost:5000/todo-service:latest
Generating ephemeral keys...
Retrieving signed certificate...
Non-interactive mode detected, using device flow.
Enter the verification code xxxx in your browser at: https://oauth2.sigstore.dev/auth/device?user_code=xxxx (1)
Code will be valid for 300 seconds
Token received!
Successfully verified SCT...
...
By typing 'y', you attest that (1) you are not submitting the personal data of any other person; and (2) you understand and agree to the statement and the Agreement terms at the URLs listed above. (2)
tlog entry created with index: 230160511
Pushing signature to: localhost:5000/todo-service

1	Quickly follow the link and pick one of your liking - we continue with Github here.
2	Glad we added `--yes` - interactivity in container is usually a pain.

And when we check the web UI we can see there is a bit of progress:

Zot Registry on http://localhost:5000

Relying on Zot is nice and good, but there are other ways to do that.

Verification of the image

It all boils down to another simple call of cosign:

$ podman run --rm -v .:/workspace --network=host \
    ghcr.io/sigstore/cosign/cosign:v2.4.1 \
    verify  \
        --certificate-oidc-issuer=https://github.com/login/oauth \ (1)
        --certificate-identity=[email protected] \
        localhost:5000/todo-service:latest | jq ".[] | .critical" (2)
Verification for localhost:5000/todo-service:latest --
The following checks were performed on each of these signatures: (3)
  - The cosign claims were validated
  - Existence of the claims in the transparency log was verified offline
  - The code-signing certificate was verified using trusted certificate authority certificates
{
  "identity": {
    "docker-reference": "localhost:5000/todo-service"
  },
  "image": {
    "docker-manifest-digest": "sha256:fb1f02fff7f1406ae3aa2d9ebf3f931910b69e99c95e78e211037f11ec8f1eb6"
  },
  "type": "cosign container image signature"
}

1	There are several options for verification available - we just rely on issuer and mail.
2	Apparently this critical is nothing of concern and a format specificed by RedHat.
3	This is a short summary of the checks that have been performed during the verification.

Just as a negative test this is how it looks like when the verification actually fails:

$ podman run --rm -v .:/workspace --network=host \
    ghcr.io/sigstore/cosign/cosign:v2.4.1 \
    verify  \
        --certificate-oidc-issuer=https://github.com/login/oauth \
        --certificate-identity=[email protected] \
        localhost:5000/todo-service:latest
Error: no matching signatures: none of the expected identities matched what was in the certificate, got subjects [[email protected]] with issuer https://github.com/login/oauth
main.go:69: error during command execution: no matching signatures: none of the expected identities matched what was in the certificate, got subjects [[email protected]] with issuer https://github.com/login/oauth

First step done - step two is to sign our SBOM as well.

Create an in-toto attestation

If you have made it this far in this post I probably shouldn’t bore you with another spec about in-toto or the framework around it and just provide the examples:

$ DIGEST=`podman run --rm -v .:/workspace -it --network=host \
    ghcr.io/oras-project/oras:main \
    discover --format json --plain-http \
        localhost:5000/todo-service:latest | jq -r ".referrers[].reference"` (1)
$ podman run --rm -v .:/workspace --network=host \
    ghcr.io/sigstore/cosign/cosign:v2.4.1 \
    attest --yes \ (2)
        --type cyclonedx \ (3)
        --predicate /workspace/sbom.json \
        $DIGEST
Generating ephemeral keys...
Retrieving signed certificate...
Non-interactive mode detected, using device flow.
Enter the verification code xxxx in your browser at: https://oauth2.sigstore.dev/auth/device?user_code=xxxx
Code will be valid for 300 seconds
Token received!
Successfully verified SCT...
Using payload from: /workspace/sbom.json
...
By typing 'y', you attest that (1) you are not submitting the personal data of any other person; and (2) you understand and agree to the statement and the Agreement terms at the URLs listed above.
using ephemeral certificate:
-----BEGIN CERTIFICATE-----
LOREMIPSUMDOLORSITAMETCONSECTETURADIPISCINGELIT
MORBIIDSODALESESTVIVAMUSVOLUTPATSODALESTINCIDUNT
...
-----END CERTIFICATE-----

tlog entry created with index: 232176597

1	We need the digest to identify our artifact for the next steps - so please keep it at hand.
2	Don’t forget to deal with the interactive prompt here.
3	Some information about type and name of what cosign is supposed to attest.

cosign still supports the older command attach sbom to attach artifacts, but the it is deprecated and it is generally advised to use proper attestations. There is a heaty debate about its status and maturity though.

Download attestation

As mentiond before this is complex, so let us have a closer look at what we can actually get back.

$ podman run --rm -v .:/workspace --network=host \
    ghcr.io/sigstore/cosign/cosign:v2.4.1 \
    download attestation \
        $DIGEST | jq "del(.payload)" (1)
{
  "payloadType": "application/vnd.in-toto+json", (2)
  "signatures": [
    {
      "keyid": "",
      "sig": "MEYCIQDE4/CeQstLjHLE+ZQ+BCH+aaw2wSWSr9i26d7iuazXrwIhAPtly5XBD6C14s/78vTjuHdLOjj2a9TeSgs0yD6YRrZd"
    }
  ]
}

1	We omit the payload data here - feel free to dump your own base64 blob
2	This is the actual type of the payload that has been transmitted.

If you want to see the actual content of the payload here is a small exercise for you:

$ podman run --rm -v .:/workspace --network=host \
    ghcr.io/sigstore/cosign/cosign:v2.4.1 \
    download attestation \
        $DIGEST | jq -r .payload | base64 -d | jq .predicate

Verification of the attestation

And lastly in the same manner as before the attestation can also be verified by the means of cosign:

$ podman run --rm -v .:/workspace --network=host \
    ghcr.io/sigstore/cosign/cosign:v2.4.1 \
    verify-attestation  \
        --type cyclonedx \
        --certificate-oidc-issuer=https://github.com/login/oauth \
        --certificate-identity=[email protected] \
        $DIGEST | jq ".[] | .critical"
podman run --rm -v .:/workspace --network=host \
    ghcr.io/sigstore/cosign/cosign:v2.4.1 \
    verify-attestation  \
        --type cyclonedx \ (1)
        --certificate-oidc-issuer=https://github.com/login/oauth \
        --certificate-identity=[email protected] \
        $DIGEST > /dev/null (2)

Verification for localhost:5000/todo-service@sha256:5c6bb144aaed7d3e4eb58ac6bcdbf2a68d0409d5328f81c9d413e9301e2517a9 --
The following checks were performed on each of these signatures:
  - The cosign claims were validated
  - Existence of the claims in the transparency log was verified offline
  - The code-signing certificate was verified using trusted certificate authority certificates
Certificate subject: [email protected]
Certificate issuer URL: https://github.com/login/oauth

1	Here we pass some expectations to the checks.
2	We don’t want to see the exact same content from the previous step again.

Passing bogus information or trying to verify the wrong digest leads to an error:

podman run --rm -v .:/workspace --network=host \
    ghcr.io/sigstore/cosign/cosign:v2.4.1 \
    verify-attestation  \
        --type cyclonedx \
        --certificate-oidc-issuer=https://github.com/login/oauth \
        --certificate-identity=[email protected] \
        $DIGEST > /dev/null
Error: no matching attestations: none of the expected identities matched what was in the certificate, got subjects [[email protected]] with issuer https://github.com/login/oauth
main.go:74: error during command execution: no matching attestations: none of the expected identities matched what was in the certificate, got subjects [[email protected]] with issuer https://github.com/login/oauth

Phew that was quite lengthy to reach this point, time for a small recap.

Conclusion

During the course of this post we have seen how OCI-registries can be leveraged to store almost any kind of artifact. The layered structure and format allows to add additional metadata and ancillary artifacts like Helm-charts can be put there to rest as well.

Bill of materials allow quick scan of layers for known vulnerabilities and combined with proper signing can the security of the supply chain be further strengthened. Alas this is also no silver bullet and takes lots of work to get it right in automatic workflows.

I personally think this is a great addition, solves my initial hunt for artifact storage and also eases the handling of all the dependencies of different kind of artifacts in a more secure way. Next stop for me is to compile all this into a shiny new Architecture Decision Record. and discuss is with my team.

All examples can be found here hidden in the taskfiles:

https://github.com/unexist/showcase-oci-registries

1. At least to me

2. Really the last one for the course of this post..

3. Maybe I just like their mascots?

4. I hope we are not - anyway!

5. I’ve got plenty of apps for that..

]]>

Domain storytelling

2025-04-29T16:33:00+02:00

The great enemy of communication is the illusion of it.

— William H. Whyte

I think we can all agree on communication is hard and especially when you want to convey something that is perfectly clear to you. One simple explanation can be the curse of knowledge, but this doesn’t help me (at least) on my next struggle to find the right words without getting frustrated first.

This kind of struggle can be mildly said interesting in most cases during personal communication, but what happens during anything related to business, like complex requirement of your next big product?

During the course of this post I want to put emphasis on visual communication, which can help to support any narrativ and ultimately provide additional help in getting understood.

Like many posts in this blog before, we again use my sample todo application - if you still haven’t seen it yet you can find an OpenAPI specification here:
https://blog.unexist.dev/redoc/#tag/Todo

Simple use-cases

Even with business requirements it is possible to start simple and one of the simplest things a user probably wants to do with our application is following:

A user wants to create a new todo entry.

Simple enough and perfectly straight forward, but the same can be expressed (and supported not replaced mind you) with a simple use-case diagram:

I suppose if I’d ask you for your first thoughts on this example now I’d probably get something in the range of this just adds clutter and is completely overkill for this really simple matter.

So still why do I insist this adds benefits?

Excursion: Visual perception

We humans are really good in visual perception and a lot of information is gathered that way in literally a glimpse.

You can easily verify it on your own: How long does it take to read the single requirement vs how long do you have to look at the picture?

There are so many sources to cite from, if you are curious about this whole topic please give Google Scholar a spin, but going further probably leads away from the point I want to make.

Technical use-cases

Targeting the right audience is also key here, but still adding too much technical jargon and information to a use-case kind of beats the benefit of getting everything in a quick glance.

UML might offer many niceties, but please ask yourself does the extension of the previous use-case add anything of value?

Let’s move on to a more complex use-case.

Advanced use-cases

No creative hat on today, so I am just going to re-use an idea from my previous [logging/tracing showcase][]:

An admin wants to define a list of swear words.
A user wants to create a new todo entry free of swearwords.

This just adds a bit more complexity, but with focus on the business side the updated use-case can look like this:

So far this probably doesn’t bring any real benefit business-wise, so let us quickly add a way to actual see the created todo entries and awestruck our competitors:

There are many more ways to improve these use-cases and I don’t lack funny ideas, but the main goal here was to demonstrate the power of visual use-cases and the story that can unfold.

Instead of creating all of these use-cases in isolation, we can also carry on with the story idea and actually tell them.

Domain-stories

At its heart Domain Storytelling is a workshop format, usually held by a domain expert and a supporting moderator, who share examples how they actually work inside the domain.

While the expert explains the domain, the moderator tries to record the story with a simple pictographic language. Each domain story covers one concrete example and can be directly used to verify if the story has been understood correctly or otherwise adjusted.

This approach allows all participants to learn the domain language (see ubiquitous language), get an understanding of the activities of the domain and also discover boundaries between the different parts (see bounded contexts).

Show and tell

The authors of the book Domain Storytelling [domstory] also provided Egon, a lightweight editor to support the workshop format.

One of my personal favorite features among others it the replay button to actually blend in the different steps like in a good slidedeck.

If we translate our last use-case to a simple domain story, one version could be like this:

Conclusion

Writing and evaluating requirements can be a progressive approach as we have seen with the evolution from a single no-brainer requirement to a more complex one. Going even further, the whole process can be done in a conversational and story-telling way and directly improve the understanding of all participants.

Using diagrams for communication isn’t something new, still I rarely see developers using them. I sometimes think this might be a problem of tooling, but with the rise of documentation-as-code this shouldn’t be an excuse anymore.

Domain storytelling is a different approach to the whole idea and even if you don’t follow this approach by detail, your projects can still benefit from the way Egon tells your stories.

If you interested in this topic and want to read more about it I highly suggest to have a look at these two books:

Domain Storytelling [domstory]
Communication Patterns [viscom]

Bibliography

[domstory] Stefan Hofer, Henning Schwentner, Domain Storytelling: A Collaborative, Visual and Agile Way to Build Domain-Driven Software, Addison-Wesley 2021
[viscom] Jacqui Read, Communication Patterns: A Guide for Developers and Architects, O’Reilly 2023

]]>

OpenAPI and AsciiDoc in the Mix

2025-04-07T18:31:00+02:00

I am getting more and more obsessed with centralized documentation and this isn’t because I enjoy writing documentation (which I unfortunately really do), but more due sheer lack of it in my day job and all the related issues we are currently facing.

Pushing ideas like the one from my previous post (Bringing documentation together) certainly helps to make writing docs easier, but there are still some loose ends to follow - like API.

So this post is going to demonstrate how OpenAPI (or formerly Swagger) can be converted into shiny AsciiDoc and be brought into the mix.

Why OpenAPI?

There are many ways to document API (mind you any documentation is better than none!), but keeping established standards like OpenAPI and AsyncAPI (which isn’t to far off) really help to keep the cognitive churn low while trying to understand what a document is trying to convey.

And from a developer’s perspective there are many low-hanging fruits:

Code-first or API-first - you decide
Many generators in both directions available - like ktor-openapi-tools used in the example
Tools like Swagger UI and Redoc
Comes pre-assembled with a testing tool

Still not sold? https://swagger.io/blog/api-strategy/benefits-of-openapi-api-development/

Converting to AsciiDoc

Again, there are dozens of options to select from.. Since I rely on the confluence publisher plugin my initial pick was something with Maven-integration as well, but unfortunately swagger2asciidoc has been unmaintained for quite some time. I actually tried to use it, but this was more like an educative endeavor for learning what happens to neglected packages.

The next best option and probably should have been my first pick anyway is OpenAPI with its exhaustive list of generators. They offer a plethora of different ways to convert specs and thankfully AsciiDoc is among them.

If we omit all nitty-gritty details it boils down to this call:

$ openapi-generator-cli generate -g asciidoc \
    --skip-validate-spec \ (1)
    --input-spec=src/site/asciidoc/spec/openapi.json \
    --output=src/site/asciidoc (2)

1	Let us ignore version handling and maturity of my own spec for now
2	This is my preferred structure for Maven-based documentations

This can also be run from a container, see either openapi-generator-cli or have a look at my containerfile for even more dependencies.

When everything works well a resulting document like this can be viewed:

Customizing the document

One of the strong points of AsciiDoc is surely its extensibility and this also true for the generator pipeline we are using now.

Per default, the generator offers a lot of different entrypoints to provide custom content for inclusion in the final document, without doing fancy hacks like e.g. an include of the generated document in your own one.

If you have a closer look at the actual generated document you can see lots of commented out includes like:

[abstract]
.Abstract
Simple todo service


// markup not found, no include::{specDir}intro.adoc[opts=optional]

An introduction sounds like a good idea, so we could use the space there to inform our readers about the automatic updates of the document:

$ cat asciidoc/src/site/asciidoc/spec/intro.adoc
[CAUTION]
This page is updated automatically, please do *not* edit manually.

After that we have to tell the generator to actually include our document. When started, it is looking for these templates.{fn-templates}.^[1] inside the specDir, something we haven’t set before, but we are quite able to do.

This only requires a minor change of our previous commandline:

$ openapi-generator-cli generate -g asciidoc \
    --skip-validate-spec \ (1)
    --input-spec=src/site/asciidoc/spec/openapi.json \
    --output=src/site/asciidoc \
    --additional-properties=specDir=spec/,useIntroduction=true (2)

1	Additional properties can be used to pass down configuration directly to the AsciiDoc renderer

And hopefully, a run of the above rewards with an output like this:

There are many more templates that can be filled and I would gladly supply a list, but at the time of writing I just can offer to grep the document on your own:

$ \grep -m 5 "// markup not found" src/site/asciidoc/index.adoc
// markup not found, no include::{specDir}todo/POST/spec.adoc[opts=optional]
// markup not found, no include::{snippetDir}todo/POST/http-request.adoc[opts=optional] (1)
// markup not found, no include::{snippetDir}todo/POST/http-response.adoc[opts=optional]
// markup not found, no include::{specDir}todo/POST/implementation.adoc[opts=optional]
// markup not found, no include::{specDir}todo/\{id\}/DELETE/spec.adoc[opts=optional]

1	Looks like we can also supply snippets to the example sections - neat!

During my tests I stumbled upon a weird behavior, whereas there are different checks per index and generation phase, which have different requirements to the actual path.

This made it necessary for me to fix this with a symlink in my builds:

$ podman run --rm -v .:/openapi -it docker.io/unexist/openapi-builder:0.3 \
  sh -c "cd /openapi \
    && ln -s asciidoc/src/site/asciidoc/spec spec \ (1)
    && openapi-generator-cli generate -g asciidoc \
        --skip-validate-spec \
        --input-spec=asciidoc/src/site/asciidoc/spec/openapi.json \
        --output=asciidoc/src/site/asciidoc \
        --additional-properties=specDir=spec/,useIntroduction=true' \
    && unlink spec" (2)

1	Nasty I know, but maybe I got a response to this when you actually read this post: https://github.com/OpenAPITools/openapi-generator/issues/20996
2	Just tidy up afterwards or rather skip altogether in a container

Publish the document

I think this is the third time I tease how everything can be pushed to Confluence, but since I don’t run any personal instance just feel teased again:

$ mvn -f pom.xml \
    -DCONFLUENCE_URL="unexist.blog" \
    -DCONFLUENCE_SPACE_KEY="UXT" \
    -DCONFLUENCE_ANCESTOR_ID="123" \
    -DCONFLUENCE_USER="unexist" \
    -DCONFLUENCE_TOKEN="secret123" \
    -P generate-docs-and-publish generate-resources

Conclusion

What have we done here? Strictly speaking this doesn’t bring many advantages, especially when the tooling for OpenAPI looks so polished like this:

https://unexist.blog/redoc

The ultimate goal of this is to create a central place where these specifications can be stored, without too many hurdles for non-dev stakeholders. Developers do well, when told the specs can be generated via Makefile.^[2], but what about other roles like e.g. testers?

Back then we rolled a special infrastructure container, which basically included SwaggerUI along with the current versions of our specs, but infrastructure is additional work that has to be done and everything that leads to it must be maintained.

Whatever you do, proving easy access to documentation really helps to reach a common understanding and also might help to keep it up-to-date.

All examples can be found here:

https://github.com/unexist/showcase-documentation-openapi-asciidoc

1. This might be misleading due to the integration of Mustache, but what are they actually called?

2. Or even better via Taskfile!

]]>

Gitlab with Podman

2025-02-15T18:07:00+01:00

Gitlab with Podman

The ultimate goal for my previous post about Dagger was to demonstrate the combination of it with Gitlab and Podman, but unfortunately I ran into so many different problems I made the decision to break it apart.

This is second part of a small series and explains how to set up Gitlab with Podman-in-Podman and various pitfalls along the way.

If you are looking for the first part just follow this link over here:
Building with Dagger.

Preparations

The first step in order to start Gitlab is to provide an SSL cert, but to make this a lot more interesting we rely on a self-signed one:

$ openssl req -newkey rsa:4096 -x509 -sha512 -days 365 -nodes \
    -out gitlab.crt -keyout gitlab.key \
    -addext "subjectAltName=DNS:gitlab" \ (1)
    -subj "/C=DE/ST=DE/L=DE/O=unexist.dev/OU=showcase/CN=gitlab" (2)

1	This line is essential, otherwise Gitlab won’t accept this cert
2	We are going to use gitlab for the hostname, so make sure to add it to your hosts file

Next up is the actual config of Gitlab.

There is aplenty that can actually be configured beforehand, especially in memory constrained environments it is beneficial to disable services like Prometheus, but here we trust in convention over configuration and just include only the bare minimum to run Gitlab:

external_url 'https://gitlab:10443/'
registry_external_url 'https://gitlab:4567' (1)

registry_nginx['enable'] = true
registry_nginx['listen_port'] = 4567 (2)

nginx['ssl_certificate'] = "/etc/gitlab/ssl/gitlab.crt"
nginx['ssl_certificate_key'] = "/etc/gitlab/ssl/gitlab.key"
nginx['listen_port'] = 10443 (3)

1	Setting the ports here causes problems elsewhere, so better also set the ports in <2> and <3>
2	My initial idea was to use the registry as a cache, but more to that later
3	Nginx usually picks the port from external_url, which is not what we want to do

Like Kubernetes, Podman allows to group or rather encapsulate containers in pods and also to convert them afterward, so let us quickly create one:

$ podman pod create -n showcase --network bridge  \
        -p 10022:22 `# Gitlab ssh` \
        -p 10443:10443 `# Gitlab web` \
        -p 4567:4567 `# Gitlab registry`
e91d11fdeb168c5713c9f48a50ab736db59d88ae7e39b807371923dcf4f26199

This can be done with the make target pd-pod-create.

Starting Gitlab

Once everything is in place we can fire up Gitlab:

$ podman run -dit --name gitlab --pod=gitlab \
    --memory=4096m --cpus=4 \
    -v ./gitlab.crt:/etc/gitlab/ssl/gitlab.crt \ (1)
    -v ./gitlab.key:/etc/gitlab/ssl/gitlab.key \
    -v ./gitlab.rb:/etc/gitlab/gitlab.rb \ (2)
    -v ./gitlab-data:/var/opt/gitlab \
    -e GITLAB_ROOT_PASSWORD=YourPassword \ (3)
    docker.io/gitlab/gitlab-ce:latest
17349b87f81aa9eb7230f414923cf491c84a36a87d61057f8dc2f8f82c7ea60a

1	We pass our new certs via volume mounts to Gitlab
2	Our previously modified minimal config
3	Let’s be creative

This can also be done with the make target pd-gitlab.

Once the container is running Gitlab can be reached at following address: https://localhost:10443

Screenshot of the login screen of Gitlab

Great success, but unfortunately Gitlab alone is only half the deal.

Adding a runner

Setting up a runner which is able to spawn new containers inside Podman is a bit tricky and requires to build a specially configured container first.

Luckily for us other people struggled with the same idea and did the heavy lifting for us:

$ podman build -t $(RUNNER_IMAGE_NAME) -f runner/Containerfile \ (1)
    --build-arg=GITLAB_URL=$(GITLAB_URL) \ (2)
    --build-arg=REGISTRY_URL=$(REGISTRY_URL) \
    --build-arg=PODNAME=$(PODNAME)

1	This relies on the pipglr project
2	This is an excerpt from the provided Makefile, so please consider the env variables properly set

This can also be done with the make target pd-runner-podman-build.

Registration of the runner

The current registration process requires us to register a new runner inside Gitlab first and this can be done via Admin CICD Runners New instance runner at: https://localhost:10443/admin/runners

Once submitted the redirection is going to fail, since our host machine doesn’t know the hostname gitlab:

Screenshot of the wrong address

This can be bypassed by just replacing gitlab with localhost or with a quick edit of the hosts file:

$ grep 127 /etc/hosts
127.0.0.1     localhost
127.0.0.1     meanas
127.0.0.1     gitlab

Registration of the actual runner is bit a more involved, but remember the other people? pipglr, the actual hero our story, comes prepared and brings some container labels to execute the registration commands.

I took the liberty to throw everything into a Makefile target, and we just call it directly this time:

$ TOKEN=glrt-t1_QnEnk-yx3sdgVT-DYt7i make pd-runner-podman
# This requires Podman >=4.1 (1)
#podman secret exists REGISTRATION_TOKEN && podman secret rm REGISTRATION_TOKEN || true
#podman secret exists config.toml && podman secret rm config.toml || true
Error: no secret with name or id "REGISTRATION_TOKEN": no such secret
Error: no secret with name or id "config.toml": no such secret
1a02dae2a667dbddbdc8bd7b0
Runtime platform                                    arch=amd64 os=linux pid=1 revision=690ce25c version=17.8.3
Running in system-mode.

Created missing unique system ID                    system_id=s_d3cc561989f6
Verifying runner... is valid                        runner=t1_QnEnk-
Runner registered successfully. Feel free to start it, but if it's running already the config should be automatically reloaded!

Configuration (with the authentication token) was saved in "/etc/gitlab-runner/config.toml"
# Fix SSL config to contact Gitlab registry
db86c90b8d202682014668223
pipglr-storage
pipglr-cache
8230fd623fc59d7621600304efcf1a11b5c9bf7cec5a8de5237b6d0143edb809 (2)

1	I really need to update this, meanwhile even my Debian machine uses a decent version of Podman
2	Yay!

The output looks promising, so let us verify our containers via Podman:

$ podman ps -a --format 'table {{.ID}} {{.Image}} {{.Status}} {{.Names}}'

CONTAINER ID  IMAGE                                    STATUS                   NAMES
bfac4e6acb26  localhost/podman-pause:5.3.2-1737979078  Up 42 minutes            e91d11fdeb16-infra
cc6599fdf8db  docker.io/gitlab/gitlab-ce:latest        Up 42 minutes (healthy)  gitlab
8230fd623fc5  localhost/custom-pip-runner:latest       Up About a minute        pipglr

And there it is, our new runner in the list of Gitlab:

Screenshot of our newly created runner

From here everything should be pretty much self-explanatory and there are loads of good articles how to actually use Gitlab itself like:

Bonus: Running with Dagger

Following the original idea of using Dagger, just another step of preparation is required. Dagger uses another container inside the runner and adds a bit more compexity to the mix:

The containers are nicely stacked, but this requires a specially grafted one for Dagger in order for it to access files:

FROM docker.io/golang:alpine

MAINTAINER Christoph Kappel <[email protected]>

RUN apk add podman podman-docker curl fuse-overlayfs \
    && sed -i 's/#mount_program/mount_program/' /etc/containers/storage.conf \ (1)
    && curl -sL --retry 3 https://dl.dagger.io/dagger/install.sh | BIN_DIR=/usr/local/bin sh

1	This took me quite a while to figure out

Bonus: Caching via registry

With so many containers (1x gitlab + 1x runner + 1x builder) the limit of a free tier can be quicky reached, and it is strongly advised to add some kind of caching layer. Gitlab comes with its own registry and can be used to cache all artifacts locally.

We already did the required configuration in our minimal config, so we just have to push the containers and configure the registry.

$ podman login -u root -p $(GITLAB_PASS) --tls-verify=false https://$(REGISTRY_URL) (1)
$ podman push --tls-verify=false \
    $(BUILDER_IMAGE_NAME):latest $(REGISTRY_URL)/root/showcase-dagger-golang/$(BUILDER_IMAGE_NAME):latest

1	Perfectly set-up environment for sure!

And finally this can be done with the make target pd-gitlab-prepare-cache.

Conclusion

Gitlab is by itself a complex system and adding Podman and Dagger to the mix doesn’t make it easer at all, but probably increases the complexy tenfold.

So what do we actually get?

During my experiments with the trio I quickly ran into many problems and some of them were really challenging. Although I tried to address some of them in this blog post, to make it fellow readers easier to gets started, the whole thing is still complicated.

My original goal was to benefit from the facts to have pipeline knowledge everywhere, since the same pipelines are run locally and in the actual CICD and to be freed from the sales stuff of Docker, but if I consider the cost of this small advantage…

Ultimately I made the decision to postpone every move in this direction for now.

All examples can be found next to the examples from the first post:

https://github.com/unexist/showcase-dagger-golang

]]>

Bringing documentation together

2024-12-26T18:20:00+01:00

Documentation is and was always my strong point and if I look back upon the year, which is about to close, it also has been a huge part inside of this blog and my daily job. During the year one critical problem (among how to motivate create a motivating environment to write documentation) remained:

How can we manage documentation that is scattered among many repositories and documentation systems?

The first problem is easily solved and I also recommended giving Antora a spin for my go-to documentation system AsciiDoc here, but what about the latter?

If you look closely, you can probably find n+1 documentation systems for every language. Examples include Javadoc for Java, Rustdoc for Rust just to name a few I daily use. Visiting all of them is totally beyond the scope of this post, so this post focuses on a more general approach with Doxygen, which also better matches my main motivation to align documentation for application and embedded software engineering.

What is Doxygen?

Doxygen was actually the first documentation generator I’ve ever used and even my oldest C project subtle contains configuration for it.

In a nutshell Doxygen collects special comment blocks from the actual source files, takes care of all the symbols and provides various output formats like HTML in the next example:

 /**
  * @brief Main function (1)
  *
  * @details (2)
  * @startuml
  * main.c -> lang.c : get_lang()
  * @enduml
  *
  * @param[in]  argc  Number of arguments (3)
  * @param[in]  argv  Array with passed commandline arguments
  * @retval  0  Default return value (4)
  **/

 int main(int argc, char *argv[]) {

    printf("Hello, %s", get_lang("NL"));

    return 0;
 }

1	The first section brief briefly (as the name implies) describes the method or function
2	A details block includes more verbose information about the implementation in the source file and can even contain Plantuml diagrams
3	Parameters should surprise no one besides the direction information `in`, `out` or both
4	And lastly return values can also be nicely laid out

Normally Doxygen command starts with a \, but I personally prefer the Javadoc @ version via the config option JAVADOC_AUTOBRIEF.

Doxygen can then be run either locally or even better via container to create the first version of our output:

$ podman run --rm -v /home/unexist/projects/showcase-documentation-asciidoxy:/asciidoxy \
    -it docker.io/unexist/asciidoxy-builder:0.3 \
    sh -c "cd /asciidoxy && doxygen"
Doxygen version used: 1.11.0
Searching for include files...
Searching for example files...
Searching for images...
Searching for dot files...
...
Generate XML output for dir /asciidoxy/src/
Running plantuml with JAVA...
Generating PlantUML png Files in html
type lookup cache used 8/65536 hits=26 misses=8
symbol lookup cache used 16/65536 hits=50 misses=16
finished...

Once done the generated html pages look like this (in dark mode):

Screenshot of the generated Doxygen docs

This works well, but unfortunately creates another documentation artifact somewhere and doesn’t move us any closer to an aggregated documentation - yet.

How can AsciiDoxy help?

Besides the html output from above, Doxygen can also create xml files which include information about all the found symbols, their documentation and also their relationship to each other. Normally this would be quite messy to integrate into Asciidoc, but this is the gap AsciiDoxy closes as we are going to see next.

Originally created by TomTom and hopefully still managed since I’ve opened a bug on Github, it parses the xml files and ultimately provides a short list of AsciiDoc macros for convenient use inside our documents:

${language("cpp")} (1)
${insert("main", leveloffset=2)} (2)
${insert("main", template="customfunc")} (3)

1	Set the language - the Mako templates vary a bit based on the language
2	Insert an actual symbol
3	Insert the same symbol again, but use a different template now

The initial setup is a bit tricky, especially with the different modules, but refer to the showcase and the official manual if you are stuck.

The container from before is equipped with the whole chain, so let us quickly fire it up:

$ podman run --rm -v /home/unexist/projects/showcase-documentation-asciidoxy:/asciidoxy \
    -it docker.io/unexist/asciidoxy-builder:0.3 \
    sh -c "cd /asciidoxy && asciidoxy \
    --require asciidoctor-diagram \
    --spec-file packages.toml \
    --base-dir text \
    --destination-dir src/site/asciidoc \
    --build-dir build \
    --template-dir templates \
    -b adoc \
    text/index.adoc"

    ___              _ _ ____             0.8.7
   /   |  __________(_|_) __ \____  _  ____  __
  / /| | / ___/ ___/ / / / / / __ \| |/_/ / / /
 / ___ |(__  ) /__/ / / /_/ / /_/ />  \___/_/_/_____/\____/_/|_|\__, /
                                      /____/

Collecting packages     : 100%|██████████████████████████████████| 1/1 [00:00<00:00, 226.55pkg/s]
Loading API reference   : 100%|██████████████████████████████████| 1/1 [00:00<00:00, 47.60pkg/s]
Resolving references    : 100%|██████████████████████████████████| 2/2 [00:00<00:00, 1954.48ref/s]
Checking references     : 100%|██████████████████████████████████| 1/1 [00:00<00:00, 28149.69ref/s]
Preparing work directory: 100%|██████████████████████████████████| 2/2 [00:00<00:00, 267.69pkg/s]
Processing asciidoc     : 100%|██████████████████████████████████| 2/2 [00:00<00:00, 67.52file/s]
Copying images          : 100%|██████████████████████████████████| 2/2 [00:00<00:00, 6647.07pkg/s]

Once this step is done AsciiDoxy has expanded all the macros and replaced them with the appropriate AsciiDoc directives like the following for ${insert("main", leveloffset=2)}:

[#cpp-hello_8c_1a0ddf1224851353fc92bfbff6f499fa97,reftext='main']
=== main


[%autofit]
[source,cpp,subs="-specialchars,macros+"]
----
#include <src/hello.c>

int main(int argc,
         char * argv)
----


main

Main function

[plantuml]
....
main.c -> lang.c : get_lang()
....

[cols='h,5a']
|===
| Parameters
|
`int argc`::
Number of arguments

`char * argv`::
Array with passed commandline arguments

| Returns
|
`int`::


|===

The markup is a bit cryptic, but shouldn’t be too hard to understand with a bit of AsciiDoc knowledge.

AsciiDoxy can perfectly generate AsciiDoc documents by itself and even supports multipage documents, but we require an intermediate step for the next part.

Bringing everything together

There is more than one way to generate the prepared document to its final form, but as initially told the general idea is to bring everything together.

I am not that fond of Confluence, but the goal of collecting everything in one place ranks higher than my taste here. Since rendering just the document doesn’t work here, we are going to rely on the asciidoc-confluence-publisher-maven-plugin from before.

This adds some more dependencies and finally explains why the container is based on Maven.

The base call to create the document works in the same manner as before:

$ podman run --rm --dns 8.8.8.8 -v /home/unexist/projects/showcase-documentation-asciidoxy:/asciidoxy \
    -it docker.io/unexist/asciidoxy-builder:0.3 \
    sh -c "cd /asciidoxy && mvn -f pom.xml generate-resources"
[INFO] Scanning for projects...
[INFO]
[INFO] --------------< dev.unexist.showcase:showcase-documentation-asciidoxy >---------------
[INFO] Building showcase-documentation-asciidoxy 0.1
[INFO]   from pom.xml
[INFO] --------------------------------[ jar ]---------------------------------
Downloading from central: https://repo.maven.apache.org/maven2/org/asciidoctor/asciidoctor-maven-plugin/2.1.0/asciidoctor-maven-plugin-2.1.0.pom
...
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 2 resources
[INFO] asciidoctor: WARN: index.adoc: line 60: id assigned to section already in use: cpp-hello_8c_1a0ddf1224851353fc92bfbff6f499fa97
[INFO] Converted /asciidoxy/src/site/asciidoc/index.adoc
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  17.596 s
[INFO] Finished at: 2024-12-26T15:51:23Z
[INFO] ------------------------------------------------------------------------

And if we have a look at our final result:

Screenshot of the generated AsciiDoc docs

Getting the actual document to Confluence is a nice exercise for my dear readers:

$ podman run --rm --dns 8.8.8.8 -v /home/unexist/projects/showcase-documentation-asciidoxy:/asciidoxy \
        -it docker.io/unexist/asciidoxy-builder:$(VERSION) \
        -e CONFLUENCE_URL="unexist.blog" \
        -e CONFLUENCE_SPACE_KEY="UXT" \
        -e CONFLUENCE_ANCESTOR_ID="123" \
        -e CONFLUENCE_USER="unexist" \
        -e CONFLUENCE_TOKEN="secret123" \
        sh -c "cd $(MOUNTPATH) && mvn -f pom.xml -P generate-docs-and-publish generate-resources"

Give it a try, I’ll watch.

Conclusion

Adding Doxygen and AsciiDoxy to the mix allows us to enhance our documentation with rendered meta information directly from the code and supplements the existing features of directly including code by file or tag. Being able to customize the used templates and select per symbol what is included offers great flexibility and still keeps the beautiful look of AsciiDoc.

The additional overhead of the toolchain and the intermediate steps to call Doxygen, AsciDoxy and AsciiDoc on every change is something to consider, but should be a no-brainer within a proper CICD pipeline.

All examples can be found here:

https://github.com/unexist/showcase-documentation-asciidoxy

]]>

Decision records

2024-10-25T17:48:00+02:00

I can probably cite myself from this blog, but writing documentation (not necessarily good documentation mind you, but any at all) is really difficult and keeping it up-to-date nigh on impossible. To ease the pain, some clever people invented tools to write documentation-as-code, so docs can co-exist next to the source and have a better chance of being touched, whenever something is changed.

Based on my personal experience I can say the same is true for any kind of project decisions and good luck finding any hint about them - until I discovered records.

Record types

During the course of this post we are going to do a quick recap of ADR mostly by pointing to links (DRY, you know?), introduce a new type for technical debt (aptly named TDR), have a look at some examples some examples with adapted tooling and talk a bit about the power of the idea, that isn’t covered by the documents alone.

Architecture Decision Records

When I first heard about architecture decisions I was directly intrigued and blogged about, so there is no need to reiterate on this right now, but in hindsight I can say it really took a while for me to actually see the real benefit of them.

It never came to my mind, but why should we stop here?

Technical Debts Records

Michael Stal pretty much got the gist of it and his suggestion is to handle technical debt in the same lieu as architecture decisions.

Documented as code
Well placed next to the actual code or any other kind of source code repository
With some mandatory fields and an open format as a guide rail.

In comparison with architecture decision records, the format of these new records (especially since it is Markdown) looks a bit different, but we are going to cover that later on.

I included the descriptions of the fields in the actual document, just because I cannot explain the fields any better.

Technical Debt Record
====================

Title:
------
A concise name for the technical debt.

Author:
-------
The individual who identified or is documenting the debt.

Version:
--------
The version of the project or component where the debt exists.

Date:
-----
The date when the debt was identified or recorded.

State:
------
The current workflow stage of the technical debt (e.g., Identified, Analyzed, Approved, In Progress, Resolved, Closed, Rejected).

Relations:
----------
Links to other related TDRs to establish connections between different debt items.

Summary:
--------
A brief overview explaining the nature and significance of the technical debt.

Context:
--------
Detailed background information, including why the debt was incurred (e.g., time constraints, outdated technologies).

Impact:
-------
Technical Impact:
- How the debt affects system performance, scalability, maintainability, etc.

Business Impact:
- The repercussions on business operations, customer satisfaction, risk levels, etc.

Symptoms:
---------
Observable signs indicating the presence of technical debt (e.g., frequent bugs, slow performance).

Severity:
---------
The criticality level of the debt (Critical, High, Medium, Low).

Potential Risks:
----------------
Possible adverse outcomes if the debt remains unaddressed (e.g., security vulnerabilities, increased costs).

Proposed Solution:
-------------------
Recommended actions or strategies to resolve the debt.

Cost of Delay:
---------------
Consequences of postponing the resolution of the debt.

Effort to Resolve:
-------------------
Estimated resources, time, and effort required to address the debt.

Dependencies:
-------------
Other tasks, components, or external factors that the resolution of the debt depends on.

Additional Notes:
-----------------
Any other relevant information or considerations related to the debt.

He also provides tooling along with the definition, which is quite nice for a starter.

The format is really close to the one of the ADR, so I did the obvious migration and adapted it to the format already used there.

The drawback of this is the previous tools cannot handle this new format and the adr-tools cannot handle TDR yet.

Record tools

During the course of the last few years I played with the original adr-tools based on the work of its inventor Nat Pryce and added some missing features. Like the pending Asciidoc support, a simple database layer to speed up some of the generators and added simple rss/atom feeds for easier aggregation.

This put me in a perfect position to adapt the tools even further and hack a new format into it under a new umbrella.

I am still playing with the idea to port the shellscripts to Rust - does anyone fancy record-tools-rs?

The following examples demonstrates how the record-tools can be used, starting with the basic steps up to deploying rendered versions to a Confluence instance, since it always pays off to include non-tech-savvy folks.

The record-tools include two examples, one of each kind to kickstart the decision to actually use these formats and keep the intention of the original along with some shameful self advertisement:

= 1. Record architecture decisions

:1: https://unexist.blog/documentation/myself/2024/10/22/decision-records.html

|===
| Proposed Date: | 2024-10-24
| Decision Date: | 2024-10-24
| Proposer:      | Christoph Kappel
| Deciders:      | Christoph Kappel
| Status:        | accepted
| Issues:        | none
| References:    | none
| Priority:      | high
|===

NOTE: *Status types:* drafted | proposed | rejected | accepted | deprecated | superseded +
      *Priority:* low | medium | high

== Context

We need to record the architectural decisions made on this project.

== Proposed Solution

Architecture Decision Records as {1}[summarised by Christoph] might help us as a format.

== Decision

We will use Architecture Decision Records.

== Consequences

None foreseeable.

== Further Information

== Comments

It isn’t strictly necessary to checkout the example, but if you want to play with the tooling:

$ hg clone https://hg.unexist.dev/record-tools
$ # OR: git clone https://github.com/unexist/record-tools
...
$ cd record-tools/example

Create new records

Besides the name, the record-tools basically behave in the same manner like the original version of the tools and for example a new TDB can be created like this:

$ ../src/record-tdb new Usage of log4j (1)

1	This command creates a new record and opens it in your default $EDITOR

Vim with a lovely default color scheme

If you consider the topic of this record there probably comes a lot to your mind what you would like to add, but let us shorten this phase and accept the record as-is and press save :+w.

Supersede old records

Sometimes decisions have to be revised (or superseded) and that couldn’t be more true with technical matters, once more information has been gathered and/or experience with the actual decision could be gained.

$ ../src/record-tdr new -s 2 Usage of zerolog (1)

1	Both are quite incompatible, but zerolog is always worth mentioning

Link records

Under the hood, supersede just overwrites the status of the previous record with supersded and applies links in both directions. This can also be done manually with arbitrary links:

$ ./src/record-tdr link 3 Amends 1 "Amended by" (1)

1	This command links record 3 to 1 long with the relationship of the link forwards and backwards

There isn’t much direct visible effect besides the addition of the links to the Further Information field, but more on this in the next section:

== Further Information

Any other relevant information or considerations related to the debt.

Supersedes link:0002-usage-of-log4j.adoc[2. Usage of Log4j]

Amends link:0001-technical-debt-decision.adoc[1. Record technical debt decisions]

Using generators

The tools include various generators that can be used to generate listings, graphs and even feeds.

Table of Contents (TOC)

The table of contents generates a nice overview of the known records and can additionally prepend and append an intro and an outro, to allow further customization:

$ ../src/record-tdr generate toc -i Intro -o Outro
= TDR records

Intro

* link:0001-technical-debt-decision.adoc[1. Record technical debt decisions]
* link:0002-usage-of-log4j.adoc[2. Usage of log4j]
* link:0003-usage-of-zerolog.adoc[3. Usage of zerolog]

Outro

Atom & RSS

These two generators should be pretty self-explanatory:

$ ../src/record-tdr generate rss (1)
version="1.0" encoding="UTF-8" ?>
version="2.0">
  
    List of all tdr records
    List of all created tdr records
    240
    2024-10-24 12:05
    record-tools
    [email protected]
1. Record technical debt decisions<link>0001-technical-debt-decision.adochigh2024-10-24Status: superseded 2. Usage of log4j<link>0002-usage-of-log4j.adoclow2024-10-22Status: superseded 3. Usage of zerolog<link>0003-usage-of-zerolog.adoclow2024-10-23Status: drafted

1	Either use `rss` `atom` for the specific type

Digraph & Plantuml

Both generators create a graph based on dot - the sole difference is the plantuml version just neatly wraps the output between @startdot and @enddot:

$ ../src/record-tdr generate plantuml
... (1)

1	We omit the output here, because it looks way better directly rendered with Plantuml below

Plantuml doesn’t use the passed links, but when the graph is directly renderes as a a vector graphic (svg) it also includes links:

$ ../src/record-tdr generate digraph | dot -Tsvg > graph.svg

Index

And index accumulates all known records, groups them based on different properties like the severity and combines everything into a clickable page.

This uses the tools quite heavily - or in other words is pretty slow. Therefore it relies on the database to speed things up, which needs to be populated first.

$ ../src/record-tdr generate database
$ ../src/record-tdr generate index
...
== List of all TDR with high severity

[cols="3,1,1,1,1", options="header"]
|===
|Name|Proposed Date|Decision Date|Status|Severity
|<<technical-debt-records/0001-technical-debt-decision.adoc#, 1. Record technical debt decisions>>|2024-10-24|2024-10-24|superseded|high
|===

== List of all TDR with critical severity

[cols="3,1,1,1,1", options="header"]
|===
|Name|Proposed Date|Decision Date|Status|Severity

|===

== List of all TDR

[cols="3,1,1,1,1", options="header"]
|===
|Name|Proposed Date|Decision Date|Status|Severity
|<>|2024-10-24|2024-10-24|superseded|high
|<>|2024-10-24|?|superseded|low
|<>|2024-10-24|?|drafted|low
|===
...

This page can be converted via Asciidoctor and its various backends:

$ ../src/record-adr generate database (1)
$ ../src/record-adr generate index > _adr_autogen.adoc (2)
$ asciidoctor -D architecture-decision-records src/site/asciidoc/architecture-decision-records/*.adoc (3)
$ asciidoctor -D . -I architecture-decision-records /site/asciidoc/architecture-decision-records.adoc (4)
$ asciidoctor -r asciidoctor-pdf -b pdf -D . src/site/asciidoc/architecture-decision-records.adoc (5)

1	Generate the database for both types
2	Generate a neat index page for both types
3	Render the actual documents now
4	Optional step - just in case a PDF version is required

Once rendered the pages should look like this:

Index page

ADR page

Another way of generating the page is via Maven, which is quite handy since it is prerequisite for the next step anyway. Fortunately the example contains all required configuration and all that needs to be done is this:

$ mvn -P generate-docs exec:exec generate-resources (1)

1	The maven exec plugin handles the database generation and index page part

There is a Makefile included in the example that provides convenience targets for the commands like make generate and make publish which will come in handy for the next step.

Publish everything

And finally we want to publish our documents, to make them easy accessible for everyone. There are many different options to pick from, but one of the easiest is to use the Confluence Publisher and put our documents to a Confluence instance of our choice.

Spinning up a confluence instance for this example is quite pointless without a license, so if you really want to see it in action there is some config required in the pom.xml file:



${env.CONFLUENCE_URL} (1)
APPEND_TO_ANCESTOR


${env.CONFLUENCE_SPACE} (2)
${env.CONFLUENCE_ANCESTOR} (3)
${env.CONFLUENCE_USER}
${env.CONFLUENCE_TOKEN}

1	The configuration can either passed by environment variables or be hardcoded - this is up to you
2	This is normally the two letter abbreviation of the space, which can be found within the space settings
3	And finally we also need the ancestor id to append our records to. Problems to find it? Just open the page settings and have a look at the address bar of your browser.

And once everything is set up correctly just fire up following:

$ CONFLUENCE_USER=USER_NAME CONFLUENCE_TOKEN=USER_TOKEN mvn -P generate-docs-and-publish exec:exec generate-resources

Records and culture

Aside from the documentation aspect and way to have these documents kind of guided to the guided document layout, we haven’t spoken of the real power of this yet.

Records foster active collaboration and work splendidly with all kind of crowd thinking. They offer a space to experiment maybe in the form of proof-of-concepts or simple showcase for a particular technologie or to collect further opinions in Writer’s Workshops.

In this way teams are able to contribute to and suggest changes of the overall architecture in the case of ADR and point to critical problems within TDR. This can be a culture change of the involved teams, since it allows a more active participation in the process and especially if they are involved in the actual (democratic?) decision.

Conclusion

We are still experimenting with the actual documents and formats at work, but my personal feeling is this really moves us forward and allows the team more autonomy and offers additional ways for contribution.

Like always all my examples can be found here:

https://github.com/unexist/record-tools

]]>

Monitoring with SigNoz

2024-09-24T17:57:00+02:00

Finding good reasoning to explore different options for monitoring or better [observability] is difficult. Either there wasn’t the singular impact on production yet, that made you lust for better monitoring and/or it is difficult to understand the merit and invest of time.

And even when you make the decision to dive into it, it is always a good idea not to start on production, but with a simple example. Simple examples on the other hand rarely show the real powers, so it usually ends in heavily contrived ones like the one I’ve used in my last post about Logging vs Tracing.

Still, nobody got fired for buying IBM ramping up monitoring, so let us - for the course of this post - put our EFK stack and friends aside and get started with something shiny new in Golang.

What is SigNoz?

If you are like me and you haven’t heard the name SigNoz before the first and foremost questions are probably what is SigNoz and why not one of these solutions insert random product here.

From a marketing perspective the key selling point for me probably and honestly was the headline on the frontpage:

OpenTelemetry-Native Logs, Metrics and Traces in a single pane

— https://signoz.io

Without knowing prior to that, this was exactly what I need, so well done marketing:

Seems to be FOSS
Single solution to address the three pillars
Nice and complete package

That sounds rather too good, but time to put on my wizard hat and to check the brief. Before messing with Docker, I checked the documentation and discovered an architecture overview and this looks like they hold their part of the bargain:

1	Apps can directly send data to SigNoz
2	Otel collectors can transmit data as well
3	Internally another custom collector provides the endpoints to receive data
4	Though I haven’t heared of Clickhouse either before, but columnar storage sounds about right
5	Some abstraction to query the actual data
6	Alert Manager keeps tracks and handles all the various alerts - glad they haven’t reinvented the wheel
7	And the shiny bit we’ve spoken of before

Collecting data

Once Signoz is running, which basically boils down to calling docker-compose, a nice starter question is how to deliver your actual data to it.

OpenTelemetry is the defacto standard for that and offers many ways to gather, collect and transmit data via highly configurable pipelines. The only noteworthy thing here to pay attention to the size of the generated logs - which may cause some headaches as it did for me during my vacation.

While playing with SigNoz I discovered it doesn’t connect each of its containers separately to an OpenTelemetry Collector.^[1], but passes this task entirely to a container with logspout.

After a quick glance at the Github page marketing did its thing again:

Logspout is a log router for Docker containers that runs inside Docker. It attaches to all containers on a host, then routes their logs wherever you want. It also has an extensible module system.

— https://github.com/gliderlabs/logspout

Alright, this still sounds like a splendid idea and is exactly we do in the example. In fact, there isn’t much we have to configure at all:

Docker needs a minimal config to get us started:

  logspout:
    container_name: todo-logspout
    image: "docker.io/gliderlabs/logspout:latest"
    pull_policy: if_not_present
    volumes: (1)
      - /etc/hostname:/etc/host_hostname:ro
      - /var/run/docker.sock:/var/run/docker.sock
    command: syslog+tcp://otelcol:2255 (2)
    depends_on:
      - otelcol
    restart: on-failure

1	Logspout needs access to the Docker socket and hostmapping for convenience
2	This configures a connection to a receiver of our otelcol instance and comes up next

And we have to define a receiver in otelcol:

receivers:
  tcplog/docker:
    listen_address: "0.0.0.0:2255"
    operators: (1)
      - type: regex_parser (2)
        regex: '^<([0-9]+)>[0-9]+ (?P[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]+)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?) (?P\S+) (?P\S+) [0-9]+ - -( (?P.*))?'
        timestamp:
          parse_from: attributes.timestamp
          layout: '%Y-%m-%dT%H:%M:%S.%LZ'
      - type: move (3)
        from: attributes["body"]
        to: body
      - type: remove
        field: attributes.timestamp
      - type: filter (4)
        id: logs_filter
        expr: 'attributes.container_name matches "^todo-(postgres|otelcol|logspout)"'
      - type: json_parser
        parse_form: body

1	Operators allow to parse, modify and filter entries
2	This is the default format of the messages logspout forwards to otelcol
3	We basically move our content to the actual body of the entry
4	There might be lots of different containers running, so we limit the entries based on container names

Pillars in practice

There is plenty of explanation and definition out there, way better than I can ever provide, but just to recall the three back to our memory:

Logging

Historical records of system events and errors

Tracing

Visualization of requests flowing through (distributed) systems

Metrics

Numerical data like e.g. performance, response time, memory consumption

Logging

The first pillar is probably the easiest and there is also lots of help and reasoning out there, including this blog.

So best we can do is throw in zerolog, add some handling in a Gin-gonic middleware and move on:

logEvent.Str("client_id", param.ClientIP). (1)
    Str("correlation_id", correlationId). (2)
    Str("method", param.Method).
    Int("status_code", param.StatusCode).
    Int("body_size", param.BodySize).
    Str("path", param.Path).
    Str("latency", param.Latency.String()).
    Msg(param.ErrorMessage)

1	The essential mapping magic happens here
2	A correlation id can help to aggregate log messages of the same origin

SigNoz offers lots of different options to search data and if you have any experience with Kibana and the likes you will probably feel right away at home:

There is also no reason to shy away if you require some kind of aggregation and diagrams with fancy bars:

Tracing

The second pillar is a slightly different beast and requires special code to enhance and propagate a trace - this is generally called instrumentation.

OpenTelemetry provides the required toolkit to start a tracer and also add spans:

func (resource *TodoResource) createTodo(context *gin.Context) {
    tracer := otel.GetTracerProvider().Tracer("todo-resource") (1)
    ctx, span := tracer.Start(context.Request.Context(), "create-todo",
        trace.WithSpanKind(trace.SpanKindServer))
    defer span.End()

    var todo domain.Todo

    if nil == context.Bind(&todo) {
        var err error

        // Fetch id
        todo.UUID, err = resource.idService.GetId(ctx)

        if nil != err {
            context.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})

            span.SetStatus(http.StatusBadRequest, "UUID failed") (2)
            span.RecordError(err) (3)

            return
        }

        // Create todo
        if err = resource.todoService.CreateTodo(ctx, &todo); nil != err {
            context.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})

            return
        }
    } else {
        context.JSON(http.StatusBadRequest, "Invalid request payload")

        return
    }

    span.SetStatus(http.StatusCreated, "Todo created")
    span.SetAttributes(attribute.Int("id", todo.ID), attribute.String("uuid", todo.UUID)) (4)

    context.JSON(http.StatusCreated, todo)
}

1	This creates a tracer based on the current context
2	Spans as working unit of a trace can include a status
3	Error messages can also be thrown in
4	And they can also include different types of general span attributes

The above code calls the id-service and demonstrates how traces can be continued and passed between service boundaries:

func (service *IdService) GetId(ctx context.Context) (string, error) {
    tracer := otel.GetTracerProvider().Tracer("todo-service")
    _, span := tracer.Start(ctx, "get-id")
    defer span.End()

    response, err := otelhttp.Get(ctx, fmt.Sprintf("http://%s/id",
        utils.GetEnvOrDefault("APP_ID_HOST_PORT", "localhost:8081"))) (1)

    if err != nil {
        return "", err
    }

    jsonBytes, _ := io.ReadAll(response.Body)

    var reply IdServiceReply

    err = json.Unmarshal(jsonBytes, &reply)

    if err != nil {
        return "", err
    }

    return reply.UUID, nil
}

1	The otelhttp package makes it really easy to propagate traces

When everything is set up correctly propagated traces look like this:

Metrics

The last pillar is one of the most interesting and probably the most troublesome, since there is no easy recipe what could and what should be done.

Metrics can generally be of following types:

Counter

A simple monotonically increasing counter which can be reset

Gauge

A single value that can go arbitrarily up and down

Histogram

A time series of counter values and a sum

Summary

A histogram with a sum and quantile over a sliding window

This allows a broad range of measurements like the count of requests or the avg latency between each of them and has to be figured out for each service or rather service landscape individually.

Still, when there are metrics they can be displayed on dashboards like this:

Alerts

Although not directly related to the three pillars, alerts are a nice mechanic to define thresholds and intervals to receive notification over various kind of channels.

The documentation is as usual quite nice and there isn’t much to add here, besides the fact a paid subscription is required to connect SigNoz to teams. There is also a way to fallback to Power Automate, unfortunately this requires another subscription.

A little hack is to use connectors for Prometheus, but please consider supporting the good work of the folks of SigNoz:

https://github.com/prometheus-msteams/prometheus-msteams

Conclusion

SigNoz is a great alternative to the established different solutions like EFK or Grafana in a well-rounded package. It is easy to install and so far as I can say easy to maintain and definitely worth a try.

All examples can be found here:

https://github.com/unexist/showcase-signoz-golang

1. otelcol in short

]]>