This is part sixteen in a series on the 0.3 version of the language spec for the Merg-E Domain Specific Language for the InnuenDo Web 3.0 stack. I'll add more parts to the below list as the spec progresses:

part 1 : coding style, files, merging, scoping, name resolution and synchronisation
part 2 : reverse markdown for documentation
part 3 : Actors and pools.
part 4 : Semantic locks, blockers, continuation points and hazardous blockers
part 5 : Semantic lexing, DAGs, prune / ent and alias.
part 6 : DAGs and DataFrames as only data structures, and inline lambdas for pure compute.
part 7 : Freezing
part 8 : Attenuation, decomposition, and membranes
part 9 : Sensitive data in immutables and future vault support.
part 10 : Scalars and High Fidelity JSON
part 11 : Operators, expressions and precedence.
part 12 : Robust integers and integer bitwidth generic programming
part 13 : The Merg-E ownership model, capture rules, and the --trustmebro compiler flag.
part 14 : Actorcitos and structural iterators
part 15 : Explicit actorcitos, non-inline structural iterators, runtimes, and abstract scheduler pipeline.
part 16 : async functions and resources and the full use of InnuenDo VaultFS

In part 9 we looked shortly at the Merg-E vault interface for the storage and retrieval of things like API keys. In this post we look deeper into the use of the InnuenDo VaultFS, not just by user code, but by the runtime as well, and what that means for crypto and web APIs. But first we will look at another abstraction pattern that is steered by the async modifier.

The async scheduling modifier for ambient async resources

In part 6 we already saw the inline pattern for lambdas that allowed us to write common form no-return Merg-E functions as a flat-scheduled abstraction that would let us write code as if Merg-E was a synchronous language with return value functions. In this section we are going to look at a similar pattern that is primarily meant for asynchronous resource access from the ambient DAG, but as before, we are exposing the runtime abstraction to the language to gain an extra language feature for free, like we did before with the actorcitos in part 14.

The first thing to realize is that async is a modifier. To be specific, it is a scheduling modifier. We will later see what this means, but let's first look at its usage.

function  myFunction (user str password str)::{
  }{
    ...
    }!!{
      switch scope.exceptions[-1] {
        resource_error : {
          .. 
          };
        };
      };
blocker mf;
mf += myFunction("pibara", async ambient.async.vault "pibara-password");
await.all mf;

So what does the async modifier do here? Basically it makes the whole function invocation a delayed partial function application. The expression

  ambient.async.vault "pibara-password"

is an asynchronous operation that eventually yields a string that should be applied to the function argument list prior to the invoked function being actually scheduled. In this case, it extracts the value of user.vaultmisc.code.pibara-password from InnuenDo VaultFS, more on that later, and when this succeeds, asynchronously, only then will myFunction get scheduled. If it fails though, the error scope of the invoked function will get invoked instead with a resource_error as error.

The async scheduling modifier for custom resources

We can turn any async function with a single argument callable as last function argument into a pseudo resource, so we can use the async modifier on our own code.

function fakeResource (foo string bar int return callable<string>)::{
  }{
    blocker bl1;
    ...
    await.all bl1;
    ...
    return(...);
    };
mf += myFunction("innuendo", async fakeResource("Foo", "Bar"));

Again this denotes a scheduling modifier for the function invocation. Note that like for the lambdas, the return is missing from the invocation expression. This is handled by the pattern the modifier triggers. The actual fakeResource function gets invoked with a callable supplied by the runtime, that callable captures the pseudo return value, and uses it in the eventual invocation of myFunction.

Types of sensitive data in InnuenDo VaultFS

At the time of writing InnuenDo VaultFS is still under construction. The design though is very much complete. The filesystem provides a bit of private key/value storage in the form of extended attributes in a user space filesystem. The privacy level is defined as a type of executable identity for binaries and scripts run by a trusted interpreter (Merg-E aims to eventually provide such a trusted interpreter). Basically the privacy identity is defined as:

Process user-id plus executable path for binaries not marked as trusted interpreter
process user-id plus script path for binaries or scripts marked as trusted interpreter

A trusted interpreter can be an actual scripting language interpreter or a VM implementation.

Within the private space, key/value storage is split up along three axes:

Administered vs Self Custody
Code vs Runtime
Bindings vs Variables

Administered key/value pairs are always read only from the code or runtime implementation, and they are created, updated and deleted by a special privilege tool, ivadm. Like everything else, ivadm is identified by process user id plus executable path, and ivadm has self custody over an admin password in order to make administration a type of in-uid level sudo. The ivadm tool can add administered key/value pairs to every identity's vault, but it has write only access, and only within the context of the same user id. It can add, set and delete, but it can't read. In contrast, self custody key-value pairs can only be read and written by a process with the particular id itself.

Orthogonal to the Administered vs Self Custody axes, we have the code versus runtime axes. When the runtime doesn't provide an ambient API for the particular sensitive data bit, the sensitive data is made available to the code directly. For the runtime that implements the 0.3 version of the language, this will basically be everything. But if instead there is an encapsulating API within the ambient API, for example for signing transactions with a signing key, then a runtime key/value pair is used for such a signing key. These key/value pairs are not available to the code, only through something like a signing API.

Finally we have bindings versus variables. Bindings are the constants of InnuenDo VaultFS. They can be set once and can not be deleted or changed. Variables though can be.

Accessing administered sensitive data from code

Let's take a small step back to our first example from part 9:

sensitive string apiKey = ambient.vault "innuendo-api-key";

Note that in this code there is no async. The reason there is no async is that Merg-E assumes that Administered key/value pairs don't need to be life changable. When the runtime starts, all administered key/value pairs are loaded into the runtime so they are available in a convenient synchronous way. The above implies administered and code, and either of bindings or variables. Because it is read only, that distinction only matters for the ivadm tool, not for the code.

Self custody in InnuenDo VaultFS from code

We already looked at reading existing key/value pairs from InnuenDo VaultFS while looking at async resources, but let's revisit:

function  myFunction (user str password str)::{
  }{
    ...
    }!!{
      switch scope.exceptions[-1] {
        resource_error : {
          .. 
          };
        };
      };
blocker mf;
mf += myFunction("pibara", async  "pibara-password");
await.all mf;

This reads the value for a given key and uses it as one function argument in an eventual function invocation.

Merg-E considers InnuenDo VaultFS as a safe place for sparse or password capabilities as key. As such, from a least authority perspective, Merg-E does NOT provide any facilities to list or iterate self custody keys.

But two things that it does provide an API for creating and deleting sensitive key/value pairs.

function setIdentityKey (secret string)::{
  ambient.async.vault.bind as bind;
  }{
    blocker blk1;
    blk1 += bind "myid" secret;
    await.all blk1;
    };
blocker mv;
mv+ setIdentityKey(async ambient.async.entropy.system.string 32);
await.all mv;

This example uses another async resource for system entropy, and it uses this to bind an identity in the self custody part of the vault.

Doing this as a variable rather than a binding is similar.

function setOrUpdateIdentityKey (secret string)::{
  ambient.async.vault.set as set;
  }{
    blocker blk1;
    blk1 += set "tmpid" secret;
    await.all blk1;
    };
blocker mv;
mv+ setOrUpdateIdentityKey(async ambient.async.entropy.system.string 32);
await.all mv;

Finally deleting is a bit simpler:

blocker mv;
blk1 += ambient.async.vault.delkey "tmpid";
await.all blk1;

Resource-shielded sensitive data in InnuenDo VaultFS

This part is provisional as there are no shielding resources planned yet for the Merg-E ambient tree, nor supporting-types in the lang tree. It is important though to communicate intent with a fictional example.Please note the API described here may never exist in this form, but it shows the general idea of being able to designate without being able to read sensitive data.

callable<string, callable<string>> signer = ambient.signer.ecdsa "hive-posting-key@pibara";
done = blocker;
done += signer(myTransaction, hivePoster);
await.all done;

The idea is that "hive-posting-key@pibara" designates a key/value pair in the runtime namespace that is inaccessible to the Merg-E code but not to the runtime. In this fictitious example, a signer callable is instantiated from ambient, loaded with the value from the designated key/value pair, a callable that can be called and that will use the sensitive data, in this case a signing key, but will not disclose it to the code.

Commandline arguments

The first version of the Merg-E runtime won't in reality be a trusted interpreter. We will treat it as such for language development reasons, but until the second iteration of the runtime, this will be faking it until we are making it. One thing that we are going to implement though is command line exposure.

Let's recall the preliminaries for a trusted interpreter:

User code can not write to argv. Writable argv, the command line arguments of your program allow your code to spoof what interpreted code they are. They can't spoof a native binary, because there are other ways for iprocfs to figure that out, but they can spoof everything else.
Interpreted code paths used by the interpreter must be absolute paths, an interpreter that allows relative path script or VM bytecode is not considered trusted.
Non-code files accessed through absolute paths must be of known data types and their signature start should not be a possible valid script or VM bytecode.

In this section we are looking at number one and two.

In Merg-E we choose to not make the argv memory directly accessible to the code. Instead we map argv into the scope dag under scope.arguments. Next to that we state that except for the interpreter path as first argument in argv, all other arguments come in pairs with a --option val syntax. The runtime will eat up specific pairs like --run <scriptpath>, but even these will be made available under scope.arguments.

Any argument starting with a "/" will be interpreted as a path, and the Merg-E interpreter will not start if the path designated does not exist. Internally the string constant will get a path annotation, forming the root of a string annotation system that we will write about in a later post. The core property is that you cannot derive paths without starting off in an absolute path.

By default all values are strings, but just as in HF-JSON and in vault values, using HF-JSON field notation, it is possible to define values of another type:

merg-e --run /path/to/myapp.mrg --mynum HF:int{16}42:f37b

This is currently not super convenient because of the CRC16, but tooling shall be added to work around this.

So what happens in scope.arguments? Well for every pair, a scalar node is added:

scope.arguments.run : string /path/to/myapp.mrg annotation:path
scope.arguments.mynum : int16 42

Both are immutable.

Coming up

This post was originally unplanned, it was the result of some gaps in the language spec that became apparent while implementing InnuenDo VaultFS.

After this post, my priority goes back to completing Innuengo VaultFS and the semantic lexer for Merg-E, the parser, and the scheduler pipeline (Yggdrasyl and Níðhöggr) for the development runtime, so expect posts on my progress first before extensions to this series on the v0.3 language specs.

Version 0.3 of the Merg-E language specification : async functions and resources and the full use of InnuenDo VaultFS