Hive Blog
PostsProposalsWitnessesOur dApps

    Progress on InnuenDo VaultFS

    (67)inHiveDevs•
    Reblog

    In this discussion in the cap-talk group on Google Groups, and in this post about sensitive data in the Merg-E language that I'm working on the need for a vault solution for the Merg-E runtime became apparent.

    I have some experience with FUSE file-systems in Linux, having written the least authority filesystem MinorFS, and the computer forensic filesystems CarvFS and MattockFS in the past, and having worked on and abandoned MinorFS2.

    Given that the vault solution I need is just a side project for Merg-E, I'm not going to try to make it perfect or make it do more than the absolute minimum I need for the Merg-E runtime, what is a simple safe (in a specific meaning of the word 'safe') place to store sensitive data such as API keys or Web3.0 crypt keys for bots and/or Web 3.0 L2 node implementations.

    No fancy caps or mandatory reliance on AppArmor or SELinux, no built in crypto layer, just access control patterns for protecting sensitive data from same UID attacks from below. The absolute minimum to create just enough vault for key/value pairs that the Merg-E DSL is going to need to be a least authority DSL.

    I've started on this side project 10 days ago and have been working on it for two weekends and a few hours every night, and while it's not ready yet, probably will take at least one more weekend, maybe two before a first full beta, I've made enough progress to share a few words on my side project.

    A little about the Merg-E roadmap

    Right now there are multiple runtime implementations on my envisioned roadmap for the Merg-E DSL. The first one will basically be just for me. It's a python-everything implementation. Basically a minimal runtime in python with a lexer and a parser for running Merg-E as an interpreted language, no REPL, but basically a scripting language version of Merg-E just to try out syntax and semantics, see if things really work as envisioned and tune until I feel comfortable that the language as outlined is going to work as envisioned. This is going to be like a private little dev runtime that runs on top of python. So we will have Merg-E code running on top of a Merg-E interpreter that in turn runs on top of a Python interpreter. It is important to remember this as it is a setup that InnuenDo VaultFS is going to need to support, even when accepting that it won't meet least authority security requirements. The dev runtime isn't meant for any kind of production code. It is meant only for developing the Merg-E language specs and testing the viability of existing language design specs. This runtime will access CoinZdense through the planned python language bindings for libcoinzdense.

    A second runtime will be the first usable one. A compiled Merg-E interpreter, still with Merg-E as a non-REPL script language. This is the script runtime. This runtime will embed libcoinzdense.

    A third runtime I'm envisioning is fully compiled Merg-E code linked against the core runtime code and libcoinzdense code. This is the embedded runtime.

    There are two other runtimes on the roadmap, both might never see the light of day because of my rather big stack of projects for the InnuenDo Stack, and they are not that relevant for the subject of this post, but for completeness, these two are about embedding SYCL compiled Merg-E lambdas into the embedded runtime, and about a full port to the BEAM virtual machine.

    On trusted interpreters.

    Before we can get into InnuenDo VaultFS, we need to discuss the subject of trusted interpreters, as these define quite a large chunk of the InnuenDo VaultFS security model underpinnings. Code that you run on your system is either a compiled native binary, or something running on top of an interpreter. An interpreter itself can also either be a compiled native binary. Interpreters can be scripting language interpreters or virtual machines running compiled non-native code, but the bottom line is, they are native or interpreters running non-native code.

    So when, according to Innuendo VaultFS is an interpreted trusted?

    That definition is basically derived from how smart (or dumb) one part of Innuendo VaultFS, iprocfs is, and from what the interpreted code can do to spoof the data that iprocfs relies on. We will look into that a little deeper later on.

    So what does InnuenDo VaultFS currently consider requisites for a trusted interpreter.

    1. User code can not write to argv. Writable argv, the command line arguments of your program allow your code to spoof what interpreted code they are. They can't spoof a native binary, because there are other ways for iprocfs to figure that out, but they can spoof everything else.
    2. Interpreted code paths used by the interpreter must be absolute paths, an interpreter that allows relative path script or VM bytecode is not considered trusted.
    3. Non-code files accessed through absolute paths must be of known data types and their signature start should not be a possible valid script or VM bytecode.

    As stated, the project right now aims primarily at Merg-E, and in Merg-E the only two valid data file formats for rule number 3 are parquet and HF-JSON, what might using InnuenDo VaultFS a non-match for other interpreted least authority languages, but if you happen to be working on one and you are able to adhere to one and two, please reach out if I should expand on number threee that in the current implementation accepts only parquet and json (in the broadest sense of JSON, not just HF-JSON).

    The three parts of Innuendeo VaultFS

    InnuenDo VaultFS consists of three parts. Two user-space file-systems, iprocfs and ivaultfs, plus one executable tool ivaultadmin. Right now I've completed a very early implementation of the user-space file-systems. The iprocfs filesystem is about the what that is accessing the data, about a what identity so to speak. What executable needs it's data to be private. The ivaultfs is all about compartmenting the data per what id, and about the split between the owning what and the admin what. Then finally the ivaultadmin tool is about granting the user password protected write only access to specific sections of the per what id compartments.

    Program granular process identity: iprocfs

    The iprocfs user space file-system is a small limited scope user space file-system that runs as root on top of the Linux /proc file-system. It is a trusted subsystem that exposes info from /proc/$PID/exe and /proc/$PID/cmdline to one and only one user id: the user id of the ivaultfs user space file-system. It will do a little bit more than just look at /proc to do this work in an what id creating way.

    In short, iprocfs takes a process id or pid and translates it to a runnable is, what we consider to be half of a what id in InnuenDo VaultFS, the other part being the user id or uid.

    When iprocfs is started, it is given a mountpoint the name of the ivaultfs user and a list of trusted interpreters.

    Something like:

    iprocfs -d /mnt/iproc/ ivaultfs /usr/local/bin/python3.12 /usr/local/bin/merg-e-dev
    

    Or in the future

    iprocfs -d /mnt/iproc/ ivaultfs /usr/local/bin/merg-e-script
    

    Remember what we said about trusted interpreters and that python isn't actually one. Until merg-e-script is there, it's a reality we work with knowing we aren't going to be running production code with merg-e-dev. Please keep this in mind when using early betas.

    If you are running compiled native binaries, you don't need to worry about trusted interpreters, and if you check out the code you could actualy run iprocfs like this, using a fake trusted interpreter list of one:

    iprocfs -d /mnt/iproc/ ivaultfs /usr/bin/yes
    

    If you do, please make sure to manually create a user account to run ivaultfs on next.

    Per what-id little compartments: ivaultfs

    The second user space filesystem is ivaultfs. It runs as an unprivileged user. Like iprocfs, there is runnable code in the codeberg repo.

    You can start ivaultfs something like this:

    sudo su - ivaultfs
    ivaultfs -d /mnt/ivault xxx /var/spool/ivaultfs /mnt/iproc /usr/local/bin/ivaultadmin
    

    Only right now ivaultadmin doesn't exist yet, so just for demonstration purposes we can declare python as our trusted admin tool next. But first let's examine the command line. After the mountpoint is currently a stub user name field, we may drop this soon (we wanted ivaultfs to drop privileges, but so far that isn't playing nice with fuse). After that we have a backing directory where ivaultfs will need to store the actual files and data. Then comes the iprocfs mountpoint that ivailtfa should use to find runnable ids, and finally we have the trusted admin binary.

    So now for our nasty Python demo. We don't have a trusted admin binary yet, so we use Python to demonstrate. Don't do this for real!!, it is just to demonstrate:

    ivaultfs -d /mnt/ivault rob /var/spool/ivaultfs /mnt/iproc /usr/bin/python3.10
    

    Let's start with a demonstration to grasp what ivaultfs is actually doing:

    #!/usr/bin/python3
    import os
    import random
    with open("/mnt/ivault/init.dat") as vaultfile:
        for file in ("/mnt/ivault/adm.keys", "/mnt/ivault/own.keys"):
            for key in  os.listxattr(file):
                val = os.getxattr(file, key).decode();
                print(file, key, ":", val)
        if "user.vaultbind.mysecret" in os.listxattr("/mnt/ivault/own.keys"):
            mysecret = os.getxattr("/mnt/ivault/own.keys", "user.vaultbind.mysecret")
        else:
            mysecret = random.randbytes(32).hex()
            os.setxattr("/mnt/ivault/own.keys", "user.vaultbind.mysecret", mysecret.encode())
        print("mysecret:", mysecret)
        for file in os.listdir("/mnt/ivault"):
            print(len(file), file)
            if len(file) == 69:
                print("file: ", file)
                os.setxattr(os.path.join("/mnt/ivault", file), "user.vaultadm.demo", b"demo Demo DEMO dEMO")
        os.setxattr("/mnt/ivault/0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef.keys", "user.vaultadm.foo", b"BAR");
    

    The first thing we notice is that the demo opens the file /mnt/ivault/init.dat, look at this file as the bank employee that lets you into the bank where your personal vault is stored. While this file is not open, it is the only file your /mnt/ivault directory will show, but open the file and it would seem the file has disappeared until you close it again, and in the meantime two other files are available, own.keys and adm.keys.

    We can not do much with these files. You can't open them, but they do have so called extended attributes. Look at these like key/value pairs. In both files the keys fall in one of multiple namespaces, all of which start with user.. More on that soon.

    If and only if the process accessing the /mnt/ivault/ directory is an instance of the trusted admin tool (what it is now because we named python as such for our demo), there will be other files visible in the directory too. These files have a name that consists of the 64 character hex representation of the blake2 id of the runnable, with the file extension ".keys". What these files are are a write-only version of the adm.key file of all other runables the current UID has run to the point of opening init.dat. But if the executable hasn't been used in that way yet, the admin tool can still set extended attributes for runnables that will use it at some later time.

    The important thing to realize for all of the files, all of them are for one specific UID, the one currently accessing the file-system. So the admin tool itself only can administer the view of al the runnables running under that one UID, not any other.

    So now let's look at the different files and the namespaces and what they imply.

    namespaceadm.keys<longhash>.keysown.keys
    user.ivaultfs.read-onlyprohibited / fs only-
    user.vaultbind.read-onlywrite once / no deleteread/write-once
    user.vaultadm.read-onlywrite many / deletable-
    user.vaultmisc.--read/write

    So basically the filesystem itself may write to extended attributes in adm.keys with a user.ivaultfs. prefix that is prohibited from writing for others including the admin tool. The admin tool can write to adm.keys through some <longhash>.keys file using either the prefix user.vaultbind. for one time undeletable binds, or the prefix user.vaultadm. for rewritable and deletable key value pairs. Finally any process can write extended attributes to own.keys using the user.vaultbind. prefix for one time non-deletable binda or user.vaultmisc. rewritable and deletable key value pairs.

    It's all very minimal, but just enough for what Merg-E needs from a FUSE based vault system.

    Please note that while ivaultfs does not use crypto, and it's not considered part of the Merg-E threat model, nothing is stopping you from running the backing directory on an encrypted file-system if your threat model is differetn, as long as that file-system supports extended attributes.

    Merg-e sub-namespaces

    For Merg-E and other trusted interpreters aiming for least authority, we define sub namespaces. From an InnuenDo VaultFS perspective these don't have any meaning, but we will try to make ivaultadmin aware of the convention and use it unless explicitly prompted not to. So think of it as a partially enforced convention that is considered good practice for least authority design og trusted interpreters.

    Imagine your Merg-E code wants to sign a Web 3.0 transaction on some chain, for example on HIVE. There is a signing key that is stored in the vault that it needs to sign the transaction. But the code needed to sign the transaction is in the runtime and in libcoinzdense or some legacy ECDSA library, there is no actual need for user code to be able to access the actual signing key data.

    This is where VaultFS sub namespacing comes in. We define two sub namespaces "as convention" under user.vaultbind, user.vaultadm and user.vaultmisc:

    • runtime
    • code

    For ivaultadmin we define 'runtime' as the least authority default, and we define a flag for using 'code' instean and or disabling sub namespaces all together.

    For Merg-E only 'code' subnamespace key/value pairs will be code-visible, and 'runtime' subnamespace key/value pairs will be used by the runtime only, and it is sugested that other trusted interpreters implement the same convention.

    coming up

    This is it so far, everything I got up and running in 10 days so far. I need to put a finishing touch on ivaultfs, do some security checks, some coding style thingies, and some extra sanity checks. Then I need to look into the make install and systemd startup files for these two FUSE filesystems. And then I need to write the admin tool. I think that I'm about half way now, but that doesn't mean 10 days because there were two weekends in the last ten days, and I have a day job, so hopefully in two weeks you can all expect a first beta.

    In the meantime, if you are interested, please play around with my code and give me feedback if you can. Anything is appreciated.

    • #innuendo
    • #vault
    • #fuse
    • #merg-e
    ·inHiveDevs·by
    (67)
    $2.17
    ||
    Sort:
    • hivebuzz profile picture
      hivebuzz profile picture(74)

      Congratulations @pibara! You have completed the following achievement on the Hive blockchain And have been rewarded with New badge(s)

      You distributed more than 16000 upvotes.
      Your next target is to reach 17000 upvotes.

      You can view your badges on your board and compare yourself to others in the Ranking If you no longer want to receive notifications, reply to this comment with the word STOP

      $0.00