Compressing Filespooler Jobs

Filespooler has a powerful concept called a decoder. A decoder is a special command that any Filespooler command that reads a queue needs to use to decode the files within the queue. This concept is a generic one that can support compression, encryption, cryptographic authentication, and so forth.

Here I will introduce it as a concept for supporting compression with gzip. This page also functions as a tutorial for encoders and decoders. If you aren’t already familiar with Filespooler, you should probably read the tutorial at Using Filespooler over Syncthing before proceeding.

Some useful properties

These are some useful Filespooler properties that will play out as we work through this discussion:

  1. fspl queue-write does not inspect the data stream in any way, and doesn’t care what’s in it.
  2. fspl prepare dumps its packet to stdout with the expectation that it is piped to some other command.
  3. Because of 1 and 2, you can insert something in the pipeline between prepare and queue-write.
  4. All commands that process a Filespooler queue accept a -d DECODECMD parameter that lets you give a command to decode packets. This decode command would probably un-do whatever the commands you inserted in the pipeline in step 3 did.

Try it out

We’re going to mimic some of the examples in the Syncthing tutorial, this time with compression.

First, we create a queue, just as we did there:

sender$ fspl queue-init -q ~/sync/gzqueue

Now, we’ll add a request:

sender$ echo Hi | fspl prepare -s ~/gzseq -i - | gzip | fspl queue-write -q ~/sync/gzqueue

This is the same command as before, just with the addition of gzip in the pipeline. The difference is that now the file in the jobs directory is compressed with gzip. Let’s take a look:

receiver$ fspl queue-ls -d zcat -q ~/sync/gzqueue
ID                   creation timestamp          filename
1                    2022-05-16T20:29:32-05:00   fspl-7b85df4e-4df9-448d-9437-5a24b92904a4.fspl

Ah ha, there it is. We can get info about it too:

receiver$ fspl queue-info -d zcat -q ~/sync/gzqueue -j 1
FSPL_SEQ=1
FSPL_CTIME_SECS=1652940172
FSPL_CTIME_NANOS=94106744
FSPL_CTIME_RFC3339_UTC=2022-05-17T01:29:32Z
FSPL_CTIME_RFC3339_LOCAL=2022-05-16T20:29:32-05:00
FSPL_JOB_FILENAME=fspl-7b85df4e-4df9-448d-9437-5a24b92904a4.fspl
FSPL_JOB_QUEUEDIR=/home/jgoerzen/sync/gzqueue
FSPL_JOB_FULLPATH=/home/jgoerzen/sync/gzqueue/jobs/fspl-7b85df4e-4df9-448d-9437-5a24b92904a4.fspl

Let’s take a look at what’s happening under the hood when we run one of these commands:

receiver$ fspl --log-level trace queue-ls -d zcat -q ~/sync/gzqueue
TRACE fspl: Parsed options are Cli { globalopts: GlobalOpts { log_level: Level(Trace) }, command: QueueLs(QueueOptsWithDecoder { qopts: QueueOpts { queuedir: "/home/jgoerzen/sync/gzqueue" }, decoder: Some("zcat") }) }
DEBUG filespooler::jobqueue: Reading header from "/home/jgoerzen/sync/gzqueue/jobs/fspl-30b1a4f2-da30-4722-b22a-fd6e1d8aea36.fspl"
DEBUG with_decoder{decoder="zcat"}: filespooler::jobqueue: Preparing to invoke decoder: "/bin/bash" ["-c", "zcat"]
DEBUG with_decoder{decoder="zcat"}: filespooler::jobqueue: Decoder PID 4037302 started successfully
TRACE filespooler::jobqueue: Killing decoder
TRACE filespooler::jobqueue: Waiting for decoder to terminate
TRACE filespooler::jobqueue: Decoder termination status Ok(ExitStatus(ExitStatus(0)))
ID                   creation timestamp          filename
1                    2022-05-18T07:54:02-05:00   fspl-30b1a4f2-da30-4722-b22a-fd6e1d8aea36.fspl

Note that here, unlike with fspl queue-process, the decoder is a command that is interpreted by the shell, so you can actually set up a decoder pipeline. Filespooler invoked zcat and piped the content of the packet to it. In this case, it only needed to read the header, so once it has read the header, it kills the decoder to prevent it from wasting cycles needlessly processing large payloads.

If you had multiple files in the queue, you’d see Filespooler invoke zcat for each one, in precisely this manner, since queue-ls needs to read the header from each.

If you forget to include the -d for a command line, it will be as if the file doesn’t exist to Filespooler. This does not cause an error exit; generally people don’t want the mere presence of invalid data to prevent the proper working of the queue. However, with debugging turned on, you can see what happens:

receiver$ fspl --log-level debug queue-ls -q ~/sync/gzqueue
DEBUG filespooler::jobqueue: Reading header from "/home/jgoerzen/sync/gzqueue/jobs/fspl-30b1a4f2-da30-4722-b22a-fd6e1d8aea36.fspl"
DEBUG filespooler::jobfile: Error reading FSPrefix: Input doesn't appear to be a filespooler file
ID                   creation timestamp          filename

Technically what happens is Filespooler attemps to read the first few bytes of the file, and detects that it doesn’t contain a Filespooler header (of course; it has a gzip header!). So it skips processing the rest of the file.

Every queue operation works exactly like normal - you just have to always supply the -d. fspl queue-process -d zcat -q queuedir will process a queue, and so forth.

Using the stdin commands

Commands such as fspl stdin-info read a packet in stdin. They don’t have a -d option because you could just as well pipe the decoded data to them. For instance:

$ cat queuefile | zcat | fspl stdin-info

It seems that lately I’ve written several shell implementations of a simple queue that enforces ordered execution of jobs that may arrive out of order. After writing this for the nth time in bash, I decided it was time to do it properly. But first, a word on the why of it all.

Thanks to Filespooler’s support for decoders, data for filespooler can be Encrypted at rest and only decrypted when Filespooler needs to scan or process a queue.

Filespooler is a way to execute commands in strict order on a remote machine, and its communication method is by files. This is a perfect mix for Syncthing (and others, but this page is about Filespooler and Syncthing).

Filespooler lets you request the remote execution of programs, including stdin and environment. It can use tools such as S3, Dropbox, Syncthing, NNCP, ssh, UUCP, USB drives, CDs, etc. as transport; basically, a filesystem is the network for Filespooler. Filespooler is particularly suited to distributed and Asynchronous Communication.