Guidelines for Writing To Filespooler Queues Without Using Filespooler

Filespooler provides the fspl queue-write command to easily add files to a queue. However, the design of Filespooler intentionally makes it easy to add files to the queue by some other command. For instance, Using Filespooler over Syncthing has Syncthing do the final write, the nncp-file (but not the nncp-exec) method in Using Filespooler over NNCP had NNCP do it, and so forth.

This page documents the requirements for a tool to write to the Filespooler queue. Note that fspl queue-write is designed to implement these requirements for local writes. When some other tool such as Syncthing or NNCP performs the direct write into the queue directory, care must be taken to make sure it is done correctly.

The fundamental thing we are concerned with is avoiding race conditions during file write. That is, we don’t want Filespooler to start processing a file that hasn’t yet been completely written.

The Requirements For Writing a File to the Queue

Once the file is completely written and closed, and ready for Filespooler to process, it MUST reside in the jobs directory and meet this pattern: fspl-*.fspl
Any files that reside in the jobs directory and are NOT yet completely written must NOT meet that pattern.
Every valid job file in the jobs directory must have a unique name that meets the pattern under item 1. Both random and non-random names are fine, so long as they are unique.

Methods of Compliance

For any write, the text between fspl- and .fspl needs to be unique for each job. fspl queue-write generates a UUID, but anything unique will do. You can call fspl gen-filename to generate a unique filename for this purpose.

Then, there is the question of how to make sure files that are not yet completely written are invisible to filespooler (items 1 and 2 above). There are two common ways of doing this:

Writing to a file with a temporary name in jobs
Writing to a file outside jobs, then renaming or hard linking into jobs

Let’s discuss them both.

Writing to a file with a temporary name in the jobs directory

This is the approach used by fspl queue-write. You can easily do it with a shell script sort of like this:

FILENAME="`fspl gen-filename`"
TMPNAME="$FILENAME.tmp"
cat > "$TMPNAME"
mv "$TMPNAME" "$FILENAME"

This is roughly what fspl queue-write does itself. Since a rename is atomic – that is, the file exists completely at either its old or its new name – this is a safe way to do it.

Writing to a temporary file outside the jobs directory

You can create other subdirectories besides jobs in your queue. You can write a file there, then use ln to hard link it into jobs, and then delete the link at the temporary location.

The problem with this is that mv is only a rename if both the source and target file are on the same filesystem. A hard link or rename can’t cross a filesystem boundary. So some of the approaches documented here – for instance, using rclone mount as described at Using Filespooler over rclone and S3, rsync.net, etc., will not work because the destination is a different filesystem than the source. Therefore, this method must be used with much more care.

Testing a setup

Before using any setup like this, it is good to test if it looks right. I advise generating a very large packet, then using ls -la queuedir/jobs on the destination to see how it appears while it is being written. If you notice a fspl-*.fspl file with a growing size, then it’s NOT working right. If the file only appears with its full, final size, then it is working right. A growing file with a temporary name is fine.

Links to this note

Using Filespooler over Rclone and S3, Rsync.Net, Etc.

You can use Filespooler with a number of other filesystems and storage options. s3fs, for instance, lets you mount S3 filesystems locally. I can’t possibly write about every such option, so I’ll write about one: rclone.

Introduction to Filespooler

It seems that lately I’ve written several shell implementations of a simple queue that enforces ordered execution of jobs that may arrive out of order. After writing this for the nth time in bash, I decided it was time to do it properly. But first, a word on the why of it all.

Parallel Processing of Filespooler Queues

Filespooler is designed around careful sequential processing of jobs. It doesn’t have native support for parallel processing; those tasks may be best left to the queue managers that specialize in them. However, there are some strategies you can consider to achieve something of this effect even in Filespooler.

Verifying Filespooler Job Integrity

Sometimes, one wants to verify the integrity and authenticity of a Filespooler job file before processing it.

Using Filespooler over NNCP

NNCP is a powerful tool for building Asynchronous Communication networks. It features end-to-end Encryption as well as all sorts of other features; see my NNCP Concepts page for some more ideas.

Using Filespooler over Syncthing

Filespooler is a way to execute commands in strict order on a remote machine, and its communication method is by files. This is a perfect mix for Syncthing (and others, but this page is about Filespooler and Syncthing).

Filespooler

Filespooler lets you request the remote execution of programs, including stdin and environment. It can use tools such as S3, Dropbox, Syncthing, NNCP, ssh, UUCP, USB drives, CDs, etc. as transport; basically, a filesystem is the network for Filespooler. Filespooler is particularly suited to distributed and Asynchronous Communication.