Parallel Processing of Filespooler Queues

Filespooler is designed around careful sequential processing of jobs. It doesn’t have native support for parallel processing; those tasks may be best left to the queue managers that specialize in them. However, there are some strategies you can consider to achieve something of this effect even in Filespooler.

Writing into multiple queues

Because Filespooler queues are so lightweight, you can easily create dozens (or thousands, whatever). You could simply have your creator system rotate through writing new jobs to each one in turn, and then kick off queue processors for each one.

This doesn’t have a great deal of elegance, but it could get the job done.

Dispersal from a single queue

Let’s say you have two queue-processing processes. Here’s what you can do:

You have a single incoming queue.
When one of the processor’s queues is empty, it grabs a job from the incoming queue, adds it to its own queue, and processes it.

Let’s consider how that might work.

First, create some queues:

fspl queue-init -q ~/incoming
fspl queue-init -q ~/proc1
fspl queue-init -q ~/proc2

Now, we’ll have a processing script that we’ll use to move things out of the incoming queue. Call in incomingproc.sh and let’s assume it takes the path to a destination queue as the first parameter, $1:

#!/usr/bin/env bash
set -euo pipefail

ln "$FSPL_JOB_FULLPATH" "$1/jobs/$FSPL_JOB_FILENAME"
cat > /dev/null

Now, here’s a script that we might use on proc1:

#!/usr/bin/env bash
set -euo pipefail

QPATH=~/proc1

if ! fspl queue-ls -q "$QPATH" | grep -q fspl- ; then
   fspl queue-process -q ~/incoming ~/incomingproc.sh -- "$QPATH"
fi

fspl queue-process --order-by=Timestamp -q "$QPATH" command_goes_here

Let’s analyze how this works:

In proc1, we first check to see if the proc1 queue is empty. If it is, we try to get a job to add to it.
To do that, we process the incoming queue using the incomingproc.sh script.
incomingproc.sh uses the environment variables that queue-process sets (see the Filespooler Reference for details) to actually cause the act of processing the job in the incoming queue to add it to the proc1 queue. It simply hardlinks it into there, which is one of the safe methods of adding a job to a queue (see Guidelines for Writing To Filespooler Queues Without Using Filespooler). Then it discards the payload for now (so that fspl queue-process doesn’t get errors writing it). As it exits with success, fspl queue-process will (by default) go ahead and delete the job from the incoming queue - but now it will live on in proc1.
Now we process the target queue like normal.

Notice the use of Timestamp ordering instead of sequence ordering. Since we are pulling jobs from the incoming queue into various processors, the sequence number in any given processor will not be contiguous. That implies a lack of strict ordering of queue processing – but then parallel processing carries that implication anyhow.

Links to this note

Feeding Filespooler Queues from Other Queues

Sometimes with Filespooler, you may wish for your queue processing to effectively re-queue your jobs into other queues. Examples may be:

Filespooler

Filespooler lets you request the remote execution of programs, including stdin and environment. It can use tools such as S3, Dropbox, Syncthing, NNCP, ssh, UUCP, USB drives, CDs, etc. as transport; basically, a filesystem is the network for Filespooler. Filespooler is particularly suited to distributed and Asynchronous Communication.

Parallel Processing of Filespooler Queues

Writing into multiple queues

Dispersal from a single queue

See also

Links to this note