As a long time Unix user I’ve always found pilelines to be very useful. They allow you to build up extremely complex functions by sticking together much less complex parts. On Linux we put together commands and pipe the output into the next command. If you ask the question “How many mp3 files are there in this directory?” you can say “Ah, I know how to list all mp3 files, and I know how to count the number of lines in a list” then the answer becomes
$ ls -l *.mp3 | wc --lines
Pipelines can become arbitrarily complex. For example, the pipeline
$ cat /var/log/maillog | grep "status=sent" | grep -v phoric | wc --lines
counts the number of lines in my maillog that contain the text “status=sent” and omit the text “phoric” (my hostname). There is usually someone smart enough to write a shorter pipeline than what you’ve written, but that’s not the point. The point is that pipelines allow us to construct complex functionality by chaining together small programs.
If I were an academic (and I am), I’d probably generalise. I’d say somthing like, “pipelines have a source, they have many filter and translate actions and finally an action”. This, I believe, is a reasonable generalisation. In a pipeline, we often take some data, strip out the bits we don’t need, translate it into another format, maybe filter some more stuff and then finally do something. Such processes can be (relatively) easily turned into a visual programming language. The web application if this, then that is a simple visual programming language. Apple’s Automator is a more general visual programming language for such pipelines on OS X. And, going back to text-based pipelines, Window’s Powershell does away with much of the faffing about with text processing found in a Unix pipeline (by encapsulating the faffing in objects)
Get-ChildItem C:\Scripts | Where-Object {$_.Length -gt 200KB} | Sort-Object Length
The powershell example is stolen from technet.microsoft.com.
Let’s say I have an application area in mind. It’s a technical area with highly complex theory such that we can expect the individual components of our pipeline to contain complex functionality. Lets take the world of audio and video processing. The words are technical, we can expect to mux (i.e. multiplex) and demux audio. And we can also expect to transcode it. Decoding, say, a flac stream and re-encoding it as a vorbis stream requires complex algorithms. Furthermore, the language is generally well understood by geeks, like you. So, how do I do exactly that. Open a flac file, decode it, re-encode it as a vorbis stream and save it to a file. Fortunatly, a pipeline based toolset for audio (and video) already exists. We’re going to look at GStreamer. It’s effectively a domain-specific programming language that could be easily implemented as a visual programming language (it previously has been, but the visual side wasn’t of use to anyone who could actually understand the domain). We want to look at how GStreamer makes constructing complex pipelines easy.
The following pipeline is the start of an answer our problem (it intentionally doesn’t work):
$ gst-launch filesrc location=song.flac ! flacdec ! vorbisenc ! filesink location=song.ogg
In the above example we construct the pipeline, where ! is the pipe symbol and launch it with gst-launch. Each of the pipeline elements, on their own, are simple to understand. A filesrc element reads a file at the specified location. A flacdec element decodes a flac stream. A vorbisenc element encodes something in to vorbis and a filesink element stores a stream back into a file. The above pipeline produces the error
WARNING: erroneous pipeline: could not link flacdec0 to vorbisenc0
This is brilliant. A major difference between a GStreamer pipeline and a Unix pipeline is that I get static checking. It can tell me, from the structure of the pipeline, whether the type of data produced by each element, can be consumed by the following element. This is brilliant because it saves time.
Imagine that my pipeline was much more complicated. Imagine that the pipeline encoded a long, high-quality, video (taking several hours) and then tried to do something specific with the audio stream. As a Unix pipeline it would look like
$ cat video.raw | encodevid | modifyaudio > video.webm
We would have to run this pipeline before we saw passing of data from encodevid to decodevid failed. So having some static checking a-priori (I’m an acadmeic, I’m allowed to use Latin) could save us a lot of time. Static checking, like in a programming language, also ensures that we are passing the right data into an element that handles that data.
Ok, so what’s a working pipeline for the above question?
$ gst-launch filesrc location=song.flac ! flacdec ! audioconvert ! audioresample ! vorbisenc ! oggmux ! filesink location=song.ogg
Note: there’s a much shorter pipeline, but this one makes most of the stages explicit. I was missing the audioconvert, the audioresample and the filesink steps. Once again, this is a very complex domain. Each of the elements in this pipeline are more simple to understand than the entire pipeline.
Let’s do something magic. Take the original flac file, convert it to 8-bit audio and re-encode it as a vorbis stream in an ogg container. Why? Well this is the point of pipelines. Someone might want to do this sometime. The pipeline architecture allows you to combine elements in ways that other people may not have considered, or simply don’t need.
$ gst-launch filesrc location=song.flac ! flacdec ! audioconvert ! audioresample ! capsfilter caps=audio/x-raw-int,channels=2,width=8,depth=8 ! audioconvert ! vorbisenc ! oggmux ! filesink location=song.ogg
I think that GStreamer is a well desiged, well implemented pipeline architecture for a complex domain. So how does it achieve what it does?
Each of the elements in GStreamer advertises its capabilities. Each element advertises the capabilities of what it accepts (eg: raw audio with sample freq > 8hHz and < 44kHz) and the capabilities of what it produces (eg: audio bit stream conforming to the vorbis spec). You can then do static checking on pipelines by ensuring the capabilities of what is produced by one element are a subset of those consumed by the following element. It's very neat.
How might you retrofit this into an existing set of complex Unix applications? You could add a --caps flag (or -caps if you insist on BSD style) to each of your applications. If you have no source, wrap them in a shell script that takes --caps. Make the caps option outupt the input and output capabilities of your pipeline elements. Bonus hipster points if this is outputted in JSON. Then write a my_static_checker that takes your pipeline as a string, checks it statically a-la-GStreamer (I’m allowed to abuse French, I’m an academic) and then executes the pipeline.
Why! Now we’re in a position to write a very domain-specific visual programming language. This could be useful in a lot of cases. Particularly in the case of at least one person I’ll be emailing this to.
Next step…monads!