2nd February, 2012

Clutter-box2dmm bindings updated to 0.12.1

I submitted a patch to Gnome bugzilla last night that updates the clutter-box2d C++ bindings so that they build against the current git master version of clutter-box2d. The patch is a little of a mess. However, this morning, I think the patch is a little less of a mess than I thought last night. Mainly because I managed to do the following on the train into work

Simply build one of the examples in the clutter-box2dmm folder.

I now have Fedora 15 & 16 packages for cluttermm, clutter-box2d and will have clutter-box2dmm later today. I’ll push these clutter-box2d* packages upstream to Fedora later, but I won’t hold my breath to see them approved. The approval process is a little unwieldy.

Posted at 10:22 am | Comments Off

21st November, 2011

fakemail is handy

For debugging mail problem, e.g. when debugging some emailmerge stuff in LibreOffice recently, fakemail was really really handy when you have a bug which requires generating a couple of hundred emails in quick succession to trigger.

Posted at 2:00 pm | Comments Off

13th November, 2011

libexttextcat 3.2.0

Released libexttextcat 3.2.0 (Extended Text Categorization used to guess the language that input text is written in). It can be found in this download dir. No code changes from 3.1.1, but adds a large collection of extra language signatures to nearly add the same language support to libexttextcat as LibreOffice supports, modulo languages that LibreOffice supports which don’t have a convenient UDHR translation to use as a basis to generate a language fingerprint.

Posted at 11:41 pm | Comments Off

1st November, 2011

Pipelines for visual programming languages

As a long time Unix user I’ve always found pilelines to be very useful. They allow you to build up extremely complex functions by sticking together much less complex parts. On Linux we put together commands and pipe the output into the next command. If you ask the question “How many mp3 files are there in this directory?” you can say “Ah, I know how to list all mp3 files, and I know how to count the number of lines in a list” then the answer becomes

$ ls -l *.mp3 | wc --lines

Pipelines can become arbitrarily complex. For example, the pipeline

$ cat /var/log/maillog | grep "status=sent" | grep -v phoric | wc --lines

counts the number of lines in my maillog that contain the text “status=sent” and omit the text “phoric” (my hostname). There is usually someone smart enough to write a shorter pipeline than what you’ve written, but that’s not the point. The point is that pipelines allow us to construct complex functionality by chaining together small programs.

If I were an academic (and I am), I’d probably generalise. I’d say somthing like, “pipelines have a source, they have many filter and translate actions and finally an action”. This, I believe, is a reasonable generalisation. In a pipeline, we often take some data, strip out the bits we don’t need, translate it into another format, maybe filter some more stuff and then finally do something. Such processes can be (relatively) easily turned into a visual programming language. The web application if this, then that is a simple visual programming language. Apple’s Automator is a more general visual programming language for such pipelines on OS X. And, going back to text-based pipelines, Window’s Powershell does away with much of the faffing about with text processing found in a Unix pipeline (by encapsulating the faffing in objects)

Get-ChildItem C:Scripts | Where-Object {$_.Length -gt 200KB} | Sort-Object Length

The powershell example is stolen from technet.microsoft.com.

Let’s say I have an application area in mind. It’s a technical area with highly complex theory such that we can expect the individual components of our pipeline to contain complex functionality. Lets take the world of audio and video processing. The words are technical, we can expect to mux (i.e. multiplex) and demux audio. And we can also expect to transcode it. Decoding, say, a flac stream and re-encoding it as a vorbis stream requires complex algorithms. Furthermore, the language is generally well understood by geeks, like you. So, how do I do exactly that. Open a flac file, decode it, re-encode it as a vorbis stream and save it to a file. Fortunatly, a pipeline based toolset for audio (and video) already exists. We’re going to look at GStreamer. It’s effectively a domain-specific programming language that could be easily implemented as a visual programming language (it previously has been, but the visual side wasn’t of use to anyone who could actually understand the domain). We want to look at how GStreamer makes constructing complex pipelines easy.

The following pipeline is the start of an answer our problem (it intentionally doesn’t work):

$ gst-launch filesrc location=song.flac ! flacdec ! vorbisenc ! filesink location=song.ogg

In the above example we construct the pipeline, where ! is the pipe symbol and launch it with gst-launch. Each of the pipeline elements, on their own, are simple to understand. A filesrc element reads a file at the specified location. A flacdec element decodes a flac stream. A vorbisenc element encodes something in to vorbis and a filesink element stores a stream back into a file. The above pipeline produces the error

WARNING: erroneous pipeline: could not link flacdec0 to vorbisenc0

This is brilliant. A major difference between a GStreamer pipeline and a Unix pipeline is that I get static checking. It can tell me, from the structure of the pipeline, whether the type of data produced by each element, can be consumed by the following element. This is brilliant because it saves time.

Imagine that my pipeline was much more complicated. Imagine that the pipeline encoded a long, high-quality, video (taking several hours) and then tried to do something specific with the audio stream. As a Unix pipeline it would look like

$ cat video.raw | encodevid | modifyaudio > video.webm

We would have to run this pipeline before we saw passing of data from encodevid to decodevid failed. So having some static checking a-priori (I’m an acadmeic, I’m allowed to use Latin) could save us a lot of time. Static checking, like in a programming language, also ensures that we are passing the right data into an element that handles that data.

Ok, so what’s a working pipeline for the above question?

$ gst-launch filesrc location=song.flac ! flacdec ! audioconvert ! audioresample ! vorbisenc ! oggmux ! filesink location=song.ogg

Note: there’s a much shorter pipeline, but this one makes most of the stages explicit. I was missing the audioconvert, the audioresample and the filesink steps. Once again, this is a very complex domain. Each of the elements in this pipeline are more simple to understand than the entire pipeline.

Let’s do something magic. Take the original flac file, convert it to 8-bit audio and re-encode it as a vorbis stream in an ogg container. Why? Well this is the point of pipelines. Someone might want to do this sometime. The pipeline architecture allows you to combine elements in ways that other people may not have considered, or simply don’t need.

$ gst-launch filesrc location=song.flac ! flacdec ! audioconvert ! audioresample ! capsfilter caps=audio/x-raw-int,channels=2,width=8,depth=8 ! audioconvert ! vorbisenc ! oggmux ! filesink location=song.ogg

I think that GStreamer is a well desiged, well implemented pipeline architecture for a complex domain. So how does it achieve what it does?

Each of the elements in GStreamer advertises its capabilities. Each element advertises the capabilities of what it accepts (eg: raw audio with sample freq > 8hHz and < 44kHz) and the capabilities of what it produces (eg: audio bit stream conforming to the vorbis spec). You can then do static checking on pipelines by ensuring the capabilities of what is produced by one element are a subset of those consumed by the following element. It's very neat.

How might you retrofit this into an existing set of complex Unix applications? You could add a --caps flag (or -caps if you insist on BSD style) to each of your applications. If you have no source, wrap them in a shell script that takes --caps. Make the caps option outupt the input and output capabilities of your pipeline elements. Bonus hipster points if this is outputted in JSON. Then write a my_static_checker that takes your pipeline as a string, checks it statically a-la-GStreamer (I’m allowed to abuse French, I’m an academic) and then executes the pipeline.

Why! Now we’re in a position to write a very domain-specific visual programming language. This could be useful in a lot of cases. Particularly in the case of at least one person I’ll be emailing this to.

Posted at 11:32 pm | Comments Off

21st October, 2011

A new Openstreetmap API framework for PHP.

So over the last while, I’ve been working on a PHP package imaginatively named Services_Openstreetmap, for interacting with the openstreetmap API. I initially needed it so I could search for certain POIs and tabulate the results; it’s now also capable of adding data to the openstreetmap database – nodes and other elements can be created, updated and so on. It will even access the details of the user that is being used to modify that data, which is one difference between it and the other single purpose OSM frameworks.

So why the big fanfare now? Well I’m happy enough with it now to let other people look at and use it and also I’ve submitted it to the PEAR Pepr process, and the grilling that entails, so it can be included in the PEAR repository.

And if this pushes more people to using Openstreetmap (or PEAR for that matter) all the better!

Posted at 5:38 pm | Comments Off

21st October, 2011

A new Openstreetmap API framework for PHP.

So over the last while, I’ve been working on a PHP package imaginatively named Services_Openstreetmap, for interacting with the openstreetmap API. I initially needed it so I could search for certain POIs and tabulate the results; it’s now also capable of adding data to the openstreetmap database – nodes and other elements can be created, updated and so on. It will even access the details of the user that is being used to modify that data, which is one difference between it and the other single purpose OSM frameworks.

So why the big fanfare now? Well I’m happy enough with it now to let other people look at and use it and also I’ve submitted it to the PEAR Pepr process, and the grilling that entails, so it can be included in the PEAR repository.

And if this pushes more people to using Openstreetmap (or PEAR for that matter) all the better!

Posted at 5:38 pm | Comments Off

21st October, 2011

CTL/CJK format character previews

As Lior Kaplan demonstrated at LibreOffice 2011 Paris, our format character preview really sucks for CTL and CJK users. If no CTL/CJK text is selected then no CTL sample text is shown, and the CJK sample text is from the fontname itself. Many font names are just Latin text, so give no indication what they look like in the actual script/language that is being written in.

e.g. Old dialog for CTL, will only preview some Western text if no text is selected, no attempt to show any sample CTL text, or even the CTL fontname. For CJK it will additional show the fontname of the CJK font in the preview, which isn’t helpful if the CJK fontname contains no CJK glyphs.

Simply adding the CTL fontname wouldn’t help much, seeing as the fontname is David CLM. So, currently reusing the preview text used in the font-dropdown first stab at “doing the right thing” gives me…

Code for all this is mostly in svtools/source/misc/sampletext.cxx where there is now some hugely over-engineered set of heuristics to guess the best script a font is tuned for and various functions to generate suitable text when all we have is the font, versus the font+language vs just the language and if we want a short identifier to classify what script a font might be good to render vs a longer sequence of sample text for a font preview.

Probably best to drop rendering the fontname in the Western case for the text preview and use some sample text there too, at least for the mixed Western+CTL+CJK case as its confusing to have a font name rendered and some sample text in another font.

After initial posting, there was some comments about the hideous rendering of the Hebrew text, which appears to be an artefact or using David CLM. Here’s what it looks like with David, i.e. its the rendering using that font that misplaces the Nikud, not me. Whether this is an interesting bug in our renderer, or maybe glyph fallback, or the font itself it probably worth of investigation.

Posted at 11:59 am | Comments Off

21st October, 2011

CTL/CTL format character previews

As Lior Kaplan demonstrated at LibreOffice 2011 Paris, our format character preview really sucks for CTL and CJK users. If no CTL/CJK text is selected then no CTL sample text is shown, and the CJK sample text is from the fontname itself. Many font names are just Latin text, so give no indication what they look like in the actual script/language that is being written in.

e.g. Old dialog for CTL, will only preview some Western text if no text is selected, no attempt to show any sample CTL text, or even the CTL fontname. For CJK it will additional show the fontname of the CJK font in the preview, which isn’t helpful if the CJK fontname contains no CJK glyphs.

Simply adding the CTL fontname wouldn’t help much, seeing as the fontname is David CLM. So, currently reusing the preview text used in the font-dropdown first stab at “doing the right thing” gives me…

Code for all this is mostly in svtools/source/misc/sampletext.cxx where there is now some hugely over-engineered set of heuristics to guess the best script a font is tuned for and various functions to generate suitable text when all we have is the font, versus the font+language vs just the language and if we want a short identifier to classify what script a font might be good to render vs a longer sequence of sample text for a font preview.

Probably best to drop rendering the fontname in the Western case for the text preview and use some sample text there too, at least for the mixed Western+CTL+CJK case as its confusing to have a font name rendered and some sample text in another font.

Posted at 11:59 am | Comments Off

19th October, 2011

PhagsPa and Tai Le, sample text ?

Looking through my fonts that are clearly tuned for a single specific script, there remain two scripts that niggle me as I don’t have suitable sample text for them. i.e. PhagsPa and Tai Le. I’m looking for a short snippet of sample text in those scripts which is suitable to stick into the font drop down preview. Ideally something fairly equivalent to “Alphabet”, “Script”, “PhagsPa/Tai Le” or “Tibetan/Tai Lü”.

Posted at 11:29 pm | Comments Off

28th September, 2011

libexttextcat: text guessing feature

LibreOffice inherited a text language guesser, based on textcat from wise-guys.nl and extended by Jocelyn Merand to basically handle UTF-8 text. This is the thing that makes the suggestions as to what language your text might really be in when you right click on some misspelled text and chose set language.

We’ve now spun this off as a standalone libexttextcat and fixed up some conversion problems from the original selection of 8bit encodings and generated new language fingerprints in other cases, which should give better results for various languages, and allow us to enable checking for some languages which was disabled until now.

The current list of languages it attempts to detect can be seen here

Here’s a plausible process to add your favourite language to it, given git clone git://anongit.freedesktop.org/libreoffice/libexttextcat and bootstrapping from the insanely-translated UDHR using Abkhaz as an example.


cd libexttextcat/langclass/ShortTexts/
wget http://unicode.org/udhr/d/udhr_abk.txt
#skip english header, name result using BCP-47
tail -n+7 udhr_abk.txt > ab.txt
cd ../LM
../../src/createfp < ../ShortTexts/ab.txt > ab.lm
echo ab.lm ab--utf8 >> ../fpdb.conf

Then update the check target in src/Makefile.am to confirm the detection of ShortTexts/ab.txt as ab works using make check

I’ll remove the necessity of a configuration file in a later version, and convert the result to a BCP-47 tag. For the moment it remains a drop in replacement for the original solution which necessitates retaining the slightly odd language tag syntax.

Posted at 3:10 pm | Comments Off