20th May, 2012
8000 commits
ohloh reckons that this week the count of commits in LibreOffice belonging to me hit 8000 accumulated over last approx 12 years.
I thought I’d sample each per-thousand rollover to see what they were…
Commit 8000: A minor startup time improvement and code simplication
Commit 7000: fix dnd crash. Generic bug fixing of fdo#39950
Commit 6000: callcatcher: remove unused code. Removing a few hunks of code that get compiled into the product, but that nothing calls. Some of the callcatcher foo which we use to trim the fat off LibreOffice
Commit 5000: Generic bug-fixing from grovelling over abrt traces, rhbz#710004 band-aid for immediate crash in IsAlignPossible.
Commit 4000: Workaround a weird-ass warning. from a minor compiler bug gcc#47679.
Commit 3000: Fix BSD uno bridges. We merged the various uno bridges together for the various unix platforms that use gcc to reduce the burden of maintaining so many. So needed to add the little register return quirk of the BSD platforms.
Commit 2000: Silence the (then) new gcc 4.5 warnings in our code
Commit 1000: Documented FSPA anchor values should override escher attributes when different. Efforts to get object positioning right on .doc import
Commit 1: MSOffice Controls {Im|Ex}port. Apparently my first post-StarOffice commit. Getting those “OCX” controls imported from MSOffice file formats.
Does this mean I’m an awesomely productive coder versus everybody else ? Nah, not really. For one thing, we started off with CVS went through mercurial and end up with git, and there’s generally a lot of difference in how many commits you generate with commit-unfriendly CVS vs git which makes you commit gung ho.
And there’s differences in commit style from one person to another too of course. I tend to generate a lot of commits because I like to refactor and code in “see my train of thought so you (ok me when I have to revisit it) can see where I went wrong if I do” steps rather than dump in a single commit that affects a hundred interconnected things. But it’s all the same amount of code at the end of the day.
Another wrinkle is that various development rules ended up hiding the true ownership of a lot of older commits. e.g.
a) Per day-0-release commits were all flattened of course, I only worked on StarOffice a short while before that event, so that’s a fairly small amount for me. But presumably a truly frightening number of commits for e.g. jp
b) for a while we worked by commiting only to cvs branches which release engineering would merge into master, e.g. this commit is an example, which is why the Hamburg release engineers hold unbeatable commit rates ![]()
c) And later the burden of commiting to OpenOffice.org for non Sun staff became almost impossible to bear, e.g. provide install sets for Windows and Linux, get a QA volunteer to QA the install set for you and sign off on it. Which was all pretty hard to do given the speed of the one or two windows buildbots available for the purpose and the limited number of QA people. Much easier to just dump the patch into bugzilla and see if someone inside the bunker could take care of all that for you, e.g. commits like this
Posted at 9:12 pm | Comments Off
7th May, 2012
Extending Clutter-Box2d
The following is a block diagram of ClutterBox2dmm. It’s at a high level of abstraction, so most of the detail is lost. What’s of interest is that Cluttermm is stacked on top of Clutter and that there exists a Clutter-Box2d.
Clutter is the graphics library I use for first year C++. I’ll explain why some day. The Clutter library is written in C. Cluttermm contains the C++ bindings for Clutter. Therefore we can use the awesome accelerated graphics of Clutter in C++. Clutter also has a related library called Clutter-Box2d. The Clutter-Box2d library contains C bindings for the upstream (and awesome) Box2D C++ library. Box2D is used in many places including, I believe, in Angry Birds. Box2D has also been cloned in many languages such as JavaScript. On top of Clutter-Box2d, I mantain Clutter-Box2dmm which are C++ bindings to Clutter-Box2d (yes, C++ bindings to C bindings for a C++ library….but it does make sense).
I’ve just extended both Clutter-box2d and Clutter-box2dmm to support the IsFixedRotation() functionality in Box2d. This hack isn’t hard. It does require modifying the C library Clutter-Box2d and then modifginy the C++ library Clutter-Box2dmm on top of this. The relevent commits are here and here. If you need this feature in Clutter-box2dmm you currently have to compile from source. Get it whilst it’s hot.
Posted at 10:18 pm | Comments Off
13th March, 2012
shiny langtag library
liblangtag looks very nice. I wonder if there’s anything in my abandonware localehelper that might be useful to stuff in there. Maybe some of the locale to langtag mapping stuff.
Posted at 2:15 pm | Comments Off
8th March, 2012
libreoffice help ported to clucene
From the things that make me happy department. Years ago our help documentation source was parsed with a bunch of java tools. At the time gcj was the only possibility for us in RHEL/Fedora and the build time for all localized langpacks that we included was about 26 hours in our build system.
Which was a bit depressing.
So I rewrote it in c++, taking super care to keep the same JavaHelp-derived format and so forth. Which brought build times down to about 10 hours.
Which made me happy.
At some stage though, then it was decided to then index our help with lucene, which brought back java as a build-time and run-time dependency for building help and searching it at run-time.
Which made me sad again, though openjdk was the default for us at this stage, so it wasn’t as much of a pain, though that’s why you have that perceptual lag when you first search for a term in help.
But now, for LibreOffice 3.6, Gert van Valkenhoef has ported our lucene code to clucene. helpcontent builds faster, and there’s no lag on searching for something in help.
Which made me happy.
Distro’s that want to use –with-system-clucene will need to build and install clucene’s contribs-lib
Posted at 1:21 pm | Comments Off
6th March, 2012
cross-compiling LibreOffice for windows (mingw32) under Fedora
Dave Tardon’s new howto cross-compile LibreOffice under Fedora to target mingw32 under Fedora, http://dtardon.fedorapeople.org/mingw/
Posted at 12:55 pm | Comments Off
29th February, 2012
syncfonts is handy
When debugging font related stuff its typical that the problem can only be triggered by a specific set of fonts. Here’s a rough-and-ready syncfonts script which when given the output of fc-list -v will try and install the fonts that are missing and remove the extraenous ones via yum, which works for the common case
Posted at 3:49 pm | Comments Off
2nd February, 2012
Clutter-box2dmm bindings updated to 0.12.1
I submitted a patch to Gnome bugzilla last night that updates the clutter-box2d C++ bindings so that they build against the current git master version of clutter-box2d. The patch is a little of a mess. However, this morning, I think the patch is a little less of a mess than I thought last night. Mainly because I managed to do the following on the train into work
Simply build one of the examples in the clutter-box2dmm folder.
I now have Fedora 15 & 16 packages for cluttermm, clutter-box2d and will have clutter-box2dmm later today. I’ll push these clutter-box2d* packages upstream to Fedora later, but I won’t hold my breath to see them approved. The approval process is a little unwieldy.
Posted at 10:22 am | Comments Off
21st November, 2011
fakemail is handy
For debugging mail problem, e.g. when debugging some emailmerge stuff in LibreOffice recently, fakemail was really really handy when you have a bug which requires generating a couple of hundred emails in quick succession to trigger.
Posted at 2:00 pm | Comments Off
13th November, 2011
libexttextcat 3.2.0
Released libexttextcat 3.2.0 (Extended Text Categorization used to guess the language that input text is written in). It can be found in this download dir. No code changes from 3.1.1, but adds a large collection of extra language signatures to nearly add the same language support to libexttextcat as LibreOffice supports, modulo languages that LibreOffice supports which don’t have a convenient UDHR translation to use as a basis to generate a language fingerprint.
Posted at 11:41 pm | Comments Off
1st November, 2011
Pipelines for visual programming languages
As a long time Unix user I’ve always found pilelines to be very useful. They allow you to build up extremely complex functions by sticking together much less complex parts. On Linux we put together commands and pipe the output into the next command. If you ask the question “How many mp3 files are there in this directory?” you can say “Ah, I know how to list all mp3 files, and I know how to count the number of lines in a list” then the answer becomes
$ ls -l *.mp3 | wc --lines
Pipelines can become arbitrarily complex. For example, the pipeline
$ cat /var/log/maillog | grep "status=sent" | grep -v phoric | wc --lines
counts the number of lines in my maillog that contain the text “status=sent” and omit the text “phoric” (my hostname). There is usually someone smart enough to write a shorter pipeline than what you’ve written, but that’s not the point. The point is that pipelines allow us to construct complex functionality by chaining together small programs.
If I were an academic (and I am), I’d probably generalise. I’d say somthing like, “pipelines have a source, they have many filter and translate actions and finally an action”. This, I believe, is a reasonable generalisation. In a pipeline, we often take some data, strip out the bits we don’t need, translate it into another format, maybe filter some more stuff and then finally do something. Such processes can be (relatively) easily turned into a visual programming language. The web application if this, then that is a simple visual programming language. Apple’s Automator is a more general visual programming language for such pipelines on OS X. And, going back to text-based pipelines, Window’s Powershell does away with much of the faffing about with text processing found in a Unix pipeline (by encapsulating the faffing in objects)
Get-ChildItem C:Scripts | Where-Object {$_.Length -gt 200KB} | Sort-Object Length
The powershell example is stolen from technet.microsoft.com.
Let’s say I have an application area in mind. It’s a technical area with highly complex theory such that we can expect the individual components of our pipeline to contain complex functionality. Lets take the world of audio and video processing. The words are technical, we can expect to mux (i.e. multiplex) and demux audio. And we can also expect to transcode it. Decoding, say, a flac stream and re-encoding it as a vorbis stream requires complex algorithms. Furthermore, the language is generally well understood by geeks, like you. So, how do I do exactly that. Open a flac file, decode it, re-encode it as a vorbis stream and save it to a file. Fortunatly, a pipeline based toolset for audio (and video) already exists. We’re going to look at GStreamer. It’s effectively a domain-specific programming language that could be easily implemented as a visual programming language (it previously has been, but the visual side wasn’t of use to anyone who could actually understand the domain). We want to look at how GStreamer makes constructing complex pipelines easy.
The following pipeline is the start of an answer our problem (it intentionally doesn’t work):
$ gst-launch filesrc location=song.flac ! flacdec ! vorbisenc ! filesink location=song.ogg
In the above example we construct the pipeline, where ! is the pipe symbol and launch it with gst-launch. Each of the pipeline elements, on their own, are simple to understand. A filesrc element reads a file at the specified location. A flacdec element decodes a flac stream. A vorbisenc element encodes something in to vorbis and a filesink element stores a stream back into a file. The above pipeline produces the error
WARNING: erroneous pipeline: could not link flacdec0 to vorbisenc0
This is brilliant. A major difference between a GStreamer pipeline and a Unix pipeline is that I get static checking. It can tell me, from the structure of the pipeline, whether the type of data produced by each element, can be consumed by the following element. This is brilliant because it saves time.
Imagine that my pipeline was much more complicated. Imagine that the pipeline encoded a long, high-quality, video (taking several hours) and then tried to do something specific with the audio stream. As a Unix pipeline it would look like
$ cat video.raw | encodevid | modifyaudio > video.webm
We would have to run this pipeline before we saw passing of data from encodevid to decodevid failed. So having some static checking a-priori (I’m an acadmeic, I’m allowed to use Latin) could save us a lot of time. Static checking, like in a programming language, also ensures that we are passing the right data into an element that handles that data.
Ok, so what’s a working pipeline for the above question?
$ gst-launch filesrc location=song.flac ! flacdec ! audioconvert ! audioresample ! vorbisenc ! oggmux ! filesink location=song.ogg
Note: there’s a much shorter pipeline, but this one makes most of the stages explicit. I was missing the audioconvert, the audioresample and the filesink steps. Once again, this is a very complex domain. Each of the elements in this pipeline are more simple to understand than the entire pipeline.
Let’s do something magic. Take the original flac file, convert it to 8-bit audio and re-encode it as a vorbis stream in an ogg container. Why? Well this is the point of pipelines. Someone might want to do this sometime. The pipeline architecture allows you to combine elements in ways that other people may not have considered, or simply don’t need.
$ gst-launch filesrc location=song.flac ! flacdec ! audioconvert ! audioresample ! capsfilter caps=audio/x-raw-int,channels=2,width=8,depth=8 ! audioconvert ! vorbisenc ! oggmux ! filesink location=song.ogg
I think that GStreamer is a well desiged, well implemented pipeline architecture for a complex domain. So how does it achieve what it does?
Each of the elements in GStreamer advertises its capabilities. Each element advertises the capabilities of what it accepts (eg: raw audio with sample freq > 8hHz and < 44kHz) and the capabilities of what it produces (eg: audio bit stream conforming to the vorbis spec). You can then do static checking on pipelines by ensuring the capabilities of what is produced by one element are a subset of those consumed by the following element. It's very neat.
How might you retrofit this into an existing set of complex Unix applications? You could add a --caps flag (or -caps if you insist on BSD style) to each of your applications. If you have no source, wrap them in a shell script that takes --caps. Make the caps option outupt the input and output capabilities of your pipeline elements. Bonus hipster points if this is outputted in JSON. Then write a my_static_checker that takes your pipeline as a string, checks it statically a-la-GStreamer (I’m allowed to abuse French, I’m an academic) and then executes the pipeline.
Why! Now we’re in a position to write a very domain-specific visual programming language. This could be useful in a lot of cases. Particularly in the case of at least one person I’ll be emailing this to.
Posted at 11:32 pm | Comments Off