Recursive types in QuickCheck

May 18th, 2011 by balor

I want to build arbitrary sized trees of Compound

data Compound = And Compound Compound
  | Or Compound Compound
  | Product Compound Compound
  | Not Compound
  | Leaf { diagram :: Unitary} deriving (Show, Eq)

However, I want to limit to a max depth, and I want to have each branch to be of an arbitrary depth.

mkBinary :: (Compound -> Compound -
> Compound) -> Int -> Gen Compound
mkBinary f depth =
  liftM2 f (mkCompound depth') (mkCompound depth')
  where
    depth' = depth -1

mkNot :: Int -> Gen Compound
mkNot depth =
  liftM Not (mkCompound (depth-1))

mkLeaf :: Gen Compound
mkLeaf = liftM Leaf arbitrary

mkCompound :: Int -> Gen Compound
mkCompound 0 = mkLeaf
mkCompound depth = oneof [mkLeaf, mkBinary And depth, mkBinary Or depth, mkBinary Product depth, mkNot depth]

instance Arbitrary Compound where
  arbitrary =
    do depth <- choose (0, 4)
       d <- mkCompound depth
       return d

QuickCheck can do it. It’s nice. Use it.

Creating Arbitrary Foo for QuickCheck

May 17th, 2011 by balor

I always write semi-automated tests, whatever language I’m using. In Haskell my preferred framework is QuickCheck, which goes beyond what an XUnit framework is capable of. I come to QuickCheck as a convert from the old religion of Object Oriented programming. However much I’ve adopted functional practices I find myself, in the darkest of times, falling back on idioms from a past life. Recently I’ve been testing code that looks like the following:

data Squad = Squad { players :: Set Char,
team :: Set Char }

So what I’ve got is some type which is composed of a set and some other set which is a subset of the original. I hope the Squad metaphor is reasonable. In many sports you’ll have a panel of players, called the squad, and then for any given game you’ll have a team which is a subset of the players in your squad. To simplify this problem, each player is identified by a single character rather than a name. Now, we don’t have a dependent type system here. Just plain old Haskell. So the constraint that “team is a subset of players” is not enforced by the type system.

Now I want to tell Haskell to create Arbitrary instances of Squad.

randSubset :: Ord a => Set a -> Gen (Set a)
randSubset xs = do {
mask <- vectorOf (Set.size xs) arbitrary;
return (Set.fromList [ e | (e,True) <- zip (Set.toList xs) mask ]) }

instance Arbitrary Squad where
arbitrary = do {
players <- listOf1 (arbitrary :: Gen Char);
team <- randSubset (Set.fromList players);
return (Squad (Set.fromList players) team))}

The smarts here is in randSubset, but listOf1 ensures that there’s at least one player in our squad. I was struggling when playing around with a random number generator, to try and pull out random elements from a Set. Such thinking is a false idol. With some guidance from the only person on the planet to be 90% Real Jedi I realised you can just use the common generator combinators as provided by QuickCheck.

So the fragment

do {
mask <- vectorOf (Set.size xs) arbitrary;
return (Set.fromList [ e | (e,True) <- zip (Set.toList xs) mask ])}

asks for an arbitrary list of boolean values where the list is the same length as the size of the input Set. Then where the ith element of the list is true, the ith element of the set is considered to be in the subset, or omitted otherwise. It’s a nice idiom, useful anywhere you’re playing with sets that aren’t quite arbitrary themselves, but are constrained to be a subset of another.

An issue with QuickCheck 2

May 7th, 2011 by balor

I’ve been interested in functional programming for a while. I’ve hacked up a few things. Mira and Conferenceasaurus mainly. I’m not claiming that the code is great, I’d regard myself still as a n00b. I also have an interest in software testing, to this end the QuickCheck is both interesting and awesome. However, I’ve got one significant issue with my usage of it.

The types I’m playing with should probably be implemented using a dependent type system. For example, I’ve got a lot of

data Foo = Foo {bar::Set a, quux::Set a}

where quux should be a subset of bar. I’m not using a dependent type system, just regular Haskell, so the dependence is not enforced. However, I do want to generate instances of Arbitrary Foo where quux is a random subset of bar. Randomness is where my issues lie.

I could generate a random subset of a set

import Data.Set
import Data.Maybe
import Test.QuickCheck

randSubset :: Set a -> Gen (Set a)
randSubset xs = do {
mask <- vectorOf (size xs) (arbitrary::Gen Bool);
return (fromList (catMaybes [if b then (Just e) else Nothing | e <- l, b <- mask]))
}
where
l = toList xs

However, you’ll notice that the type signature is randSubset :: Set a -> Gen (Set a) and to create quux as a random subset of bar in an Arbitrary Foo I’d need this to return a Set a. I could unGen it, however this would require picking a seed which produces the same set of pseudo-random numbers each time, or generating a random seed, in which case I’d have to return an IO (Set a).

So, question time. Is it possible to generate arbitrary instances of Foo such that the property “quux is a subset of bar” is enforced in all arbitrary instances?

update:
As arbitrary :: Gen a then arbitrary = do x <- arbitrary; y <- randSubset x; return (Foo x y) is the solution.

A tough Q1 for open-source

March 25th, 2011 by balor

You can feel the recession biting. It seems to have encouraged several companies to evaluate their open-source strategy. Actually, maybe that reasoning is a little too shallow. There are some deeper reasons we’ve seen Nokia, Google and RedHat change some of their practices. But let’s be clear from the start; some companies, like Nokia, have changed their whole strategy. Others, like RedHat, have made little corrections to their general practice. Nokia made their strategic decision based on failure and RedHat made their practice change due to success.

The Nokia thing. They’ve executed an entire U-turn on their open-source strategy. At a high-level their previous strategy seemed to be (as an external observer) to use their excellent QT toolkit to develop applications for their newly open-sourced Symbian platform for low to mid-range mobiles. Furthermore, they’d use the same toolkit to develop for their next-gen Meego platform. However, for reasons to do with both business and technology, they’ve decided that Windows Phone 7 is to be their next-gen platform of choice. Thankfully, the open-source nature of Meego means that we don’t loose the innovation that has been already contributed to the platform.

The reasons for Nokia’s change have been extensively covered in the computing press. My take is that the major consequence of this is an observation that all of our next-gen mobile platform providers are non-European (yes I’m discounting Symbian). So Android is from Google, WP7 from Microsoft, WebOS from HP and Blackberry from RIM. The first three are US companies, the last is Canadian. This, in my opinion, is a bad thing for Europe. Much of the Nokia R&D on Meego seemed to be outsourced to small to medium sized European companies. I know of at least three British companies (I have resided in the UK for a few years now) that did Meego development which was paid for by Nokia.

On Google’s Honeycomb decision. I’ve not decided yet if this is in response to

      The Chinese strawman argument, or
      Some deep-integration argument.

Maybe it’s both or neither. However, the Chinese strawman is the argument that if Honeycomb is open-source then many companies come along and produce poor-quality, but cheap, devices running the platform. Therefore, devaluing the perception of the platform. I believe that this argument is a poor one, hence I’ve called it a strawman. I don’t see how Android resellers can charge the 30% markup that one of their competitors can charge. Maybe this allows the small number of honeycomb licensees to bump up their margins slightly?

The deep integration argument is that honeycomb is bound to a specific platform like NVidia’s Tegra. Or, more interestingly, Google don’t think that you can do really high-quality UX design in an open-source fashion. There’s been a lot of talk about this lately in the open source world. About the open-source world being hostile to designers. One aspect of this has been the Canonical/Gnome argument. Where, it appears, that Canonical decided that the best way to design their next-gen desktop was to bring the design team in-house (it’s more grey than in-house -v- not in-house) whereas the Gnome team are (almost) entirely open and have even developed the fantabulous sparkle share tool for sharing designs between Gnome designers. What’s clear is that good UX design is hard. What’s unclear is whether being open-source makes good UX more difficult. I think that ThinkUp, Firefox and Gnome 3 make good design stories. For the record, I think Caonical’s Unity is interesting but doesn’t add anything to the point I’m trying to make.

I don’t know the ThinkUp people. But they seem to have a great attitude to encouraging contributions from people who might have traditionally found open-source hostile. From an external perspective, they seem to do great design in a small team and apply it to a hosted web application. Web designers are awesome. They’ve, arguably, made more design progress in the past ten years than desktop designers have made in the past thirty. Those of us who build fat (desktop) apps can then borrow their design ideas.

The Firefox devs are entirely different. It’s a massive project. They have different design constraints than ThinkUp, particularly as they have such a massive install base on so many different platforms. Furthermore, Mozilla has a lot of cash to spend on both user testing and a team of UX desigers. I like Firefox 4. I think you can see the fruits of incremental, good design.

The design community I’m most familiar with are the Gnome 3 designers. What they’ve done in the past six months is amazing. Furthermore, unlike the small integrated team at ThinkUp or the large on-site team at Mozilla, the Gnome design team is distributed (in terms of geography and companies) but well-integrated.


This picture speaks more than a paragraph of text. It shows a clean and consistent desktop. Yes there are a few unpolished edges, but the software hasn’t been released yet. What’s interesting is to observe the “traditional code geeks” talking to the UX designers. Changes to the UI or the UX are, more often than not, bounced off one of the designers. And, more interesting, is that the ideas that the UX designers communicate are being internalised by the developers. Ok, so very few of these developers will ever be kick-ass designers. However, by internalising the design ideas you can see a commitment to bringing fresh UX ideas on board and the value that the developers place on the design team. They’re highly valued and stunningly smart people.

So as usual I started off talking about how this Q1 has been tough for open-source. However my internal optimist has trumped my internal cynic. Nokia and Google may make decisions about their open-source policy, but they’re not the only places that are (or were) doing excellent open-source design and engineering. The loss of Nokia from our ecosystem will be keenly felt. But Intel are pressing forward with Meego and the design ideas that come from Meego are finding their way into Gnome 3 (and vice-versa).

Cubic spline interpolation

January 30th, 2011 by balor

The following may make more sense when the latex-mathjax plugin is enabled. You’ll see the equations then, not the LaTeX source.

I’ve got a set of $(x,y)$ points. What I want to do is draw a reasonably smooth curve through all of them, rather than a set of straight lines connecting them. To do so, I need some higher order function such as a cubic. In particular the cubic Bezier function is a well known graphics primitive. Therefore, given a set of points and the definition of a cubic Bezier, I need to calculate the control points for each curve.

A cubic Bezier function interpolates between $K_0$ and $K_1$ using control points $C_{0,1}$ and $C_{0,2}$ in the following manner
$$
B(t)=(1-t)^3K_0+3t(1-t)^2C_{0,1}+3t^2(1-t)C_{0,2}+t^3K_1.
$$
We may extend the idea to a spline. The knots of the spline are $\langle K_0, K_1,\ldots,K_n\rangle$. This leads to $n-1$ cubic functions $B_0, B_1, \ldots, B_{n-1}$. Where $B_i$ interpolates between $K_i$ and $K_{i+1}$ with control points $C_{i,1}$ and $C_{i,2}$.

We want $C^2$ continuity so that our curve is nice and smooth. Therefore, for each $i$ we require
\begin{eqnarray*}
B^\prime_i(1)&=&B^\prime_{i+1}(0)\textrm{ and}\\
B^{\prime\prime}_i(1)&=&B^{\prime\prime}_{i+1}(0).
\end{eqnarray*}
Furthermore,
$$
B_i^\prime(t)=3K_{i+1}t^2-3C_{i,2}t^2+6C_{i,2}\left(1-t\right)t-6C_{i,1}\left(1-t\right)t+3C_{i,1}\left(1-t\right)^2-3K_i\left(1-t\right)^2
$$
and
$$
B_i^{\prime\prime}(t)=6 K_i (1 – t) + 6 C_{i,1} t – 12 C_{i,1} (1 – t) – 12 C_{i,2} t + 6 C_{i,2} (1 – t) + 6 K_{i+1} t.
$$

Evaluating $B^\prime_i(1)$ gives
\begin{eqnarray*}
B_i^\prime(1)&=&3K_{i+1}1^2-3C_{i,2}1^2+6C_{i,2}\left(1-1\right)1-6C_{i,1}\left(1-1\right)1+3C_{i,1}\left(1-1\right)^2-3K_i\left(1-1\right)^2\\
&=&3K_{i+1}-3C_{i,2}+6C_{i,2}(0)1-6C_{i,1}(0)1+3C_{i,1}(0)^2-3K_i(0)^2\\
&=&3K_{i+1}-3C_{i,2}.
\end{eqnarray*}
and evaluating $B_{i+1}^\prime(0)$ gives
\begin{eqnarray*}
B_{i+1}^\prime(0)&=&3K_{i+2}0^2-3C_{i+1,2}0^2+6C_{i+1,2}\left(1-0\right)0\\
&=&-6C_{i+1,1}\left(1-0\right)0+3C_{i+1,1}\left(1-0\right)^2-3K_{i+1}\left(1-0\right)^2\\
&=&3C_{i+1,1}-3K_{i+1}
\end{eqnarray*}
Therefore, for each $i$
\begin{eqnarray*}
B_i^\prime(1)&=&B_{i+1}^\prime(0)\\
3K_{i+1}-3C_{i,2}&=&3C_{i+1,1}-3K_{i+1}\\
6K_{i+1}&=&3C_{i,2}+3C_{i+1,1}\\
K_{i+1}&=&0.5C_{i,2}+0.5C_{i+1,1}.
\end{eqnarray*}

Evaluating $B_i^{\prime\prime}(1)$ gives
\begin{eqnarray*}
B_i^{\prime\prime}(1)&=&6 K_i (1 – t) + 6 C_{i,1} t – 12 C_{i,1} (1 – t) – 12 C_{i,2} t + 6 C_{i,2} (1 – t) + 6 K_{i+1} t\\
&=&6K_i(0)+6C_{i,1}-12C_{i,1}(0)-12C_{i,2}+6C_{i,2}(0)+6K_{i+1}\\
&=&6C_{i,1}-12C_{i,2}+6K_{i+1}\\
\end{eqnarray*}
and evaluating $B_{i+1}^{\prime\prime}(0)$ gives
\begin{eqnarray*}
B_{i+1}^{\prime\prime}(1)&=&6 K_{i+1} (1 – t) + 6 C_{i+1,1} t – 12 C_{i+1,1} (1 – t) – 12 C_{i+1,2} t + 6 C_{i+1,2} (1 – t) + 6 K_{i+2} t\\
&=&6 K_{i+1} (1 – 0) + 6 C_{i+1,1} 0 – 12 C_{i+1,1} (1 – 0) – 12 C_{i+1,2} 0 + 6 C_{i+1,2} (1 – 0) + 6 K_{i+2} 0\\
&=&6 K_{i+1} – 12 C_{i+1,1} + 6 C_{i+1,2}.
\end{eqnarray*}
Therefore, for each $i$
\begin{eqnarray*}
B_i^{\prime\prime}(1)&=&B_{i+1}^{\prime\prime}(0)\\
6C_{i,1}-12C_{i,2}+6K_{i+1}&=&6 K_{i+1} – 12 C_{i+1,1} + 6 C_{i+1,2}\\
0&=&-6C_{i,1}+12C_{i,2}-12 C_{i+1,1} + 6 C_{i+1,2}\\
0&=&-C_{i,1}+2C_{i,2}-2 C_{i+1,1} + C_{i+1,2}\\
\end{eqnarray*}

We now have $(2\times n-1)-2$ equations and $(2\times n-1)$ unknown control points. We construct the linear system
$$
\left(
\begin{array}{ccccccc}
0&0.5&0.5&0&0&0&\ldots\\
-1&2&-2&1&0&0&\ldots\\
0&0&0&0.5&0.5&0&\ldots\\
0&0&-1&2&-2&1&\ldots\\
%0&0&0&0.5&0.5&0&\ldots\\
%0&0&1&-2&2&-1&\ldots\\
\vdots&&&&&&
\end{array}
\right)
\times
\left(
\begin{array}{c}
C_{0, 1}\\
C_{0, 2}\\
C_{1, 1}\\
C_{1, 2}\\
\ldots
\end{array}
\right)
=
\left(
\begin{array}{c}
K_1\\
0\\
K_2\\
0\\
\ldots
\end{array}
\right).
$$

We also have the natural boundary conditions that $B^{\prime\prime}_0(0)=0$ and $B_{n-1}^{\prime\prime}(1)=0$. The linear system then becomes
$$
\left(
\begin{array}{ccccccccc}
2&-1&0&0&0&0&\ldots&0&0\\
0&0.5&0.5&0&0&0&\ldots&0&0\\
-1&2&-2&1&0&0&\ldots&0&0\\
0&0&0&0.5&0.5&0&\ldots&0&0\\
0&0&-1&2&-2&1&\ldots&0&0\\
%0&0&0&0.5&0.5&0&\ldots&0&0\\
%0&0&1&-2&2&-1&\ldots&0&0\\
\vdots&&&&&&&&\\
0&0&0&0&0&0&\ldots&-1&2
\end{array}
\right)
\times
\left(
\begin{array}{c}
C_{0, 1}\\
C_{0, 2}\\
C_{1, 1}\\
C_{1, 2}\\
\ldots
\end{array}
\right)
=
\left(
\begin{array}{c}
0\\
K_1\\
0\\
K_2\\
0\\
\ldots\\
0
\end{array}
\right).
$$

Holidays at our place

December 26th, 2010 by balor

The monsters


had fun (blame Uncle Declan for the Santa costume). But it was tiring


I was pretending to sleep. He, most definitely, wasn’t pretending.

Introducing Caitlín

December 11th, 2010 by balor

No pictures yet, but she was born on the 11th December weighing in at 3.03kg. Mother and child doing well.

Update:

A picture

A video

Mother and child are now home and both are eating well.

On the importance of backups

December 2nd, 2010 by balor

I’m a bit paranoid about backing up. Particularly when it comes to the crown jewels i.e. my long-term research projects. Or, at least, I *thought* I was paranoid about backing up. It turns out I’m not. Consider the two following maxims of backups

  • If you need your backups you’re already in a compromising situation.
  • A backup is not a backup until you’ve seen a full restore.

So my backup strategy means taking the content from my laptop and pushing it onto my home NAS box. I normally use a nice GUI program such as Deja Dup which uses an implementation of the venerable rsync underneath. Then for the long-term research projects I use the git version control system (I also use darcs and have largely given up on bzr, due to SCM proliferation). I do a “git push” of all my projects from my laptop, to my home NAS and to a server in Ireland. This is a paranoid strategy….or is it.

This last week the disk in my home NAS box died. I’ve since replaced it, but have not had the time (or an external 3.5″ SATA caddy) to recover the data. The disk is foobar’ed, so I’m expecting only a partial recovery. That’s fine. I have all my data on my laptop. However, bang! Maybe it was the cold, or just plain entropy, but my laptop has developed a hardware fault. Thankfully, it’s not a disk fault. If it was I’d have lost my entire music collection, but not much else.

You see, when my NAS drive failed my paranoia kicked in. I asked my mate Neal to give me ssh access to his home NAS for my most important long-term research projects. I pushed up to his system, and the system in Ireland. But think about it…if I had a disk fault on my laptop, because I was already in a compromising situation, I could have lost a lot of important data. Already two independent random events were conspiring against me, why not a third? There was a window of a day between which I had no NAS storage and when my laptop failed. If I had replaced that disk a day later, I could have lost some data. Less important data, but data costs money.

All is well. I have all my data. Even my music collection (which isn’t too important with respect to my 8 years of research work). I’m also confident in my backups as every time I use git to back up a research project I can verify, using “git log”, that all my data *is* actually backed up. There have been many occasions on which friends or colleagues have told me that they assumed their backups were working. They have lost data due to not doing a systematic full restore. They’ve never verified their backup worked by *actually* trying to recover some data. Hopefully that’ll never happen to me, but you can’t be too complacent…or too paranoid.

Smoothing paths in Inkscape

September 19th, 2010 by balor

Inkscape is an extremely powerful tool. I’ve played with the lib2geom code which is the underlying representation of paths and shapes used in Inkscape. I think a small fraction of the power of this API is exposed through the graphical interface. The following is a quick guide to smoothing a freehand path in Inkscape. The correct term for this in the tool is “simplification”. This is inherited from the lib2geom API.

First, choose the draw freehand tool.

Then, secondly draw some kind of path.
A drawn curve

Finally, with the contour highlighted press “Ctrl+L” i.e. hold down the Ctrl key and the L key. Alternatively, choose “Path” from the menus and “Simplify”. You may need to simplify the path a lot to see a visual difference. Here’s the above path after I hit “Ctrl+L” about ten times
A smoothed path

And we’re done :) .

Diagrams 2010, my retrospective

August 14th, 2010 by balor

I flew, with colleagues, to Portland in Oregon for the Diagrams 2010 conference. We arrived on a Saturday and departed the following Thursday. This makes it the shortest period that I’ve ever taken a long-haul flight for. So today I’m feeling rough: approximately one part dead, four parts like I have some painful communicable disease and six parts hungover. I’m counting on a smoked rasher sandwich to right all the ills in the world.

As I previously posted, Portland is fantastic. We partook in many of the local cultural delights such as visiting Powell’s book store, sampling ale at the Deschutes micro-brewery and visiting the Bite of Oregon festival. I spent $10 at Powells and picked up a copy of Flatland and a copy of Wittgenstein’s Brown and Blue book. To my lifelong shame, I made a beeline for the maths section, but totally forgot to visit the computing section! The conference banquet was in the local art museum, which was excellent. They have a collection of American abstract art which Paolo Bottoni kindly explained to me. They also have some Rodin sculptures and a sculpture by Picasso, a Monet water-lilly, an early Van Gogh and several other impressionist works by artists such as Cezzane. All very beautiful.

The conference itself was an excellent event. I thoroughly enjoyed Randall Davis’ keynote. He demonstrated the breadth and depth of research that you might expect from an MIT lab. His assertion that we should be having a conversation with our computers has already influenced how I think some of our sketch recognition work should proceed.

There were lots of excellent papers. Two in particular will, I think, influence future directions of my own research. The first is by Koji Mineshima, Mitsuhiro Okada and Ryo Takemura titled “Two Types of Diagrammatic Inference System: Natural Deduction Style and Resolution Style”. In this paper they do lots of interesting things, including quantifying free rides by comparison of Euler diagrams and certain sentential forms used in reasoning. The second is by Mathias Frisch, Jens Heydekorn and Raimund Dachselt titled “Diagram Editing on Interactive Displays Using Multi-Touch and Pen Gestures”. They demonstrated some really well done software for drawing certain types of diagrams. I’m particularly interested in taking the approach described in Davis’ keynote, the design and technology described by Frisch et al. and the constraint based layout described by Tim Dwyer in “Hi-tree layout using quadratic programming ” to inform our sketch recognition stuff.

There were some other great papers. David Landy spoke about his “Toward a physics of equations” and demo’ed his very, very cool iPhone/iPad application that helped him capture his findings. The papers that won the best paper prizes were, unsurprisingly, interesting and well presented.

All in all, I had a great time in Portland. I enjoyed the beer and the food was better than food I’ve eaten anywhere I’ve visited in the US. And thankfully, the rasher sandwich seems to be doing its thing. I might even get some work done later :)