<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Caolan McNamara &#187; General</title>
	<atom:link href="http://blogs.linux.ie/caolan/category/General/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.linux.ie/caolan</link>
	<description>babblings!</description>
	<lastBuildDate>Mon, 21 Nov 2011 13:00:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.4</generator>
		<item>
		<title>fakemail is handy</title>
		<link>http://blogs.linux.ie/caolan/2011/11/21/fakemail-is-handy/</link>
		<comments>http://blogs.linux.ie/caolan/2011/11/21/fakemail-is-handy/#comments</comments>
		<pubDate>Mon, 21 Nov 2011 13:00:05 +0000</pubDate>
		<dc:creator>Caolán</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://blogs.linux.ie/caolan/?p=525</guid>
		<description><![CDATA[For debugging mail problem, e.g. when debugging some emailmerge stuff in LibreOffice recently, fakemail was really really handy when you have a bug which requires generating a couple of hundred emails in quick succession to trigger.]]></description>
			<content:encoded><![CDATA[<p>For debugging mail problem, e.g. when debugging some emailmerge stuff in LibreOffice recently, <a href="http://www.lastcraft.com/fakemail.php">fakemail</a> was really really handy when you have a bug which requires generating a couple of hundred emails in quick succession to trigger.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.linux.ie/caolan/2011/11/21/fakemail-is-handy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>libexttextcat 3.2.0</title>
		<link>http://blogs.linux.ie/caolan/2011/11/13/libexttextcat-3-2-0/</link>
		<comments>http://blogs.linux.ie/caolan/2011/11/13/libexttextcat-3-2-0/#comments</comments>
		<pubDate>Sun, 13 Nov 2011 22:41:59 +0000</pubDate>
		<dc:creator>Caolán</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://blogs.linux.ie/caolan/?p=521</guid>
		<description><![CDATA[Released libexttextcat 3.2.0 (Extended Text Categorization used to guess the language that input text is written in). It can be found in this download dir. No code changes from 3.1.1, but adds a large collection of extra language signatures to nearly add the same language support to libexttextcat as LibreOffice supports, modulo languages that LibreOffice [...]]]></description>
			<content:encoded><![CDATA[<p>Released libexttextcat 3.2.0 (Extended Text Categorization used to guess the language that input text is written in). It can be found in this <a href="http://dev-www.libreoffice.org/src/libexttextcat/">download dir</a>. No code changes from 3.1.1, but adds a large collection of extra language signatures to nearly add the same language support to libexttextcat as LibreOffice supports, modulo languages that LibreOffice supports which don&#8217;t have a convenient UDHR translation to use as a basis to generate a language fingerprint.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.linux.ie/caolan/2011/11/13/libexttextcat-3-2-0/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CTL/CJK format character previews</title>
		<link>http://blogs.linux.ie/caolan/2011/10/21/ctlctl-format-character-previews/</link>
		<comments>http://blogs.linux.ie/caolan/2011/10/21/ctlctl-format-character-previews/#comments</comments>
		<pubDate>Fri, 21 Oct 2011 10:59:53 +0000</pubDate>
		<dc:creator>Caolán</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://blogs.linux.ie/caolan/?p=510</guid>
		<description><![CDATA[As Lior Kaplan demonstrated at LibreOffice 2011 Paris, our format character preview really sucks for CTL and CJK users. If no CTL/CJK text is selected then no CTL sample text is shown, and the CJK sample text is from the fontname itself. Many font names are just Latin text, so give no indication what they [...]]]></description>
			<content:encoded><![CDATA[<p>As Lior Kaplan demonstrated at LibreOffice 2011 Paris, our format character preview really sucks for CTL and CJK users. If no CTL/CJK text is selected then no CTL sample text is shown, and the CJK sample text is from the fontname itself. Many font names are just Latin text, so give no indication what they look like in the actual script/language that is being written in.</p>
<p>e.g. Old dialog for CTL, will only preview some Western text if no text is selected, no attempt to show any sample CTL text, or even the CTL fontname. For CJK it will additional show the fontname of the CJK font in the preview, which isn&#8217;t helpful if the CJK fontname contains no CJK glyphs.</p>
<p><a href="http://blogs.linux.ie/caolan/files/2011/10/ctl-preview-before.png"><img src="http://blogs.linux.ie/caolan/files/2011/10/ctl-preview-before.png" alt="" width="731" height="541" class="aligncenter size-full wp-image-511" /></a></p>
<p>Simply adding the CTL fontname wouldn&#8217;t help much, seeing as the fontname is David CLM. So, currently reusing the preview text used in the font-dropdown first stab at &#8220;doing the right thing&#8221; gives me&#8230;</p>
<p><a href="http://blogs.linux.ie/caolan/files/2011/10/ctl-preview-after.png"><img src="http://blogs.linux.ie/caolan/files/2011/10/ctl-preview-after.png" alt="" width="731" height="541" class="aligncenter size-full wp-image-512" /></a></p>
<p>Code for all this is mostly in svtools/source/misc/sampletext.cxx where there is now some hugely over-engineered set of heuristics to guess the best script a font is tuned for and various functions to generate suitable text when all we have is the font, versus the font+language vs just the language and if we want a short identifier to classify what script a font might be good to render vs a longer sequence of sample text for a font preview.</p>
<p>Probably best to drop rendering the fontname in the Western case for the text preview and use some sample text there too, at least for the mixed Western+CTL+CJK case as its confusing to have a font name rendered and some sample text in another font.</p>
<p>&#8212;</p>
<p>After initial posting, there was some comments about the hideous rendering of the Hebrew text, which appears to be an artefact or using David CLM. Here&#8217;s what it looks like with David, i.e. its the rendering using that font that misplaces the Nikud, not me. Whether this is an interesting bug in our renderer, or maybe glyph fallback, or the font itself it probably worth of investigation.<br />
<a href="http://blogs.linux.ie/caolan/files/2011/10/ctl-preview-after-david.png"><img src="http://blogs.linux.ie/caolan/files/2011/10/ctl-preview-after-david.png" alt="" width="731" height="541" class="aligncenter size-full wp-image-518" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.linux.ie/caolan/2011/10/21/ctlctl-format-character-previews/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>PhagsPa and Tai Le, sample text ?</title>
		<link>http://blogs.linux.ie/caolan/2011/10/19/phagspa-and-tai-le-sample-text/</link>
		<comments>http://blogs.linux.ie/caolan/2011/10/19/phagspa-and-tai-le-sample-text/#comments</comments>
		<pubDate>Wed, 19 Oct 2011 22:29:50 +0000</pubDate>
		<dc:creator>Caolán</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://blogs.linux.ie/caolan/?p=508</guid>
		<description><![CDATA[Looking through my fonts that are clearly tuned for a single specific script, there remain two scripts that niggle me as I don&#8217;t have suitable sample text for them. i.e. PhagsPa and Tai Le. I&#8217;m looking for a short snippet of sample text in those scripts which is suitable to stick into the font drop [...]]]></description>
			<content:encoded><![CDATA[<p>Looking through my fonts that are clearly tuned for a single specific script, there remain two scripts that niggle me as I don&#8217;t have suitable sample text for them. i.e. PhagsPa and Tai Le. I&#8217;m looking for a short snippet of sample text in those scripts which is suitable to stick into the font drop down preview. Ideally something fairly equivalent to &#8220;Alphabet&#8221;, &#8220;Script&#8221;, &#8220;PhagsPa/Tai Le&#8221; or &#8220;Tibetan/Tai Lü&#8221;.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.linux.ie/caolan/2011/10/19/phagspa-and-tai-le-sample-text/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>libexttextcat: text guessing feature</title>
		<link>http://blogs.linux.ie/caolan/2011/09/28/libexttextcat-text-guessing-feature/</link>
		<comments>http://blogs.linux.ie/caolan/2011/09/28/libexttextcat-text-guessing-feature/#comments</comments>
		<pubDate>Wed, 28 Sep 2011 14:10:11 +0000</pubDate>
		<dc:creator>Caolán</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://blogs.linux.ie/caolan/?p=493</guid>
		<description><![CDATA[LibreOffice inherited a text language guesser, based on textcat from wise-guys.nl and extended by Jocelyn Merand to basically handle UTF-8 text. This is the thing that makes the suggestions as to what language your text might really be in when you right click on some misspelled text and chose set language. We&#8217;ve now spun this [...]]]></description>
			<content:encoded><![CDATA[<p>LibreOffice inherited a text language guesser, based on textcat from wise-guys.nl and extended by Jocelyn Merand to basically handle UTF-8 text. This is the thing that makes the suggestions as to what language your text might really be in when you right click on some misspelled text and chose set language.</p>
<p>We&#8217;ve now spun this off as a standalone <a href="http://cgit.freedesktop.org/libreoffice/libexttextcat/">libexttextcat</a> and fixed up some conversion problems from the original selection of 8bit encodings and generated new language fingerprints in other cases, which should give better results for various languages, and allow us to enable checking for some languages which was disabled until now.</p>
<p>The current list of languages it attempts to detect can <a href="http://cgit.freedesktop.org/libreoffice/libexttextcat/tree/langclass/LM">be seen here</a> </p>
<p>Here&#8217;s a plausible process to add your favourite language to it, given <i>git clone git://anongit.freedesktop.org/libreoffice/libexttextcat</i> and bootstrapping from the <a href="http://www.ohchr.org/EN/UDHR/Pages/SearchByLang.aspx">insanely-translated UDHR</a> using Abkhaz as an example.</p>
<p><code><br />
cd libexttextcat/langclass/ShortTexts/<br />
wget http://unicode.org/udhr/d/udhr_abk.txt<br />
#skip english header, name result using <a href="http://www.rfc-editor.org/rfc/bcp/bcp47.txt">BCP-47</a><br />
tail -n+7 udhr_abk.txt &gt; ab.txt<br />
cd ../LM<br />
../../src/createfp &lt; ../ShortTexts/ab.txt &gt; ab.lm<br />
echo ab.lm ab--utf8 &gt;&gt; ../fpdb.conf<br />
</code></p>
<p>Then update the check target in src/Makefile.am to confirm the detection of ShortTexts/ab.txt as ab works using make check</p>
<p>I&#8217;ll remove the necessity of a configuration file in a later version, and convert the result to a BCP-47 tag. For the moment it remains a drop in replacement for the original solution which necessitates retaining the slightly odd language tag syntax.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.linux.ie/caolan/2011/09/28/libexttextcat-text-guessing-feature/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>git, really nifty after all</title>
		<link>http://blogs.linux.ie/caolan/2011/09/20/git-really-nifty-after-all/</link>
		<comments>http://blogs.linux.ie/caolan/2011/09/20/git-really-nifty-after-all/#comments</comments>
		<pubDate>Mon, 19 Sep 2011 23:57:25 +0000</pubDate>
		<dc:creator>Caolán</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://blogs.linux.ie/caolan/?p=489</guid>
		<description><![CDATA[Maybe there&#8217;s something to the cult-of-git after all . vcl/unx/source/fontmanager/fontcache.cxx had some code which painstakingly constructed a string, only to do nothing with it. Clearly at some time in the past it was used, so when did its use go away. This is a file which has been moved around over the years from place [...]]]></description>
			<content:encoded><![CDATA[<p>Maybe there&#8217;s something to the cult-of-git after all <img src='http://blogs.linux.ie/caolan/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> . vcl/unx/source/fontmanager/fontcache.cxx had some code which painstakingly constructed a string, only to do nothing with it. Clearly at some time in the past it was used, so when did its use go away. This is a file which has been moved around over the years from place to place, hmm, potentially tricky to scratch the itch of knowing when it happened ?, not at all&#8230;</p>
<p><code>git log --follow --oneline -S'suspiciously missing variable' /path/to/file.cxx</code></p>
<p>and 2 seconds later I have a list of 5 commits, there it is at the top of the list. Back in 2005, a rework of the font cache where the stat on a file was optimized out, while the constructed path to the file remained. No undetected nightmare merge bug then, just a missed micro optimization opportunity.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.linux.ie/caolan/2011/09/20/git-really-nifty-after-all/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>all text rendered with cairo</title>
		<link>http://blogs.linux.ie/caolan/2011/08/19/all-text-rendered-with-cairo/</link>
		<comments>http://blogs.linux.ie/caolan/2011/08/19/all-text-rendered-with-cairo/#comments</comments>
		<pubDate>Fri, 19 Aug 2011 11:56:02 +0000</pubDate>
		<dc:creator>Caolán</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://blogs.linux.ie/caolan/?p=480</guid>
		<description><![CDATA[So, as of today all LibreOffice (3.5 onwards) text rendering under X goes through cairo. This was already the case in practice for horizontal text for quite a while, the additional change is that its true for vertical text as well now. before after Yes, I know it&#8217;s still rather sub-optimal. The current implementation is [...]]]></description>
			<content:encoded><![CDATA[<p>So, as of today all LibreOffice (3.5 onwards) text rendering under X goes through cairo. This was already the case in practice for horizontal text for quite a while, the additional change is that its true for vertical text as well now.</p>
<p />
<b>before</b><br />
<a href="http://blogs.linux.ie/caolan/files/2011/08/linux-classic.png"><img src="http://blogs.linux.ie/caolan/files/2011/08/linux-classic.png" alt="" width="320" height="256" class="aligncenter size-full wp-image-478" /></a><br />
<b>after</b><br />
<a href="http://blogs.linux.ie/caolan/files/2011/08/linux-cairo.png"><img src="http://blogs.linux.ie/caolan/files/2011/08/linux-cairo.png" alt="" width="320" height="256" class="aligncenter size-full wp-image-479" /></a></p>
<p>Yes, I know it&#8217;s still rather sub-optimal. The current implementation is basically intended to be bug-for-bug compatible for now, though I couldn&#8217;t resist improving the positioning of 0x30FC.</p>
<p>Test-case at <a>http://cgit.freedesktop.org/libreoffice/core/tree/qadevOOo/testdocs/vertical-testcase.odt</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.linux.ie/caolan/2011/08/19/all-text-rendered-with-cairo/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>sgv, StarDraw 2.0 examples with text ?</title>
		<link>http://blogs.linux.ie/caolan/2011/08/08/sgv-stardraw-2-0-examples-with-text/</link>
		<comments>http://blogs.linux.ie/caolan/2011/08/08/sgv-stardraw-2-0-examples-with-text/#comments</comments>
		<pubDate>Mon, 08 Aug 2011 11:12:50 +0000</pubDate>
		<dc:creator>Caolán</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://blogs.linux.ie/caolan/?p=475</guid>
		<description><![CDATA[I wonder if anyone has any sgv documents left around, not svg, but sgv, the StarDraw 2.0 format. Looking for .sgv documents that contain text, and ideally text outside of the ASCII range. A few umlaut&#8217;s would probably suffice.]]></description>
			<content:encoded><![CDATA[<p>I wonder if anyone has any sgv documents left around, not svg, but sgv, the StarDraw 2.0 format. Looking for .sgv documents that contain text, and ideally text outside of the ASCII range. A few umlaut&#8217;s would probably suffice.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.linux.ie/caolan/2011/08/08/sgv-stardraw-2-0-examples-with-text/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>unused code, libreoffice style</title>
		<link>http://blogs.linux.ie/caolan/2011/07/11/unused-code-libreoffice-style/</link>
		<comments>http://blogs.linux.ie/caolan/2011/07/11/unused-code-libreoffice-style/#comments</comments>
		<pubDate>Mon, 11 Jul 2011 12:34:48 +0000</pubDate>
		<dc:creator>Caolán</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://blogs.linux.ie/caolan/?p=472</guid>
		<description><![CDATA[The return of callcatcher derived lists of unused code list in LibreOffice. I tweaked callcatcher to understand the additional gcc command line options used by the new gbuild module so it can be dropped in as a gcc replacement in that environment. There&#8217;s now a findunusedcode target in the top level Makefile and a cached [...]]]></description>
			<content:encoded><![CDATA[<p>The return of <a href="http://www.skynet.ie/~caolan/Packages/callcatcher.html">callcatcher</a> derived lists of unused code list in LibreOffice. I tweaked callcatcher to understand the additional gcc command line options used by the new gbuild module so it can be dropped in as a gcc replacement in that environment.</p>
<p>There&#8217;s now a findunusedcode target in the top level Makefile and a cached list of easy to remove methods in the tree as <a href="http://cgit.freedesktop.org/libreoffice/bootstrap/tree/unusedcode.easy">unusedcode.easy</a>. These are non-virtual C++ methods which are not called directly, nor have their address taken by any code in a stock debug level Linux build.</p>
<p>What distinguishes unusedcode.easy from not-easy is simply that the easy list is restricted to C++ name-mangled class-level symbols and so omits any non-mangled C-style symbols which might be dlsymed from some not easy to find entry point.</p>
<p>At a count of 5176 easy unused methods there&#8217;s enough there to be getting on with for the moment, and can revisit the C-style symbols with a whitelist of known dlsym names on completion of those.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.linux.ie/caolan/2011/07/11/unused-code-libreoffice-style/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>regression testing libreoffice filters</title>
		<link>http://blogs.linux.ie/caolan/2011/07/11/regression-testing-libreoffice-filters/</link>
		<comments>http://blogs.linux.ie/caolan/2011/07/11/regression-testing-libreoffice-filters/#comments</comments>
		<pubDate>Mon, 11 Jul 2011 11:58:27 +0000</pubDate>
		<dc:creator>Caolán</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://blogs.linux.ie/caolan/?p=468</guid>
		<description><![CDATA[For regression testing LibreOffice filters I&#8217;ve now arranged things so that each import filter&#8217;s cppunit test comprises of three data dirs, a pass dir, a fail dir and an indeterminate dir. Files in pass must parse without error, those in fail are expected to fail, but fail gracefully by returning an error or throwing an [...]]]></description>
			<content:encoded><![CDATA[<p>For regression testing LibreOffice filters I&#8217;ve now arranged things so that each import filter&#8217;s cppunit test comprises of three data dirs, a pass dir, a fail dir and an indeterminate dir. Files in pass must parse without error, those in fail are expected to fail, but fail gracefully by returning an error or throwing an exception, i.e. a crash is a fail on a &#8220;fail&#8221; test, while &#8220;can&#8217;t parse&#8221; is the expected pass state.</p>
<p>The pass/fail dirs are typically pre-filled in the tree with a small sample of tricky documents which get tested at every build time to ensure they remain working.</p>
<p>indeterminate dirs on the other hand are expected to be empty in the tree, and the cppunit tests don&#8217;t care if their contents can be parsed or not, only that they don&#8217;t crash. This is really convenient for searching for crashes in a large document collection (horde), given that its an order of magnitude faster than using the full application to load and layout the results.</p>
<p>So I/we can just take a large document horde of e.g. docs and throw them in sw/qa/core/data/ww8/indeterminate and run make -sr in sw and sit back and wait to see if anything in there is a crasher at the parser level. For extra goodness export VALGRIND=memcheck to run the whole lot under valgrind.</p>
<p>FWIW, today anyway</p>
<ol>
<li>All 3721 attachments of (alleged) mime-type application/msword in openoffice.org&#8217;s bugzilla pass without crash when placed into ww8/indeterminate. To be re-run under valgrind later</li>
<li>And all (ok, only 128) attachments of (alleged) mime-type application/msword in freedesktop.org&#8217;s bugzilla pass under VALGRIND=memcheck when placed into indeterminate.</li>
</ol>
<p>I&#8217;ve got doc, rtf, qpro, wmf, emf, hwp, lwp and sxw organized and pre-seeded with a sample handful files so far. Plenty more filters than that of course, but .doc is my current focus as the richest vein of available had-bugs-reported documents.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.linux.ie/caolan/2011/07/11/regression-testing-libreoffice-filters/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

