DEV300_m78, lang-tags

DEV300_m78 callcatcher report. Down -10 overall. scripting and vbahelper are unused method free.

Fiddling with BCP 47 style language tags, Some code at here to make a sane mapping from the glibc locale strings to it, along with some mods to hunspell to make that the default dictionary naming scheme. Enchant, OOo and friends would need some tweaking as well to bubble the extra info up. End game would be that the edge case stuff like sr-Latn[-RS] and ca[-ES]-valencia would work out of the box.

fontconfig could do with a bit of love in that direction as well. I see some bogus tags of e.g. sd-in@devanagari in some font .conf files but fontconfig takes a language-territory tag, not @modifiers, so its reading those as sd-in. Ideal situation would be to be able to describe that as sd-Deva[-IN].

glibc has two ber locales, ber_DZ and ber_MA, unfortunately these are collective language codes, which is a real nuisance. In the case of the other collective code of no, i.e. no_NO gettext and others map it to nb_NO, ber_DZ is probably most likely equivalent to kab-DZ, but ber_MA does appear to in practice refer to a collection of three languages. Though for some inscrutable reason (copy and paste) the translations in the locale file are actually Azerbaijani/Azeri

4 Responses to “DEV300_m78, lang-tags”

  1. Goran Raki? says:

    Is it now safe to rename Serbian Latin hunspell dictionary to sr-Latn and have it magicaly be linked with sr@latin glibc locale?

    For OpenOffice.org I beleive I can just map sr-Latn.dic and sr-Latn.aff files to sh OOo locale code in dictionaries.xcu.

    With the patch from the Issue #113496 OpenOffice.org will preselect its sh locale code as default document language when started with sr@latin.

  2. Caolan says:

    No, its not safe to do that yet for the general case, I got sidetracked along the way with making the ideal mapping from glibc @ modifiers and aspell variants to BCP47 langtags. Yes, with my related sr@latin patch for OOo OOo will default to sh for default language, but to enable correct system hunspell, etc. integration with OOo, firefox and everything else a few more bits need to be in place to get all of them to magically understand that sr-Latn.dic|aff should be the default filename to look for when in a glibc sr@latin locale.

  3. Goran Raki? says:

    Ok, but if I undertand you correctly there is “no future” for sh code for the Serbian Latin Hunspell dictionary? Fedora is doing OK, but some other distros are breaking sh into a different package and having a single Serbian dictionary is really what I would like to have.

    So, do you think I should submit a patch to OOo to rename sh.dic and sh.aff to sr-Latn and fix dictionaries.xcu to map these new files to internal sh locale code?

    Then outside OpenOffice.org, at one point there will be system support to map sr@latin to sr-Latn dictionary, which will be just a bonus feature. Currently everything is broken for “sh”.

  4. Caolan says:

    Yeah, there’s no real future in “sh”, most other things except for OOo just go “huh!” when they see it, so you’re right, the future would be sr-Latn, so naming the files sr-Latn.aff|.dic is the “right thing to do” and in the interim have the dictionaries.xcu map it to “sh” just for OOo, and in the long term things will hopefully work, and in the short term you haven’t really lost anything.

Leave a Reply