I happened to notice that there was a big difference in size when compiling the sw (writer) module with different -finline-limit values. And I was messing around with that because I’m still bitter about discovering that modernizing some of OOo’s macros to templates bloated up the final size of OOo considerably and so I abandoned the effort. I wondered what sort of a difference messing around with -finline-limit might make to e.g. startup-time.
The comparison here was of simply starting a writer document containing a macro triggered on document load which printed to stdout the current timestamp and forced an exit. Hopefully an indication of how long it takes to launch OOo and get to a state where useful work can be done. What I was really interested in was getting reproducible figures which could indicate sanely if the change made a difference. Always tricky, and dropped from these results is the effect of cat-ing some huge files between runs, which just caused wild skew between runs of 7 seconds to 22 seconds, worthless data. Also dropped was the fillmem and flushdisk between runs, which also had unreliable variation. What’s installed for these runs and size comparisons was just writer and calc. The compiler is our current rawhide gcc 4.1.2-3.
Compelling is the size difference, original size of
du of 152148 for /usr/lib/openoffice.org/program
du of 150392 for /usr/lib/openoffice.org/program
so nearly two megs in stripped binaries smaller. Surely that’s a good thing that has to show some run-time benefit, or not ? I’ll even take performance neutral.
All runs were over 27 iterations so that throwing out outliers would still give a reasonably large set of samples
1. on a “warm” start of simply launching OOo iteratively gives 2.53 secs vs 2.36 startup in favour of inline-limit=64
2. on a “cold-sim” start of calling michael’s cache killer inbetween gives 42.47 secs vs 39.71 in favour of inline-limit=64
3. on a “bizarro” start of launching OOo from a dvd and unmounting it in-between gives 96.76 secs vs 95.33 in favour of inline-limit=64 (yeah a minute and a half, I don’t recommend it)
And I randomly selected =64 because on fiddling with some simple code to get a feel for what the X in -finline-limit=X means 60ish seemed around what I sort of vaguely intuitively expected to be the cutoff limit for inlining before I’d really thought about it.
Full results including interesting cachegrind summaries which show better cache miss rates at in this calc document.