Archive for May, 2007

Writing Academic Papers using Latex and Subversion (Part 1)

Wednesday, May 23rd, 2007

Many academics in Computer Science, Mathematics and Engineering use LaTeX as a document typesetting tool for technical papers (and more besides). Indeed, many publication outlets in these areas require the use of LaTeX for document submission. LaTeX has many advantages for document preparation (and many disadvantages) which I don’t delve into here. For our purposes the key advantage of LaTeX is that the files are plain-text. This allows us to use existing source control tools to control LaTeX files.

I want to use source control for several reasons

  1. to allow me to make small changes to a document over time,
  2. to allow our research group to collaborate on a single document without clobbering each others changes,
  3. to easily share the document between my work machine, my laptop and my home machine, and
  4. to occasionally use advanced features such as branching

. I’m a software developer so source control is second nature to me. I’ve not used source control on academic writing projects with non-software developers. But I intend to soon. So the rest of this post is a strange mix of the general, and stuff specific to my situation.

First let me explain the use of source control. An administrator (me in this case) creates a repository for the project. You can think of the repository as a library where you put successive revisions of a document, or multiple documents. Scaling-up slightly we can store a directory and all its sub directories in a repository. The source control tool acts as the librarian between you and the repository. You ask the librarian to get you the latest version of the project.

Getting a bit specific I suggest using the Subversion source control tool, called ‘svn’. Windows users can use Tortoise-SVN which is a version of the Subversion librarian which integrates well with Windows Explorer. Linux users generally have a version which integrates with their tool of choice. Mac users are generally too busy drinking Mocha-chino-lattes to worry about this kind of thing. I further suggest you tell your administrator to make the Subversion repository (i.e. the library) available via SSH (or at least it will be in my case….).

Using the Subversion tool you check-out a revision and commit changes. So assume three authors (Aidan, Andrew and Gem) are working on a document. All three check out the first revision (revision 1). Aidan makes some changes, and commits them to the repository. At this stage Aidan holds version 2 of the document but both Andrew and Gem hold version 1. Andrew and Gem should update their current working-copy to the latest version.

So. You may check-out a revision from a repository. The repository normally has an address such as ssh+svn://csdev.it.brighton.ac.uk//home/vmg/VLC07 where VLC07 is the conference you’re working on the paper for. If you check-out the latest revision, the Subversion tool creates the directory structure on your hard drive and copies the files from the repository to your hard-drive. The working-copy is the copy from the repository that you are currently doing work on. You work by making changes to your working-copy. You then commit these changes to the repository. Generally, when you’re submitting changes, you will be asked to add an entry into the ChangeLog. This is great. A ChangeLog allows all authors to keep track on what is happening in the development of the document. For example

————————————————————————

r18 | kre | 2007-05-23 13:19:48 +0100 (Wed, 23 May 2007) | 2 lines

* reverted \url{} change

————————————————————————

r17 | kre | 2007-05-23 13:12:41 +0100 (Wed, 23 May 2007) | 3 lines

* Changed howpublished URLs in bibliography to use \url{}

* Added paragraph breaks in Conclusion

We can see that in revision 12 the user kre made the changes listed in the bullet points. And in revision 18 she reverted (removed) one of the changes (she should’ve put in a reason, but I typed it for her…so blame me). Given that other authors work on the same document, they can now read all the reasons for the changes that were made.

What happens, say, if Aidan checks out, makes a change and commits such that the repository is at version 2. But Gem makes a change from version 1 of the document and tries to check it in? In that case we get a conflict. The normal way to avoid this is to do the modify-update-commit dance.

Conflicts, merging and using advanced features such as branching will be covered in Part 2 (due before Duke Nukem Forever) of this post.

To address the points I raised above:

How does source control allow me to make small, iterative changes to a document? When I make a single change, eg: rephrase a paragraph or add a theorem or spell check the document, I can check it in with a reason. In this way I can build up a story of how my document has unfolded. The tool is there to provide me with the support to build up my document by heaping small change on top of small change.

How does source control allow me to sync up my home machine, work machine and laptop? Well I simply update the machine I’m working on to the latest version of the source. If I’m leaving work I have been known to do a change with the reason “Aidan is leaving work now and intends to continue rewriting section 2 when he gets home”. Though this is not the best practice, it works for me ™.

As I’ve said. I’ll explain branching soon and, more importantly, why one might use branching. In a nutshell: Say you were writing a paper with a colleague but you disagreed with some major component. Instead of clobbering their work and changing the paper on them, you can create a parallel branch which mirrors what they have except it contains your changes. Then both of you can have a coffee and argue about it. In my case I can simply be wrong about it and agree that the original way was best (sorry kre :) ).

Helpful pointers:

  1. Write your LaTeX files one sentence per line.
  2. When drafting a document, print the revision number in the title use: $Revision:$ and propset the file (explained in Part 3).