Friday, March 28, 2008

LaTeX and a Strategy for Overcoming the Learning Curve

In October 2007 I made the decision to start writing up my dissertation, and the platform I chose was LaTeX. Franky, I chose LaTeX because the quality of the documents produced puts other products to shame! Specifically, my only other choice was MS Word (or equivalent), which has treated me well, but from initial testing was clearly not up to the task (master documents suck!). Reading up on popular news sites at the time also provided me with further evidence. Presentation in the thesis is everything. It represents the pinnacle of 3+ years of research and is the primary method by which the work produced during the PhD is assessed. The decision to use LaTeX was made after checking out a number of thesis and papers produced by the platform, after which my primary aim was to address the single pain point claimed by others that had made the decision: the learning curve. I documented the progress of the strategy I adopted (links and opinions), which involved two main thrusts: (1) reading seminal LaTeX literature, and (2) learn by doing.

I started out by reading up on LaTeX and TeX in Wikipedia, the result of which suggested to me that there was plenty of free documentation on the web to exploit (voiding the need to hit the library), and that getting into TeX was not necessary. The first two big problems I want to address were the choice of an IDE, and the specifics of reference migration. Googleing revealed the popularity of the MiKTeX distribution, reinforced by the longevity and amount of downloads on the projects statistics. Reading up on popular LaTeX IDE's lead me to TeXnicCenter, which also had convincing project statistics. Googling on these two packages in combination supported their compatibility and combined popularity. All of my references were stored in ProCite, which I learned I wanted to migrate to BibTeX format. I chose JabRef to maintain my references (in particular the WebStart version for automatic deployment of updates). Reference migration was a pain, I exported from ProCite, imported into EndNote, then exported from Endnote and imported into JabRef using this guide (RI format). Some data cleaning in JabRef was required (I had ~1000 entries at that time), although overall the migration was pretty smooth.

After I was up and running, I educated myself on the basics of getting things done in LaTeX. I printed a copy of the LaTeX Cheat Sheet and memorised as much as I could. I downloaded the introductory guides from the Latex Project, and read through them end-to-end. Specifically, LaTeX: an introduction and The (Not So) Short Introduction to LaTeX2e. I realised I was going to have two main problems: graphics and tables. I flicked through Using Imported Graphics in LaTeX2e to get up on graphics, and Tables in LaTeX: packages and methods to get up on tables. Regarding tables, I chose to prepare my data in MS Excel, then use LaTable to convert the CSV into a LaTeX table. Regarding graphics, I had already created many images in MS Word. I copied these from MS Word into MS Visio and exported as WMF. I them imported them into TPX and exported as EPS and included them using the graphics package into my documents. Regarding images that were in JPEG, I used InkScape to convert them to EPS for usage in my documents. Graphics were the weakest part in my tool chain, and looking back on the monotony, I should have invested more time into this area (heed my advice!). A final area I invested time early was algorithm representation (pseudo code). I looked at a lot of examples, and came across Algorithm2e which I adopted because I thought it looked the best, especially after further tweaking.

I wanted to use an off-the-shelf thesis template. Searching revealed a number, although I settled on an integration of three specific templates from around the web: here, here, and here. The most important principle was separation, specifically the main thesis file displays nothing, each chapter (includes appendices, front, back matter) is maintained in a separate file. This allows a lot of control over the working set (using \includeonly) for compiling and drafting. I also modified the template based on my schools requirements (Swinburne), and documented all of these specifications in the template to continually remind me. I managed to find lots of tips and tricks all over the web for both LaTeX (wikibook, tutorials), and TeXnicCenter. I used an Australian dictionary file which I continually added to with domain specific words. I had a lot of problems with too many words, which I measured using the excellent web-based LaTeX word count script. Going forward, I plan on pushing my LaTeX out to the web as a template and thesis source to help future "other me's". Regarding IDE's I have played with TeXlipse which I have found to be quite good, although not as mature as TeXnicCenter (yet). Specifically, I love the Eclipse platform, and all the cool tools that come with it, so I would love to see this variant of the tool rise up and take over, just like Eclipse did with Java IDE's.

The general strategy I adopted was the same strategy I used back in the day when consulting in getting productive on a new technology as fast as possible. Generally the procedure used in this case was as follows:

  1. Preliminary Reading: This step is all about coming to terms with the the capability and placement of the domain in the context of the problem to be solved. What is it, how does it help me?
  2. Environment: All about getting a standard operating environment up and running as quick as possible. Importantly, there needs to be some level of trust in the tools used.
  3. Detail Introductory Reading: This step is about coming to terms the principles of the technology. Specifically, the details of how to use it, best practices for completing common tasks, so on. This foundational education is critical for future trouble shooting.
  4. Project Structure: The skeleton design of the scope of the project using the selected tools and initial application of acquired best practices. This includes a well structured project directory, naming conventions, and initial content. This structure houses all future work on the project until completion.
  5. Learn While Doing: Do the work, and acquire specific details as required.
I was working on the my thesis (step 5) within a day, and had a stable tool chain and skeleton within about a work-week. The strategy is aggressive, top-down, and productivity focused, although the general outline has served me very well in the past. In addition, I usually document my progress (as demonstrated in this post) so that I can track what works and what does not in terms of tools, resources, and even methodological tweaks. The process results in a functional understanding of the domain, in that you can get things done, although the top-down method of information acquisition means there is a distinct lack of theoretical understanding, which must be sought out after the fact. In the case of the application of this process to programming technologies, prior experience fills in this gap. In the case of LaTeX, a programming background helps, as you can think about the system as a declarative language, full of sweet macro's, with one tough interpreter, a perspective which helped a lot when debugging document errors.

Importantly, the outlined 'learning by doing' strategy works. It has given me a functional understanding of LaTeX and a coherent written-up dissertation (most of the research was complete prior to the switch) in about six months. For anyone out there on the fence, my advice is: if the document matters to you and you want it to look professional, make the switch and get on with it!

2 comments:

Jason said...

I came across this guide which will be helpful to anyone looking to get up and running fast: Writing a thesis with LaTeX.

Filip said...

Great tip when you don't like to "code" latex: LyX, a what-you-see-is-what-you-mean-editor which is great for basic markup and processing. Available for Linux and Windows..