Sunday, June 07, 2009

PDF bookmarks

This isn't a programming post but if you are a programmer, you've probably had to read a PDF based ebook or two. Everyone has had to open a PDF file at one time or another right? You click on a link on a web page which turns out to be a PDF and if you're lucky, Adobe's Acrobat Reader pops up and in a few short moments, you are reading the material you were after. Far too often however you are stuck waiting for a multi-megabyte file to download and then when it finally finishes, Acrobat throws up a baffling dialog with an obscure list of things that it thinks are vital for you to download immediately. It really throws off your rhythm.

It's even worse if you don't have Acrobat installed. It's an ever changing gauntlet to finally get to a page that actually has a link to download the software you need to read what you just clicked. And acrobat just keeps growing! It's up to about 41 MB now and seems to want to dig its grimy little fingers into every part of your system. If it ever finishes installing you will probably be told to restart your browser. Eventually you'll have your browser running again and you'll be staring at the Google home page saying to yourself "Where the hell was I and what was I trying to read?!" and "What the hell are all these new icons on my desktop?"

It's experiences like this that really make me hate PDF files. But if you want to read many of the ebooks that are out there, you don't have much choice. I've found myself reading more and more PDF based books so I can't avoid them anymore but I've made a couple of startling discoveries recently that I felt like sharing.

First, Adobe Acrobat Reader isn't the only free reader available for Windows. What? Yes! It's true. There are probably more than these and I haven't rigorously tested them but here they are in order from big, feature loaded to light and fast:
Now the first two are commercial and allow you to get extra features in exchange for cash but the free versions are more than adequate. You may encounter some of the same kind of nagging you get from Acrobat but it should be much less. Sumatra is free and open source.

After spending an hour or more scouring the internet for these Acrobat alternatives I realized that I should have started at this Wikepedia page of everything PDF related. Oh well.

Where was I? Oh yes, the second realization. None of these 'book' readers can do the one basic, fundamental thing that you can do with any crappy, worn out, dead-tree book on the planet.. stick some reasonably flat object between a couple of pages and walk away, knowing that when you get back, you can resume reading right where you left off. PDF readers have no bookmark feature!

When I realized I didn't want to read my 243 page book in one sitting, I went searching through the menus in Acrobat looking for the bookmark command. When I finally did find something with the word 'bookmark' on it, it turned out to be hardwired into the PDF file and only if the authors decided to provide some. Hmm, a bunch of bookmarks laid out in sections that mark all the key sections of the book... like, oh I don't know.. a table of contents! Ok, so Adobe thinks a table of contents should be called bookmarks. Well, in a world of internet browsers that's not confusing at all. Geez.

The other menu items in Acrobat seem to hint at the ability to edit or mark up a PDF file but that must be some of the functionality they want you to pay for. This is essentially what started me looking for an alternative to Acrobat. Sumatra looked nice and light but it is only a reader pure and simple, no help there. Foxit however looked much more promising.

I wasn't really keen on editing the PDF itself but that seemed like the only option. It turns out there are alot of ways to edit a PDF that all sound slightly similar. I'm sure this is great for publishers but it's way too much for somebody just looking for a way to mark their position in a book.

One thing that looked promising was the ability to add what looked like little post-it notes. You could place them anywhere on a page and they could hold all kinds of text. Unfortunately, the only way to navigate to a note was to page through the PDF file one page at a time until you found the note. There was no way to navigate directly to a note! Unbelievable... back to Google.

After some time, I stumbled on a link to a book called PDF Hacks. There are alot of tips and tricks there, most of which you need to buy the book to see but down the list at #15 was Bookmark PDF Pages in Reader. It seems that Acrobat can can run Javascript and this guy wrote some Javascript that will add the ability to bookmark a page. Best of all, it doesn't modify the PDF file to do it. It is stored with your application settings for Acrobat instead. The next time you open that PDF file, you can go to any of the bookmarks you created for it. Excellent!

This is what I've been looking for. Unfortunately I have to go back to Adobe Acrobat Reader to use it but I can live with that. If you can live with it too, here's how you do it.
  1. Go to this page and download the zip file that is linked there.
  2. Unzip that file. You should end up with a file named "bookmark_page.js".
  3. Copy that file to the "Javascripts" directory for your copy of Acrobat. You will typically find that directory at "C:\Program Files\Adobe\Reader 9.0\Reader\Javascripts", or something like that.
  4. If Acrobat is running, close it and then run it again and open your favorite PDF file. You should now have some bookmark commands under the "view" menu as shown in the following image..

That should do it. Enjoy!

It would seem that the javascript hack above was written before Adobe made some changes to security settings that affect javascript. This may manifest as an error dialog stating there was an internal error. I don't know the Acrobat API or javascript enough to troubleshoot the problem but a work around is to go to the 'Edit' menu in Acrobat and open 'Preferences'. Find the 'Javascript' settings and uncheck 'Enable global object security policy'. I also don't know exactly what implications that change brings either so use with caution.

