He who attains his ideal, precisely thereby surpasses it

In need of IT software professionals? See Aperte!
Nederlands en een software professional nodig? Aperte!

PDF fontfilter

Posted by alextreme on Tue, 06/05/2008 - 14:26 :: Aperte | Linux | Personal

Been rather busy with a number of projects for clients these last few weeks, but now that quite a few of them have been launched I can sit back and focus on some interesting side-projects.

For two projects I've written a PDF report generator that uses a combination of dompdf and pdftk to output very nice CSS-styled reports from generated HTML. Dompdf is quite nice, even though it does take its time to do more complex HTML-to-PDF conversions. Pdftk is used as dompdf can't handle more than a few pages at a time, it seems (lots of OOM issues).

A rather annoying problem however is that pdftk, when concatenating PDF files, doesn't remove duplicate embedded font objects. This isn't an issue when you're talking about a couple of pages, but these reports can easily get to 50+ pages due to a multitude of fonts easily hit the 10MB mark.

Instead of hacking dompdf or pdftk, I instead wrote a small pdf-fontfilter script (you can find it on my projects-page) in python to remove the duplicate font-objects from the PDF. Using this small script, those 10MB PDF files are easily reduced to about 700KB, without any further modifications and with the same visual result.

Although using the Adobe PDF reference manual made things a lot easier, it is plain that PDF is a document format that stems from a couple of decades ago. PDF files are generated to be as efficient as possible for the reader to parse, but this does make it more interesting for those wanting to generate or modify PDF files. Especially the cross-reference section at the end of every PDF document shows this.

Another interesting tidbit: xpdf happily works with badly-formatted PDF files that Adobe Reader chokes on. With all the bloat in Adobe Reader, you'd think that they would be able to do their best to fix broken PDF documents...

IBM internship report online

Posted by alextreme on Sun, 20/04/2008 - 14:38 :: IBM | Studies

Last week I had a final discussion for my internship. My results weren't uncontroversial, to say it mildly, but I hope to have woken up some people with my 'fair and balanced' criticism.

You can find my thesis/report on my docs-page, together with a link to the dutchgrid presentation (slides and video) I held a few weeks ago. I've also placed my VGFS-code online (see projects) but do check out the README for a big fat disclaimer.

If all goes well this will have been my last post tagged with IBM and Studies. I've been mulling about a few new projects to kick off, but more on those later. It's time to get productive again.

Back from DIME

Posted by alextreme on Thu, 10/04/2008 - 18:07 :: Aperte | Linux | Morphix

Those two days in Strasbourg for the DIME conference were over before I knew it. The presentation on wednesday-morning went well. I'm always much too nervous before a presentation, but before I knew it I was all out of the Alka-linux CDs I made for handing out. All in all the reception of our project was fine, but there's still enough to improve on.

Giulio and the other Italians had other presentations too, but it was good to meet and get to know each other. Although I live and breathe by my email like you probably do, it is a poor means of communication compared to discussing things face to face: Nothing beats the terror you see in the face of an Italian when you ask for a cappuccino after noon :)

In the Alka-linux groove...

Posted by alextreme on Sat, 05/04/2008 - 14:28 :: Aperte | Linux | Personal

Well, the presentation last Wednesday went pretty well. I was quite stressed and hadn't slept well the night before, but once on stage it all went automatically. My message came across and wasn't thought of lightly, so at least I have the illusion that my internship wasn't for nothing :) Have also handed in the last (really this time!) version of my internship report / CS master thesis. All in all everyone's satisfied, thus I am too.

Currently I'm hard at work on the first version of Alka-linux for the DIME conference starting next tuesday. I've been hard at work on getting Morphix back up to shape, but the problem with 'maintaining' a distribution is that it's a never-ending story: there's always something, somewhere that won't work for someone. Perfection is not an option, thus what remains is just that you give it your best (or make it work as well as possible for you).

After DIME I have a couple of projects left to finish, but nothing I can't handle. Over the next half year I'll be focusing only on Aperte (and any open source projects that gain my interest), after that and a nice quiet summer it's going to be time for a couple of choices.

I know one thing though: I don't envy Gandalfar after meeting with him yesterday at The Next Web in Amsterdam. In between Zemanta's release last week and twittering his lunch I don't think he gets a lot of sleep. If you're reading this Jure: shut down your laptop and get some rest. If you don't, I know I can contact your parents via Peter...

Gridforum Presentation today

Posted by alextreme on Wed, 02/04/2008 - 07:16 :: IBM | Personal | Studies

Even though my internship at IBM is officially over, there's always the matter of distributing your findings. For my master in AI that was the IEEE CEC conference in Singapore last year, but as I've given my master in CS a more practical twist via IBM it seems fitting that I'll be presenting my results at the business gridforum day today. The balance between technical and high-level content is a difficult one, but I'll give it my best shot.

Tomorrow Jure is giving a demo in Amsterdam (probably The Next Web conference, which I won't be going to given the EUR 750,- entrance fee). Let's hope for some nice weather and a couple of beers so he can finally explain the Zemanta masterplan.

Life as a entrepreneur is bliss.

Posted by alextreme on Sun, 16/03/2008 - 13:23 :: Aperte | Morphix | Personal | Studies

I kid of course. There are many loose ends that still have to be tied up before I can concentrate full-time on Aperte, but the last few weeks have been nice and relaxed... something I haven't been able to say in quite a while.

Which is not to say that I'm getting bored. I've been able to pick up the pace for my clients, which has thankfully lead to even more work on some new fronts. For Alka-linux, a Morphix-based distribution for economists, I've been able to develop a few small tools to improve Morphix in general. I'll be presenting the initial version of Alka on the 9th of April on the upcoming DIME conference. I'm especially satisfied about the design work that has been done, which I gladly let Jure's brother take care of.

Over the next couple of weeks I hope to finish my internship report and round up the requirements for my CS diploma. For the short term I've got enough to do but I'm considering various forms of marketing to promote my little software business. My lack of focus is probably my biggest drawback, but I enjoy working on widely different projects. Jack-of-all-trades...