Thursday, October 16, 2008

Let me admit before throwing tantrums at pdftoxml that i did not personally use this product but just provided some technical consultancy for a team that used the product
Their requirement was simple and straight forward - convert pdf to xml. Imagine this, the requirement is just interspersing two spaces between the product's name pdf to xml. And that, this product could not do properly.
We had a pdf that was 25 pages long (it is a report so had tables in it) and used pdftoxml to convert it. The product first failed ostensibly because of the size of the file. So we splitted the file to multiple files and tried again now it worked.
But wait, the trouble with the product did not end there. Our original aim was to convert the entire pdf to xml - we realised that it could be done in smaller chunks, so we went ahead and wrote a batch file with multiple pdftoxml calls, for each of the smaller chunks.
As partly expected, pdftoxml started to hang intermittently after a few statement execution within the batch. This most probably might be because the program has some memory leak or could be because it continues to hold reference to pdf or whatever after paying 299$ for a product you don't have the energy to figure out why it did not work.





No comments: