Comparing PDFs

JonathanRyan

Suspended / Banned
Messages
10,765
Name
Jonathan
Edit My Images
Yes
I have a bunch of documents. They run to about a dozen pages each and I want to check they are substantially the same (basically they are contracts and I want to make sure that although names, dates and amounts vary, the terms are the same - I'd rather not have to read every word of every contract to do this. It seems like a computer could do that for me).

I have them as PDFs. What's a good way to compare them on a Mac?

BTW I've tried using Nitro Cloud to convert them to DOCX but it just produces corrupted files......
 
I use Araxis Merge to check files (and directories), just tried it on 2 PDF I know are different (they are graphics files) and the output showed as gobbledegook but at least it showed they were different.
 
Thanks Matt.

Unfortunately I know the PDFs are different (they should be mainly boilerplate with a few customisations). Unless Araxis lets me exclude areas then I'd need something that showed me the differences so I can check that only the bits I expect to vary are different.
 
Thanks.

PDF Content Comparer sits there for a while then shows me that the PDFs are different. When I ask more questions it shows me pages it thinks are the same (which very clearly aren't) and pages it thinks are different (which may or may not be).

I think it's possible the PDFs were created in a stupid way which makes them just graphic images.

Trying to download a trial of Acrobat. But since Adobe let all my data get stolen they are now insisting I change my password and it's taking a while.....
 
Sadly, a password reset, 2GB of downloads and several license agreements later, even Acrobat can't tell me where the differences are. It says page 2 in document 1 has been replaced by page 2 in document 2 but that's it (in fact there's a small text change half way down). I guess they were scanned files and Acrobat can't figure out what the text says.

Thanks for the suggestions - a parallel read is likely to be my fastest solution. It's unlikely there are differences but it's one of those due diligence things. I'd feel really dumb if page 11 of copy 4 said they owned my immortal soul.
 
The only other thing I can think of is export the text to Word files and do a word count on each one. At least it'll indicate possible errors by telling you if some files have more words than others - presumably they should all have the same number (apart from specifics such as names & addresses).
 
If the contracts are important, the good old Mk 1 eyeball is probably the best tool...
 
If they are true Pdfs, rather than scans then you should be able to open the pdf documents in MS Word and then do a document compare.
 
If they are true Pdfs, rather than scans then you should be able to open the pdf documents in MS Word and then do a document compare.

Trying to open a pdf file in Word just produces gobbledegook.
 
Trying to open a pdf file in Word just produces gobbledegook.
That depends on the pdf type and how you open it. If you first open Word then drag your pdf into it it will usually display the contents correctly - it won't if you' open as' Word. It does for me with Word 2007. I also have adobe pro installed though so I'm not sure if that has anything to do with it.
 
That depends on the pdf type and how you open it. If you first open Word then drag your pdf into it it will usually display the contents correctly - it won't if you' open as' Word. It does for me with Word 2007. I also have adobe pro installed though so I'm not sure if that has anything to do with it.

The pdf I tried was a Nikon user manual.

I opened Word 2010 first, then File menu>Open>Nikon manual - that produced gobbledegook (2nd pic below). If I open Word 2010 and literally drag the Nikon manual pdf into it I get this:

2239-1389874788-3c30dd7261cf3206a12bbdc8c054c6af.jpg


2240-1389875072-f23ec8d650e50f4deb380afce2ca1276.jpg
 
Last edited:
Back
Top