Perl Script to compare overlap between two simple datasets

LibraryThing recently released an xml file containing all the ISBN numbers they have in their system. They suggested that libraries might want to run a comparison on the isbns they have in their system against what LibraryThing has.

See the blog post New feed: Compare your library with LibraryThing for more details.

I decided to spend a little bit of time to see if I could do a simple comparison. It seems sensible to share my code so that others that want to try this don't have to duplicate my effort. It might also be useful for other applications where you need to compare the overlap of two datasets. Currently the code just assumes each line of the file is an ISBN (or the value you want to make the comparison on). This means that the format of the list from Library thing needs to be adjusted. I have also included my script to do that conversion.

These scripts are the result of only an hour (or so)'s work so if you find any obvious bugs, or have suggestions for other things this could do. Please feel free to contact me (j.brunskill@waikato.ac.nz).

Perl Scripts:

See: Our blog (Library Cogs) for details on our comparison results.


This page was written by James Brunskill (www.jambe.co.nz) in Feb 2007

Note: The code linked to from this page was developed while James was working for the University of Waikato Library. If for any reason the university would like this page and associated scripts to be removed or moved to a different location, I reserve the right to do so.