Problem – you have a directory full of epub files with names like 1234.epub and 4567.epub files, and what you’d really like to have is all those files organised into directories by author name, and the filenames themselves being of the form “Author – Title”.
Solution – first, install Calibre, because it comes with a handy command-line tool called ebook-meta that spits out all the metadata from a file. Then, from the directory where the files are, run this:
#!/usr/bin/python import os, shutil, subprocess files = os.listdir('.') for filename in files: if filename.endswith('.epub'): print "Processing " + filename # Use Calibre's ebook-meta to extract the metadata... p = subprocess.Popen(['ebook-meta', filename], stdout=subprocess.PIPE) output = p.communicate()[0] meta = {} for line in output.splitlines(): index = line.find(' : ') if index != -1: i2 = line.find(' ') meta[line[:i2]] = line[index + 3:] if not meta.has_key('Author(s)') or not meta.has_key('Title'): print "...missing some data in " + filename + " - skipping" continue # We just want the author name, not the sortable stuff at the end... author = meta['Author(s)'] index = author.find(' [') if index != -1: author = author[:index] title = meta['Title'] if not os.path.exists(author): os.mkdir(author) destfile = os.path.join(author, author + ' - ' + title + '.epub') shutil.move(filename, destfile) print "...moved to " + destfile |
So, for example, 973.epub gets moved to Peter Watts/Peter Watts – Behemoth.epub. Files that can’t be parsed should hopefully be left alone. This should also be able to deal with lots of other ebook formats, if you change the two instances of ‘epub’ in the script to something else.
Disclaimer: If you find this useful, that’s great – if accidentally deletes all the files on your system, don’t come crying to me.
-
This is all very interesting, but where’s Books (Chapter 7)? It’s been 10 MONTHS since Chapter 6!

2 comments
Comments feed for this article
Trackback link: http://ciarang.com/posts/batch-re-naming-epub-files/trackback