Plain Text Files from HathiTrust

hathiStudents have asked me how to download plain text (txt) versions of HathiTrust books for use with third party qualitative analysis software, like HyperRESEARCH.

HathiTrust allows users to search inside the OCR’d fulltext of scanned books that are both in and out of copyright.

Books that are out of copyright (in the public domain or licensed by creative commons) can also be downloaded, thanks to BC’s partnership in the organization.

To download pdf’s of “full view” (out of copyright) items, login with your BC credentials.

It is possible to download plain text versions as well, since the book images have already been OCR’d, though the process involves a few extra steps:

  1. Go to the mobile version of the HathiTrust site, and find the item you’d like to use as plain text (must be out of copyright)
  2. Login, if you haven’t already
  3. From the full view page, click the Get Book icon, and download the EPUB version
  4. Download and install Calibre
  5. Add your EPUB book to Calibre, and select Convert Books, which offers the option to convert to many formats including plain text