I believe that Jedediah Hotchkiss' Civil War sketchbook would make for interesting reading. While this work is publish on the Library of Congress's (LoC) website, at 224 pages I wanted a more convenient way of reading the document than looking through an online image gallery.
Here's how I arrived at a single PDF file that contained all 225 pages of Jed's personal sketchbook.
Step 1: I viewed the source of the LoC page and noted a rel="alterantive" link tag.
Step 2: curling this URL returned back a wealth of interesting information:
$ curl -s 'https://www.loc.gov/item/2005625258/?fo=json' | jq . { "articles_and_essays": [ { "site": [ "lcweb" ], "contributor": [ "potter, abbey" ], "original-format": [ "web page" ], "partof": [ ...
Step 3: rather than reading the details of this JSON format, I poked around until I found this critical block:
"resources": [ { "files": 117, "captions": "http://cdn.loc.gov/service/gmd/gmd388m/g3880m/g3880m/gcwh0001/captions.txt", "image": "http://cdn.loc.gov/service/gmd/gmd388m/g3880m/g3880m/gcwh0001/ca000001.gif", "url": "http://www.loc.gov/resource/g3880m.gcwh0001/" } ],
Step 4: between curl and my browser, I was able to write the following code which pulls down all 117 images associated with this LoC entry:
#!/bin/bash ## ## Grab content from the library of congress ## ## For example: ## locget 'https://www.loc.gov/resource/g3880m.gcwh0001/?c=200&fo=json&st=slideshow' ## usage() { echo "`basename $0` {gallery-url}" exit 1 } if [ -z "$1" ]; then usage fi resource_url="$1" captions_url=$(curl -s $resource_url | jq -r '.resources[0].captions') image_url=$(curl -s $resource_url | jq -r '.resources[0].image') path=$(dirname $image_url | sed -e 's|http://cdn.loc.gov/||' \ -e 's|/|:|g') curl -s $captions_url | while read row ; do file=`echo $row | cut -f 3 -d ' '` if [ -n "$file" ] ; then curl -s "http://tile.loc.gov/image-services/iiif/$path:$file/full/pct:100/0/default.jpg" > $file.jpg fi done
Note the call to tile.loc.gov to pick up the image files. By setting pct:100, I'm able to request full size images. It's also possible provide a value like pct:50 to pick up images that are half size.
Step 5: with step 4 complete, I had a full set of images locally. However, each image contains both a left and right hand page. To split the pages into separate files, I used my good friend ImageMagick:
$ mkdir pages $ cd pages $ for f in ../*.jpg ; \ do echo $f ; convert -crop 50%x100% +repage $f `basename $f` ; \ done
Step 6: Finally, I created a single (massive) PDF file by running the command:
$ convert *.jpg master.pdf
You can download the generated PDF here.
And here's a few screenshots of me scrolling throw Jed's sketchbooks on my Galaxy S9+:
The formatting isn't perfect, and the PDF file is massive. But still, I'm able to scroll through the pages with ease, and I can view detail by simply zooming in.
If I had a horse, I could peruse the content from the same perspective Jedediah created it. Though, even I admit that's probably excessive.
No comments:
Post a Comment