Ben's Journal: February 2023

Tuesday, February 28, 2023

Finally: USB Thumbdrive Access from within Termux

One maddening exercise in Termux is inserting a USB Thumb drive into your phone: it will be recognized by Android, but invisible to Termux.

See, My Files on my S22 Ultra recognizes the USB reader:

Termux, despite re-executing termux-setup-storage, doesn't.

Grrrrr.

The usual suggestion to overcome this shortcoming is to root your phone; something I'm not ready to do (yet). Thankfully, after years of running into this issue, I finally found an easy, no-root, work around. Here's the fix in action:

In the above screenshot, you can see I'm interacting with the USB thumdrive via the command fsops and the path /mnt/media_rw/B601-0C66.

fsops is my own creation, which you can find on github. It supports listing, copying, cat'ing and removing files. It plays nice with other bash commands so you can use it as part of one-liners or scripts in general.

fsops works because of two truths. First, Tasker, like my phone's file browser, can see USB drives. When I launch a File Select action in Tasker, it let's me click on the icon in the bottom right corner to browse the USB drive:

Incidetanlly, that's how I learned the magic path /mnt/media_rw/B601-0C66.

Next, it's possible to invoke Tasker from Termux's command line (including from within a proot'd Linux instance). This post explains how. The mechanism in use here is Android intents. You can set up a Tasker profile to listen for net.dinglish.tasker.Xyz and then invoke:

am broadcast --user -a net.dinglish.tasker.Xyz

To call the Xyz profile. fsops invokes tasker with the following parameters:

am broadcast --user 0 -a net.dinglish.tasker.FileSystemOperation \
	-e operation "$op" \
	-e outcome "$outcome" \
        -e error "$error" \
	-e src "$src" \
	-e dest "$dest" > /dev/null

op is set to the operation I want to perform and the remaining variables are set to files that will either be set by the user or filled in by fsops.

am broadcast delivers data to Tasker, which invokes my file system operation of choice. am broadcast makes no promises about when the task will be finished. To overcome this limitation, fsops passes a path in the variable $outcome and polls the existence of this file to detect when the task is done. While a bit on the brute force side of things, it does work well.

You can grab fops here, and the Tasker resources here. Once set up, that thumbdrive will finally be available in Termux. Whoo!

Friday, February 24, 2023

Termux::API Commands From a Prooted Ubuntu Instance

You can trivially run an entire Linux instance on your Android device by installing Termux, and then using proot-distro to install Ubuntu. The process sounds complex, but all the heavy lifting has been done by others. You don't need to buy anything, root your phone or take any other unusual steps. It's truly amazing.

One quirk of this setup is that once I launch Ubuntu, I lose access to Termux::API's Android specific commands. These commands let you interact with your phone from the command line in novel ways, like sending text messages or using the built in camera to snap photos.

For example, from within Termux I can ask what my phone's volume is set to:

Entering this command at an Ubuntu prompt gives a 'command not found' message. To which I say, of course it's not found; Ubuntu's running under Termux, so it doesn't know anything about Termux.

My usual work around for this is to launch two terminal instances within Termux: one running Ubuntu, the other standard Termux. But having two instances is clunky, and more often than not I don't make use of the Termux::API commands.

Today, I finally Googled around to see if anyone else has had this problem. Of course others have, and this discussion suggested that accessing Termux commands from within Ubuntu was doable. The answer hinged on file systems being properly mounted, something that sounded complex. Still, I hoped for the best and checked if /data/data/com.termux was available from within Linux.

It was! Under that directory were all the files that are found in Termux. I then added the following to my .bashrc on Ubuntu:

    export PATH=$PATH:/data/data/com.termux/files/usr/bin/

I reloaded my bash environment and to my surprise and joy, all the Termux commands, as well as my Termux home directory files, were now available to me.

What a great example of a failure of imagination: I assumed these commands weren't available to me, so they weren't.

Wednesday, February 22, 2023

Review: Here Goes Nothing

As soon as I clicked 'borrow' in Libby, I congratulated myself for deciding to listen to Eamon McGath's Here Goes Nothing: A Novel. A glance at the book's blurb told me that I'd be getting a glimpse into life on the road with a touring band. I'm always curious how people deal with the grind of travel, and a band that needs to perform night after night, would surely have useful insights for dealing with these challenges.

Within a minute of listening to the text, I had to reset my expectations. This wasn't an organized narrative of a band's adventures; this was a glimpse into chaotic exploits held together by alcohol, drugs, a reckless disregard of life, property and social norms, and more alcohol.

Here Goes Nothing interweaves at least two touring experiences, one early in our protagonist's career, the other later. While there's clearly growth--the van gets fancier, the recklessness is dialed down a tad--not a whole lot changes. And that's probably the point: the life of a touring band is like many big investment, big payoff endeavors: long stretches of exhaustion, boredom and fatigue, punctuated by moments of immeasurable joy and excitement.

Compared to the plodding and predicable story I just finished, I found the confusion and disorientation of McGrath's tale to be a welcome change.

The audio version of Here Goes Nothing makes frequent use of musical playing over the narrator. While this can be disorienting, the effect it has on mood is substantial. Without fully appreciating it, the music can work with the story to build intense tension and then change to reinforce the relief that the text delivers. It certainly adds depth to the telling of the story, and seems more than appropriate in book about the pursuit of the perfect musical performance.

Ultimately, I got more than I bargained for in McGrath's story. I was hoping I might pick up some tips about navigating life on the road. What I found instead was the opportunity to wrestle with some of life's biggest questions: how far should one go to pursue one's passion? What happens if your passion is also your poison? Is chasing your glory days commitment to excellence, or a fool's errand?

It's easy for me to shake my head and say dude: just let it go. The drugs, the drinking, the wear on your body and mind--it's just not worth it. But I can see in McGrath's narrative the reflection of all great pursuits; they're hard but the pay off makes it worth chasing. So as much as I want the characters in Here Goes Nothing to retire, I can't help but encourage them: Rock On!

Monday, February 20, 2023

I've Got The Whole Commonwealth In My Hand | Crafting and Visualizing an Offline Map Repository

I want to create an offline library of USGS Topo maps that will serve as a navigational backup for my local area. Whether I find myself at a trail head without downloaded maps, or some cataclysmic event has disabled mobile data access, I want a reliable source of high quality maps. Using my usgsassist script and newly implemented map compression, I can easily grab and store large swaths of maps on a single SD card. So creating this repository should be a breeze.

The first question is, what's my local area? I'm located in Northern Virginia, and for this exercise I decided to cast a (very) wide net. I opted to grab all the topo maps from the northern boundary of Pennsylvania to the southern boundary of Virginia. That's almost 6 hours of driving in any direction. Surely that's overkill for my 'local area,' but having a broader area adds to the likelihood that this repository will save the day.

You can grab the latest version of usgsassist from github.

OK, let's build this!

Building the Library

# Grab a list of maps in the area of interest
$ usgsassist -a topos -l "PA, USA; VA, USA" > area.maps

# Spot check the list
# Looks like we got them all, from: Abilene to Zelienople.
$ wc -l area.maps
3363 area.maps
$ head -3 area.maps
Abilene, VA|2022-09-14|https://prd-tnm.s3.amazonaws.com/StagedProducts/Maps/USTopo/PDF/VA/VA_Abilene_20220914_TM_geo.pdf|-78.625|37.125|-78.5|37.25
Abingdon, VA|2022-09-16|https://prd-tnm.s3.amazonaws.com/StagedProducts/Maps/USTopo/PDF/VA/VA_Abingdon_20220916_TM_geo.pdf|-82|36.625|-81.875|36.75
Accomac, VA|2022-08-31|https://prd-tnm.s3.amazonaws.com/StagedProducts/Maps/USTopo/PDF/VA/VA_Accomac_20220831_TM_geo.pdf|-75.75|37.625|-75.625|37.75
$ tail -3 area.maps
Zanesville East, OH 2019|2019-12-02|https://prd-tnm.s3.amazonaws.com/StagedProducts/Maps/USTopo/PDF/OH/OH_Zanesville_East_20191202_TM_geo.pdf|-82|39.875|-81.875|40
Zanesville West, OH 2019|2019-12-03|https://prd-tnm.s3.amazonaws.com/StagedProducts/Maps/USTopo/PDF/OH/OH_Zanesville_West_20191203_TM_geo.pdf|-82.125|39.875|-82|40
Zelienople, PA 2019|2019-09-04|https://prd-tnm.s3.amazonaws.com/StagedProducts/Maps/USTopo/PDF/PA/PA_Zelienople_20190904_TM_geo.pdf|-80.25|40.75|-80.125|40.875

# Download and compress each map
# Download each map to map.pdf
# Then store the compressed map as the name provided by the USGS
$ cat area.maps | cut -d'|' -f3 | \
  while read url ; do \
    name=$(basename $url) ; \
    echo $name ; \
    curl -s $url > map.pdf ; \
    pdfimages -a remove -i map.pdf -o $name ; \
  done
VA_Abilene_20220914_TM_geo.pdf
VA_Abingdon_20220916_TM_geo.pdf
...

# Copy to an SD Card
$ sudo mount -t drvfs 'D:' /mnt/d
$ cp -Rv area/ /mnt/d/
'area/' -> '/mnt/d/area'
'area/NY_Port_Jervis_North_20190923_TM_geo.pdf' -> '/mnt/d/area/NY_Port_Jervis_North_20190923_TM_geo.pdf'
'area/WV_Porters_Falls_20191210_TM_geo.pdf' -> '/mnt/d/area/WV_Porters_Falls_20191210_TM_geo.pdf'
'area/WV_Romance_20191209_TM_geo.pdf' -> '/mnt/d/area/WV_Romance_20191209_TM_geo.pdf'
...

After a lengthy, but thankfully unattended downloading process, all 3,363 maps downloaded to my laptop. I was able to put these maps on a single 128GB SD card with space to spare. I confirmed that I can access the map files from my cell phone via an SD card reader:

So now what?

While it's comforting that I have this repository, there are issues with it. The most glaring one being: how would I actually use it? More specifically, how would I find the few maps of interest out of the 3,000+ that are on the SD card?

I'll be answering that question in a future post. For now, I want to tackle a simpler question: what area does this library cover?

I'm confident that the maps from 'northern Pennsylvania' to 'southern Virginia' has me covered, but what does that area include?

I searched for web services that would let me visualize this, and ultimately landed on a delightfully simple option: the Google Static Maps API.

This API is allows you to craft a URL that describes the map you're looking for and Google will serve up an image of said map. How have I never used this capability before?

I implemented usgsassist -a preview by creating a Static Maps URL that includes a path definition which outlines the area of interest in purple. The path coordinates are made up of the bounding box returned by usgsassist -a geobox. This sounds complex, but it turned out to be straightforward to implement.

Here's the URL generated by my map library's bounding box (minus my Google Maps key):

$ usgsassist -a preview -l "PA, USA; VA, USA" | sed 's/key=[^&]*/key=XXX/'
https://maps.googleapis.com/maps/api/staticmap?size=500x500&key=XXX&path=color:purple|42.5141658,-74.6895018|42.5141658,-83.675415|36.5407589,-83.675415|36.5407589,-74.6895018|42.5141658,-74.6895018

Here's the corresponding image Google generated:

Sweet, right? This URL can be trivially passed to curl to download the image. I've stored the image on the SD card with the maps to provide additional context.

$ curl -s $(usgsassist -a preview -l "PA, USA; VA, USA") > ~/dl/area.png

In general, this preview command takes the guess work out of figuring out what area usgsassist -a topos is going to include.

Next up I'll tackle the challenge of actually using this map library.

Tuesday, February 14, 2023

Review: The Cartographers

The Cartographers, by Peng Shepard, started strong. I really like the premise of a map that looks to be worthless, but is quite valuable. Unfortunately, the book lost its steam and eventually had too many plot holes for me to enjoy.

Warning: Spoilers Below

I was initially pulled into The Cartographers and was very much on board with the premise. Then the pacing and predictability of the characters left the story dragging.

I was willing to grant the author the magic needed to make phantom settlements real. But I was less impressed that every unexplained phenomena is just left, well, unexplained. Are all phantom settlements real when treated as such? How was scanning the Agloe map supposed to give Wally the ability to rewrite reality in any part of the world? What did Nell actually do to save her friends and Mom? These questions piled up, and Peng didn't seem to make any attempts to answer them.

I'll give you that Peng's premise for how the close nit cartographers managed to come unglued over the summer was deftly written. I could see how each of the characters could make the missteps they did, and that could realistically spiral out of control.

But where the text really lost me was the decisions that were made by Nell's parents, Daniel and Tam. Are we to believe the map that is Daniel's sole lifeline to his wife and child's mother is casually tossed into a box marked "junk" and left unattended? That's insane.

And then there's the 'junk box' incident itself. Not only was this excessively cruel behavior from a loving father, but it raised the very red flags that ultimately put himself and Nell in danger. That's not my guess; that's what the book explains. Wally only knew to suspect Daniel had the map when he made the rash decision to excommunicate his daughter from the cartography world.

Even more painfully, Daniel had an easy solution to the junk box incident: the truth. "Oh Nell, the box you found there isn't the mystery you think it is. It's the result of me, in the haze of my grief over the loss of Mom, putting a box from our summer project in the archives and never looking back. I'd forgotten that in it were these precious maps. Let's give them back to the sources they came from."

But the decision that really pushed me over the edge was when Tam and Daniel decided that they needed to fake Tam's death and she needed to remain in Agloe drawing its map. Wait, what?

You're telling me that you have the chance to undo the death of your child's parent, and you're like, 'nope, we've got a map to draw.' I haven't seen sociopath-level thinking like this since the Hunger Games Series.

Like the junk box incident, the character's decisions manufacture the very premise the characters are trying to avoid. Wally desperately wants to get back to Agloe. So what do Nell and Tam do? They keep him from Agloe. Had they simply shared the good news of Tam's survival with the group, there's little doubt Wally would have avoid becoming the obsessed monster that they ultimately create. Not to mention, you'd have spared your friends an unthinkable level of grief.

And even if Wally had insisted they keep Agloe a secret, so what? There are at least 3 other magical phantom settlements in the book, one of which is drawn on a business card. That underscores that Agloe is anything but unique. Give Agloe to Wally and move on.

I get that Peng wants to show us the power of maps. But her approach ultimately doesn't work for me: the character's challenges are just too self inflicted for me to root for them.

Monday, February 13, 2023

Trains, Bikes and Automobiles: Easy Smithsonian Fun

Living close to the National Mall it seems like we should be able to duck into the city, experience an exhibit at a world-class museum, and get out before we've spent a significant amount of time, money and sanity. In practice, traffic, crowds, parking woes, over-priced cafeterias, and countless other factors usually stand in the way of letting this perfect-on-paper plan work.

And yet, yesterday the stars aligned and we did just this with G. Between the awful weather and the fact that it was Superbowl Sunday, we managed to find close, free parking and a nearly empty Smithsonian Museum of American History. Our goal for the day was to check out the America On the Move exhibit. This exhibit features a variety of trains, cars, bikes, boats, buses and motorcycles; perfect for a 3 year old and his Uncle to gawk at.

We met Bud the bulldog who was part of the first cost-to-coast drive that took place in 1903, and took our seats in a 1959 Chicago 'L' train car. We ogled the cramped and loud engine room from the Oak and even caught glimpse of a model zeppelin flying above. The mighty PS-4, a locomotive from 1926 was a highlight, with it's simulated train noises and larger than life components. It was even painted in green, G's favorite color!

Mom later reported that along with all that we saw, G very much liked the escalator ride we took to get to the exhibit. In fairness, it was a very nice escalator.

After making our way through the exhibit, we ducked into the nearby On the Water exhibit. While this exhibit had some boats for us to check out, it wasn't nearly as captivating as the On The Move exhibit. G kept asking: where are the canons? Where, indeed.

We also ran out of time before we could play in the Wegmans' Wonderplace, but we'll definitely save that for another day.

After perusing the exhibits we stopped by the cafeteria to eat the snacks we'd smartly brought along. And then it was time to call it a day and head home.

We couldn't have asked for an easier or more fun day. I highly recommend America On The Move. If you do a bit of research ahead of time, you'll see what appears to be a random collection of transportation artifacts connect up in a meaningful way. But even if you go in cold, you'll be treated to some incredibly cool stuff. And because of its basement location and lack of name recognition, the exhibit has always been relatively crowd free whenever I've visited. It's definitely a winner.

Thursday, February 09, 2023

Same Map, Less Megs: Optimizing USGS Map File Size, Part 2

In my last post I convinced myself that a standard USGS topo map contains a heap of image data, that if I removed, would result in a smaller map file and no loss of functionality. So now it's time to make that happen.

My first attempt was to follow this recipe. The suggestion was to use PyMuPDF to redact every image in the document. While functionally promising, the result was a miss. First, the redaction process takes a significant amount of time given that there's over 170 images in a single map. More importantly, the redacted images are replaced with text that leak outside the 'Images' map layer. The result was a map covered in redaction text, which as you can imagine, was useless.

Looking at the PyMuPDF docs I found Page.delete_image. Apparently, I had been overthing this. It looked like the image removal process was going to be as simple as:

for page_num in range(len(doc)):
    for img in doc.get_page_images(page_num):
        page = doc[page_num];
        xref = img[0]
        page.delete_image(xref)

That is, for each page of the document, loop through every image on that page. For each of these images, call delete_image. Alas, when I tried this, delete_image triggered an error message:

File "/Users/ben/Library/Python/3.9/lib/python/site-packages/fitz/utils.py", line 255, in replace_image
if not doc.is_image(xref):
  AttributeError: 'Document' object has no attribute 'is_image'

Looking at the source code, the error message is right: Document doesn't have an is_image method on it. This looks like a bug in PyMuPDF.

Fortunately, what was broken about delete_image was a pre-check I didn't need. The code that does the work of removing the image appears to be functional, so I grabbed it and used it directly.

Deleting an image is now accomplished with this code:

for page_num in range(len(doc)):
    for img in doc.get_page_images(page_num):
        page = doc[page_num];
        xref = img[0]
        new_xref = page.insert_image(page.rect, pixmap=pix)
        doc.xref_copy(new_xref, xref)
        last_contents_xref = page.get_contents()[-1]
        doc.update_stream(last_contents_xref, b" ")

doc.save(output_file, deflate=True, garbage=3);

The doc.save arguments of deflate=True and garbage=3 ensure that space is reclaimed from the removed images.

Given my new found knowledge, I enhanced pdfimages to support -a remove, which removes all images in a PDF.

Here's my script in action:

# 4 freshly downloaded USGS Topo Maps
$ ls -lh *.pdf
-rw-------@ 1 ben  staff    53M Sep 23 00:20 VA_Bon_Air_20220920_TM_geo.pdf
-rw-------  1 ben  staff    56M Sep 17 00:17 VA_Chesterfield_20220908_TM_geo.pdf
-rw-------  1 ben  staff    48M Sep 23 00:21 VA_Drewrys_Bluff_20220920_TM_geo.pdf
-rw-------  1 ben  staff    48M Feb  8 08:05 VA_Drewrys_Bluff_20220920_TM_geo.with_images.pdf
-rw-------  1 ben  staff    51M Sep 23 00:22 VA_Richmond_20220920_TM_geo.pdf

# Remove their images
$ for pdf in *.pdf; do \
    pdfimages -a remove -i $pdf -o compressed/$pdf ; \
  done

# And we're smaller! From 50meg to 6meg. Not bad.
$ ls -lh compressed/
total 69488
-rw-------  1 ben  staff   6.7M Feb  9 07:47 VA_Bon_Air_20220920_TM_geo.pdf
-rw-------  1 ben  staff   8.0M Feb  9 07:47 VA_Chesterfield_20220908_TM_geo.pdf
-rw-------  1 ben  staff   6.4M Feb  9 07:47 VA_Drewrys_Bluff_20220920_TM_geo.pdf
-rw-------  1 ben  staff   6.4M Feb  9 07:47 VA_Drewrys_Bluff_20220920_TM_geo.with_images.pdf
-rw-------  1 ben  staff   6.3M Feb  9 07:47 VA_Richmond_20220920_TM_geo.pdf

# Are the PDF layers still intact? They are
$ python3 ~/dt/i2x/src/trunk/tools/bash/bin/pdflayers -l compressed/VA_Richmond_20220920_TM_geo.pdf
off:230:Labels
on:231:Map Collar
on:232:Map Elements
on:233:Map Frame
on:234:Boundaries
on:235:Federal Administrated Lands
on:236:National Park Service
on:237:National Cemetery
on:238:Jurisdictional Boundaries
on:239:County or Equivalent
on:240:State or Territory
on:241:Woodland
on:242:Terrain
off:243:Shaded Relief
on:244:Contours
on:245:Hydrography
on:246:Wetlands
on:247:Transportation
on:248:Airports
on:249:Railroads
on:250:Trails
on:251:Road Features
on:252:Road Names and Shields
on:253:Structures
on:254:Geographic Names
on:255:Projection and Grids
off:256:Images
on:257:Orthoimage
on:258:Barcode

My script shrinks a USGS PDF from 50'ish megs to 7'ish. That means I can now store the 1,697 map files for Virginia in 11.8 gigs of disk space, instead of 84.8 gigs. That's quite an improvement for a script that was relatively easy to write, and fast to execute.

The question remains: does the modified PDF remain a valid GeoPDF? Will Avenza Maps treat it like a location aware document? I loaded up one of my newly compressed maps into Avenza to confirm this:

Success! As you can see, Avenza is able to detect the coordinates on the map, as well as measure distances and bearings. The image-less maps are more compact, and completely functional.

You'll notice in the above screenshot that there are no street names printed on the map. That's by design. I turned off the layer that displays this information to verify that OCGs are still being respected. They are.

Time to start filling up Micro SD cards with collections of maps.

Wednesday, February 08, 2023

Same Map, Less Megs: Optimizing USGS Map File Size, Part 1

But Why?

I love the idea of downloading a large area's worth of USGS maps, dropping them on a Micro SD card, and keeping them in my 'back pocket' for unexpected use. Sure, Google and Back Country Navigator's offline map support is more elegant and optimized, but the there's just something reassuring about having an offline catalog at your fingertips.

Getting and downloading maps in bulk is easy enough to do. For example, I can ask my USGS command line tool for all the maps that define Virginia:

$ usgsassist -a topos -l "Virginia, USA" | wc -l
    1697

The problem is that each map is about 50 megs. I confirmed this by looking at the 4 maps that back Richmond, VA:

$ wget $(usgsassist -a topos -l "Richmond, VA"  | cut -d'|' -f3)
...
$ ls -lh
total 427440
-rw-------  1 ben  staff    53M Sep 23 00:20 VA_Bon_Air_20220920_TM_geo.pdf
-rw-------  1 ben  staff    56M Sep 17 00:17 VA_Chesterfield_20220908_TM_geo.pdf
-rw-------  1 ben  staff    48M Sep 23 00:21 VA_Drewrys_Bluff_20220920_TM_geo.pdf
-rw-------  1 ben  staff    51M Sep 23 00:22 VA_Richmond_20220920_TM_geo.pdf

Multiplying this out, it will take about 84 gigs of space to store these maps. With storage space requirements like these, I'll quickly exhaust what I can fit on a cheap SD card.

This begs the question: can we take any action to reduce this disk space requirement? I think so.

But How?

Inside each USGS topo is an 'Images' layer that contains the satellite imagery for the map. By default, this layer is off, so it doesn't appear to be there:

But, if we enable this layer and view the PDF, we can see it:

$ python3 ~/dt/i2x/code/src/master/pdftools/pdflayers \
   -e "Images" \
   -i VA_Drewrys_Bluff_20220920_TM_geo.pdf  \
   -o VA_Drewrys_Bluff_20220920_TM_geo.with_images.pdf

My hypothesis is that most of the 50 megs of these maps go towards storing this image. I rarely use this layer, so if I can remove it from the PDF the result should be a notable decrease in file size and no change in functionality.

But Really?

To test this hypothesis, I decided I'd extract the image from the PDF. If it was as hefty as I thought, I'd continue with this effort to remove it. If the file isn't that large, then I'd stop worrying about this and accept that each USGS map is going to take about 50 megs of disk space.

My first attempt at image extraction was to use the poppler PDF tool's pdfimages command. But alas, this gave me a heap of error messages and didn't extract any images.

$ pdfimages VA_Bon_Air_20220920_TM_geo.pdf images
Syntax Error (11837): insufficient arguments for Marked Content
Syntax Error (11866): insufficient arguments for Marked Content
Syntax Error (11880): insufficient arguments for Marked Content
Syntax Error (11883): insufficient arguments for Marked Content
...

Next up, I found a useful snippet of code in this Stack Overflow discussion. Once again, PyMuPDF was looking like it was going to save the day.

I ended up adapting that Stack Overflow code into a custom python pdfimages script.

When I finally ran my script on one of the PDF map files I was surprised by the results:

$ python3 ~/dt/i2x/code/src/master/pdftools/pdfimages -i VA_Drewrys_Bluff_20220920_TM_geo.pdf -o extracted/
page_images: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 174/174 [00:18<00:00,  9.51it/s]
pages: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:18<00:00, 18.31s/it]

$ ls -lh extracted/ | head
total 361456
-rw-------  1 ben  staff   1.0M Feb  8 07:22 VA_Drewrys_Bluff_20220920_TM_geo_p0-100.png
-rw-------  1 ben  staff   1.0M Feb  8 07:22 VA_Drewrys_Bluff_20220920_TM_geo_p0-101.png
-rw-------  1 ben  staff   1.0M Feb  8 07:22 VA_Drewrys_Bluff_20220920_TM_geo_p0-102.png
-rw-------  1 ben  staff   1.0M Feb  8 07:22 VA_Drewrys_Bluff_20220920_TM_geo_p0-103.png
-rw-------  1 ben  staff   1.0M Feb  8 07:22 VA_Drewrys_Bluff_20220920_TM_geo_p0-104.png
-rw-------  1 ben  staff   1.0M Feb  8 07:22 VA_Drewrys_Bluff_20220920_TM_geo_p0-105.png
-rw-------  1 ben  staff   1.0M Feb  8 07:22 VA_Drewrys_Bluff_20220920_TM_geo_p0-106.png
-rw-------  1 ben  staff   1.0M Feb  8 07:22 VA_Drewrys_Bluff_20220920_TM_geo_p0-107.png
-rw-------  1 ben  staff   1.0M Feb  8 07:22 VA_Drewrys_Bluff_20220920_TM_geo_p0-108.png
$ ls extracted/ | wc -l
     174

Rather than extracting one massive image, it extracted 174 small ones. While not what I was expecting, the small files do add up to a significant payload:

$ du -sh extracted
176M    extracted

Each of these image files is one thin slice of the satellite photo. Here's an example:

I find all of this quite promising. There's over 170 Megs worth of image data that's been compressed into a 50 Meg PDF. If I can remove that image data, the file size should drop significantly.

Next up: I'll figure out a way to remove this image data, while still maintain the integrity of the map files. I'm psyched to see just how small these file can be!

Friday, February 03, 2023

Organize it or trash it? Taming My Download Directory Mess

Every so often my S22's Downloads folder becomes a tangled thicket of files. When I looked yesterday, it had over a hundred files and it was impossible to know if any were worth keeping.

Rather than sort this mess out, I prefer an easier route: push the files to an Amazon S3 bucket where the mess can live safely out view. Amazon's pricing makes it so that I can store a gig of questionable files for .4 cents(!) a month. At that price, it's much easier to sweep the files under the rug, knowing I can get back to them if needed, then to spend any real time doing cleanup.

Using FolderSync, this process happens in a reliable and automated way. Here's how.

First off, I defined a Amazon S3 account in FolderSync. I then created a new FolderPair with these properties:

The most useful settings are:

Sync type: Remote. This is a one way push from my phone into Amazon S3.
Sync deletions: (unchecked). It's important to leave this unchecked, as we'll be asking FolderSync to delete the files after they are uploaded.
Move files to target folder: (checked). This setting ensures the phone's download directory will be empty after the sync procedure is done.
Copy files to a time-stamped folder: (checked). This helps organize our clean up efforts, making a directory in S3 for each time we perform the sync process.

Once set up, I can click the 'Sync' button and have FolderSync work its magic. When my last cleanup effort finished, here were the results:

On the S3 side of things, I can access the files by browsing a date of interest:

One helpful feature of S3 is the ability to set up life-cycle policies. I'm using this, for example, to tell Amazon that the items in this archive bucket aren't going to be accessed frequently. Amazon will then store my files at a much cheaper rate.

I could use this facility to delete these files after an extended period, say, 1 year. While that's tempting, I'm interested to see what I make of my Downloads directory in say, another decade. Will I find some hidden gem in there that at the time seemed like junk, but is now invaluable? Time will tell.

If only IRL de-cluttering could be this easy!

Thursday, February 02, 2023

OCGs meet CLI | GeoPDF Layer Management from the Command Line

As soon as I learned that USGS GeoPDF maps contained multiple layers, I wanted to find a way to toggle these layers from the command line. Combined with my USGS download script, I figured I could retrieve and prepare maps in bulk. While I found many pdf command line tools, surprisingly, I couldn't find any that worked with layers.

My luck finally turned when I learned that PDF Layers are also known as 'Optional Content Groups', or OCGs. Using this term, I found a number of promising libraries and tools.

The most obvious one was the python based pdflayers script. But alas, it failed to identify any layers when I pointed it to a topo PDF.

Next up, I investigated the impressive PyMuPDF library. It offered functions like get_layers() and get_ocgs() which made it quite promising. I installed the library with the command:

$ python3 -m pip install --upgrade pymupdf

and kicked off a REPL to experiment with these functions. My Python skills are weak, but I did confirm that PyMuPDF identifies layers within a USGS GeoPDF.

I hacked together my own version of pdflayers* and before I knew it, I had just the tool I was after. Let's see it in action:

# Look at the existing layers of a freshly downloaded USGS topo map
$ pdflayers -l VA_Alexandria_20220927_TM_geo.pdf
off:236:Labels
on:237:Map Collar
on:238:Map Elements
on:239:Map Frame
on:240:Boundaries
on:241:Federal Administrated Lands
on:242:Department of Defense
on:243:National Park Service
on:244:National Cemetery
on:245:Jurisdictional Boundaries
on:246:County or Equivalent
on:247:State or Territory
on:248:Woodland
on:249:Terrain
off:250:Shaded Relief
on:251:Contours
on:252:Hydrography
on:253:Wetlands
on:254:Transportation
on:255:Airports
on:256:Railroads
on:257:Trails
on:258:Road Features
on:259:Road Names and Shields
on:260:Structures
on:261:Geographic Names
on:262:Projection and Grids
off:263:Images
on:264:Orthoimage
on:265:Barcode

# 'Enable' specific layers. When enabling, all other layers are
# implicitly disabled
$ pdflayers -e 237,238,239,249,251,252,253,262 \
   -i VA_Alexandria_20220927_TM_geo.pdf \
   -o t.VA_Alexandria_20220927_TM_geo.pdf

# Check my work: success!
$ pdflayers -l t.VA_Alexandria_20220927_TM_geo.pdf
off:236:Labels
on:237:Map Collar
on:238:Map Elements
on:239:Map Frame
off:240:Boundaries
off:241:Federal Administrated Lands
off:242:Department of Defense
off:243:National Park Service
off:244:National Cemetery
off:245:Jurisdictional Boundaries
off:246:County or Equivalent
off:247:State or Territory
off:248:Woodland
on:249:Terrain
off:250:Shaded Relief
on:251:Contours
on:252:Hydrography
on:253:Wetlands
off:254:Transportation
off:255:Airports
off:256:Railroads
off:257:Trails
off:258:Road Features
off:259:Road Names and Shields
off:260:Structures
off:261:Geographic Names
on:262:Projection and Grids
off:263:Images
off:264:Orthoimage
off:265:Barcode

While these results were encouraging, the question remained: would the updated files continue to behave like proper GeoPDFs? That is, would messing with the layers remove the geographic metadata that lets these PDF files work as interactive maps. I turned to Avenza Maps, an Android App, to answer this question.

Avenza is a map viewer that is powered by PDF files. When loaded with a GeoPDF, Avenza becomes location aware and can display a blue dot on the document at your current position. It can also report the precise coordinates of any spot on the map.

The moment of truth came when I loaded my modified PDF file. Would the blue dot show up? Would it remain geo aware?

It does! The two screenshots above are of the same map, the only difference being the visible layers. The second map has many layers turned off, optimizing it for viewing terrain. You can see contour lines in the first map, but the labels and other features make this harder to do.

One limitation of Avenza maps is that it has no ability to toggle PDF layers. Using my command line tool, I can now sidestep this issue. I can prepare maps using pdflayers and import them into Avenza for optimized use.

*My pdflayers command should really be packaged as a pip module. If this would be helpful, let me know in the comments.