The Problem
Earlier today I was struggling to debug an issue I was having with a web scraping app that I've been working on. The server was claiming I wasn't sending any POST parameters to it, when I was quite sure I was.
Normally, I'd debug this with Fiddler - which would allow me to see exactly what my application was sending over the wire. I can easily wire a Scheme program into Fiddler by just adding:
(parameterize ((current-proxy-servers '(("http" "localhost" 8888)))) ...any code that deals with URLs... )
Unfortunately, I was dealing with an https connection, and for some reason, requests aren't being routed through the proxy.
The Debugging Strategy
After more time than I'd like to admit, I finally worked up a solution for seeing what the server was sending out. I just setup my own SSL server and created a trivial page which simply outputs the POST parameters.
You're welcome to test your https code against it, you can find the script here. It's embarrassingly simple, but turned out to be just what I needed.
This also prodded me to setup my server to do https, which I haven't done in quite some time. Turned out, it was easy, I just followed these instructions:
- You'll need a certificate for your server. You can either buy one, or create your own. If you make your own, you'll get a nasty warning message from the browser - but it's fast and free to do.
- Once you have the certificate, you can follow these instructions to install it.
Like I said, turned out to be painless.
The Solution
Once I had the proper debugging in place I was able to see that I was in fact, sending parameters but the server was ignoring them. Why? I had neglected to set the Content-Type on the POST. D'oh. I added in the Content-Type of application/x-www-form-urlencoded and all worked perfectly. My first instinct was that this was a bug in the PLT Scheme URL library, but it makes sense to me that this value is unspecified - as why should the library assume that you're always sending application/x-www-form-urlencoded content?
Oh well, it was a good lesson to learn, even if it was a painful way to learn it.
Experience has shown me that %99 percent of the time I am doing something wrong. That said, as small as %1 is, I've dealt with that zone, too!
ReplyDeleteSo true!
ReplyDeleteAnd here I thought I had some mysterious https/socket issue and instead, it was me missing a trivial HTTP header.
So typical...
Ben you should add the "notify of replies by email" to your blog. Most folks don't subscribe to comment feed.
ReplyDeleteWhat libraries are you using to do your screen scraping with Scheme? How are they working out?
ReplyDeleteDo you use the FiddlerScript Editor much?
ReplyDeleteGrant -
ReplyDeleteMy web scraping has been making use of: HtmlPrag and SXML (specifically, sxpath).
I've found this article to be quite useful: http://turingcompletewasteoftime.blogspot.com/2007/11/i-have-ended-world-hunger.html
It basically captures all the techniques that I use.
Fiddler Script Editor - haven't used it yet. Should I? What's it for?
I thought Google recently added the "notify of replies by e-mail" functionality to all blogs? Or are you suggesting something else?
Thanks for the feedback!