Tonight, one of my client's discovered an odd issue with his Google Custom Search setup - it was indexing and directing folks to https versions of his page. He doesn't really a need for https, so while the server is listening on port 443 (probably for plesk, right?) sending users over there isn't a good idea.
While Google Custom Search gives you a fair amount of control of which sites get indexed and which don't, I couldn't find an obvious way to tell Google to exclude the https version of the site.
In the end, I used two tips I found floating around the web:
- Append the search operator -inurl:https to the search. This was done by tweaking the HTML form and adding an extra input.
- Setup a robots.txt to disallow indexing. This is the long term solution, of course. But, it's not exactly trivial, as the web server is setup to have both http and https served out of the same doc root. That means that by default, the same robots.txt is served up to everyone. There's a simple 2 line rewrite rule fix to address that.
Problem solved, and I learned a multitude of lessons: the inurl: operator rocks, rewrite rules are nearly always the solution and Google Custom Search can magically append a search parameter for you if you ask nicely.
No comments:
Post a Comment