I’ve been doing various clean up tasks this week related to the blog, with part of that being a review of the Google Webmaster Tools results. One of the things I noticed is that I had some crawl errors (URL’s that generated a page not found error). Here is the first part of that information showing 15 errors:
Drilling into the detail showed the 15 problematic URL’s. If you look you’ll see a pattern – something that looks like a ‘good’ URL followed by what looks like another web site (www.onetug.org at the end of the first one). I clicked through to the underlying post and looked at the html and the problem turned out to be simple, I had omitted the ‘http://’ when I entered the link (at least for the first few). Minus that prefix the current path is prepended, screwing up the URL. Not sure how I managed to enter them wrong, Live Writer is pretty good about fixing things like that as you go.
With 15 to fix it wouldn’t be horrible to do one edit at a time. Click the link, get a 404, remove the ‘bad’ part of the URL, hit enter, go to edit mode, scan for a not-quite-right URL, change, save, go back to webmaster tools and mark as fixed. Or you could do an update against the table. Tempting, sort of, less so since it’s mySQL and not SQL Server. That seemed like a good reason to go look for another option and the first one I found was Search Regex, a plug-in for WordPress. I know the power of regex but am not a regex guru so had to spend a few minutes experimenting. The pattern I needed to find was ‘www’ with a leading double quote – see the screen shot below:
Note: I know you know, but run a backup before you do this kind of thing!
I used Search quite a few times until I had the right results. Once I had it matching correctly I added the replace pattern and hit Replace, which shows the before/after but doesn’t actually make the change. I missed the trailing period and the review let me see that on my first try. Fixed that, tested again, looked ok, so I did Replace & Save, and then a Search to confirm no more matches on the original search.
It only took about 10 minutes overall. I’ll check back in a day or two to see if the errors go away (or if I created new ones!). Not a huge win, but worth doing. If you’re blogging it’s worth signing up for the webmaster tools – free – just to see what Google sees about your site.