So after leaving this place for a month (probably more), I log in to find that there are over 6300 comments pending. I have WordPress set up to automatically queue comments there that contain more than three hyperlinks, which tends to be a hallmark of spam. Of course, there are the more discrete spam bots, which just leave some BS comment and have the Website field point to discount V1agara or whatever else. Those are usually picked out by Akismet, which actually does a pretty good job. Unknown to me, however, was the fact that I had forgotten to re-enable Akismet after I upgraded. So it caught neither the trickier comments nor the 6300 blatant ones, which would otherwise have ended up in the Spam pile. And to boot, Akismet needs updating anyway.
Well, I decided to try a different solution, so the other day I installed the reCaptcha plugin. This service, which not only keeps your site from being inadvertently associated with male-enhancement drugs, also helps Google to digitize books, with words OCR programs tend to have problems with. (Think bad scans.) Check out their site for more information, it’s a neat concept.
I’ve only had it up for a day, so I’m not entirely sure it’s working, but I don’t see why it wouldn’t be. I tried using tesseract on one of the images, and here’s what I got. First, the image:
Now, when I cropped out ‘of’ and tried that separately, it got it – that’s how it works (check the link above), it gives you a known one and an unknown one; I guess you could fool it if you got the one correct. It does check the other word with other sites, though. (As a side note, I did put in the text given above along with ‘of’, and I was actually able to fool it. My guess though is that if you did this from the same IP a bunch of times it would catch on.)
So I’ll see how this turns out, hopefully it’s not too annoying.