Friday, 10 July 2009

Betcha didn't know that!



We've all seen Captchas right? Well you will almost certainly have used one before even if you didn't know what it was called. A Captcha is a 'Completely Automated Public Turing test to tell Computers and Humans Apart' and it's the box at the end of a sign up form or comment submission form (like you get when posting a link on Facebook) which makes you enter a word displayed on the screen as an image into a text box. The primary function of these boxes is to avoid spammers writing programs which make automatic submissions or signups to websites.

After a couple of years of successful use, someone had the big idea of using the time spent around the world filling these forms out to better effect.

So they added a second word. This seems a pain, but I'll explain why and you'll see the genius. Around 200 million of us spend around 10 seconds inputting Captcha words into forms every day which equates to 150,000 work hours. An ongoing challenge of the web is how to get the printed, pre-computer, pre-Internet word into a digital (and therefore searchable) format. Books are scanned and optical character recognition (OCR) software outputs what it sees as text. No OCR software is 100% accurate and therefore there are words it cannot digitise. Until now.

The images which cannot be deciphered by OCR are collected and sent via reCAPTCHA to the 100,000 or so forms around the web who use their freebie captcha widget. They also include an image of a word that the OCR program could decipher. If a user types in the word it knows to be correct, correctly, reCAPTCHA assumes the word it doesn't know will also now be correct. So, you can in fact type the correct word correctly and the unknown word incorrectly and still get through the sign up form (although an automated spammer couldn't). The unknown word is then sent onto a number of other reCAPTCHAS to see how other people see it until it reaches a point where reCAPTCHA is satisfied that the word OCR couldn't fathom has been, well, fathomed. It sends it back to HQ and puts it back in the original document filling in the blank.

We spend all day online and not much gets past us, however this one had me sat back in my chair for a while pondering just how some minds work. This might be the best use of crowd sourcing I've seen to date.

Learn more at http://recaptcha.net/learnmore.html

No comments: