Google's New Algorithm: Accidentally Able to Solve CAPTCHAs

Google logo and CAPTCHASCAPTCHA, an acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart", is a type of challenge-response test used to determine whether or not the user is human by usually giving twisted, blurry or hard to read messages that bots couldn't "read". Google that also uses CAPTCHA, has made an algorithm that could crack its own system.

Google at first was developing a new algorithm to better read house and street numbers from Street View cameras that often come blurry. Recognizing images that show data in photographs is a difficult feat. The variation in lightnings and issues with shadows, focus and motion also add the difficulties, not to mention the variation of fonts, colors, styles, orientations, character managements and others.

The algorithm was developed for Google's Street View so the company can extract data from images properly to pinpoint them with high accuracy on the map.

So far, the algorithm has helped the company extract about 100 million street numbers worldwide.

To test the algorithm, Google's team also tested the new algorithm on the company's own CAPTCHA puzzles. And to their surprise, they found out that the new algorithm came out to be very good that it was able to crack them with 99.8 percent accuracy. The information was released from a paper published this week.

Earlier this year, Google Street View engineers published a paper describing a neural network, modeled from animal's nervous systems. The project came up pretty good at identifying house numbers. "We can, for example, transcribe all the views we have of street numbers in France in less than an hour using our Google infrastructure," wrote one of the engineers in the paper. It got the numbers right with 90 percent accuracy.

And by feeding it small, black-and-white CAPTCHA with none of the lightning or color variables, the neural network that was created using some of the technology from the Street View and reCAPTCHA teams, works even better. The whole idea of letting it loose to read CAPTCHAs is because they're hard for computers to solve. But instead, they found out that the algorithm came up to be surprisingly good that it can match humans.

Despite its deep convolutional neural network (a kind of neural network that's effective for image recognition) being very good when it comes to CAPTCHAs, the new algorithm isn’t quite as accurate, correctly identifying the text just over 90 percent of the time. When analyzing house numbers specifically, however, its accuracy climbs up to over 96 percent.

Before, Google actually decided to make its reCAPTCHAs less distorted and easier to decipher for humans because the company significantly reduced its dependence on text distortions as the main differentiator between human and machine. Google says that its reCAPTCHAs are now looking at a broader range of clues: entering the text is just one clue for an engagement to elicit a broad range of other clues that characterize humans and bots.

"Thanks to this research, we know that relying on distorted text alone isn't enough," wrote Vinay Shet, reCAPTCHA’s Product Manager in a blog post. Shet explains that part of this is analyzing a user’s full interaction with the CAPTCHA puzzle - and not solely whether they can get the answer right.