It was back in 2013 that researchers developed a software to crack CAPTCHAs on Google, Yahoo! and PayPal.
At that time, the algorithms in use were capable of solving the human-or-not challenge with an over 90% accuracy. An astonishing rate that it could be considered a hack for CAPTCHAs.
This time, researchers security firm F-Secure go a little beyond that.
The team have been surveying the threat landscape to find a way to properly crack CAPTCHAs. To ease the work, the researchers have even built a CAPTCHA cracking server, called the CAPTCHA22, in order to speed up the cracking process and create a safe place to store CAPTCHAs.
Then came a day when the researchers wanted to see whether or not they can crack Microsoft Outlook‘s text-based CAPTCHA (Outlook Web App, or also called the OWA portal) using AI, and make the CAPTCHA think that a human solved it.
The team not only found that text-based CAPTCHAs like the ones used by OWA are super annoying, but the researchers also found that the CAPTCHAs pose potential security risk.
F-Secure in a blog post:
To create what they call the 'Cracken', the researchers followed the same principles of manually labeling CAPTCHAs.
Based on their previous experience, the researchers were pretty optimistic that they could create an CAPTCHA-solving AI.
Unfortunately for them, they were first greeted with bad news.
The result was the AI was only capable of solving the CAPTCHA with 22% accuracy.
After putting the project back to the drawing board, the team decided to increase their labeling efforts and ramp up from the usual 200 labeled CAPTCHAs to about 1400. But here was where the researchers stumbled to another problem.
The team found that they were getting repeated CAPTCHAs, 5% at a time.
The team also found that OWA's CAPTCHA had different image-to-text ratios than any of those they had previously encountered.
While working on it, they decided to also make the characters clearer by adjusting the noise levels.
Even after all this work, the algorithm didn’t perform any better, as its accuracy was actually dropped to a little under 16%.
Upon further inspection, the researchers finally realized that they wrongly labeled roughly 50 out of every 300 manually entered CAPTCHAs. Some of the common mistakes happened on the confusing “I” for “l” (lower-case L), as well as “Y” for “V.”
This was where the researchers noticed yet another pattern: Microsoft Outlook‘s CAPTCHA mechanism never used “l” (lower-case L). With this in mind, the researchers can quickly tweaked their data set by narrowing down the alphabets.
Suddenly, the accuracy of the algorithm jumped from nearly 16% to 47%, a huge improvement.
The next step was even a bigger challenge: figuring out how to submit the cracked CAPTCHA to Outlook’s web portal.
Since the CAPTCHA was designed to keylog each letter the user submits, the researchers had to mimic the keystrokes a human would make to fool the mechanism.
Making things even more difficult, after ten seconds on the OWA login page, the researchers were requested to fill in a CAPTCHA again, even when doing things manually. So again, back to the drawing boards.
But this time, instead of "painstakingly reverse engineer the 160+ JS functions," the researchers opt to use Pyppeteer.
Pyppeteer is the unofficial port of Puppeteer to Python, which allows users to simulate a browser and inject commands via a very convoluted pipeline between Python, JS, and Chromium.
With the tool, the researchers wanted to automate the interactions with the browser window, go to the OWA page, extract the CAPTCHA for CAPTCHA22 to solve, to then simulate the keystrokes a user would perform to type in the CAPTCHA.
And finally, the researchers managed to do just that, and developed the fully automated and final pipeline of what they call the CAPTCHA Cracken.
“There are some interesting new CAPTCHA samples on the market, but it is just a matter of time before these also buckle under the CAPTCHA Cracken. We are not saying that CAPTCHAs are useless, they should just not be seen as the silver bullet that stops automated attacks,” the researchers said.