The latest version of the bot detector reCaptcha is invisible to users and has spread to more than 650,000 websites.
It’s great for security—but not so great for your privacy.
We’ve all tried to log into a website or submit a form only to be stuck clicking boxes of traffic lights or storefronts or bridges in a desperate attempt to finally convince the computer that we’re not actually a bot.
For many years, this has been one of the predominant ways that reCaptcha—the Google-run internet bot detector—has determined whether a user is a bot or not. But last fall, Google launched a new version of the tool, with the goal of eliminating that annoying user experience entirely. Now, when you enter a form on a website that’s using reCaptcha V3, you won’t see the “I’m not a robot” checkbox, nor will you have to prove you know what a cat looks like. Instead, you won’t see anything at all.
“It’s a better experience for users. Everyone has failed a Captcha,” says Cy Khormaee, the reCaptcha product lead at Google. Instead, Google analyzes the way users navigate through a website and assigns them a risk score based on how malicious their behavior is. Khormaee won’t share what signals Google uses to determine these scores because he says that would make it easier for scammers to imitate benign users, but he believes that this new version of reCaptcha makes it incredibly difficult for bots or Captcha farmers—humans who are paid tiny amounts to break Captchas online—to fool Google’s system.
“You have to understand what behavior on the site should be and mimic that well enough to fool us,” he says. “That’s a really hard problem versus the general problem of, ‘Pretend like I’m a human.’” Website administrators then get access to their visitors’ risk scores and can decide how to handle them: For instance, if a user with a high-risk score attempts to log in, the website can set rules to ask them to enter additional verification information through two-factor authentication. As Khormaee put it, the “worst case is we have a little inconvenience for legitimate users, but if there is an adversary, we prevent your account from being stolen.”
According to tech statistics website Built With, more than 650,000 websites are already using reCaptcha v3; overall, there are at least 4.5 million websites use reCaptcha, including 25% of the top 10,000 sites. Google is also now testing an enterprise version of reCaptcha v3, where Google creates a customized reCaptcha for enterprises that are looking for more granular data about users’ risk levels to protect their site algorithms from malicious users and bots.
But this new, risk-score based system comes with a serious trade-off :
According to two security researchers who’ve studied reCaptcha, one of the ways that Google determines whether you’re a malicious user or not is whether you already have a Google cookie installed on your browser. It’s the same cookie that allows you to open new tabs in your browser and not have to re-log into your Google account every time. But according to Mohamed Akrout, a computer science Ph.D. student at the University of Toronto who has studied reCaptcha, it appears that Google is also using its cookies to determine whether someone is a human in reCaptcha v3 tests. Akrout wrote in an April paper about how reCaptcha v3 simulations that ran on a browser with a connected Google account received lower risk scores than browsers without a connected Google account. “If you have a Google account it’s more likely you are human,” he says. Google did not respond to questions about the role that Google cookies play in reCaptcha.
With reCaptcha v3, technology consultant Marcos Perona and Akrout’s tests both found that their reCaptcha scores were always a low risk when they visited a test website on a browser where they were already logged into a Google account. Alternatively, if they went to the test website from a private browser like Tor or a VPN, their scores were high risk.[Image: courtesy Google]To make this risk-score system work accurately, website administrators are supposed to embed reCaptcha v3 code on all of the pages of their website, not just on forms or log-in pages. Then, reCaptcha learns over time how their website’s users typically act, helping the machine learning algorithm underlying it to generate more accurate risk scores. Because reCaptcha v3 is likely to be on every page of a website, if you’re signed in to your Google account there’s a chance Google is getting data about every single webpage you go to that is embedded with reCaptcha v3—and there many be no visual indication on the site that it’s happening, beyond a small reCaptcha logo hidden in the corner.
Khormaee would not address the way that Google uses data for reCaptcha in any way and instead referred Fast Company to Google’s terms of service, which is linked beneath the reCaptcha logo on most sites. However, there was no reference to reCaptcha anywhere in terms of service. After this story was published, Google reached out to say that reCaptcha’s API sends hardware and software information, including device and application data, back to Google for analysis, and that the service is only used to fight spam and abuse.
Google encouraging site admins to put reCaptcha all over their sites, and then sharing the resulting risk scores with those admins is great for security, Perona thinks, because he says it “gives site owners more control and visibility over what’s going on” with potential scammer and bot attacks, and the system will give admins more accurate scores than if reCaptcha is only using data from a single webpage to analyze user behavior. But there’s the trade-off. “It makes sense and makes it more user-friendly, but it also gives Google more data,” he says. Google would not clarify what it does with the data it captures about user behavior via reCaptcha, only that it is used for improving reCaptcha and general security purposes.
This kind of cookie-based data collection happens elsewhere on the internet. Giant companies use it as a way to assess where their users go as they surf the web, which can then be tied to providing better-targeted advertising. For instance, Google’s reCaptcha cookie follows the same logic of the Facebook “like” button when it’s embedded in other websites—it gives that site some social media functionality, but it also lets Facebook know that you’re there. Previously, Google has said that the data captured from reCaptcha is not used for ad targeting or analyzing user interests and preferences. After this story was published, Google said that the information collected through reCaptcha will not be used for personalized advertising by Google.
Perona views Google’s use of reCaptcha as an “online land grab” that strengthens Google’s hold over the internet. He thinks reCaptcha is similar in this way to other Google products like Accelerated Mobile Pages (AMP), a program to make news sites’ pages load faster on mobile devices but has caused some consternation from publishers over whether Google is taking web traffic away from news sites. Same goes for Google Chrome, which the Washington Post recently called “surveillance software” (I’m among those who have ditched Chrome for Firefox).
“It’s always a double-edged sword,” Perona says. “You gain something, but you’re also giving Google a little more control over everything online.” The gain is security and better user experience, but privacy may suffer.
Google did not address any potential privacy problems and insisted that reCaptcha v3 is a matter of corporate responsibility. It sees reCaptcha v3 as a way of ensuring a safe, frictionless online experience. “Google is so deeply integrated with the internet,” Khormaee says. “We want to do anything we can to protect it.”