Third-party scripts on hundreds of popular websites have been extracting personal information from visitors in “increasingly intrusive ways,” according to researchers at Princeton University.
Steven Englehardt, Gunes Acar and Arvind Narayanan, the authors of the research, have published a report documenting the surprisingly invasive nature of “session replay” scripts employed by hundreds of the world’s most-popular websites. The team’s findings are presented in the first installment of a series called “No Boundaries.”
The scripts in question record everything from keystrokes to mouse movements to scrolling behavior in relation to the contents of the pages you visit. That data is then sent to third-party servers, the team says.
In other words, major websites rely on what could be described as “keylogger-as-a-service” for analytics.
“Unlike typical analytics services that provide aggregate statistics, these scripts are intended for the recording and playback of individual browsing sessions, as if someone is looking over your shoulder,” according to the trio.
For this study, the team analyzed seven of the top session replay services available today, including Yandex, FullStory, Hotjar and UserReplay. After a lot of digging around, the Princeton researchers found 482 websites (of the Alexa top 50,000) using these services.
To be clear, these data collection services, and others like them, serve a legitimate purpose – gathering analytics to improve a website, or the products and services advertised or sold through that website. However, based on their findings, Princeton researchers conclude that “the extent of data collected by these services far exceeds user expectations.”
For instance, text typed into forms is sometimes collected and sent to third-party servers before the user can even submit the form. The data also gets sent even if the user decides to forfeit the session.
Session replay scripts are said to be forbidden from recording keystrokes when the user inputs their password in a password field. But say you type your password in the wrong field by mistake. We’ve all done it at least once in our lives.
Precise mouse movements are saved too.
“This data can’t reasonably be expected to be kept anonymous,” reads the report. “In fact, some companies allow publishers to explicitly link recordings to a user’s real identity.”
Depending on the purpose of your visit to the websites, the danger levels vary. Merely browsing through an online fashion store may not pose a tremendous risk to your security. But collection of page content from a clinic’s online portal, for example, may translate to leaked medical conditions, credit card details and other personal information.
“This may expose users to identity theft, online scams, and other unwanted behavior. The same is true for the collection of user inputs during checkout and registration processes,” the team writes.
Any service managing or storing vast clusters of customer data makes an attractive target to hackers. And according to the report, session relay services aren’t exactly bulletproof.
In addition to passwords included in some session recordings, sensitive user inputs are redacted in a “fundamentally insecure” ways. Recording services are inherently setting themselves up for failure to protect user data.
The dashboards for some of the session relay services deliver playbacks within an HTTP page. This is the case even when recordings take place on a secured HTTPS website.
A list of websites that use third-party session replay scripts can be found here.
Motherboard has obtained statements from a few of the websites targeted. While some have decided to stop using session replay scripts to reevaluate their protocols, others have been dismissive of the research, claiming that, despite making use of such services, the organization greatly values the security of its users.