8.1. User tracking

A third-party host (or any object capable of getting content distributed to multiple sites) could use a unique identifier stored in its client-side database to track a user across multiple sessions, building a profile of the user’s activities. In conjunction with a site that is aware of the user’s real id object (for example an e-commerce site that requires authenticated credentials), this could allow oppressive groups to target individuals with greater accuracy than in a world with purely anonymous Web usage.

There are a number of techniques that can be used to mitigate the risk of user tracking:

Blocking third-party storage

User agents may restrict access to the database objects to scripts originating at the domain of the top-level document of the browsing context, for instance denying access to the API for pages from other domains running in iframes.

Expiring stored data

User agents may automatically delete stored data after a period of time.

This can restrict the ability of a site to track a user, as the site would then only be able to track the user across multiple sessions when she authenticates with the site itself (e.g. by making a purchase or logging in to a service).

However, this also puts the user’s data at risk.

Treating persistent storage as cookies

User agents should present the database feature to the user in a way that associates them strongly with HTTP session cookies. [COOKIES]

This might encourage users to view such storage with healthy suspicion.

Site-specific safe-listing of access to databases

User agents may require the user to authorize access to databases before a site can use the feature.

Origin-tracking of stored data

User agents may record the origins of sites that contained content from third-party origins that caused data to be stored.

If this information is then used to present the view of data currently in persistent storage, it would allow the user to make informed decisions about which parts of the persistent storage to prune. Combined with a blocklist (“delete this data and prevent this domain from ever storing data again”), the user can restrict the use of persistent storage to sites that she trusts.

Shared blocklists

User agents may allow users to share their persistent storage domain blocklists.

This would allow communities to act together to protect their privacy.

While these suggestions prevent trivial use of this API for user tracking, they do not block it altogether. Within a single domain, a site can continue to track the user during a session, and can then pass all this information to the third party along with any identifying information (names, credit card numbers, addresses) obtained by the site. If a third party cooperates with multiple sites to obtain such information, a profile can still be created.

However, user tracking is to some extent possible even with no cooperation from the user agent whatsoever, for instance by using session identifiers in URLs, a technique already commonly used for innocuous purposes but easily repurposed for user tracking (even retroactively). This information can then be shared with other sites, using visitors’ IP addresses and other user-specific data (e.g. user-agent headers and configuration settings) to combine separate sessions into coherent user profiles.