發佈於： Fri, 13 Jan 2023 +0800
Mirrors in Chuanghua ():
The User Agent header
The User Agent header has been used for decades to identify the browser connecting to the server, both for statistics purposes, and to work around bugs in specific versions of the browser.
Unfortunately, it has also been used to block browsers, because they are not one of the “supported browsers”. This has been implemented both as pure blocks, or more insidious ones like not sending the same content as for other browser, “because that browser does not support it, anyway”.
To work around such problems, browsers started to include parts of the identification strings of some of the other browsers, mostly the major ones. This may have started with Internet Explorer, if not earlier.
While at Opera, we tried to identify as just plain “Opera”, without any of the other pretensions, but we eventually had to start sending fake User Agents to many sites, due to them blocking us or sending us bad data.
One of the more notable cases was what I call the “Catch-22 cookie”, which was a bad cookie checker that usually works like this:
- A new user loads the site.
- Since the user does not have any cookies, check if the client supports cookies by sending a cookie.
- Redirect to a new page that checks that the cookie was returned, and, if not, tell the user to enable cookies.
The problem in our case was that the code in step 2 sending the cookie first checked the User Agent string against a list in a Microsoft IIS server Client Capabilities module. It then only sent a cookie if the list said that the User Agent supported cookies. And since the module did not have Opera listed as a client that supported cookies, it did not send any cookies.
So, when reaching step 3, Opera, of course, did not have a cookie to send, because it never received one, and the user got told to enable cookies. Catch-22!
The site in question was rather large and important, so we needed it fixed. However, in the end, we actually had to submit patches to Microsoft to fix their server product.
To work, such a list needs to be based on perfect knowledge about all existing clients, which is difficult and costly, if not impossible.
The proper way of conducting this test would have been to always send a dummy cookie and remove it after the test completed. And, the capabilities module’s list should have been a block list for known user agents that the server shouldn’t send cookies to.
At Vivaldi, we’ve also encountered issues when identifying as “Vivaldi” and tried to use customized User Agents with sites that broke. Eventually, we gave up and started just identifying as Chrome to all websites, except those few that we knew would handle our Vivaldi ID correctly (e.g., our own site or our partners’).
But, it is not just the name of the browser that can cause trouble, it can also be the version number.
When Opera reached version 10, we actually encountered websites that believed the version was 1.0, not 10.0. The reason was that the scripts that processes the header were hardcoded to assume there was just one digit in from of the version dot, so two-digit numbers broke the script. We thus had to stop the version at 9.9 and add a second Opera-specific User Agent name to identify the 10.x+ versions.
We encountered similar issues when we reached Vivaldi 1.10. Again, websites assumed there was just one digit in the minor version number, so 1.10 was considered 1.1. Instead, we had to use 1.91 as the version number in the User Agent string.
When we later reached 2.10 and were still encountering extensive browser sniffing at levels that made adding overrides unsustainable, we decided “<Bleep> it!” and stopped sending “Vivaldi” in the general User Agent header.
It is not just Client version numbers that are causing issues. A couple of months ago, I discovered that the Chromium code had frozen the MacOS version in the User Agent string at 10.15.7, the last released Mac OSX 10.x release, because websites would refuse to accept requests from machines running MacOS 11 (or later) and using that in the OS part of the User Agent String. Oops!
This is actually a problem for our bug reports, since we record the OS version from the User Agent string when bugs are reported. So, unless the reporter adds extra information, we may not realize the report is for Mac OS 11, 12, or 13.
In a related development, a year ago, the Chromium team actually had to run a long series of tests while preparing for their release of Chromium 100, due to the possibility of them encountering the same kind of problems. I assume Mozilla did something similar before they reached version 100.
More recently, there has been work to retire the User Agent header, starting with a reduction in the version information in the User Agent header. Chromium 106+ is no longer sending detailed information abut the version, just “18.104.22.168”. This is part of the transition to a new system of providing information about the browser: “Client Hints”.
Over the past few years, work has been going on to create a new system to provide more accurate information about the browser to the website, called Client Hints.
This new system is based around a set of HTTP headers prefixed with “Sec-CH-”, e.g. “Sec-CH-UA”, the new User Agent header. There are also headers for other information, browser engine, device, OS, resolution of the document on the display, etc.
Standards-wise, the system is based on the server indicating which of these headers it wants to receive. The browser then sends them if it supports them. However, Chromium, at the very least, is currently always sending three of these headers, including the User Agent (Sec-CH-UA), mobile, and platform information, and may send others.
Chrome’s Sec-CH-UA header contains information about browser brands and their version (Chrome and Chromium), as well as a “brand” value called “Not A Brand”.
The brand values are regularly (based on the version numbers) shuffled around in the header, and the “Not A Brand” value is also regularly modified by inserting various non-letter characters like “;”, “:”, and “.”, a process called “GREASE”, which has the effect of varying the header, so websites cannot rely on a particular sequence of values, or the text in the values. It thus attempts to force them into writing parsers that are standards compliant and don’t take shortcuts.
Vivaldi is currently not including a brand in this header, only sending “Chromium” and the “Not A Brand” values.
The big question about Client Hints is whether they (in particular Sec-CH-UA) will work better for browsers than the User Agent string?
Will websites properly parse the headers (Really?? ) and only use them responsibly (🤣 🙃). Or, will they start abusing this information to block “unsupported” browsers, too (😭)?
Unfortunately, the early indications are that some won’t parse properly, and others will use them to block “unsupported” browsers.
We have, so far, encountered two cases, one of each, where sites would not work for Vivaldi due to how the websites processed the Sec-CH-UA header information.
The first case was a website that worked in one version of Vivaldi but failed to load in the next, due to a server-side problem. This turned out to be due to the value shuffling and GREASE modification of the “Not A Brand” value. Chromium varies the sequence of values differently in each Chromium version, e.g.:
In the failing version, the sequence was “Not a Brand” and “Chromium”, and the “Not a Brand” value included a semicolon (“;”) character, which is used to separate values in unquoted text, but is just a normal character when it is in a quoted value. The website’s header parser ignored the quotes, and the result was that, when the “Not A Brand” value is first, the parser (and the server script) crashed.
Further investigation revealed that browsers with a branded header (e.g., Chrome and Microsoft Edge) would never have the “Not A Brand” value at the start of the header. Unbranded ones would have it that way every even-numbered Chromium version, that is, the Extended Stable versions, like 106 and 108. And every third of those would have a semi-colon in the string in a way that would break the parsing.
We “solved” that problem by freezing the sequence, so the “Chromium” brand was always first in the header.
In my opinion, part of the problem here is that Chromium does not vary the header enough, and not frequently enough. As mentioned, the branded browsers would never send the variation that caused our problem, and fact that Chromium only varies the content for each Chromium release, and not more often, means that it can take a long time (years) to discover parser problems like the one we encountered.
Part of the reason why the header isn’t varied more frequently is that the header value can be used as part of the key by websites to return cached webpages requested with the exact same set of URL and certain headers. Frequently changing the header value would increase the load on the website cache system and servers.
I still think that should not be a blocker for varying the sequence more frequently.
We have not yet worked around this issue, but, as far as we can tell, the only workaround is to use one of the “approved” brand names in our header. Using “Vivaldi” as the brand would not work.
If we encounter further issues of this kind, my guess is that the only way we can avoid the problem is to do like we did with the old User Agent string: pretend to be Google Chrome and make sure we do not send any Vivaldi-specific information.
That may not have been the desired goal for Client Hints, but if the last several decades of using User Agent string information have proven anything, it is that only the major vendors (OS or browser) are able to tell the truth. Everybody else will have to tell lies one way or the other - even Microsoft had to tell lies when they started distributing Internet Explorer.