Crate cookie_bug [−] [src]
tl;dr
cookie-rs percent-decodes a cookie's key and value, it later only re-encodes the value. No such encoding/decoding is defined for cookies. Applications relying on the parser might break, e.g. Servo doesn't use cookie-rs' formatting and might send cookies with invalid characters back.
A bug
It turns out cookie-rs handles Cookies in a rather unexpected way, which in turn breaks other systems, such as Servo, that rely on cookie-rs to do the parsing only, but no encoding/decoding of a cookie's contents.
Cookie Theory
The latest "definitive specification for cookies as used in the real world was published" (according to Wikipedia) is RFC6265, "HTTP State Management Mechanism".
A cookie as sent in an HTTP header is defined by
cookie-pair = cookie-name "=" cookie-value
cookie-name = token
cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
token
is defined in another RFC as
token = 1*<any CHAR except CTLs or separators>
with CHAR
being every US-ASCII character and CTL being control characters
(including carriage return \r
and linefeed \n
)
A cookie-octet is represented as:
cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
That is:
US-ASCII characters excluding control characters, whitespace, comma,
semicolon and backslash.
That leaves a whole lot of characters to be used,
especially the percent sign %
(aka %x25
).
The same cookie-pair
description is used
for the header sent back to a server.
The RFC also contains a description on how to properly parse HTTP headers to extract a cookie's key-value pair. It relaxes a bit from the above format, so even servers not conforming to it will still be usable.
At no point does the RFC talk about any encoding or interpretation of the values.
It has this hint though:
NOTE: Despite its name, the cookie-string is actually a sequence of octets, not a sequence of characters. To convert the cookie-string (or components thereof) into a sequence of characters (e.g., for presentation to the user), the user agent might wish to try using the UTF-8 character encoding
Note: any such decoding should be for presentation to the user. The string has to be used unmodified when used in a HTTP header.
Cookie-rs
Cookie-rs is a library for HTTP cookie parsing and cookie jar management for rust.
It's also used by Hyper, which in turn is used in Servo.
Usage is easy:
let cookie_str = "key=value"; let cookie = Cookie::parse(cookie_str).unwrap();
This gives you a cookie object with access to all relevant fields, especially the key and value.
But hidden in the code and no where documented, it percent-decodes everything.
let cookie_str = "key=value%23foobar"; let cookie = Cookie::parse(cookie_str).unwrap(); assert_eq!("value#foobar", cookie.value); // Unexpected.
It does so for the key as well. This will also re-encode the value using percent-encoding.
It also comes with a handy method to format a cookie again:
let cookie = Cookie::new("key".into(), "value#foobar".into()); assert_eq!("key=value%23foobar", format!("{}", cookie));
Though it does not do this percent-reencoding for the value.
let cookie = Cookie::new("key#foobar".into(), "value".into()); assert_eq!("key#foobar=value", format!("{}", cookie));
The decoding was introduced a long time ago in 724003de.
Servo
Servo relies on Hyper and cookie-rs to do the right thing with HTTP requests and parsing of the data.
Because it's a browser it also stores the cookies and sends them back to the server as necessary.
It also properly follows the exact steps to generate the cookie header as written in the RFC. And in the end it just concatenates the key and value.
(match acc.len() { 0 => acc, _ => acc + ";" }) + &c.cookie.name + "=" + &c.cookie.value
c.cookie
is already a Cookie
object, as parsed by cookie-rs
.
This means all the above mentioned percent-decoding was already applied.
But because it direclty uses the strings in the object,
no re-encoding is applied anymore.
Now consider a server sending a string containing percent symbols (%
),
maybe even something that actually looks like a
properly percent-encoded value (%0A = \n
).
Interpretation of this string is totally up to the server.
It expects to get the exact same string back from the client.
Instead it gets a percent-decoded, but not re-encoded value back.
Therefore Servo will fail to adhere to the RFC and breaks application expecting their cookies back.
The following cookies cause problems when Servo receives them and later sends them back.
Cookie::parse("key=value%eefoobar") // After percent-decoding it's not valid UTF-8 // and therefore not a valid String // Servo will just not save it. // → Cookie lost. Cookie::parse("key=value%0Afoobar") // After percent-decoding it contains // a newline. // Servo will insert it as is, // breaking the whole `Cookie` header // → Cookie is invalid on server-side
Solutions
The best thing would be to separate the different steps in cookie-rs. Provide one layer for parsing only, solely dealing with byte arrays. This layer would be used by Servo and would just work.
Another layer on top could help with an actual decoding or turning cookies into strings. But at this point it's up to the application to decide.
Of course the above would be a breaking change of cookie-rs (even though this behaviour is not documented at all). Right now there are only 6 reverse-dependencies on crates.io plus Servo. I didn't check usage in any of them to see if above solution would break more stuff.
To keep cookie-rs as is (but maybe atleast document its behavior), a new low-level library could be used (which in turn makes cookie-rs just a user of this library). No one depending on cookie-rs directly would be affected, but Servo could be fixed.
Note: Even re-encoding the cookie when inserted as a header by Servo would not help, because of cookies not decoding to proper UTF-8.
Functions
_01_simple |
It can parse cookies easily |
_02_value_percent |
But it decodes the value |
_03_value_single_percent |
And it fails if the value doesn't hold a valid percent-encoded string |
_04_key_percent |
The same is true for the key: it's decoded. |
_05_format |
cookie-rs comes with a handy method to display a cookie.
It does so by formatting as you expect.
(It even includes the additional data used in a |
_06_format_perc |
If it gets a percent-encoded string like |
_07_format_hash |
But if it has a value like |
_08_format_hash_key |
Funnily enough: percent-encoding on display is only applied to the value, the key is passed through unmodified. |
_09_invalid_utf8 |
Because it expects to get a proper string, it fails to build up a cookie |
_10_newline |
And now it also inserts a newline |