1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
//! ## tl;dr
//!
//! cookie-rs percent-decodes a cookie's key and value,
//! it later only re-encodes the value.
//! No such encoding/decoding is defined for cookies.
//! Applications relying on the parser might break,
//! e.g. Servo doesn't use cookie-rs' formatting
//! and might send cookies with invalid characters back.
//!
//! ---
//!
//! # A bug
//!
//! It turns out cookie-rs handles Cookies in a rather unexpected way,
//! which in turn breaks other systems, such as Servo, that rely on cookie-rs
//! to do the parsing only, but no encoding/decoding of a cookie's contents.
//!
//! # Cookie Theory
//!
//! The latest "definitive specification for cookies as used
//! in the real world was published" (according to Wikipedia)
//! is [RFC6265](https://tools.ietf.org/html/rfc6265),
//! "HTTP State Management Mechanism".
//!
//! A cookie as sent in an HTTP header is defined by
//!
//! ```text
//! cookie-pair       = cookie-name "=" cookie-value
//! cookie-name       = token
//! cookie-value      = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE  )
//! ```
//!
//! `token` is defined in another RFC as
//!
//! ```text
//! token          = 1*<any CHAR except CTLs or separators>
//! ```
//!
//! with `CHAR` being every US-ASCII character and CTL being control characters
//! (including carriage return `\r` and linefeed `\n`)
//!
//! A cookie-octet is represented as:
//!
//! ```text
//! cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
//! ```
//!
//! That is:
//! US-ASCII characters excluding control characters, whitespace, comma,
//! semicolon and backslash.
//! That leaves a whole lot of characters to be used,
//! especially the percent sign `%` (aka `%x25`).
//!
//! The same `cookie-pair` description is used
//! for the header sent back to a server.
//!
//! The RFC also contains a description on how to properly parse HTTP headers
//! to extract a cookie's key-value pair.
//! It relaxes a bit from the above format,
//! so even servers not conforming to it will still be usable.
//!
//! At no point does the RFC talk about any encoding or interpretation
//! of the values.
//!
//! It has this hint though:
//!
//! > NOTE: Despite its name, the cookie-string is actually a sequence of
//! > octets, not a sequence of characters.
//! > To convert the cookie-string
//! > (or components thereof) into a sequence of characters (e.g., for
//! > presentation to the user), the user agent might wish to try using the
//! > UTF-8 character encoding
//!
//! Note: any such decoding should be for presentation to the user.
//! The string has to be used unmodified when used in a HTTP header.
//!
//! # Cookie-rs
//!
//! [Cookie-rs](https://github.com/alexcrichton/cookie-rs) is a library for
//! HTTP cookie parsing and cookie jar management for rust.
//!
//! It's also used by [Hyper](https://github.com/hyperium/hyper),
//! which in turn is used in [Servo](https://github.com/servo/servo).
//!
//! Usage is easy:
//!
//!
//! ```rust
//! let cookie_str = "key=value";
//! let cookie = Cookie::parse(cookie_str).unwrap();
//! ```
//!
//! This gives you a cookie object with access to all relevant fields,
//! especially the key and value.
//!
//! But hidden in the code and no where documented,
//! it percent-decodes everything.
//!
//! ```rust
//! let cookie_str = "key=value%23foobar";
//! let cookie = Cookie::parse(cookie_str).unwrap();
//! assert_eq!("value#foobar", cookie.value); // Unexpected.
//! ```
//!
//! It does so for the key as well.
//! This will also re-encode the value using percent-encoding.
//!
//! It also comes with a handy method to format a cookie again:
//!
//! ```rust
//! let cookie = Cookie::new("key".into(), "value#foobar".into());
//! assert_eq!("key=value%23foobar", format!("{}", cookie));
//! ```
//!
//! Though it does not do this percent-reencoding for the value.
//!
//! ```rust
//! let cookie = Cookie::new("key#foobar".into(), "value".into());
//! assert_eq!("key#foobar=value", format!("{}", cookie));
//! ```
//!
//! The decoding was introduced a long time ago in
//! [724003de](https://github.com/alexcrichton/cookie-rs/commit/724003decad72cdbe3f998b0b8d181682e9582c5).
//!
//! # Servo
//!
//! Servo relies on Hyper and cookie-rs to do the right thing
//! with HTTP requests and parsing of the data.
//!
//! Because it's a browser it also stores the cookies and sends them back
//! to the server as necessary.
//!
//! It also [properly follows the exact steps](https://github.com/servo/servo/blob/2be0cb7827c6553b7dfa4d641bf3a1c72372ad3b/components/net/cookie_storage.rs#L87-L117)
//! to generate the cookie header as written in the RFC.
//! And in the end it just concatenates the key and value.
//!
//! ```rust
//! (match acc.len() {
//!     0 => acc,
//!     _ => acc + ";"
//! }) + &c.cookie.name + "=" + &c.cookie.value
//! ```
//!
//! `c.cookie` is already a `Cookie` object, as parsed by `cookie-rs`.
//! This means all the above mentioned percent-decoding was already applied.
//! But because it direclty uses the strings in the object,
//! no re-encoding is applied anymore.
//!
//! Now consider a server sending a string containing percent symbols (`%`),
//! maybe even something that actually looks like a
//! properly percent-encoded value (`%0A = \n`).
//! Interpretation of this string is totally up to the server.
//! It expects to get the exact same string back from the client.
//!
//! Instead it gets a percent-decoded, but not re-encoded value back.
//!
//! Therefore Servo will fail to adhere to the RFC and breaks application
//! expecting their cookies back.
//!
//! The following cookies cause problems when Servo receives them
//! and later sends them back.
//!
//! ```
//! Cookie::parse("key=value%eefoobar") // After percent-decoding it's not valid UTF-8
//!                                     // and therefore not a valid String
//!                                     // Servo will just not save it.
//!                                     // → Cookie lost.
//!
//! Cookie::parse("key=value%0Afoobar") // After percent-decoding it contains
//!                                     // a newline.
//!                                     // Servo will insert it as is,
//!                                     // breaking the whole `Cookie` header
//!                                     // → Cookie is invalid on server-side
//! ```
//!
//! # Solutions
//!
//! The best thing would be to separate the different steps in cookie-rs.
//! Provide one layer for parsing only, solely dealing with byte arrays.
//! This layer would be used by Servo and would just work.
//!
//! Another layer on top could help with an actual decoding
//! or turning cookies into strings.
//! But at this point it's up to the application to decide.
//!
//! Of course the above would be a breaking change of cookie-rs
//! (even though this behaviour is not documented at all).
//! Right now there are only 6 reverse-dependencies on crates.io plus Servo.
//! I didn't check usage in any of them to see if above solution would break
//! more stuff.
//!
//! To keep cookie-rs as is (but maybe atleast document its behavior),
//! a new low-level library could be used
//! (which in turn makes cookie-rs just a user of this library).
//! No one depending on cookie-rs directly would be affected,
//! but Servo could be fixed.
//!
//! Note: Even re-encoding the cookie when inserted as a header by Servo
//! would not help, because of cookies not decoding to proper UTF-8.
extern crate cookie;

use cookie::Cookie;

/// It can parse cookies easily
#[cfg_attr(test, test)]
pub fn _01_simple() {
    let cookie_str = "key=value";

    let cookie = Cookie::parse(cookie_str).unwrap();

    assert_eq!("key", cookie.name);
    assert_eq!("value", cookie.value);
}

/// But it decodes the value
#[cfg_attr(test, test)]
pub fn _02_value_percent() {
    let cookie_str = "key=value%2Ffoobar";

    let cookie = Cookie::parse(cookie_str).unwrap();

    assert_eq!("key", cookie.name);
    assert_eq!("value%2Ffoobar", cookie.value);
}

/// And it fails if the value doesn't hold a valid percent-encoded string
#[cfg_attr(test, test)]
pub fn _03_value_single_percent() {
    let cookie_str = "key=value%foobar";

    let cookie = Cookie::parse(cookie_str).unwrap();

    assert_eq!("key", cookie.name);
    assert_eq!("value%2Ffoobar", cookie.value);
}

/// The same is true for the key: it's decoded.
#[cfg_attr(test, test)]
pub fn _04_key_percent() {
    let cookie_str = "key%2Ffoobar=value";

    let cookie = Cookie::parse(cookie_str).unwrap();

    assert_eq!("key%2Ffoobar", cookie.name);
    assert_eq!("value", cookie.value);
}

/// cookie-rs comes with a handy method to display a cookie.
/// It does so by formatting as you expect.
/// (It even includes the additional data used in a `Set-Cookie` header)
#[cfg_attr(test, test)]
pub fn _05_format() {
    let cookie = Cookie::new("key".into(), "value".into());
    assert_eq!("key=value", format!("{}", cookie));
}

/// If it gets a percent-encoded string like `value%2Ffoobar`,
/// everything is fine, because none of the characters gets
/// percent-encoded again.
#[cfg_attr(test, test)]
pub fn _06_format_perc() {
    let cookie = Cookie::new("key".into(), "value%2Ffoobar".into());
    assert_eq!("key=value%2Ffoobar", format!("{}", cookie));
}

/// But if it has a value like `value#foobar`,
/// the `#` gets encoded.
#[cfg_attr(test, test)]
pub fn _07_format_hash() {
    let cookie = Cookie::new("key".into(), "value#foobar".into());
    assert_eq!("key=value#foobar", format!("{}", cookie));
}

/// Funnily enough: percent-encoding on display is only applied to the value,
/// the key is passed through unmodified.
#[cfg_attr(test, test)]
pub fn _08_format_hash_key() {
    let cookie = Cookie::new("key#foobar".into(), "value".into());
    assert_eq!("key#foobar=value", format!("{}", cookie));
}

/// Because it expects to get a proper string,
/// it fails to build up a cookie
#[cfg_attr(test, test)]
pub fn _09_invalid_utf8() {
    let cookie_str = "ke%y=a%eeb";
    let cookie = Cookie::parse(cookie_str).unwrap();

    assert_eq!("a%eeb", cookie.value);
}

/// And now it also inserts a newline
#[cfg_attr(test, test)]
pub fn _10_newline() {
    let cookie_str = "ke%y=a%0Ab";
    let cookie = Cookie::parse(cookie_str).unwrap();

    assert_eq!("a%0Ab", cookie.value);
}