[rust-dev] A plea for removing context-free ambiguity / context-required parsing

Graydon Hoare graydon at mozilla.com
Fri Aug 17 12:15:22 PDT 2012


On 12-08-17 10:25 AM, Nathan wrote:

> Yes, this is true, but it doesn't address the general problem:
>
> match myfoo {
>    [bar, 42] => /* ... */
> }
>
> I may have the rust syntax wrong, but the point is anywhere in a
> structured/compound pattern either a variable binding or an enum
> discriminator may appear, right?

Yes. Or a constant. I.e. 'some(red)' doesn't match 'some(blue)' if red 
and blue are named constant integers (or enum tags).

I think it's going to be a bit of a losing battle to differentiate these 
in all cases; you wind up having to make one or more "very frequently 
written thing" either very ugly or likely to be misused.

Now, curiously, this is not as frequently _ambiguous_ as it seems. Most 
cases are automatically unambiguous:

   - foo::bar => ...  is unambiguous due to '::'
   - foo(_) => ... is unambiguous due to '(_)'
   - ref foo => ... is unambiguous due to 'ref'
   - copy foo => ... is unambiguous due to 'copy'

The only case we're actually looking at is nullary-constructor or 
equality-with-a-constant. That is, someone writing:

   match ... {
     nonr => ...
   }

and matching 'some(x)' against it because they fat-fingered 'none' and 
wound up binding an identifier (or alternatively, they mentioned an 
ident they thought was a constructor-in-scope but it was not, so they 
introduced a binding).

Fixing this is Hard though. We struggled a lot. You have this sequence 
of painful logic:

   - Hard requirement: 'let x = 10' declares a variable.
   - Hard requirement: 'const x : int = 10' declares a constant.
   - Hard requirement: 'x' in expression position refers to a variable
     and/or a constant, at least.

These are very basic "if we change them, the language passes the 
threshold of too-ugly-to-write". So they're not on the table. Now consider:

   - Misuse avoidance #1: users tend to forget that patterns and exprs
     are different, since they "look similar", so anything they write as
     an expr they're likely to write as a pattern. In particular, 'x' as
     a constant (and 'x' as a nullary enum ctor, if those are unadorned).

   - Misuse avoidance #2: users tend to write all enum ctors the same
     way, nullary and non, so if 'none' has a sigil, 'some' has to have
     one too, otherwise they'll wind up writing 'none'-without-the-sigil
     just out of symmetry with 'some'.

   - Ergonomics #1: ideally the distinction between a constant integer
     like "const red : int = 0xff0000;" and an enum ctor can be forgotten
     by users. People change between magic constants and enums with
     some frequency when writing code.

   - Ergonomics #2: ideally enum ctors don't all have sigils in expr
     position, since a great many are unambiguously ctor-calls anyway
     (eg. 'some(10)' is much nicer than '`some(10)' everywhere).

Misuse avoidance #1 makes it pretty much impossible to dodge the 
constant-vs-binding ambiguity, and the combination of #1 and #2 second 
means that the only dodge likely to work on the enum-vs-binding 
ambiguity is one where _all_ occurrences of _all_ ctors require sigils 
(eg. the Ocaml `Variant thing, for nullary and N>1-ary, in pattern and 
expr forms alike). This is possible. But it's about the only solution I 
can see that doesn't bring more problems than it solves. And even if we 
did that, it wouldn't solve #1, and would lose ergonomic arguments #1 
and #2.

So, with all this in mind, we went with the SML rule: having the 
compiler restrict names introduced-by-a-pattern to not-collide with any 
in-scope nullary ctors and constants. It _seems_ to be working pretty 
well in practice.

-Graydon


More information about the Rust-dev mailing list