[rust-dev] parsing, ambiguity, and empty structs

Niko Matsakis niko at alum.mit.edu
Wed Feb 27 07:36:22 PST 2013


A recent and very welcome pull request [1] pointed out Yet Another 
Ambiguity around struct syntax.  If you have something like this:
     ... match x { ...
is that "match (x {})", where `x` is the name of a struct literal, or 
`match x {` where `x` is the variable being matched and what follows are 
the arms?

Before I go any further, I want to emphasize that I am not picking on 
the author of the pull request.  As I said, it's excellent work and the 
author made a logical decision on how to proceed with the ambiguity.  
However, since it is dealing with our grammar, it seems like we should 
decide how to resolve this with more discussion than a review on a pull 
request, so I thought I'd write up an e-mail describing the issue and 
gather some feedback.

Now, to some extent, you can resolve this if there are fields present 
because the code would look like:
     ... match x { field: ...
However, this breaks down if you have empty structs, which didn't used 
to be allowed but currently are.  Plus it requires more lookahead, 
clearly, though not an indeterminate amount.

The pull request took the approach of parsing `match x {}` as an empty 
struct literal and thus to write a match with no arms (an admittedly 
bizarre thing to write) one must write `match (x) {}`.  This is 
reasonable but I find it personally somewhat surprising that `match x { 
}` would not parse (...and then likely lead to an exhaustiveness 
checking failure).

However, this same ambiguity arises in a lot of places: if/else-if 
expressions, match expressions, `do` and `for` expressions, and perhaps 
a few others.  Currently I *think* we use lookahead for field names to 
resolve the ambiguity that arises with struct literals, but of course 
this doesn't work with empty structs.  I'd like it if we could resolve 
this in a uniform way.

I see various options:

1. Treat Foo {} as a struct literal, requiring parentheses to 
disambiguate in some cases (e.g., `if (x) {}`).  This is what the pull 
request does.

2. Declare that `Foo { ... }` literals must always have at least one 
field, and use newtype structs for the empty struct case.

3. Place a parser restriction on those contexts where `{` terminates the 
expression and say that struct literals cannot appear there unless they 
are in parentheses.

Some details follow.

### Treat `Foo {}` as a struct literal

I don't have anything more to say about this approach. =)

### Treat empty structs the way we treat enum variants?

Perhaps we should just not parse a declaration like:
     struct X {}
instead one would write something like:
     struct X;
or
     struct X();
Much as you write
     enum Foo { Y }
This would be a "new-type" struct so X would also serve as a value, just 
like the constant `Y` in the enum case.  This would mean that one never 
writes a struct literal `Foo {}` but instead just `Foo`.

### Restrict where struct literals can appear

We could also just have a subclass of expressions which can appear in 
`if`, `do`, etc.  This subclass would not permit struct literals.  That 
means that `if Foo {x: 10}.is_true {}` or something would have to be 
written `if (Foo { x: 10 }.is_true()) { ... }`.  This rule implies that 
very little lookahead is needed.  Such rules can be a pain for the 
pretty printer, however.  To some extent we already have a rule like 
this for `do` and `for`, since we will parse:
     ...for x.each |y...
as a method call with one argument and not `(x.each | y)`.  Since this 
rule would presumably not apply to `if` etc, there would actually be 
three classes of expressions, those that can appear in `if`, those that 
can appear in `do`/`for`, and full expressions.

### My personal opinion

I started out preferring the final option, but I am now leaning towards 
option #2, which seems to simplify the grammar overall and still 
requires only fixed lookahead to disambiguate.


Niko

[1] https://github.com/mozilla/rust/pull/5137


More information about the Rust-dev mailing list