RFC: Empowered data - unification of code block and object literal (and class)
Herby Vojčík
herby at mailbox.sk
Fri Jan 6 13:58:39 PST 2012
This proposal tries not to add new entities into the language, it only tries
to take what already is there and reusing it mercilessly. Also number of
abstractions is lowered, since some of them could be implemented with
existing ones, with minimal changes. The result is compact, lesser, more
uniform and much more powerful language. ;-) Forward-compatible.
(The whole text is here as HTML:
http://blog.herby.sk/blosxom/Programming/ES-next/empowered-data.html
It is long, but imnsho rich. Please, give it a chance if you'll feel for a
few minutes of reading. It is also included below, but without formatting,
to allow replies.
Thank you, Herby)
Motivation.
Blocks of code enclosed in curly braces were of two natures in ES3 and ES5 -
there were code blocks, containing sequence of instructions to perform, and
there was the object literal, which contained recipe for building a
structured piece of data.
ES.next introduced some powerful additions to the object literal, introduced
new use for it (.{...} and <| {...} operations) and brought in a new type of
{...} block - the class block. The class block borrows many new features of
object literal, but itself is something in-between.
Driven by the feeling that having more types of {...} source code constructs
brings more confusion led to the thoughts about their nature and their
similarities. This proposal wants to take this train of thought to the
extreme by proposing only two types of "brave new curly block" constructs
with strict role split - the imperative one for control flow and the
declarative one for data structures, while building on similarities between
them and radically empower the declarative one in the process, and not
losing forward compatibility (by this I mean old-style constructs work in
the new proposal). Classes are made as one case of the declarative construct
with class-specific extensions, changing existing class syntax very slightly
and not losing any semantics.
Curly blocks are similar. Reuse for much power with few features.
A simple code block:
{
x = 4;
receiver.f(x);
function g() {
do { nothing; } while (false);
}
x++;
const prop = 5;
if (x>5) {
process.exit();
}
x--
}
Basic structure of this block is: there are simple statements, which are
terminated by a semicolon (ignoring general semicolon insertion here). The
last simple statement in a code block does not need a semicolon, though it
can have it. These statements include assignment and function call.
Then there are structured statements which do not need to be ended with
semicolon (do-while is a nasty exception), since they are ended with a
sub-block. These are if/else, while loops, function declarations etc.
Not that this is correct explanation of code block structure (I for example
ignore cases where if/else/while/... sub-statements are simple statments,
not sub-blocks. For now, let us assume there are always sub-blocks). I also
intentionally dismissed variable declarations, since they are not needed for
this topic and would make things a little more complicated (look at that
const line as an assignment ;-) ).
Now, for the simple object literal (with ES.next extensions):
{
x: 4,
g() {
do { nothing; } while (false);
}
y: { foo: "bar" },
get prop () { return 5; }
z: 0
}
Basic structure of this block is: there are simple productions, which are
terminated by a colon. The last simple production in a literal block does
not need a colon, though it can have it. These productions are property
initializations.
Then there are enhanced productions which do not need to be ended with
colon, since they are ended with a sub-block. These are get, set and method
declarations.
Even when the range of possible building elements of object literal is
smaller than that of the code block, the similarities can be seen pretty
well. There is undoubtful similarity between x = 4; and x: 4,, not only
syntactical, but semantical, too. There is strong syntactical similarity
between declaration of function g in code block and method g in the literal.
Semantically it is also pretty similar, though not as much as the previous
case.
Previous examples showed that there are formally (simple-simple,
structured-enhanced), syntactical and functionally similar pairs of
constructs between code block and object literal. These elements are,
more-or-less, about the same thing. The difference between them is given by
the context: assignment and function declaration do actions (they are
imperative), field specification and method specification produce data (they
are declarative).
It can be said, with lot of grains of salt, that code block is "(ordered)
collection of imperative elements, simple, semicolon delimited, as well as
structured, undelimited" and object literal is "(unordered) collection of
declarative elements, simple, colon delimited, as well as structured,
undelimited", but matching elements appear in both. This strawman is about
completing this element similarity, mainly drawing from useful code elements
and bringing their counterpart to the data domain.
1. if & Co. Conditional data structures.
The first idea to borrow from code domain is the if statement - in this
case, not a statement, but a data production. You surely had a situation
when writing an object literal and wanted to have a field or two only when
specific condition is met. The solution nowadays is either not put it in and
add it afterwards with if statement in the code (which is not correct, a
conditional data field was wanted, not a conditional action that assigns to
that field) or put the field in with ?: or && operators, so the field has
null value in the case it should not be there at all.
Why not to have something like this?
{
x: 4,
g() {
do { nothing; } while (false);
}
y: { foo: "bar" },
if (bar > cowboy) { jar: ["whiskey"], wall: ["bottle", "bottle"] }
get prop () { return 5; }
z: 0
}
The data-domain if, in accordance with its code counterpart, is the
structured element, that does not need a colon at the end, since it ends
with a sub-block. But the data-if governs data-block. The curly block that
is guarded by data-if should be a normal data-block which is included if the
condition is met, and is not included when the condition is not met.
Of course we can have if/else if/else combination, like in { name: name, if
(age > 60) { retired: true } else if (age < 18) { minor: true } else {
workplace: company } age: age }.
If the if/else could only govern (data) blocks, it would not be the true
compilation of code-if. To be true, it should take both simple elements
ended by comma as well as blocks into its syntax, so this should be
possible, too: { name: name, if (age > 60) retired: true, else if (age < 18)
minor: true, else workplace: company, age: age }. To not create
inconsistencies, I would allow this syntax, as well. Be as true to code-if
as possible. In one line, this may look inferior, but when indented, it can
be
{
name: name,
if (age > 60) retired: true,
else if (age < 18) minor: true,
else workplace: company,
age: age
}
or
{
name: name,
if (age > 60)
retired: true,
else if (age < 18)
minor: true,
else
workplace: company,
age: age
}
which is not that bad.
Another conditional that can readily be adopted into the data domain is
switch. It's fall-throgh, implicit block, break-finished semantic is a bit
unwieldy for a one-liner, like { name: name, switch (role) { case "manager":
canSeeReports: true, case "admin": aceessToServerRoom: true, break, case
"developer": accessToLibrary: true, default: needsTask: true } if (boss)
reportsTo: boss }, but again, formatting helps, and frankly, switch is not
used that often in the code, it won't be used that much in data either, but
sometimes it is really helpful. For the sake of completeness, it should be
in the data, as well.
2. f(x, y). Data-production macros.
Code has a function call amongst its "simple" building blocks. It allows to
define a little piece of code in one place and issue it later in many other
places, possibly parametrized. Why not to have something like that in data,
too? What about these data productions?
{
name("Doe", "John"),
people.counter(),
position: "manager",
salary: 100000
}
{
name("White Daemon", "Jinx Perry"),
dogs.counter(),
race: "cavalier King Charles spaniel",
colors: [ white, brown ]
}
What are name(...) and repositorty.counter(), function calls? Not exactly -
in code it would be calls to functions or methods that would do some
imperative sequence of actions. In data, it "invokes" a named data
production, which is just like function or a method, but its block is
declarative. Otherwise, they are defined the same way as functions or
methods, with exception of @ character used as a name prefix:
function @name(surname, givenNames) {
fullname: (locale == "hu" || locale == "jp") ?
surname+" "+givenNames : givenNames+" "+surname,
catalogName: surname+", "+givenNames,
givenNames: givenNames,
surname: surname
}
class Repository {
...
@counter() { id: this.maxId++, creationDate: Date.now() }
...
}
I call @ functions and methods data-production macros. They are not in fact
true functions - the semantics of dogs.counter() is to include id:
dogs.maxId++, creationDate: Date.now() in the object literal. The semantics
is this for a reason - so implementors can optimize it to any level they see
fit. It is "just" an inclusion of a parameterized preready data production.
On the other hand, dynamics of true functions / methods and easy
interoperability with code must be present, macros must be as flexible as
code functions are. For this, I'd propose these rules:
macro is first-class object that is accessible by its property name (sans @)
for reading and writing (if not made const etc.)
you can create macro object inline by function@ (args) { macro body }
typeof macro is "function", it has no [[Construct]] and behaviour of
[[Call]] is deliberately undefined (to allow implementors freedom to use it
as they see fit)
issuing non-macro object with typeof "function" from inside data block as a
macro results in throwing TypeError
As for the [[Call]] implementation specific, how do you reuse a macro from
inside code? Simply: obj.{ macro(...) }. This is officially recommended (and
only supported) way of reusing macro directly from code.
And yes, you can have recursion with macros. You are encouraged to.
One more note: macros can be even more powerful if they cleverly use the
[expr]: expr data production. It is part of ES.next-enhanced object literal.
The word is "cleverly", it can be colossally abused. You have been warned.
No loops, no variables. "Functional" object production.
There can be two paths with continuing the approach above. One is to adopt
everything, however imperative, which is possible, from the code side to the
data side, so we can have variables and loops in data side, as well and can
issue something like this:
{
operation: "square",
min: 1,
max: 10,
for (var i = this.min; i <= this.max; i++) { [i]: i*i }
}
I argue that when you have this kind of imperativity, (if is conditional
descriptive; macro, even if powerful through recursion, is less imperative
than loop and variable), you can as well do it in plain code. After all,
code is better for imperative things:
var result = {
operation: "square",
min: 1,
max: 10
}
for (var i = result.min; i <= result.max; i++) result.{ [i]: i*i };
I used this mechanical translation and not used result[i] = i*i; for purpose
of genericity: you can issue loops in code but still use all of the power of
enhanced descriptive blocks using .{ data-production... } construct.
If "side-effect" imperative things like variable, and, consequently, loops,
were exempt from data-production blocks (and nothing other which is
imperative in nature is added later; and all things that would be added
would be "side-effect-free" and non-imperative), we will end up with a thing
I'd call "functional data production". I think it is desirable trait of a
data-production.
By "functional" I now mean the trait that is inherent to code in functional
languages - if issued, with parameters, it produces value from them, but
this value production has no side-effects. The most prominent of these
side-effects is setting a value of a variable. One may also call this
"stateless". Data production should be stateless, imperative code is one
that should be stateful.
Being stateless (of course, the data production is not stateless in strict
sense - the values are computed by stateful code expressions, and [expr]:
expr can bring expressions in keys as well; but avoiding variables and loops
makes data production still less stateful) allows doing things that are
typical for functional code (various behind-the-scene optimizations, mainly;
but also some proofs of correctness) for the data -production blocks. Since
data production is descriptive thing, one almost naturally expects from it
to be sort-of "producing a value" instead of "start a process of
manufacturing a value". Though I can not give a convicing case for this, I
beilieve it is Good Thing (tm) to let the data production be stateless. In
the long run it will bring its fruit.
Parsing: ambiguities; syntax as opt-in philosophy.
This and lots of similar extensions are in some time questioned by the
parsing problems. For example:
{
if (typeof window === "undefined") { server: true }
else { broswer: true }
}
is interpreter as code, with two expression statements labeled "server" and
"browser", both statements being "true", if encountered in code. When parsed
in expression contexts (after assigment "=" or after function call "("), it
parses as data production.
The condensed example of this phenomenon is:
{}.f()
If encoutered in code context, it is syntax error, because {} is code block
and "." is unexpected token. If encountered in data context, it is the value
of calling f method of {} object.
This untreatable ambiguity may render any proposals as this doomed. But it
is not that. Even plain {} does not work - and we got used to put
parentheses around it whenever it appears at the beginning of an expression
statement (it is not so common to start an expression statement with object
literal, but when it happens, almost always dot is following and it produces
early syntax error). So this is annoying, but already known phenomenon, and
we learn to live with it. Bottom line is, it is orthogonal to this proposal.
One possible parsing problem is combination of method declaration (f(args)
{body)) with macro invoking (f(args)). But hopefully there will not be a
problem, because the latter needs a comma delimiter unless last in the
block.
One paragraph for "syntax as opt-in" mindset, which seems to be part of
ES.next. Conditionals and/or macro calls inside data production block are to
be treated as ES.next syntax and, consequently, opt-it in. The same is the
case of function@ and @-prefixed function and method names. The question of
scope of opt-in is still debated, but overall, this proposal seems to favor
program-wide opt-in. It needs the review of others to see full consequences
for "syntax as opt-in" if this proposal is considered. It brings some (not
breaking) changes to the basic ECMAScript matter, that is, to the object
literal. Also, if there were parsing guesses based on containing if, switch
or function call, they are invalidated.
3. Class is glorified declaration of prototype.
No offense meant. One of the motivation behind all this was the fact that
class block was neither imperative nor declarative but (at least
syntactically) something from both, and by need of having only two kinds of
{...} - imperative (with all its consequences and common functionality all
over) and declarative (ditto). And as I see it (I hope I am not alone),
class is a way to describe the prototype (and constructor at the same time,
but it is already nicely integrated). So taking example from class proposal
(comments shortened),
class Monster {
// The contextual keyword "constructor" ... defines the body
// of the class’s constructor function.
constructor(name, health) {
public name = name;
private health = health;
}
// An identifier followed by an argument list and body defines a method.
attack(target) {
log('The monster attacks ' + target);
}
// The contextual keyword "get" followed by an identifier and
// a curly body defines a getter in the same way that "get"
// defines one in an object literal.
get isAlive() {
return private(this).health > 0;
}
// Likewise, "set" can be used to define setters.
set health(value) {
if (value < 0) {
throw new Error('Health must be non-negative.')
}
private(this).health = value
}
// After a "public" modifier, an identifier ... declares a prototype
// property and initializes it
public numAttacks = 0;
// After a "public" modifier, the keyword "const" followed by an
identifier
// and an initializer declares a constant prototype property.
public const attackMessage = 'The monster hits you!';
}
we can embrace "just describe the prototype object" and do this instead:
class Monster {
// A method defined with name "constructor" is processed specially:
// tt _has_ [[Construct]] and is made a constructor of this class.
// If not explicitly generated, empty one is provided.
constructor(name, health) {
public name = name;
private health = health;
}
// A method, as in every object literal.
attack(target) {
log('The monster attacks ' + target);
}
// A getter, as in every object literal.
get isAlive() {
return private(this).health > 0;
}
// A setter, as in every object literal.
set health(value) {
if (value < 0) {
throw new Error('Health must be non-negative.')
}
private(this).health = value
}
// A property definition, as in every object literal.
numAttacks: 0,
// A "const" property definition, as in every object literal.
// (syntax of const property production is not yet agreed upon,
// just use any one which is selected in the end)
attackMessage := 'The monster hits you!'
}
Apart from the different comments, which just show the different
implementation provide semantically same result, the class code is nearly
identical. Gone is (superfluous) public keyword in context of the prototype,
I'd say it could go from constructor method as well (this.name = name; works
fine and does not create any exceptional situations for
constructor/non-constructor). If you see at it, the class block really only
did (declaratively) describe the prototype. So let us make class Clazz
[extends Superclazz] an operator on the generic data-production block, which
creates the class machinery from it and returns constuctor function. It can
be de-facto desugared to something like:
var _proto = (Superclazz || Object).prototype <| {
... the class body ...
};
if (!_proto.constructor) { _proto.{ constructor() {} } }
var _ctr = _proto.constructor;
__allowConstruct__(_ctr);
_ctr.prototype = _proto;
return _ctr;
except for the __allowConstruct__ will be inherent, not issued afterwards.
Pros are clearly visible: less kinds of abstraction, no management of making
features in class and object literal work consistently (class declaration is
an object literal, everything works automatically).
There are some open issues, definitely. The class proposal continues with
this:
class Monster {
// "static" places the property on the constructor.
static allMonsters = [];
// "public" declares on the prototype.
public numAttacks = 0;
// Although "public" is not required for prototype methods,
// "static" is required for constructor methods
static numMonsters() { return Monster.allMonsters.length; }
}
which can be straightforwardly rewritten as
class Monster {
// "static" places the property on the constructor.
static allMonsters: [],
// plain declares on the prototype.
numAttacks: 0,
// "static" is required for constructor methods
static numMonsters() { return Monster.allMonsters.length; }
}
and yes, object literal needs to know static keyword if used in context of
class operator. Yes, an exception, but pretty clear one. We can live with
it. The question appears: "What about static in macros?", which is not
really easy to answer. One possibility may be to allow it (and any use of
static) and throw an error if it is not (directly or included) happening
inside class operator.
Then there is private: one possibility is to treat it similarly to previous
static solution - allow to write it anywhere, but throw an error if it is
(directly or included) not inside class operator. Or it can be taken more
broadly, to which there is dedicated special section below.
To end this paragraph more positively, if you define class block to be a
data-production block, you can make the language more cohesive and features
reused instead of coordinated, which should be a plus. Also adoption should
be less fearful, because you do not any "class magic", you are simply
"declaring the structure of a prototype" (while constructor and static (and
also private, if it remains class specialty) are taken care by class
operator for you).
Classes + macros = free trait-based composition.
Obvious sexy freebie. Put traits into macros (you can create middlemen by
another macros importing and glueing some of them), and then use them in
class production.
function @Pointish() {
get r() { return Math.sqrt(this.x*this.x, this.y*this.y); }
get phi() { return ... this.x }
set r(newR) { ... }
set phi(newPhi) { ... }
}
function @Circlish() {
get area() { return Math.PI*this.radius*this.radius; }
get diameter() { return 2*this.radius; }
get cirumference() { return 2*Math.PI*this.radius; }
}
function @Translatable() {
translate(dx, dy) { this.x += dx; this.y += dy; }
}
function @Rotatable() {
rotate(angle) { this.phi += angle; }
grow(quotient) { this.r *= quotient; }
}
class BasicPoint {
constructor(x, y) {
this.x = x;
this.y = y;
}
Pointish()
}
class Vector extends BasicPoint {
constructor(x, y) { super(x, y); }
Rotatable(),
Translatable()
}
class Circle extends BasicPoint {
constructor(x, y, radius) {
super(x, y);
this.radius = radius;
}
Circlish(),
Translatable(),
grow(quotient) { this.radius *= quotient; }
}
//etc
And you can parametrize them, if you see the use (for example with names of
properties to use for x, y, radius, ... while having defaults).
4. private reuse
The class specificity of private raises issues, as could be seen above.
Also, private seems to be nice feature, which can be profitably used
anywhere, not just in classes.
The nature of private is lexical boundedness. It only has reasonable
semantics in the "scope", and is bounded to this scope. If it was made
available to any function or object literal, what would be the scope? What
would private(this) mean?
For now, the scope is always "the enclosing class block", which defines that
private is class-specific. I think it should be possible to let its broader
use.
It is true that other constructor functions (beyond those created in class)
do not need private at all - they have "lexical privates". But there is no
reason not to let them use the keyword if it is there - if only for uniform
look of the code. This can help maintainability and refactoring.
Let us allow any constructor function define itself as "private scope" by
issuing private; at the beginning of the block. Like this:
function Foo (bar) {
private;
private bar = bar;
this.built = this.maintained = new Date;
this.upgradeBar(equipment) {
private(this).bar.upgrade(equipment);
this.maintained = new Date;
}
}
This function uses its own private scope. If it appeared without private; in
the main level, an error should be issued, because private is used outside
of any scope. On the other hand, if it was without private; but for example
inside class operator, it would use the enclosing private scope - in it,
private(this) is legal, but private bar=bar; is illegal.
The module is probably an imperative block, too. It may be worth thinking if
it also could not have possibility of issuing private;, thereby defining
itself as a private-scope and at the same time defining the private storage
block. Then, anything defined inside this module could use private(this) to
get to module private data.
Back to the data domain, any object literal (which can now define methods,
and could put functions in properties before) could want to use private
data. Of course you can again use lexical scope to allow "lexical privates".
But the same as with constructor functions, why not let the object, if it
wishes to, use the private in the same manner? Drawing from the code domain,
let any literal use private, in the beginning of the block to make it
private-scope and defining the private store, as well. So you can do
var counter = {
private,
get count() { return private(this).count; }
increment() { private(this).count++; }
}
but count is not initialized. Drawing again form code private foo = "bar";
used in constructors (and when creating inline object, the literal that
builds it is sort-of a constructor), we get:
var counter = {
private,
private count: 0,
get count() { return private(this).count; }
increment() { private(this).count++; }
}
That's it. Broader usage of private raises lots of open questions, see them
below.
Known problems, open questions.
What if I want to include a trait to class or sub-data into an object, but
do not want to call a macro, which must be evaluated? For performance
reasons, there should be some kind of direct import there.
Do not worry and use parameterless macros. Premature optimization is the
root of all evil. Leave the evil to the compiler. If your macro does not
rely on side-effects, its invoking can be considerably optimized by
ECMAScript itself, down to one if from PIC and then inlining it. If you make
the macro const, even that if can be probably eliminated.
I understand use of private(this) in module, but how does it initialize its
private data?
See "What if such nested class's instances need their private store as
well?" below.
Drawing parallels from constructor, wouldn't presence of private count: 0 be
enough to make the literal a private scope?
Sometimes, private data is needed, but need not to be initialized. It may be
possible to make private, optional if there are some private properites set
up, but it may lead to forgetting it in case there are none. I'd rather see
it required, but see also next two items.
Does class block need private, at the beginning?
This is in fact, a very good question. To complicate things, private, at the
beginning of a class block would mean there is a private store for the
prototype, not for the instances. Instances' stores are governed inside the
constructor by its private;. I see little need for a prototype to have its
own private store. It can be said that if you include private,, the private
store for the prototype will be created and it is the lexical private scope
as for any other object literal. Also see "What if such nested class's
instances need their private store as well?" below, which may obsolete this
question altogether.
Ok, does constructor method inside class block need private; at the
beginning?
This is in fact, a very good question. ;-) First, let's say the
constructor's privateness should be promoted to the whole enclosing block
(well, constructor has some personal advantages). Secondly, it seems
constructor of class block has implicit private; since class is seen as
private scope, always. But it is intriguing idea that the privateness of
class is not automatic, for example to be able to create nested classes that
do need to access private stores of their owners. Food for thought for
private proposers and implementers (and users, of course).
But for the majority of cases, classes will be private;. By default, it
should be on.
There may be an opt-out, for example !private; or delete private;.
What if such nested class's instances need their private store as well?
Very good question. It seems that "being a scope" and "having a store" is
not the same thing. I do not see other solution than creating private store
whenever private(foo) is encountered in runtime, it wants to write and the
store does not exist (no need to create it to return undefined from read).
What if there is private, in class block, but also an implicit private; in
constructor?
See next.
What if private; for constructors was not implicit and there was private, in
class block, but also private; in constructor?
Well, since constructor one's is promoted up, they conflict. I'd say they
should be merged in this case, so they share private name for the store, but
both instances and prototype have a private store.
Does the class itself have private store? That is, is static private
allowed?
Functions do not have private store, as yet. Only objects created from
literals with private,, modules with private; and instances of classes with
private; (explicit or implicit) in constructor have it. I'd say no (use
prototype's private store). But again, see "What if such nested class's
instances need their private store as well?" above. It suggests simple
solution that invalidates all fears of this kind (and allows static private
with no problems).
Bonus: arrays and generators.
This is just a bonus idea, which sprang up from including code-like features
into data. The array literal was not enhanced any way yet. But it is
natural - in array literal you rarely need to have some elements optional
and some not. So no ifs here. As for the couterpart of a macro, let's
postpone it for a while.
Arrays have "listish", "linear" feel. So does loops. ;-) But loops are not
the right addition to the data production - they use variables and are very
code-like. But there is another element, which is "listish" and "linear" - a
generator. So, if you have defined some generator with
function* fib (upTo) {
...
}
why not to include it when bulding array literal, like:
[ "fibonacci", 10, *fib(10) ]
Not as sexy as traits though macros, but occasionally, usable. Especially in
bare [ *gen(args) ] form to have intrinsic toArray. And generators are
essentially macros of the array world, with a grain of salt.
And I think [ ..., *foo, ...] syntax could work for any iterable thing. Why
only generators?
Thanks for patience, Herby
More information about the es-discuss
mailing list