Bytecode

K. Gadd kg at luminance.org
Thu May 15 20:14:49 PDT 2014


It's my understanding that the vast majority of the CLR's dynamic
language support is at the runtime level, not the bytecode level. The
bytecode is strongly typed (with lots of instructions/mechanisms for
boxing, unboxing and type casts), and dynamic support is done through
something called the 'DLR' that sits on top of the CLR. The DLR
provides machinery for things like late binding and inline caches.

For this C# snippet:
```
using System;

public static class Program {
    public static void Main (string[] args) {
        dynamic one = (Func<int>)(
            () => 1
        );
        dynamic doubleInt = (Func<int, int>)(
            (int x) => x * 2
        );

        Console.WriteLine("{0} {1}", one(), doubleInt(1));
    }
}
```

The desugared (well, decompiled from IL - the arg_XXX_X variables are
from the decompiler, not actually in the IL) C# looks like this:

```
public static void Main(string[] args)
{
    object one = () => 1;
    object doubleInt = (int x) => x * 2;
    if (Program.<Main>o__SiteContainer0.<>p__Site1 == null)
    {
        Program.<Main>o__SiteContainer0.<>p__Site1 =
CallSite<Action<CallSite, Type, string, object,
object>>.Create(Binder.InvokeMember(CSharpBinderFlags.ResultDiscarded,
"WriteLine", null, typeof(Program), new CSharpArgumentInfo[]
        {
            CSharpArgumentInfo.Create(CSharpArgumentInfoFlags.UseCompileTimeType
| CSharpArgumentInfoFlags.IsStaticType, null),
            CSharpArgumentInfo.Create(CSharpArgumentInfoFlags.UseCompileTimeType
| CSharpArgumentInfoFlags.Constant, null),
            CSharpArgumentInfo.Create(CSharpArgumentInfoFlags.None, null),
            CSharpArgumentInfo.Create(CSharpArgumentInfoFlags.None, null)
        }));
    }
    Action<CallSite, Type, string, object, object> arg_15C_0 =
Program.<Main>o__SiteContainer0.<>p__Site1.Target;
    CallSite arg_15C_1 = Program.<Main>o__SiteContainer0.<>p__Site1;
    Type arg_15C_2 = typeof(Console);
    string arg_15C_3 = "{0} {1}";
    if (Program.<Main>o__SiteContainer0.<>p__Site2 == null)
    {
        Program.<Main>o__SiteContainer0.<>p__Site2 =
CallSite<Func<CallSite, object,
object>>.Create(Binder.Invoke(CSharpBinderFlags.None, typeof(Program),
new CSharpArgumentInfo[]
        {
            CSharpArgumentInfo.Create(CSharpArgumentInfoFlags.None, null)
        }));
    }
    object arg_15C_4 =
Program.<Main>o__SiteContainer0.<>p__Site2.Target(Program.<Main>o__SiteContainer0.<>p__Site2,
one);
    if (Program.<Main>o__SiteContainer0.<>p__Site3 == null)
    {
        Program.<Main>o__SiteContainer0.<>p__Site3 =
CallSite<Func<CallSite, object, int,
object>>.Create(Binder.Invoke(CSharpBinderFlags.None, typeof(Program),
new CSharpArgumentInfo[]
        {
            CSharpArgumentInfo.Create(CSharpArgumentInfoFlags.None, null),
            CSharpArgumentInfo.Create(CSharpArgumentInfoFlags.UseCompileTimeType
| CSharpArgumentInfoFlags.Constant, null)
        }));
    }
    arg_15C_0(arg_15C_1, arg_15C_2, arg_15C_3, arg_15C_4,
Program.<Main>o__SiteContainer0.<>p__Site3.Target(Program.<Main>o__SiteContainer0.<>p__Site3,
doubleInt, 1));
}
```

So you can see all the inline cache and binding machinery at work
there. As far as I know there were 0 bytecode changes to introduce
this feature; I certainly didn't have to implement any special
bytecodes to support 'dynamic' in JSIL.

There are certainly some aspects of the CLR bytecode that make dynamic
languages easier/harder to build on top of it, though. I just don't
know what they are. I know a lot of the pain was reduced with the
addition of 'lightweight code generation' or LCG, which allows jitting
a single method on the fly and attaching it to a given context (like a
method) so that it can access private members. This is used heavily in
dynamic languages on the CLR now.

On Wed, May 14, 2014 at 10:54 PM, C. Scott Ananian
<ecmascript at cscott.net> wrote:
> On Wed, May 14, 2014 at 9:12 PM, Axel Rauschmayer <axel at rauschma.de> wrote:
>
>> It'd be great if there was material on the limits of the JVM and the CLR.
>> AFAICT these are the only virtual machines that are trying to be universal
>> (run both static and dynamic languages well).
>
>
> Well, from experience, the JVM is/was handicapped by some incidental
> decisions in its original standard library[*] that have a large adverse
> impact on startup time.  This has restricted the 'usefulness' of the JVM
> from its inception.  There are projects to re-engineer the standard library
> around this, but they have been slow (and are not yet complete)[**].
> Similarly, the support for dynamic languages is fairly recent (JDK 7,
> JavaScript implementation using these features in JDK 8), so it's a bit
> early to know how that will play out in terms of adoption and practical use.
>
> So I'm not sure how much you're going to learn from the JVM, other than "no
> matter how good/bad your bytecode is, other factors may dominate".  That is:
> I would doubt most conclusions about bytecodes drawn from the example of the
> JVM, since I don't believe the bytecode design was a first order effect on
> its trajectory to date.
>
> That said, my favorite bytecode anecdote from the JVM is that the amount of
> space wasted in class files by renaming the language from 'oak' to 'java'
> was far greater than the amount of space saved by adding a 'jsr' instruction
> to bytecode (which was intended to allow finally clauses without code
> duplicate).  However, the jsr instruction was a disaster: it was responsible
> for the first security exploits in the JVM's early days, greatly complicated
> code verification (inspiring a bunch of new academic research! which is
> never something you want in a production language design), and slowed down
> execution by disallowing efficient bytecode verification techniques.  It was
> ultimately deprecated in Java 6.
>
> So: if you want small bytecode files, sometimes it's better just to rename
> your language!
>   --scott (a recovering Java compiler engineer)
>
> [*] My fuzzy recollection of one such: The `java.lang.System` class included
> the stdout/stdin/stderr fields `System.out`, `System.in`, `System.err` which
> as bytestreams needed to deal with the charset of the I/O streams (since
> Strings were natively UTF-16) and so ended up pulling in a huge list of
> supported charsets and charset conversion classes, totaling many hundreds of
> kilobytes of bytecode, none of which could be statically prebuilt because
> selecting the proper charset depended on the user's environment variable
> settings at runtime.  The amount of ancillary code pulled in by the charset
> conversion machinery included `System.properties` (to read that environment
> variable), which was a `Map` subclass, so pulled in most of the Collections
> API, etc, etc.
>
> [**] See http://openjdk.java.net/projects/jigsaw/ and the blog entries
> linked there.
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>


More information about the es-discuss mailing list