[rust-dev] linkage

Graydon Hoare graydon at mozilla.com
Thu Jul 29 18:57:04 PDT 2010


Hi,

I wanted to go over some of the options facing us regarding the linkage 
model in rust, and get feedback on people's different preferences. It 
turns out this kind of thing effects a lot of code, in somewhat subtle ways.

How Things Are Presently in rustboot:
(warning: kinda surprising)

   - All code is compiled PIC.
     - There are no relocs.
     - Actual-pointers are derived programmatically.

   - Object-file symbols are used only minimally.
     - Symbols from libc.so and librustrt.so use normal
       C ld.so-style dynamic linkage.
     - On some platforms we emit local mangled symbols for rust items
       and glue functions within a crate. These are just to help with
       debugging. They serve no functional role.
     - 'rust_crate' is the only extern symbol a crate exposes to ld.so.
     - 'rust_crate' points to a descriptor that points
       to the DWARF sections .debug_abbrev and .debug_info

   - All further linkage is driven lazily by thunks inside a crate.
     - Calling a native C library thunks to librustrt.so which calls
       dlopen() and dlsym() on the C library, caches the result.
     - Calling a 'use'd function in a rust library thunks to
       librustrt.so which calls dlopen() on the C library, grabs
       'rust_crate', crawls the DWARF to navigate the module structure,
       and caches the result. Lookup is scoped to the crate's module
       namespace, with the import prefix stripped off.

   - Crate-dependency is acyclic. A crate can't depend on itself.

   - We present an "SPI" for embeddings to use, on the theory that an
     embedding might wish to load a rust crate in-process and spin up
     a thread domain to interact with the environment. We funnel all
     environment-interactions (logging, malloc/free, signals etc.)
     through this SPI. Or we should. That's the aim anyway. At the moment
     we don't always succeed.

This scheme sounds a bit ad-hoc, but it's based on a few specific goals 
and observations:

   - The ability to refcount a crate (and everything we pull out of it)
     such that it can be unloaded and a replacement reloaded at runtime.
     Hot-reloading, in other words. Also REPL-ing and such.

   - The ability to get type information -- including type abbreviations
     imported at compile time -- out of a crate's DWARF without separate
     'include' files or anything. We pull type info out of the same DWARF
     we drive the linkage itself off. Figured there was no point
     duplicating the information.

   - Crates are acyclic *anyways* because I didn't want to permit
     recursive type definitions crossing crate boundaries; module
     systems that support separate compilation of mutually-recursive
     types exist but they're pretty exotic and involve a lot of
     machinery.

Now, personally I like and am still interested in some aspects of this 
model, but I realize there are a lot of pressures working against it and 
it might be time to revisit. It has shortcomings and the goals might be 
achieved differently. Here are some issues:

   - This scheme means that the crate structure is the last word on the
     runtime linkage boundaries. If you realize you actually want two
     crates combined into a single loadable unit, you can't exactly
     statically link or combine LLVM .bc files or anything. This is
     solvable to some extent if you're combining a rust crate with
     another rust crate (just include one .rc file in the other, should
     work plus or minus some plumbing) but it won't get you far if you
     want to inline a bunch of C code into rust by mixing LLVM .bc files.

   - "Always having DWARF" is a nice side-effect of the existing scheme,
     but the visual studio debugger doesn't speak DWARF. You have to use
     gdb (or the forthcoming LLDB I guess) on win32. So not necessarily
     as big a win as one might like. Same goes for win32 profilers and
     such. At some point someone's going to want to be spitting out PDB.

   - The crate refcounting and symbol-cache is an additional cost.
     Probably not a huge one, but costs add up.

   - DWARF doesn't generally provide hashed access to symbols; while
     it *does* provide hierarchical name crawling, it's possible you'll
     wind up with a linear search in a substantially-wide namespace at
     some point during a symbol import. System linkers tend to hash or
     even pre-assign ordinals. And use IAT/IDT or PLT tables, which
     are smaller and probably faster than our thunks.

   - DWARF is a little complex to parse at runtime. Currently the runtime
     library has a partial DWARF reader and I'm less certain than I was
     that "any equivalent encoding of the runtime type signatures would
     be equally cumbersome". There might be simpler encodings.

   - Hot-loading probably means waiting for a domain to shut down and
     kill its type descriptor caches and such anyways, and may well not
     work properly if there are native resources involved. Plus you
     will have to be very particular about data-type and code-pointer
     identities between the loading crate and the loaded crate. It might
     be a bit of an imaginary feature, not worth fighting to preserve
     in current form.

   - It seems that LLVM is likely to consider DWARF "freely discardable"
     as it runs its optimizations. We might be able to mark a subset of
     the DWARF as non-discardable, or that may inhibit optimizations.
     We don't actually know how well the existing scheme will transplant.

   - The runtime library and the compiler have a bit of a "special
     relationship" in two ways: the use of C symbols for linkage --
     at least *something* special needs to happen for startup and for
     pulling in the all-other-symbols routine that the thunks target --
     and the fact that they know about one another's data structures (a
     bunch of structure offsets and ABI details need to be kept in sync
     between the two). Moving responsibilities between compiler, rust,
     and C++ runtime-library code tends to carry a heavy tax in terms of
     amount of maintenance work involved.

So .. I've been talking to others about an alternative model. I'll 
sketch it out here; there are obviously many details involved but I 
thought I'd at least give a broad picture and see if anyone thinks it'd 
be better:

   - Let gas or someone else decide when PIC makes sense, and to write
     our relocs for us when necessary.

   - Use system linkage much more. We don't have overloaded names so we
     don't *really* need mangled names for anything aside from glue; we
     can just module-qualify user-provided names using "." as expected.

   - Since symbols have no "global" cross-crate name in rust (the client
     names the import-name root) we'd need to ensure two-level naming
     (library -> symbol) works on all platforms. I *think* it does, but
     it might be a bit of a challenge in some contexts (Anyone know what
     to do on ELF, for example? GUID-prefix every name? This might sink
     the whole idea).

   - Give up on relying on DWARF. Use DWARF as much as we *can* on any
     platform that supports it. emit PDB when and if we can on win32,
     let LLVM discard what it needs to for the sake of optimization. Just
     treat it as "debug info" as the name implies.

   - Encode type signatures of crates using a custom encoding. Either
     some kind of reflective system where the client calls into the
     crate to make requests, or a fixed data structure it crawls, or
     something. Make something minimal up to fit our needs.

   - Give up on hot-reload in-process. Use the process-domain boundary
     as the hot-reload boundary. Make runtime linkage effectively
     "one way" like it typically is in C (you can dlclose(), but it's
     unsafe, so .. generally don't).

   - Possible: give up the concept of resource accounting at anything
     less than a process-domain, use rlimits or such to enforce rather
     than trying to funnel everything through an SPI (which won't catch
     native resource consumption anyway).

   - Possible: make a rust native-module ABI for C code in .bc files,
     and teach the compiler to mix such LLVM bitcode into the crate it's
     compiling. Modify the compiler to emit code in a more abstract form
     consisting of lots of calls to runtime-library stuff that's known to
     be inlined from C++-land (structure accesses, upcalls, glue and
     such). Write more of the compiler support routines in C++, including
     stuff that "has to run on the rust stack".

   - Possible: permit compiling a crate to .bc so it can be "linked" to
     another crate (with cross-crate inlining). Like, support this at a
     compiler-driver level, as a different target flavour.

I put the latter two points as "possible" because (a) it's not clear to 
me that they'd work and (b) they'd definitely not work with the existing 
x86 backend, or *any other* backend. We'd be quite wedded to LLVM if we 
relied on those; it'd make (for example) compiling the standard library 
with msvc or icc impossible, as we'd need parts of its LLVM bitcode 
mixed into the compiler output. But we could perhaps adopt those last 
two changes piecemeal, independent of the first several parts, once the 
self-hosted compiler is far enough along that LLVM is always an assumed 
part of the puzzle.

Thoughts? Feelings? Such changes would involve a lot of shifting around 
with potentially not-much visible or immediate gain, so would soak up a 
lot of work; the implications would come later and be strangely 
distributed (some performance improvements, some maintenance and 
integration improvements, some improvements and also some degradations 
in flexibility and portability..)

I also don't exactly know whether ELF is going to provide anything 
two-level-naming-ish to handle the proposed scenario. Any ideas on that? 
Mach-o and PE both provide such a system, ELF doesn't seem to.

-Graydon


More information about the Rust-dev mailing list