graydon at mozilla.com
Thu Jul 29 18:57:04 PDT 2010
I wanted to go over some of the options facing us regarding the linkage
model in rust, and get feedback on people's different preferences. It
turns out this kind of thing effects a lot of code, in somewhat subtle ways.
How Things Are Presently in rustboot:
(warning: kinda surprising)
- All code is compiled PIC.
- There are no relocs.
- Actual-pointers are derived programmatically.
- Object-file symbols are used only minimally.
- Symbols from libc.so and librustrt.so use normal
C ld.so-style dynamic linkage.
- On some platforms we emit local mangled symbols for rust items
and glue functions within a crate. These are just to help with
debugging. They serve no functional role.
- 'rust_crate' is the only extern symbol a crate exposes to ld.so.
- 'rust_crate' points to a descriptor that points
to the DWARF sections .debug_abbrev and .debug_info
- All further linkage is driven lazily by thunks inside a crate.
- Calling a native C library thunks to librustrt.so which calls
dlopen() and dlsym() on the C library, caches the result.
- Calling a 'use'd function in a rust library thunks to
librustrt.so which calls dlopen() on the C library, grabs
'rust_crate', crawls the DWARF to navigate the module structure,
and caches the result. Lookup is scoped to the crate's module
namespace, with the import prefix stripped off.
- Crate-dependency is acyclic. A crate can't depend on itself.
- We present an "SPI" for embeddings to use, on the theory that an
embedding might wish to load a rust crate in-process and spin up
a thread domain to interact with the environment. We funnel all
environment-interactions (logging, malloc/free, signals etc.)
through this SPI. Or we should. That's the aim anyway. At the moment
we don't always succeed.
This scheme sounds a bit ad-hoc, but it's based on a few specific goals
- The ability to refcount a crate (and everything we pull out of it)
such that it can be unloaded and a replacement reloaded at runtime.
Hot-reloading, in other words. Also REPL-ing and such.
- The ability to get type information -- including type abbreviations
imported at compile time -- out of a crate's DWARF without separate
'include' files or anything. We pull type info out of the same DWARF
we drive the linkage itself off. Figured there was no point
duplicating the information.
- Crates are acyclic *anyways* because I didn't want to permit
recursive type definitions crossing crate boundaries; module
systems that support separate compilation of mutually-recursive
types exist but they're pretty exotic and involve a lot of
Now, personally I like and am still interested in some aspects of this
model, but I realize there are a lot of pressures working against it and
it might be time to revisit. It has shortcomings and the goals might be
achieved differently. Here are some issues:
- This scheme means that the crate structure is the last word on the
runtime linkage boundaries. If you realize you actually want two
crates combined into a single loadable unit, you can't exactly
statically link or combine LLVM .bc files or anything. This is
solvable to some extent if you're combining a rust crate with
another rust crate (just include one .rc file in the other, should
work plus or minus some plumbing) but it won't get you far if you
want to inline a bunch of C code into rust by mixing LLVM .bc files.
- "Always having DWARF" is a nice side-effect of the existing scheme,
but the visual studio debugger doesn't speak DWARF. You have to use
gdb (or the forthcoming LLDB I guess) on win32. So not necessarily
as big a win as one might like. Same goes for win32 profilers and
such. At some point someone's going to want to be spitting out PDB.
- The crate refcounting and symbol-cache is an additional cost.
Probably not a huge one, but costs add up.
- DWARF doesn't generally provide hashed access to symbols; while
it *does* provide hierarchical name crawling, it's possible you'll
wind up with a linear search in a substantially-wide namespace at
some point during a symbol import. System linkers tend to hash or
even pre-assign ordinals. And use IAT/IDT or PLT tables, which
are smaller and probably faster than our thunks.
- DWARF is a little complex to parse at runtime. Currently the runtime
library has a partial DWARF reader and I'm less certain than I was
that "any equivalent encoding of the runtime type signatures would
be equally cumbersome". There might be simpler encodings.
- Hot-loading probably means waiting for a domain to shut down and
kill its type descriptor caches and such anyways, and may well not
work properly if there are native resources involved. Plus you
will have to be very particular about data-type and code-pointer
identities between the loading crate and the loaded crate. It might
be a bit of an imaginary feature, not worth fighting to preserve
in current form.
- It seems that LLVM is likely to consider DWARF "freely discardable"
as it runs its optimizations. We might be able to mark a subset of
the DWARF as non-discardable, or that may inhibit optimizations.
We don't actually know how well the existing scheme will transplant.
- The runtime library and the compiler have a bit of a "special
relationship" in two ways: the use of C symbols for linkage --
at least *something* special needs to happen for startup and for
pulling in the all-other-symbols routine that the thunks target --
and the fact that they know about one another's data structures (a
bunch of structure offsets and ABI details need to be kept in sync
between the two). Moving responsibilities between compiler, rust,
and C++ runtime-library code tends to carry a heavy tax in terms of
amount of maintenance work involved.
So .. I've been talking to others about an alternative model. I'll
sketch it out here; there are obviously many details involved but I
thought I'd at least give a broad picture and see if anyone thinks it'd
- Let gas or someone else decide when PIC makes sense, and to write
our relocs for us when necessary.
- Use system linkage much more. We don't have overloaded names so we
don't *really* need mangled names for anything aside from glue; we
can just module-qualify user-provided names using "." as expected.
- Since symbols have no "global" cross-crate name in rust (the client
names the import-name root) we'd need to ensure two-level naming
(library -> symbol) works on all platforms. I *think* it does, but
it might be a bit of a challenge in some contexts (Anyone know what
to do on ELF, for example? GUID-prefix every name? This might sink
the whole idea).
- Give up on relying on DWARF. Use DWARF as much as we *can* on any
platform that supports it. emit PDB when and if we can on win32,
let LLVM discard what it needs to for the sake of optimization. Just
treat it as "debug info" as the name implies.
- Encode type signatures of crates using a custom encoding. Either
some kind of reflective system where the client calls into the
crate to make requests, or a fixed data structure it crawls, or
something. Make something minimal up to fit our needs.
- Give up on hot-reload in-process. Use the process-domain boundary
as the hot-reload boundary. Make runtime linkage effectively
"one way" like it typically is in C (you can dlclose(), but it's
unsafe, so .. generally don't).
- Possible: give up the concept of resource accounting at anything
less than a process-domain, use rlimits or such to enforce rather
than trying to funnel everything through an SPI (which won't catch
native resource consumption anyway).
- Possible: make a rust native-module ABI for C code in .bc files,
and teach the compiler to mix such LLVM bitcode into the crate it's
compiling. Modify the compiler to emit code in a more abstract form
consisting of lots of calls to runtime-library stuff that's known to
be inlined from C++-land (structure accesses, upcalls, glue and
such). Write more of the compiler support routines in C++, including
stuff that "has to run on the rust stack".
- Possible: permit compiling a crate to .bc so it can be "linked" to
another crate (with cross-crate inlining). Like, support this at a
compiler-driver level, as a different target flavour.
I put the latter two points as "possible" because (a) it's not clear to
me that they'd work and (b) they'd definitely not work with the existing
x86 backend, or *any other* backend. We'd be quite wedded to LLVM if we
relied on those; it'd make (for example) compiling the standard library
with msvc or icc impossible, as we'd need parts of its LLVM bitcode
mixed into the compiler output. But we could perhaps adopt those last
two changes piecemeal, independent of the first several parts, once the
self-hosted compiler is far enough along that LLVM is always an assumed
part of the puzzle.
Thoughts? Feelings? Such changes would involve a lot of shifting around
with potentially not-much visible or immediate gain, so would soak up a
lot of work; the implications would come later and be strangely
distributed (some performance improvements, some maintenance and
integration improvements, some improvements and also some degradations
in flexibility and portability..)
I also don't exactly know whether ELF is going to provide anything
two-level-naming-ish to handle the proposed scenario. Any ideas on that?
Mach-o and PE both provide such a system, ELF doesn't seem to.
More information about the Rust-dev