[rust-dev] Memory layout of types
michaelwoerister at gmail.com
Mon Jun 24 02:14:33 PDT 2013
As you may know, I'm working on debug info support for rustc. Last week
I found a small bug regarding the size of structs as stored in the debug
info. Once found, it was not hard to fix but it led me to thinking a bit
more on how to handle memory layout in the debug info code and how to
make this as stable as possible.
I will give a bit of an overview of the problem first and then I have
some questions towards the end. Maybe someone here can help me out with
At the moment the debuginfo module essentially tries to
reconstruct/mirror the behavior of other compiler parts that actually
define the memory layout. For example, debuginfo::create_struct()
emulates the behavior of LLVM's StructLayout class , and
debuginfo::create_boxed_type() adds a hardcoded set of fields to the
struct representing a box.
Unfortunately this approach can lead to fragile code:
(1) The reconstructed algorithm may miss corner cases. E.g. the current
struct layout algorithm in debuginfo does not handle 'packed' structs
(i.e. structs without padding bytes) or structs with destructors
(because they have an additional bool field, added by the compiler)
(2) When something changes somewhere else in the compiler, the debug
info algorithm may go out of sync. E.g. the comment section at the
beginning of trans/adt.rs indicates that in the future enums may support
optimized layouts to conserve space . This will obviously break debug
info, if it assumed that discriminant are always word-sized and at some
Duplicating the layouting logic is simply not 'DRY'.
Consequently, I think we should use an approach that doesn't try to
replicate layout information logic but reads it from the definite
sources. I think this problem can be broken down into two distinct
(1) Given a struct/tuple/enum type and a list of its field types, what
is the offset and allocation size of each field.
(2) What *are* the fields of a given composite type?
I think the best strategy to solve problem (1) is to query the finished
LLVM struct type. This should really be the definite source for a types
memory layout and saves us from having to emulate and keep up-to-date a
complicated layouting algorithm (there is not only the 'packed'
parameter but also the much more involved DataLayout specification
that needs to be accounted for). This can probably also be implemented
rather easily by wrapping LLVM's DataLayout class .
Problem (2) might not be that easy to solve. Or maybe its just knowing
the right function(s) to call. That is the main reason for posting this
email, maybe someone here knows the best way to solve this cleanly.
There are a few places that look promising:
* For structs, there is ty::struct_fields()---which is what is used at
the moment---but it does not account for additional fields generated by
the compiler, such as the 'destroyedness' flag.
* There is the trans::adt module and the Repr enum which look very
promising. It does not provide all the information I need (e.g. it will
include generated fields, but does not explicitely specify where they
are located, just in comments (enum discriminant) or code (destroyedness
flag)). But otherwise it seems the place to go for a definite field list
of composite types.
* Then there is middle::trans::type_::Type which seems to contain some
valueable information on the layout of boxes, trait stores, vecs, etc. I
wonder, however, where this information is taken from? Is it specified
somewhere? It would be great to have an exhaustive list of any
structures used internally by the compiler, like boxes, vecs, any kind
of fat pointer, etc. Can this be found somewhere?
Any help and comments are appreciated. Thanks for taking the time to
More information about the Rust-dev