[rust-dev] Memory layout of types

Michael Woerister michaelwoerister at gmail.com
Mon Jun 24 02:14:33 PDT 2013

Hi everyone!
As you may know, I'm working on debug info support for rustc. Last week 
I found a small bug regarding the size of structs as stored in the debug 
info. Once found, it was not hard to fix but it led me to thinking a bit 
more on how to handle memory layout in the debug info code and how to 
make this as stable as possible.

I will give a bit of an overview of the problem first and then I have 
some questions towards the end. Maybe someone here can help me out with 

At the moment the debuginfo module essentially tries to 
reconstruct/mirror the behavior of other compiler parts that actually 
define the memory layout. For example, debuginfo::create_struct() 
emulates the behavior of LLVM's StructLayout class [1], and 
debuginfo::create_boxed_type() adds a hardcoded set of fields to the 
struct representing a box.

Unfortunately this approach can lead to fragile code:
(1) The reconstructed algorithm may miss corner cases. E.g. the current 
struct layout algorithm in debuginfo does not handle 'packed' structs 
(i.e. structs without padding bytes) or structs with destructors 
(because they have an additional bool field, added by the compiler)
(2) When something changes somewhere else in the compiler, the debug 
info algorithm may go out of sync. E.g. the comment section at the 
beginning of trans/adt.rs indicates that in the future enums may support 
optimized layouts to conserve space [2]. This will obviously break debug 
info, if it assumed that discriminant are always word-sized and at some 
fixed offset.
Duplicating the layouting logic is simply not 'DRY'.

Consequently, I think we should use an approach that doesn't try to 
replicate layout information logic but reads it from the definite 
sources. I think this problem can be broken down into two distinct 

(1) Given a struct/tuple/enum type and a list of its field types, what 
is the offset and allocation size of each field.
(2) What *are* the fields of a given composite type?

I think the best strategy to solve problem (1) is to query the finished 
LLVM struct type. This should really be the definite source for a types 
memory layout and saves us from having to emulate and keep up-to-date a 
complicated layouting algorithm (there is not only the 'packed' 
parameter but also the much more involved DataLayout[3] specification 
that needs to be accounted for). This can probably also be implemented 
rather easily by wrapping LLVM's DataLayout class [3].

Problem (2) might not be that easy to solve. Or maybe its just knowing 
the right function(s) to call. That is the main reason for posting this 
email, maybe someone here knows the best way to solve this cleanly.

There are a few places that look promising:

* For structs, there is ty::struct_fields()---which is what is used at 
the moment---but it does not account for additional fields generated by 
the compiler, such as the 'destroyedness' flag.

* There is the trans::adt module and the Repr enum which look very 
promising. It does not provide all the information I need (e.g. it will 
include generated fields, but does not explicitely specify where they 
are located, just in comments (enum discriminant) or code (destroyedness 
flag)). But otherwise it seems the place to go for a definite field list 
of composite types.

* Then there is middle::trans::type_::Type which seems to contain some 
valueable information on the layout of boxes, trait stores, vecs, etc. I 
wonder, however, where this information is taken from? Is it specified 
somewhere? It would be great to have an exhaustive list of any 
structures used internally by the compiler, like boxes, vecs, any kind 
of fat pointer, etc. Can this be found somewhere?

Any help and comments are appreciated. Thanks for taking the time to 
read this!


[1] http://llvm.org/docs/doxygen/html/DataLayout_8cpp_source.html#l00044
[3] http://llvm.org/docs/doxygen/html/classllvm_1_1DataLayout.html

More information about the Rust-dev mailing list