[rust-dev] Syntax of vectors, slices, etc

Matthieu Monrocq matthieu.monrocq at gmail.com
Tue Apr 24 11:30:55 PDT 2012


As this is going to be my first e-mail on this list, please do not hesitate
to correct me if I speak out of turn.
Also do note that I am not a native English speaker, I still promise to do
my best and I will gladly welcome any correction.


First, I agree that operations on vectors and strings are mostly similar.

However this is at the condition of considering strings as list of
codepoints, and not list of bytes. List of bytes are useful in encoding and
decoding operations, but to manipulate Arabic or Korean, they fall short:
having users manipulate the strings byte-wise instead of codepoint-wise is
a recipe to disaster outside of English and Latin-1 representable languages.

I understand that this may seem contradictory to Rust's original direction
of utf-8 encoded strings, but having worked with utf-8 strings using C++
`std::string` I can assure you that apart from blindly passing them around,
one cannot do much. All modifiying operations require the use of Unicode
aware libraries... even `substr`.

Second, I do not think that statically known sizes are so important in the
type system. I am a huge fan, and abuser, of the C++ template system, but I
will be the first to admit it is really complex and generally poorly
understood even among usually savvy C++ users.

As I understand, fixed-length vectors were imagined for C-compatibility.
Statically allocated buffers have lifetime that exceed that of all other
objects in the system, therefore they can perfectly be accessed through
slices. Other uses implying C-compatibility should be based on dynamically
allocated memory, and the size will be unknown at compilation.

In the blog article linked, an issue regarding the variable-size of
`rust_vec<T>` is made because it plays havoc with stack-allocation.
However, is real stack-allocation necessary here ? It seems to me that was
is desirable is the semantic aspect of a scope-bound variable. Whether the
actual representation is instantiated on the stack or on the task heap is
an implementation detail, and the compiler could perfectly well be enhanced
such that all variably-sized types are actually instantiated on the heap,
but automatically collected at the end of the function scope. A "parallel"
stack dedicated to such allocations could even be used, as the
allocation/deallocation pattern is stack-like.

I hope my suggestions are reasonable. Do feel free to ignore them if they
are not!

-- Matthieu

On Tue, Apr 24, 2012 at 2:06 AM, Niko Matsakis <niko at alum.mit.edu> wrote:

> Some more thoughts on the matter:
> http://smallcultfollowing.com/**babysteps/blog/2012/04/23/**
> vectors-strings-and-slices/<http://smallcultfollowing.com/babysteps/blog/2012/04/23/vectors-strings-and-slices/>
> Niko
> On 4/23/12 4:40 PM, Niko Matsakis wrote:
>> One thing that is unclear to me is the utility of the str/N type.  I
>> can't think of a case where a *user* might want this type---it seems to me
>> to represent a string of exactly N bytes (not a buffer of at most N bytes).
>>  Graydon, did you have use cases in mind?
>> Niko
>> On 4/23/12 4:12 PM, Graydon Hoare wrote:
>>> On 12-04-23 03:21 PM, Rick Richardson wrote:
>>>> Should a str be subject to the same syntax? Because it will have
>>>> different semantics.
>>> I think the semantics are almost identical to vectors. Save the null
>>> issue.
>>>  A UTF-8 string  has differently sized characters, so you can't treat
>>>> it as a vector, there are obvious and currently discussed
>>>> interoperability issues regarding the null terminator.
>>> You certainly can treat it as a (constrained) vector. It's just a byte
>>> vector, not a character vector. A character vector is [char]. Indexing
>>> into a str gives you a byte. You can iterate through it in terms of
>>> bytes or characters (or words, lines, paragraphs, etc.) or convert to
>>> characters or utf-16 code units or any other encoding of unicode.
>>>  It should definitely get a slice syntax, since that will likely be the
>>>> most common operation on a string.
>>>> I would also like to support a notion of static sizing, but with UTF-8
>>>> even that's not always possible.
>>> Yes it is. The static size is a byte count. The compiler knows that size
>>> statically and can complain if you get it wrong (or fill it in if you
>>> leave it as a wildcard, as I expect most will do.)
>>>  I reckon a string should be an object, and potentially be convertible
>>>> to/from a vector.  But trying to treat it like a vector will just lead
>>>> to surprising semantics for some.  But that's just my opinion.
>>> The set of use-cases to address simultaneously is large and covers much
>>> of the same ground as vectors:
>>>   - Sometimes people want to be able to send strings between tasks.
>>>   - Sometimes people want a shared, refcounted string.
>>>   - Sometimes people want strings of arbitrary length.
>>>   - Sometimes people want an interior string that's part of another
>>>     structure (with necessarily-fixed size), copied by value.
>>>   - String literals exist and ought to turn into something useful,
>>>     something in static memory when possible, dynamic otherwise.
>>>   - Passing strings and substrings should be cheap, cheaper than
>>>     refcount-adjustment even (when possible).
>>> As far as I know, our class system can't really satisfy these
>>> requirements. This is why they're a built-in type (just like vectors).
>>> To make the class system strong enough to do all those things would be
>>> much more work, and would be approaching more like the C++0x model,
>>> which I believe to be over-engineered in pursuit of the "make libraries
>>> able to do anything a built in type can do" goal.
>>> But reasonable people disagree on this.
>>> -Graydon
>>> ______________________________**_________________
>>> Rust-dev mailing list
>>> Rust-dev at mozilla.org
>>> https://mail.mozilla.org/**listinfo/rust-dev<https://mail.mozilla.org/listinfo/rust-dev>
> ______________________________**_________________
> Rust-dev mailing list
> Rust-dev at mozilla.org
> https://mail.mozilla.org/**listinfo/rust-dev<https://mail.mozilla.org/listinfo/rust-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/rust-dev/attachments/20120424/42ff5929/attachment.html>

More information about the Rust-dev mailing list