jack: (Default)
[personal profile] jack
The formatting is probably going to be screwed up here, because I'm going to use a lot of <. This is a mix of stuff I'm trying to get straight in my mind, so I hope it's somewhat informative, but please point out where I've been unclear, confused or incorrect.

I am going to talk about lifetimes specifically, and save "only one mutable reference at once" aspect of the borrow checker for the following post.

In C or C++, it's possible to take a pointer or reference to a variable, and use the pointer or reference after the value is no longer valid. If it happens within a single function, it's often possible for the compiler (or lint tool?) to warn you. Eg. returning a pointer or reference to a value in a temporary variable. If you have a pointer in a different part of the program, it's easy to miss. Ideally you write code so it doesn't happen, but it's good if it *definitely* can't happen.

Rust makes the equivalent of those compiler warnings a part of the language. Each value has an associated lifetime. That is typically the scope it was first declared in, but could be shorter (eg. if it's artificially dropped) or longer (if it's allocated on the heap). That is basically "how long it's ok to hold a pointer/reference to this value (or part of this value)"[1].

That's all much the same within one function, but rust applies the same guarantees across the whole program. In order to do so, if you have a reference of any sort, it needs to carry along a lifetime parameter. These are usually implicit to avoid boilerplate, which means you can dig yourself in surprisingly far before suddenly discovering you have NO IDEA how this works :)

A simple-ish example might be a function which take a string, and returns a substring. In C++, you would have to choose between returning a new string that copies that substring (with a small overhead), or returning a slice of some sort (a char*, or a special slice type) that references the original memory -- but becomes invalid if that memory goes out of scope and is deallocated. In rust, you can specify that the returned value has the same lifetime as the parameter supplied, and then the normal checks for the calling function make sure that the slice/reference isn't used after the original value is deallocated (or changed).

In fact, if there's only *one* parameter and the function returns a reference, the return is assumed to be a reference to the input parameter (or to part of it) and you don't need to specify the lifetimes, it all just happens. Except slightly more safely than in C++ where you would not usually write a function like that because it's not easy to see if it's used safely.

If there's two input parameters, you need to specify which the return value depends on. In principle you can specify the return value might depend on either, or on both, but I haven't tried anything like that.

That's about as far as I've got. There's more stuff I've thought about, but not certainly enough to talk about it.


The actual format is a special case of a template function. Lifetimes are named like identifiers are, but with a ' at the start. Conventionally 'a, 'b, 'c etc.

The function name is followed by <'a> or <'a, 'b, T> with as many lifetime and/or type parameters as needed. Each input reference can then be annotated with a type parameter after the &. You can use the same lifetime parameter for multiple input references and the function will just use the smallest lifetime (the lifetimes of the parameters supplied don't actually have to be the same).

Then the return value of the function specifies the appropriate lifetime parameter.

fn process_str<'a>(&'a in_str: String) -> &'a String

<b>Question 1</b>

That seems like a lot of confusing boilerplate. Since it seems like lifetimes ONLY come from template parameters, why do you need to specify them in the template parameters list? Why can't that just be omitted?

There's a stack overflow question, but the answer just says "better to be explicit", it doesn't really give any examples of what would be confusing without that.

<b>Question 1a</b>

For that matter, specifying the input parameters at all seems complicated. Since lifetimes can (?) only come from input parameters, why can't they be specified that way?

fn foo(a:String, b:String, c:String, d:String) -> & lifetime(a) String

Return a String with lifetime equal to the lifetime of a. And inside the lifetime construct, you could allow min, max etc to combine lifetimes of variables if necessary.

<b>Question 2</b>

If you dereference a reference to a value correctly before the value does out of scope, it's still an error if the reference is still in scope, even if you don't use it. That sort of makes sense (there's no point having it), but it also doesn't do any harm. Why isn't the end of lifetime considered the last time a variable is *used*, not where it goes out of scope?

There is an rfc to reconsider this question, but I don't think it was acted on. Presumably there's not much benefit and there is a chance of confusion.

<b>Question 3</b>

If the compiler knows where the reference is needed, why can't it keep the value alive that long? Like a reference counted or garbage collected value, but at compile time?

I guess that's just way too complicated or confusing.

<b>Comparison to other languages</b>

This fixes a big problem in languages that habitually have bare pointers (see C, and half of C++)

If you don't care about efficiency, of course, you can just use reference counted references or garbage collected references everywhere. This can occasionally be confusing (if some reference keeps a value alive, but the value isn't really meaningful any more). But basically works. (See the other half of C++, and most other languages.)

<b>Footnote 1</b>

Something I was confused by for a time, is that in rust you can only *copy* values explicitly with .clone() (like in C, you can only memcpy a struct if you know it's safe to do, or in C++, it's implicit, but you need to have a copy constructor). But unlike C, where writing a=b just doesn't work for most types, in rust, you can assign *any* type with a=b, but it functions as a move: it copies the value from b into a with a straight memcpy, including any contained pointers or whatever. But b is then invalid.

It checks at compile time that you can't use b again, so in practice the first time you notice this is "wait, it looked like I assigned ok, but then I got other weird errors".

But there are other benefits, like being able to return a struct value from a function without special arrangements to avoid a temporary copy.

But it confused me about lifetimes, because the contents of an object can often live on after the object is dropped. When in fact, the compiler often arranges that when it *would* memcpy, it actually reuses the same part of memory, so references might still be valid. But that's an implementation detail, so when you "move" an object, that's the end of its lifetime.

Date: 2017-01-28 05:13 pm (UTC)
ewx: (Default)
From: [personal profile] ewx

In fact, if there's only *one* parameter and the function returns a reference, the return is assumed to be a reference to the input parameter (or to part of it) and you don't need to specify the lifetimes, it all just happens.

Pedantry: I think you mean if there's only one input lifetime - there could be many parameters but only one them a reference.

(I'm working through the book, partly inspired by your own efforts though I've been meaning to learn this language for a while now.)

Date: 2017-02-10 05:26 pm (UTC)
ewx: (Default)
From: [personal profile] ewx
Well, I got distracted by Go, for which a practical short-term need has arisen. But I'll be coming back to Rust nevertheless.