Jan. 25th, 2017

jack: (Default)
My goal for January was to learn some rust, and if possible contribute to the rust compiler/library source on github.

Rust is a language aimed at lowish level code with the efficiency of C, but with the safety of a garbage collector and type checker. Someone (ciphergoth? fanf?) originally pointed it out to me, and obviously I'm really interested in that intersection, as my experience has mostly been in lowish level stuff, but also in avoiding all forms of boilerplate and overhead.

For a while, an informal motto was "speed, safety, convenience, pick three", which is presumably won't live up to, but shows how it's being aimed.

It's not ready to replace C or C++, it's still maturing, but has matured a fair bit. And is almost the only language anywhere where using it for things C is used for now is even conceivable.

I don't know if my interest will go anywhere, but I feel like I learned useful things from just trying. Understanding the trade-offs made in designing a language, and the types of code patterns it invites similarly to C++, and ones it recommends for and against, and thinking about what the code I write is doing in practice, seem to have made me understand programming a little bit better.

So far

I read some of the introductory books and articles. I installed the compiler and package manager (on an Ubuntu VM) and made sure I could write a "Hello world" program.

I got the source code for the compiler and libraries, tested I could build it, and looked at the open bugs. I was very pleased that there was an existing effort to tag some bugs as "easy" for new contributors. I didn't try to edit any of the actual compiler code yet, but I did submit a small change to the documentation.

And that there was a bot (rust high-five) welcoming new contributors and assigning reviewers so small patches actually get accepted or rejected, not just languish. And a bot doing continuous integreation (rust bors, with a non-rust-specific development known as homu), specifically testing patches *before* being pulled to master. So changes actually made it into nightly release almost immediately, and three months later into a regular release.

I was also pleased that the code of conduct read like it was written by someone in this century.


I've read something about some of the concepts in rust people find weird, and may try to write something about my understanding, to see how much I've grokked, and get feedback from other people who've played with rust.

I've mentioned in passing several small design choices that I enjoyed. Eg. the error handling, usually returning an Option type, which is either a success with a return value, or an error with an error type or string. Eg. putting type annotations on functions arguments, but relying on automatic variable types within function bodies. I won't review all of these, but in general, they felt good when I saw them. If I actually compare them to what I'm used to in other languages, I'll see if they still feel good.
jack: (Default)
The formatting is probably going to be screwed up here, because I'm going to use a lot of <. This is a mix of stuff I'm trying to get straight in my mind, so I hope it's somewhat informative, but please point out where I've been unclear, confused or incorrect.

I am going to talk about lifetimes specifically, and save "only one mutable reference at once" aspect of the borrow checker for the following post.

In C or C++, it's possible to take a pointer or reference to a variable, and use the pointer or reference after the value is no longer valid. If it happens within a single function, it's often possible for the compiler (or lint tool?) to warn you. Eg. returning a pointer or reference to a value in a temporary variable. If you have a pointer in a different part of the program, it's easy to miss. Ideally you write code so it doesn't happen, but it's good if it *definitely* can't happen.

Rust makes the equivalent of those compiler warnings a part of the language. Each value has an associated lifetime. That is typically the scope it was first declared in, but could be shorter (eg. if it's artificially dropped) or longer (if it's allocated on the heap). That is basically "how long it's ok to hold a pointer/reference to this value (or part of this value)"[1].

That's all much the same within one function, but rust applies the same guarantees across the whole program. In order to do so, if you have a reference of any sort, it needs to carry along a lifetime parameter. These are usually implicit to avoid boilerplate, which means you can dig yourself in surprisingly far before suddenly discovering you have NO IDEA how this works :)

A simple-ish example might be a function which take a string, and returns a substring. In C++, you would have to choose between returning a new string that copies that substring (with a small overhead), or returning a slice of some sort (a char*, or a special slice type) that references the original memory -- but becomes invalid if that memory goes out of scope and is deallocated. In rust, you can specify that the returned value has the same lifetime as the parameter supplied, and then the normal checks for the calling function make sure that the slice/reference isn't used after the original value is deallocated (or changed).

In fact, if there's only *one* parameter and the function returns a reference, the return is assumed to be a reference to the input parameter (or to part of it) and you don't need to specify the lifetimes, it all just happens. Except slightly more safely than in C++ where you would not usually write a function like that because it's not easy to see if it's used safely.

If there's two input parameters, you need to specify which the return value depends on. In principle you can specify the return value might depend on either, or on both, but I haven't tried anything like that.

That's about as far as I've got. There's more stuff I've thought about, but not certainly enough to talk about it.


The actual format is a special case of a template function. Lifetimes are named like identifiers are, but with a ' at the start. Conventionally 'a, 'b, 'c etc.

The function name is followed by <'a> or <'a, 'b, T> with as many lifetime and/or type parameters as needed. Each input reference can then be annotated with a type parameter after the &. You can use the same lifetime parameter for multiple input references and the function will just use the smallest lifetime (the lifetimes of the parameters supplied don't actually have to be the same).

Then the return value of the function specifies the appropriate lifetime parameter.

fn process_str<'a>(&'a in_str: String) -> &'a String

<b>Question 1</b>

That seems like a lot of confusing boilerplate. Since it seems like lifetimes ONLY come from template parameters, why do you need to specify them in the template parameters list? Why can't that just be omitted?

There's a stack overflow question, but the answer just says "better to be explicit", it doesn't really give any examples of what would be confusing without that.

<b>Question 1a</b>

For that matter, specifying the input parameters at all seems complicated. Since lifetimes can (?) only come from input parameters, why can't they be specified that way?

fn foo(a:String, b:String, c:String, d:String) -> & lifetime(a) String

Return a String with lifetime equal to the lifetime of a. And inside the lifetime construct, you could allow min, max etc to combine lifetimes of variables if necessary.

<b>Question 2</b>

If you dereference a reference to a value correctly before the value does out of scope, it's still an error if the reference is still in scope, even if you don't use it. That sort of makes sense (there's no point having it), but it also doesn't do any harm. Why isn't the end of lifetime considered the last time a variable is *used*, not where it goes out of scope?

There is an rfc to reconsider this question, but I don't think it was acted on. Presumably there's not much benefit and there is a chance of confusion.

<b>Question 3</b>

If the compiler knows where the reference is needed, why can't it keep the value alive that long? Like a reference counted or garbage collected value, but at compile time?

I guess that's just way too complicated or confusing.

<b>Comparison to other languages</b>

This fixes a big problem in languages that habitually have bare pointers (see C, and half of C++)

If you don't care about efficiency, of course, you can just use reference counted references or garbage collected references everywhere. This can occasionally be confusing (if some reference keeps a value alive, but the value isn't really meaningful any more). But basically works. (See the other half of C++, and most other languages.)

<b>Footnote 1</b>

Something I was confused by for a time, is that in rust you can only *copy* values explicitly with .clone() (like in C, you can only memcpy a struct if you know it's safe to do, or in C++, it's implicit, but you need to have a copy constructor). But unlike C, where writing a=b just doesn't work for most types, in rust, you can assign *any* type with a=b, but it functions as a move: it copies the value from b into a with a straight memcpy, including any contained pointers or whatever. But b is then invalid.

It checks at compile time that you can't use b again, so in practice the first time you notice this is "wait, it looked like I assigned ok, but then I got other weird errors".

But there are other benefits, like being able to return a struct value from a function without special arrangements to avoid a temporary copy.

But it confused me about lifetimes, because the contents of an object can often live on after the object is dropped. When in fact, the compiler often arranges that when it *would* memcpy, it actually reuses the same part of memory, so references might still be valid. But that's an implementation detail, so when you "move" an object, that's the end of its lifetime.
jack: (Default)
I haven't looked at lifetimes relating to structs yet.

Come to think of it, if my previous understanding was right, the lifetimes of return values can only ever be a combination of lifetimes of input parameters, so there's only so many possibilities, and the compiler knows which ones are possible (because if you dropped the input parameters, it would know which of the potential return values it would still be valid to read)... why can't it just deduce the output lifetimes? Is it more complicated than that in most (or some) cases?

ETA: One more thing I forgot. Lifetimes don't *do* anything. They're like traits, or types which could have been automatically deduced: the compiler checks that the lifetimes you specify don't leave any variables being used-after-free. But they don't *change* the lifetime of a variable, just tell any code that uses the variable what the lifetime *is*.