Jan. 28th, 2017

jack: (Default)
This is another post about politics. I'm sorry. Please let me know if there is a way for me to talk about these topics in a less distressing way.

Read more... )
jack: (Default)
Const values

Last time I talked about lifetimes. Now let me talk about the other part of references, ownership and borrow checking.

If you're dealing with const values, this is similar to other languages. By default, one place "owns" a value. Either declared on the stack, or on the heap (in a Box). Other places can be passed a const reference to that value. As described with lifetimes, rust checks at compile time that all of those references are finished with before the original goes out of scope. When the original goes out of scope, it's deallocated (from stack or heap).

Alternatively, it can be reference counted. In rust, you can use Rc<Type> instead of Box<Type> and it's similar, but instead of having a native reference to the value, you take a copy of the Rc, and the value is only freed from the heap when the last Rc pointing to it disappears.

One reason this is important is thread-safety. Rc isn't thread safe, and rust checks you don't transfer it to another thread for that reason. Arc changes reference count atomically so *is* thread safe, and can be sent to another thread. (It's a copy of the Arc that's sent, but one that refers to the same data.)

Const references can't usually be sent between threads unless the original had a lifetime of the whole program (static), because there's no universal way to be sure the thread is done with it, so it's always illegal for the original owner to go out of scope (?) But threads with finite lifetimes are hopefully coming in future (?)

Non-const values

A big deal in rust is making const (immutable) the default, and declaring non-const things (mut). I think that's a good way of thinking. But here it may get confusing.

You can have multiple references to an immutable value. But in order to be thread safe, you can only have one *mutable* reference. Including the original -- it's an error to access the original during the scope of a mutable reference. That's why it's called a "borrow" -- if you make a mutable reference to a value, you can only access the original again once the reference goes out of scope.

But a point that's less well agreed is how useful this is when you don't pass anything between threads.

One argument is that you might be able to have a pointer *to* a value that you then mutate, but if it's something like a vector, you can't have a pointer/reference to a value in it because that might have been invalidated. And even if you have an iterator which could in theory be safe (eg. the iterator contains an index, not just a pointer), you still need to check for the iterator being invalid when it's used, which reduces various optimisations.

Another argument, that I found more interesting, is that even if the value isn't invalidated in a memory-safety sense, if you change the value in two disparate parts of code (say, you loop through all X that are Y calling function Z, and function Z in turn calls function W which does something to some X, including the ones you're iterating through), it's easy for the logic you write to be incorrect, if you can't tell at a glance which values might be changed half way through your logic and which won't be.

I found that persuasive as a general principle. Though I'm not sure how practical it is to work with those constraints in practice, if they're generally helpful once you know how to work with them, or if they're an unnecessary impediment. Either way, I feel better for having thought about those issues.

Workaround, interior mutability

"Interior mutability" is feature of rust types (Cell and RefCell), which is a bit like "mutable" keyword in C++: it allows you to have a class instance which the compiler treats as constant, (eg. allowing optimisations like caching return values), but does something "under the hood" (eg. the class caches expensively calculated results, or logs requests made to it, or keeps a reference count).

There's a couple of differences. One is, as I understand it, you don't just write heedlessly to the mutable value, rather rust checks at run time that you only take one mutable reference to it at once. So if you screw up, it immediate panics, rather than working most of the time but with subtle bugs lurking.

But it's also the case that if you do want a shared class accessed by many parts of your program (a logging class say, is that a reasonable example?), rust encourages you to use interior mutability to replicate the default situation in C or C++, of having a class multiple different parts of your program have a pointer through which they can call (non-const) functions in it.

I have more thoughts on these different ways of using pointers maybe coming up.