Random Rust Thoughts
Jun. 20th, 2024 02:01 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
I wanted to learn some rust and started using it for a small hobby project, a tile-based computer game.
My natural inclination is very much towards what C++ and rust are trying to do, of creating a language which has good abstractions, but which compiles to something that runs efficiently and detects broken code at compile time, so I was interested to see what was the same and what was different.
These are my random impressions based on a short dive:
SPECIFIC OBSERVATIONS: ERROR PROPAGATION
The error propagation operator is excellent, all languages should work something like this. In a function "x = func()?;" means "If func() returns a value, continue assigning the value to x. If func() returns an error, return the error from this function." IMO this is how failing should work, because a large minority of the time the calling function wants to choose whether func() failing is an error or not, without four lines of boilerplate. IMO this gives the most commonly useful parts of error returns and the most commonly useful parts of exceptions.
Note this depends on returning success/failure through a standard Result type, which works well, but probably wants the fehler crate as well to reduce boilerplate in non-error return values. (See https://diziet.dreamwidth.org/13657.html).
SPECIFIC OBSERVATIONS: COPY AND MOVE
Rust and C++ both pay a lot of attention to when you can and can't copy objects, to avoid unneccessary copying while making it straightforward to write common code constructions. But they end up in a different place.
Rust assumes that ALL structs can be moved in memory with memcpy (unless you invoke some particular hoops). Structs implement a trait "Copy" if they can be copied with memcpy when assigned by "=". There's a separate trait "Clone" for structs which implement a ".clone()" function which copies the contents in a custom way (e.g. by making a copy of any owned objects). It took a while for me to adjust but now this makes more sense to me than C++'s previous approach. You give up a few particular things like structs containing a pointer to different part of themselves.
Notice that things like "a struct which contains a pointer to some structs on the heap" can't be copied because you'd have two structs with the same pointer that might both think they own what the pointer points to. But it *can* be moved, because the pointer is just as valid if the pointer is moved.
Then you end up with categories being roughly "things which can be moved at will", "things which can be copied at will" and "things which can be copied but non-trivially". And it turns out those are usually the cases that matter. C++ tries to support "things which can be seamlessly copied with = but something happens behind the scenes" but you actually don't need that as often as I thought.
One specific case where there's a real difference is when a variable is assigned or passed to a function where it's used up (eg assigned), and it matters whether the original variable is used again after that. In rust the compiler checks whether the variable is used after the assignment, so if it isn't, it can automatically be moved away (because ALL values can be moved). And if it is needed again, then the value is copied if it supports it. And if it can't be copied then you need to explicitly clone() it, and get a compiler error if you didn't.
The C++ equivalent are move semantics and std::move(), which are very well thought through, but don't handle some of those cases as well. If you want to move a C++ variable into a function, you need to use std::move() to show it isn't used again, but IIRC the compiler can't check and you get a crash if you get it wrong. And generally, because C++ tries to assume you *can* copy anything (either with a default copy or a custom copy), you can't easily see when an optimisation happens and when it doesn't. (I guess maybe you could do something similar to rust by defining move assignment and constructor but always using a clone function instead of defining a non-default assignment or copy-constructor. Would that work?)
SPECIFIC OBSERVATIONS: Referencing and dereferencing
There's a trick to remembering when you need * and & and when you don't, although I don't remember all the cases.
It's something like, when you're accessing a member, always use "." and the compiler dereferences as many times as necessary, whether you're accessing a member of a value, or of a pointer, or of a pointer to a pointer. The same for println and a couple of other situations.
Whereas in almost all other contexts, you explicitly need & to reference and * to dereference, even ones where you don't them in C or C++.
This makes a fair amount of sense. You don't have the equivalent of C++ transparent references, meaning it's always visible when you're dealing with a reference. This can be intrusive but you get used to having * around (e.g. in standard collections which store references, even if they happen to contain integers in this instance and could have used values directly).
SPECIFIC OBSERVATIONS: Borrow checker
The borrow checker ensures that you don't have two different variables where changing one changes the other because they both refer to the same underlying value. This is supposed to avoid coding mistakes, and also makes other language features easier to implement.
Worrying about unnecessary borrow checker errors used to be the Thing about rust. Now it's progressed to the point I didn't need to worry much about it, except when I was doing things I knew might be challenging ownership-wise.
I haven't plumbed enough to judge whether the borrow checker is necessary. There've been several examples where I thought "that's strange, but on reflection, I can see it's potentially useful" which is promising. There's a few cases which are awkward not to be able to do, like not being able to call "x.func(x.a, b)" because it's using x.a and x at the same time. Or the difficulty in having pointers back and forth between two objects. But you can usually work round them, usually using smart pointers is reasonable.
And by coincidence I ran into a use-after-free bug in C++, when a parameter was passed by reference and the original expired before the parameter was moved into a permanent variable, which made me more appreciate rust's efforts here.
SPECIFIC OBSERVATIONS: Cargo
Using Cargo as a standard build and packaging tool works great.
Using rust without a standard library works surprisingly well. It was designed to be able to do low-level stuff.
Trying to use rust without cargo is fighting uphill. That's not a problem for me, as long as you know to be cautious of it.
Having a universal repository of everyone's rust's packages on crates.io is a big adjustment from a C and C++ norm of "if you want someone else's library, you download the source or binary and put it in your build pipeline". But mostly a positive one.
With the caveat that you can search up a crate for whatever functionality, but there's no standard for "this is reasonably mature" other than "is widely used". Some are effectively standard, but not quite adopted into standard library yet, others are common but might still contain footguns.
And the significant caveat that, if you use a crate from crates.io which has dependencies, then (iirc even at build time, as well as run time) the build system will automatically download and run that code, and one day one of those will upgrade to a backdoored version or something. (See https://diziet.dreamwidth.org/1805.html) This is how lots of things work now, and it's increasingly hard to get away from, so I've not worried about it for my work, but if you're concerned with security it's a pain to work round.
SPECIFIC OBSERVATIONS: Culture
It seems like rust culture is 85% fluff and welcoming and surprisingly wholesome and non-toxic and pretty sensible decisions about language evolution, but 15% full of mysterious dysfunction at the heart of the project.
GENERAL IMPRESSIONS: POSITIVES
For a hobby project it was fairly quick to get going. For such a small project compiling was snappy (I hear rust may have slow compilations for serious projects but don't know if it does or not.)
It is really nice to use a language which compiles properly like C++ does, but has a lot of clean modern syntax. Little syntax tweaks like not allowing "if (a) b;" without braces means you can allow "if a {b}" without brackets, which saves space in the most common case. Nice innovations like slice notations, range notation, and "structs which implement THIS trait can be used in THIS language feature". Features like generics with less wall-of-angle-brackets than C++ requires.
Struct interfaces are streamlined compared to C++. There's less focus on inheritance and more focus on which classes fulfil which interfaces. Structs are defined by their member variables, with potentially multiple separate blocks for their functions.
Similarly an improved modern setup for organising and building files. No header files. A central config file for a crate specifies which source files are used, and they can import classes and functions from each other.
The idea of having macros which can extend syntax in quite broad ways, but have to represent a logical unit of code, not just "any block of text" is good. Macros are nice to USE, though I hear complicated to write.
GENERAL IMPRESSIONS: CAUTIONS
Most of the things in this section moved into the "specifics" section.
In some ways C++ is more expressive. I'm waiting to see if the restrictions in rust are ones that concern me long term or not.
My natural inclination is very much towards what C++ and rust are trying to do, of creating a language which has good abstractions, but which compiles to something that runs efficiently and detects broken code at compile time, so I was interested to see what was the same and what was different.
These are my random impressions based on a short dive:
SPECIFIC OBSERVATIONS: ERROR PROPAGATION
The error propagation operator is excellent, all languages should work something like this. In a function "x = func()?;" means "If func() returns a value, continue assigning the value to x. If func() returns an error, return the error from this function." IMO this is how failing should work, because a large minority of the time the calling function wants to choose whether func() failing is an error or not, without four lines of boilerplate. IMO this gives the most commonly useful parts of error returns and the most commonly useful parts of exceptions.
Note this depends on returning success/failure through a standard Result type, which works well, but probably wants the fehler crate as well to reduce boilerplate in non-error return values. (See https://diziet.dreamwidth.org/13657.html).
SPECIFIC OBSERVATIONS: COPY AND MOVE
Rust and C++ both pay a lot of attention to when you can and can't copy objects, to avoid unneccessary copying while making it straightforward to write common code constructions. But they end up in a different place.
Rust assumes that ALL structs can be moved in memory with memcpy (unless you invoke some particular hoops). Structs implement a trait "Copy" if they can be copied with memcpy when assigned by "=". There's a separate trait "Clone" for structs which implement a ".clone()" function which copies the contents in a custom way (e.g. by making a copy of any owned objects). It took a while for me to adjust but now this makes more sense to me than C++'s previous approach. You give up a few particular things like structs containing a pointer to different part of themselves.
Notice that things like "a struct which contains a pointer to some structs on the heap" can't be copied because you'd have two structs with the same pointer that might both think they own what the pointer points to. But it *can* be moved, because the pointer is just as valid if the pointer is moved.
Then you end up with categories being roughly "things which can be moved at will", "things which can be copied at will" and "things which can be copied but non-trivially". And it turns out those are usually the cases that matter. C++ tries to support "things which can be seamlessly copied with = but something happens behind the scenes" but you actually don't need that as often as I thought.
One specific case where there's a real difference is when a variable is assigned or passed to a function where it's used up (eg assigned), and it matters whether the original variable is used again after that. In rust the compiler checks whether the variable is used after the assignment, so if it isn't, it can automatically be moved away (because ALL values can be moved). And if it is needed again, then the value is copied if it supports it. And if it can't be copied then you need to explicitly clone() it, and get a compiler error if you didn't.
The C++ equivalent are move semantics and std::move(), which are very well thought through, but don't handle some of those cases as well. If you want to move a C++ variable into a function, you need to use std::move() to show it isn't used again, but IIRC the compiler can't check and you get a crash if you get it wrong. And generally, because C++ tries to assume you *can* copy anything (either with a default copy or a custom copy), you can't easily see when an optimisation happens and when it doesn't. (I guess maybe you could do something similar to rust by defining move assignment and constructor but always using a clone function instead of defining a non-default assignment or copy-constructor. Would that work?)
SPECIFIC OBSERVATIONS: Referencing and dereferencing
There's a trick to remembering when you need * and & and when you don't, although I don't remember all the cases.
It's something like, when you're accessing a member, always use "." and the compiler dereferences as many times as necessary, whether you're accessing a member of a value, or of a pointer, or of a pointer to a pointer. The same for println and a couple of other situations.
Whereas in almost all other contexts, you explicitly need & to reference and * to dereference, even ones where you don't them in C or C++.
This makes a fair amount of sense. You don't have the equivalent of C++ transparent references, meaning it's always visible when you're dealing with a reference. This can be intrusive but you get used to having * around (e.g. in standard collections which store references, even if they happen to contain integers in this instance and could have used values directly).
SPECIFIC OBSERVATIONS: Borrow checker
The borrow checker ensures that you don't have two different variables where changing one changes the other because they both refer to the same underlying value. This is supposed to avoid coding mistakes, and also makes other language features easier to implement.
Worrying about unnecessary borrow checker errors used to be the Thing about rust. Now it's progressed to the point I didn't need to worry much about it, except when I was doing things I knew might be challenging ownership-wise.
I haven't plumbed enough to judge whether the borrow checker is necessary. There've been several examples where I thought "that's strange, but on reflection, I can see it's potentially useful" which is promising. There's a few cases which are awkward not to be able to do, like not being able to call "x.func(x.a, b)" because it's using x.a and x at the same time. Or the difficulty in having pointers back and forth between two objects. But you can usually work round them, usually using smart pointers is reasonable.
And by coincidence I ran into a use-after-free bug in C++, when a parameter was passed by reference and the original expired before the parameter was moved into a permanent variable, which made me more appreciate rust's efforts here.
SPECIFIC OBSERVATIONS: Cargo
Using Cargo as a standard build and packaging tool works great.
Using rust without a standard library works surprisingly well. It was designed to be able to do low-level stuff.
Trying to use rust without cargo is fighting uphill. That's not a problem for me, as long as you know to be cautious of it.
Having a universal repository of everyone's rust's packages on crates.io is a big adjustment from a C and C++ norm of "if you want someone else's library, you download the source or binary and put it in your build pipeline". But mostly a positive one.
With the caveat that you can search up a crate for whatever functionality, but there's no standard for "this is reasonably mature" other than "is widely used". Some are effectively standard, but not quite adopted into standard library yet, others are common but might still contain footguns.
And the significant caveat that, if you use a crate from crates.io which has dependencies, then (iirc even at build time, as well as run time) the build system will automatically download and run that code, and one day one of those will upgrade to a backdoored version or something. (See https://diziet.dreamwidth.org/1805.html) This is how lots of things work now, and it's increasingly hard to get away from, so I've not worried about it for my work, but if you're concerned with security it's a pain to work round.
SPECIFIC OBSERVATIONS: Culture
It seems like rust culture is 85% fluff and welcoming and surprisingly wholesome and non-toxic and pretty sensible decisions about language evolution, but 15% full of mysterious dysfunction at the heart of the project.
GENERAL IMPRESSIONS: POSITIVES
For a hobby project it was fairly quick to get going. For such a small project compiling was snappy (I hear rust may have slow compilations for serious projects but don't know if it does or not.)
It is really nice to use a language which compiles properly like C++ does, but has a lot of clean modern syntax. Little syntax tweaks like not allowing "if (a) b;" without braces means you can allow "if a {b}" without brackets, which saves space in the most common case. Nice innovations like slice notations, range notation, and "structs which implement THIS trait can be used in THIS language feature". Features like generics with less wall-of-angle-brackets than C++ requires.
Struct interfaces are streamlined compared to C++. There's less focus on inheritance and more focus on which classes fulfil which interfaces. Structs are defined by their member variables, with potentially multiple separate blocks for their functions.
Similarly an improved modern setup for organising and building files. No header files. A central config file for a crate specifies which source files are used, and they can import classes and functions from each other.
The idea of having macros which can extend syntax in quite broad ways, but have to represent a logical unit of code, not just "any block of text" is good. Macros are nice to USE, though I hear complicated to write.
GENERAL IMPRESSIONS: CAUTIONS
Most of the things in this section moved into the "specifics" section.
In some ways C++ is more expressive. I'm waiting to see if the restrictions in rust are ones that concern me long term or not.
no subject
Date: 2024-06-20 01:02 pm (UTC)no subject
Date: 2024-06-20 02:36 pm (UTC)?
operator being nice and short: of course, if someone is coming from a language that has exceptions, the?
operator is one more character than they're used to – it's the analogue of "just not bothering to catch an exception" so that it propagates silently out of the function!(My favourite silly joke about Rust's error handling: "
try
not!Result<Do,DoNot>
. There is notry
.")I was initially suspicious of the
?
operator because I wasn't sure how I'd feel about having such a strong incentive to choose my function boundaries to correspond to the units of functionality that make sense to abandon half way through if something goes wrong. (You can do things differently if you need to, like making and immediately calling a closure so that the scope of?
is limited to that closure, but it would take a lot for you to think that was less ugly than the alternative :-) But in fact I haven't had trouble with that at all; it almost always seems entirely natural.Perhaps less obvious: Rust errors and C++ exceptions have very different performance characteristics. The C++ exception system is designed on the assumption that exceptions are rare – thrown only in case of some unusual disaster. So the stack unwinding is horribly complicated and expensive in performance, but on the other hand, not throwing an exception has very low cost. Whereas in Rust, if even a successful
Result
is returned through a nest of 10 function calls all using?
, every level must manually check it in case it's an error, so there's more of a cost to not failing – but the cost of failing is much lower, because you just check a variable you have right there already, and don't have to hand off to a huge terrifying piece of stack unwinding code which consults huge compiler-generated tables. In C++ it would be a serious performance mistake to casually use exceptions for normal control flow (as you might in Python, say, where aStopIteration
is thrown at the end of more or less any for-loop); in Rust it's probably fine.On move vs copy constructors: a recurring theme is that Rust checks things at compile time that C++ can only check at run time. The reason
std::move
is less useful is because the C++ compiler isn't statically tracking which variables currently have live values in, and which don't. So in Rust, after you move out of an object, the compiler knows that variable doesn't contain a thing at all any more, so that it's a compile error to try to use it. But in C++, if you move out of an object, it's still legal to try to use it, it just won't do anything useful – and the compiler can't change that.(Unless it will! Writing your own types in C++, you could always choose to write the move constructor/operator so that they leave the moved-from object in some predictable and useful state, like an empty list or a null pointer or zero or something, and then depend on that behaviour in client code. The STL types generally don't – they leave the object in a state where the only safe things to do are to assign something new into it or to destruct it. As far as I know the C++ standard doesn't even guarantee that moving out of
unique_ptr
leaves it containingnullptr
, although this will often be what happens in practice because it's hard to imagine a C++ library implementing it any other way – so it's easy to accidentally rely on that not-guaranteed thing...)Another example of compile- vs run-time checking is that C++'s analogue of Rust
Result<T,E>
isstd::expected<T,E>
. Both try to enforce that you don't accidentally 'squash' errors by just never remembering to check them (although in both cases there's a way to squash an error on purpose if you really meant to). The C++std::expected
object does it by having a check in its destructor so that if it had an error in it and you didn't extract or inspect the error (err, somehow) then you get a crash – but that means you still don't find out about the mistake until you have a test that exercises the failing code path. Whereas Rust will statically check at compile time (I think via#[must_use]
, unless I've misspelled that), so you find out without having to have exhaustive tests of your error handling. And that's a huge improvement, because error paths are notoriously hard to write tests for!no subject
Date: 2024-06-20 08:16 pm (UTC)Thank you for reading my assorted impressions!
"One character still more than exceptions"
Yes, that's true. It's ALMOST as concise as exceptions. Implicitly the trade off is that the extra explicitness is worth the extra character, which seems likely to me. But my perspective is starting from a background of "don't usually use exceptions, but want some of the benefits?" rather than the reverse. And I haven't actually USED either ? or exceptions very much so I don't know yet... :)
"Initially suspicious"
But you CAN always check the return value explicitly instead of using ?, just like any sort of return values, but ? avoids the visual overhead in a particularly common case.
I guess, if you're used to actually USING exceptions, and have meaningful try/catch blocks within the function, rather than only having "boilerplate to turn exception into binary return value" and "bubble everything up to the top of the library", then rust's equivalent doesn't do that. But I haven't been :)
I guess you could extend the concept further with a ?? operator that breaks out of a block rather than returning from the function, more directly like exceptions, but I don't know how much you'd gain.
I've also been thinking more in terms of what's clear for the programmer to read, not what's efficient at run time. In C code I usually see return values not checked quite thoroughly enough because the maintenance overhead of several lines of return for each fallible function looked too much... Somewhat similarly anywhere you need to turn an exception into a "if succeed/fail {". I think how efficient it is at run time is going to matter SOMETIMES (especially if you raise too many exceptions), but usually actually less of a deciding factor...
no subject
Date: 2024-06-21 08:25 am (UTC)map_err
can help with that too where a message along with the enum choice is helpful.)no subject
Date: 2024-06-21 07:14 pm (UTC)Yeah. I understand why the compiler can't GUARANTEE to track which variables can't be used, because it might depend on other control flow etc and would require doing all the things rust does with all the tradeoffs elsewhere in the language. But I think the compiler usually detects "use uninitialised value" in simple cases, I don't know why it can't detect straightforward uses of "use after move" the same way. Or maybe it does now and I just didn't realise.
Yeah. I just had a quick check, and apparently unique_ptr *does* specify it gets nulled, and the standard specifies that moved-from objects should be in *some* valid state (but might or might not be empty or anything). But I didn't know that and hope I don't need to know it, as you say, the general rule is not to assume anything. Which just feels so strange, like C++ tried to remove an awful lot of C footguns, and then introduced std::move which seems really easy to get wrong. I guess they really wanted moves and couldn't find any better alternative.
Yeah. It makes sense that rust does better with these sort of compile time checks because it made a big push to make lifetimes and things possible, and had the benefit of seeing C++'s experience. While C++ is encumbered by lots of backwards compatibility, and it's hard to retrofit guarantees like this. But it means it is a strength of rust.
Gosh, I hadn't realised C++ had a type which tried to do that. That is ingenious, although as you say, not as good as a compile time check. It does feel that this ought to usually be sufficient with a compile time not-guaranteed-perfect lint of some sort for "check function return value". I guess that's not as easy to check that you got the result and didn't just unwrap it to get the presumed value. But if you did just unwrap it, presumably you wanted to mask the error. I don't know.
no subject
Date: 2024-06-20 05:02 pm (UTC)OTOH, how many people who download non-trivial source code actually bother to check it thoroughly? If you grabbed your copy before the compromise then you're OK, otherwise your local library is infected.
no subject
Date: 2024-06-20 07:58 pm (UTC)Yes, I mostly agree about compiled binaries. It's useful to be ABLE to inspect the source, but for what I'm doing I usually don't have any reason to , or ability to.
But there's a spectrum of "automatic". Someone else compiling the binary is still quite close to the low end of the automatic scale if you manually download the binary, once. I think the thing about package managers is that they represent a trend where it's increasingly useful/inevitable to rely on quite a few libraries for fairly necessary things, which typically have further libraries as dependencies, and typically have some reason you NEED to update some of them and can't just leave them all as you originally downloaded them. And/or the package manager encourages and makes easier a workflow where they're frequently updated.
Also note the "code from package repository automatically downloaded and run as part of the BUILD process", which is another step of automatically trusting package repository code, even if you weren't realistically going to audit it before running it.