Jan. 29th, 2017

jack: (Default)
This isn't solely about rust, but it made me think about something I wasn't really aware of. There's several common uses for pointers. The four uses themselves are nothing particular, but I'm interested in thoughts about the speculation about #3 and #4.

1. Heap allocation.

If you allocate a value on the heap, not the stack, you need to refer to it by a pointer. And if you're using a language other than C, automatically de-allocate it after the stack-allocated pointer goes out of scope (either immediately, using a smart pointer in C++, or eventually, in a garbage collected language).

If that's *all* you want to do, you can hide the issue from the programmer completely if you want to, as with languages that expect heap-allocation by default, and you're just supposed to know which copies produce independent duplicates of the same data and which copies refer to the same common data.

In rust, this is a box (?)

2. Pass-by-reference.

If you pass a value to a function and either (a) want the function to edit that value or (b) the value is too large to copy the whole thing efficiently, you want to pass the value by reference. That could be done either with a keyword which specifies that's what happens under the hood, or explicitly by taking a pointer and passing that pointer.

In rust, you pass a ref (equivalent to a C++ reference or pointer), but there are various compile time checks to make sure the memory accesses are safe.

3. I need access to this struct from various different parts of my program.

Eg. a logging class. Eg. a network interface class. Each class which may need to access functionality of those classes needs a way to refer to them. There's a few ways of doing this, which are good enough, although not completely satisfactory.

You can make all those be static. But then there's no easy way to replace them in testing, and there's problems with lifetimes around the beginning and end of your program. You have to be careful to initialise them in the right order, or just assume you don't use them around the time they may be invalid (but that may throw up lots of errors from lint or rust compiler).

You can pass them in as arguments to every function. But that's clunky, and involves a lot of repetition[1]. However, see the weird suggestion at the end.

Or you can just make sure each class has a pointer to the necessary classes (or maybe to a top-level class which itself has pointers or members with the relevant classes), initialise it at class construction. However, this has *some* of the problems of the above two possibilities, it's less easy to replace the functions for testing, and it's somewhat redundant. This one is what's a little weird in rust, I think you have to use objects which are declared const, but actually have run-time-checked non-const objects inside ("interior mutability"). Again, see the weird suggestion at the end.

4. I have some class which contains many sibling objects which need to know about each other.

This *might* be a data structure, if you're implementing a vector, or a doubly-linked list, or whatever for a standard library. Probably not, those are usually implemented with old-school unchecked pointers like in C, and you just make sure you do it right. But it would be *nice* if you could have efficiency *and* checking.

More commonly, there's something like "a parent class with several different subsystems which each need to call functions in different ones of them". Or a computer game where the parent object is a map of tiles, where each tile contains a different thing ("wall" or "enemy" etc), and the types want to do different things depending what's adjacent to them.

In this case, my philosophy has slowly become, as much as possible, have each class return...something, and have the parent class do any glue logic. Which makes the coupling much less tight, ie. it's easier to change one type without it having baked in knowledge of all parent types and sibling types. But doesn't work if the connections are too complicated. And even if it works, gives up some of the flexibility of having different functions for different types of child type, because a lot of functionality has to be in the parent (where "do a different thing" may be "switch statement" not "child pointer derived from base/interface type and function dynamically dispatched". Again, notice, this is functionally very very similar, the question is about what's easy to read and write without making mistakes.

Again, see weird suggestion below.

A weird suggestion for new syntax

This is the bit that popped into my head, I don't know if it makes sense.

We have a system for encapsulating children from parents. The child exposes an interface, and the parent uses the interface and doesn't worry about the implementation. But here we have children who *do* need to know about parents. One option is to throw away the encapsulation entirely, and put things in a global scope.

But how about something inbetween?

Say there is a special way of declaring type A to be a parent (or more likely, an interface/base type which exposes only the functions needed, and an actual class which derives from/implements that), and B1, B2, B3 etc to be children types, types which are declared and instantiated from A.

Suppose our interface, A, exposes a logging member function or class, and two members of types B1 and B2 (because those are expected to be needed by most of the children).

And then, you can only declare or instantiate those children B1, B2, B3 etc in A or a member function of A (that is, where there is a this or self value of type compatible with A). And whenever you call a member function of one of those children, just like that child is passed along in a secret function parameter specified with a "this" or "self" value, there is a similar construct (syntax pending) to refer to members of A.

So, like, "b.foo(x,y)" is syntactic sugar for "foo(b,x,y)" where b becomes "this" or "self", make "a.b.foo(x,y)" syntactic sugar for "foo(a,b,x,y" where b becomes "self" and a becomes "parent" or "a::" or whatever.

Basically, ideally you'd ALWAYS have encapsulation. But sometimes, you actually do have a function you just want to be able to call from... most of your program. Without hassle. You know what you mean. But you can't easily specify it. So it sometimes ends up global. But it shouldn't be *completely* global. It should be accessible in any function called from a top-level "app" class or something, or any function of a member of that, or a member from that, if they opt in.

[1] Repetition

Everyone knows why repetition is bad, right? At best, you put the unavoidably-repeated bits in a clear format so you can see at a glance they're what you expect and have no hidden subtleties. But even above the arguments against, even if people are happy to copy-and-paste code, writing out extra things in function signatures drives people to find any other solution, even crappy ones.