Further parent scope iteration
Feb. 15th, 2017 10:59 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Thanks to everyone who commented on the previous post, or posted earlier articles on a similar idea. I stole some of the terminology from one of Gerald-Duck's posts Simon pointed me to. And have tried to iterate my ideas one step closer to something specific.
Further examples of use cases
There are several related cases here, many adapted from Simon's description of PuTTY code.
One is, in several different parts of the program are a class which "owns" an instance of a socket class. Many of the functions in the socket also need to refer to the owning class. There are two main ways to do that. One way is every call to a socket function passes a pointer to the parent. But that clutters up the interface. Or the socket stores a pointer to the parent initialised on construction. But there is no really appropriate smart pointer, because both classes have pointers to each other.
A socket must have an owner. And classes which deal with sockets will usually have exactly one that they own, but will also often have none (and later open one), or more than one.
And because you "know" the pointer to the parent will never be invalid as long as the socket is owned by the parent, because you never intend to pass that pointer out of that class, but there is no decent language construction for "this pointer is a member of this class, and I will never copy it, honest" which then allows the child to have a provably-safe pointer to the parent. This is moot in C if you don't have smart pointers anyway, but it would still be useful to exactly identify the use case so a common code construction could be used, so programmers can see at a glance the intended functionality. It would be useful to resolve in C++. And there are further problems in rust, where using non-provably-safe pointers is deliberately discouraged, and there's a greater expectation that a class can be moved in memory (and so shouldn't contain pointers to parts of itself).
The same problem is described too different ways. One is, "a common pattern is allocating an owned class as a member of another class, the owned class has an equal or shorter lifetime than the owner, and a pointer back to it which is known to always be valid with no pointer loops", or a special sort of two-way pointer, where one is an owning pointer and the other is a provably-valid non-owning pointer. Another is "classes often want to refer to the class that owns them, or the context they were called from, and there is no consistent/standard way of doing that."
Proposal
Using C++ terminology, in addition to deriving from a class, a class can be declared "within" another class, often an abstract base class aka interface aka trait of the actual parent(s).
The non-static member functions of Socket, in addition to a hidden pointer parameter identifying the instance of socket which is accessed by "this->", has a second hidden parameter identifying the instance of Plug from which is was called, accessed by "parent->please_call_me_from_socket(foo)" (or parent<Plug>->please_call_me_from_socket(foo) or something to disambiguate if there are multiple within declarations. Syntax pending).
Where does that pointer come from? If it's called from a member function of a class which is itself within Plug, then it gets that value. That's not so useful for plug, but is useful for classes which you want to be accessible almost everywhere in your program, such as a logging class.
In that case, you may want a different syntax, say a "within" block which says all classes in the block are within an "app" class, and then naturally all pass around a pointer to the top-level app class without any visual clutter. And it only matters when you want to use it, and when you admit the logger can't "just be global".
For Socket, we require than member functions are Socket are only called from member functions of Plug (which is what we expected in the first place, but hadn't previously had a way of guaranteeing). And then the "parent" pointer comes from the "this" pointer of the calling function.
There is probably also some niche syntax for specifying the parent pointer explicitly, if the calling code has a pointer to a class derived from Plug, but isn't a member function, or wants to use a different pointer to this. The section on pointers may cover this.
Pointers and Callbacks
If you do want to pass a pointer to the socket to another part of the program, but maintain its connection to the plug-derived owning class, you need a pointer-equivalent that contains a pointer to both. Especially in the common case where you want to have an OS callback to your socket class.
This is not a new requirement, even without this syntax you would need to pass around a pair of pointers, or more likely, store a pointer to the parent within plug. This syntax would be basically codifying one or the other implementation.
That can be done, but does add a cost if other code was written assuming a single pointer sized identifier for the callback. Also see the "alternatives" section.
Alternatives: Scope or Class
I based this partially off a similar idea that covered a lot of the same ground, but that required a variable with a specific name in the calling scope, instead of requiring that to be in a class with a specific interface. I have definitely written code which has functioned like that, several nested functions all of which pass the same parameter to each other, where just being able to say "you know which one I mean" would have been very useful. But I've also often then made a mistake where there was a before and after version of the variable and I passed the wrong one.
I was worried by way it seemed to have functionality which wasn't clearly visible (aka "magic" :)). I feel a lot safer with a specific, declared interface, where it's hard to forget and accidentally name a different variable something that gets shared in place of the one that should exist.
Alternatives: Private member pointer only or any variable?
The main use case envisages the owning class having a private member pointer to a dynamically allocated instance of the owned class. I don't know if that should be a requirement or not. ie. is there any benefit to preventing you passing the pointer out of the class?
I would argue no, there are many interesting use-cases for passing a pointer to the owned class, eg. an element-of-a-collection, which is added to a different collection instead (or as well!) as long as it has *some* owner of this type.
However, that case would require the expanded pointer syntax in order to write natural constructions like:
Because here we DO want to call functions of the element from another class, not a member function of the collection. The "get an element" operator would need to return a magic pair-pointer containing a pointer to the collection as well as the element.
Alternatives: pass-parent-pointer-as-secret-parameter or save-parent-pointer-on-construction
I am not sure which is best, but I am leaning towards the secret parameter, which I used when describing above. For several reasons, because I think it's more conceptually correct, I think saving a pointer in the owned class isn't that useful mechanically, but makes the code a lot more readable without this syntax. And because it allows the element-of-collection use case. And because it means the parent isn't required to be heap-allocated and never moves, which would be hard to enforce; no saved pointer to the parent means the parent can safely be stack-allocated and undergo move semantics in C++ or rust (ie. be memmove'd and all the internal state stay valid, which a pointer-to-parent or pointer-to-auncle/pointer-to-nibling prevents).
The one notable downside is needing to pass around double pointers to callbacks or to collection elements.
Next steps
Do people understand that even if they don't agree, or is it still too much in my head?
Does the use case seem valid, or is it not something which would actually benefit?
Is there any way of hacking up an example with macros in C/C++ or Rust, to see an example without adding syntax extensions to the compiler yet.
Further examples of use cases
There are several related cases here, many adapted from Simon's description of PuTTY code.
One is, in several different parts of the program are a class which "owns" an instance of a socket class. Many of the functions in the socket also need to refer to the owning class. There are two main ways to do that. One way is every call to a socket function passes a pointer to the parent. But that clutters up the interface. Or the socket stores a pointer to the parent initialised on construction. But there is no really appropriate smart pointer, because both classes have pointers to each other.
A socket must have an owner. And classes which deal with sockets will usually have exactly one that they own, but will also often have none (and later open one), or more than one.
And because you "know" the pointer to the parent will never be invalid as long as the socket is owned by the parent, because you never intend to pass that pointer out of that class, but there is no decent language construction for "this pointer is a member of this class, and I will never copy it, honest" which then allows the child to have a provably-safe pointer to the parent. This is moot in C if you don't have smart pointers anyway, but it would still be useful to exactly identify the use case so a common code construction could be used, so programmers can see at a glance the intended functionality. It would be useful to resolve in C++. And there are further problems in rust, where using non-provably-safe pointers is deliberately discouraged, and there's a greater expectation that a class can be moved in memory (and so shouldn't contain pointers to parts of itself).
The same problem is described too different ways. One is, "a common pattern is allocating an owned class as a member of another class, the owned class has an equal or shorter lifetime than the owner, and a pointer back to it which is known to always be valid with no pointer loops", or a special sort of two-way pointer, where one is an owning pointer and the other is a provably-valid non-owning pointer. Another is "classes often want to refer to the class that owns them, or the context they were called from, and there is no consistent/standard way of doing that."
Proposal
Using C++ terminology, in addition to deriving from a class, a class can be declared "within" another class, often an abstract base class aka interface aka trait of the actual parent(s).
class Plug
{
virtual void please_call_me_from_socket(int arg1)=0;
}
class Socket : within Plug
{
// Please instantiate me only from classes inheriting from Plug
public:
void do_something();
private:
int foo;
};
The non-static member functions of Socket, in addition to a hidden pointer parameter identifying the instance of socket which is accessed by "this->", has a second hidden parameter identifying the instance of Plug from which is was called, accessed by "parent->please_call_me_from_socket(foo)" (or parent<Plug>->please_call_me_from_socket(foo) or something to disambiguate if there are multiple within declarations. Syntax pending).
Where does that pointer come from? If it's called from a member function of a class which is itself within Plug, then it gets that value. That's not so useful for plug, but is useful for classes which you want to be accessible almost everywhere in your program, such as a logging class.
In that case, you may want a different syntax, say a "within" block which says all classes in the block are within an "app" class, and then naturally all pass around a pointer to the top-level app class without any visual clutter. And it only matters when you want to use it, and when you admit the logger can't "just be global".
For Socket, we require than member functions are Socket are only called from member functions of Plug (which is what we expected in the first place, but hadn't previously had a way of guaranteeing). And then the "parent" pointer comes from the "this" pointer of the calling function.
There is probably also some niche syntax for specifying the parent pointer explicitly, if the calling code has a pointer to a class derived from Plug, but isn't a member function, or wants to use a different pointer to this. The section on pointers may cover this.
Pointers and Callbacks
If you do want to pass a pointer to the socket to another part of the program, but maintain its connection to the plug-derived owning class, you need a pointer-equivalent that contains a pointer to both. Especially in the common case where you want to have an OS callback to your socket class.
This is not a new requirement, even without this syntax you would need to pass around a pair of pointers, or more likely, store a pointer to the parent within plug. This syntax would be basically codifying one or the other implementation.
That can be done, but does add a cost if other code was written assuming a single pointer sized identifier for the callback. Also see the "alternatives" section.
Alternatives: Scope or Class
I based this partially off a similar idea that covered a lot of the same ground, but that required a variable with a specific name in the calling scope, instead of requiring that to be in a class with a specific interface. I have definitely written code which has functioned like that, several nested functions all of which pass the same parameter to each other, where just being able to say "you know which one I mean" would have been very useful. But I've also often then made a mistake where there was a before and after version of the variable and I passed the wrong one.
I was worried by way it seemed to have functionality which wasn't clearly visible (aka "magic" :)). I feel a lot safer with a specific, declared interface, where it's hard to forget and accidentally name a different variable something that gets shared in place of the one that should exist.
Alternatives: Private member pointer only or any variable?
The main use case envisages the owning class having a private member pointer to a dynamically allocated instance of the owned class. I don't know if that should be a requirement or not. ie. is there any benefit to preventing you passing the pointer out of the class?
I would argue no, there are many interesting use-cases for passing a pointer to the owned class, eg. an element-of-a-collection, which is added to a different collection instead (or as well!) as long as it has *some* owner of this type.
However, that case would require the expanded pointer syntax in order to write natural constructions like:
collection[0].remove_from_collection()
Because here we DO want to call functions of the element from another class, not a member function of the collection. The "get an element" operator would need to return a magic pair-pointer containing a pointer to the collection as well as the element.
Alternatives: pass-parent-pointer-as-secret-parameter or save-parent-pointer-on-construction
I am not sure which is best, but I am leaning towards the secret parameter, which I used when describing above. For several reasons, because I think it's more conceptually correct, I think saving a pointer in the owned class isn't that useful mechanically, but makes the code a lot more readable without this syntax. And because it allows the element-of-collection use case. And because it means the parent isn't required to be heap-allocated and never moves, which would be hard to enforce; no saved pointer to the parent means the parent can safely be stack-allocated and undergo move semantics in C++ or rust (ie. be memmove'd and all the internal state stay valid, which a pointer-to-parent or pointer-to-auncle/pointer-to-nibling prevents).
The one notable downside is needing to pass around double pointers to callbacks or to collection elements.
Next steps
Do people understand that even if they don't agree, or is it still too much in my head?
Does the use case seem valid, or is it not something which would actually benefit?
Is there any way of hacking up an example with macros in C/C++ or Rust, to see an example without adding syntax extensions to the compiler yet.
no subject
Date: 2017-02-16 11:27 am (UTC)Hmmm. But precisely the reason why I needed back-and-forth pointers between Socket and Plug in the first place is that sometimes you do get into this system by calling a member function on the Socket.
In PuTTY, one of the side effects of creating a Socket is that it inserts itself into whatever machinery in the main event loop causes the socket to be monitored for incoming data. So the event loop (in whatever form it takes for a given app and platform, which comes to more forms than you might expect!) wants to loop over an undifferentiated list of Sockets doing things like inserting their fds into Unix select(2) calls, and when data comes in, it will call socket->got_incoming_data(some_buffer) (spelling/syntax adjusted for expository purposes) for the socket in question, and that in turn must be able to find the socket's owning Plug without having been invoked from the Plug.
So that's why, in my situation, the Socket and the Plug each has to hold a pointer to the other, with the Plug's pointer allowed to be NULL in cases where no socket is currently active, but with the invariants that if a Plug does have a non-null Socket pointer then that Socket must contain a reciprocal link back to the same Plug, and conversely, any Socket must always contain a valid pointer to a Plug which owns that same Socket.
no subject
Date: 2017-02-16 09:57 pm (UTC)On the other hand, if that's a common use case, then maybe this sort of construction doesn't provide much benefit and we should just stick to storing pointers to the parent which are set on construction, and that are never free'd.
That's problematic for rust because of the ownership issues, but not for C. In fact, does *any* of this matter for code written in C?
no subject
Date: 2017-02-17 09:15 am (UTC)no subject
Date: 2017-02-19 10:06 pm (UTC)no subject
Date: 2017-07-04 09:03 am (UTC)