jack

Again, I appeal to my active audience to correct me :)

Last time we had some code which, given an array of square sizes, calculated the corresponding areas. It was preceded by some code to set up the sizes, and followed by some code to print out the result, but it looked something like:

for (int i=0;i<4;++i) {
    squares_area[i] = squares_size[i] * squares_size[i],
}

However, what if we had rectangles instead of squares? Would we have two separate arrays, one for widths and one for heights? We clearly need to store that much data. However, storing them in two separate arrays leads to the risk of them getting out of sync (say, switching the order of two elements, but forgetting to do it in one of the arrays). What it is possible to do instead is invent a new data type, rect[1], which contains two numbers. It would look something like this:

struct rect {
  int width;
  int height;
};

int main() {
   rect rect_array[4] = { {4,5}, {6,2}, {3,9}, {5,5} };
   int cumulative_total_area = 0;
   for (int i=0;i<4;++i) {
       cumulative_total_area += rect_array[i].width * rect_array[i].height;
   }
   // Pretend there's some code here to print the result
}

I decided to calculate the total area this time, because it was slightly more illustrative. What we have is a rect data type which specifies how a width and height should be stored in memory (basically "one after the other" it's not a very complicated one :)) And then we have four of them in memory, and we use "rect_array[3]" to mean "the third one" and ".width" to mean "the width number of that one".

Notice that in C++, that's all there is in memory, just a list of widths and heights, probably in adjacent memory locations. In other languages, the eventual program would store some extra data like the name "rect" and so on, which can be very useful, but potentially has a small performance overhead.

(Even in a language like C++, if you want to debug, you can ALSO store that information, but that's only available to you snooping in to see if everything's working correctly; the language has no way to express it[1].)

In fact, at this point, it becomes sensible to recognise that "multiply width by height" is something that we may _often_ do to a rectangle, and it makes sense to make a function which does that. We could make a function which takes a single rect as an argument and returns the area, like:

int area(rect r) {
   return r.width * r.height
}

But it turns out that there's a special syntax which represents this more compactly:

struct rect {
  int width;
  int height;
  int area() {
     return width * height;
  }
}

That is, area is a function which always applies to a specific rect (it could take other arguments as well) and returns an integer. Because it's declared _within_ rect, "width" and "height" are understood to refer to the specific rect it's talking about. It's called with a syntax like:

for (int i=0;i<4;++i) {
    cumulative_total_area += rect_array[i].area();
}

Now, this has lots of convenience. For instance, if we have lots of functions that calculate an area, we don't have to carefully give them different names: area is the only one within rect, and that's enough to specify it, even if some OTHER struct has a function with the same name. The same applies to width and height.

It's also important that so far, all of the member variables of the struct we have been able to write from the outside. And everything so far is included in C. However, with C++ and similar languages came the idea that some variables would only be visible from functions inside the struct[2]. The idea being that you might represent a complicated calculation, but the other parts of the program only need to see the inputs and outputs.

This sounds obvious, but it is really import, because the greater the chance of understand one part of the program, without finding a crucial bit of it is actually changed when you're not expecting it by a completely different part of the program, makes it possible to understand LARGE programs. Obviously, good programmers had been thinking like this for ages, and continue to do so, but the reason why this syntax exists is to make it simple and clear to express the idea that "this bit is all internals, you only need to worry about it if it's broken, you just need to understand that when you say "foo", I do "blah" :)

Also notice, that so far, this is purely syntax. If we compiled this program and ran it, it would be represented in memory in a very similar way if we didn't use any structs at all. It's incredibly valuable because when you're writing the program, it lets you encapsulate different bits in different structs.

Next time, we'll learn about when objects actually matter as more than vague bags for doing things, although what we've done so far is incredibly valuable itself.

[1] In fact, it might be nice to have a language where you can specify to store the metadata or not, like how C# lets you declare your integers checked, or not checked, for overflow (?) You can normally only make that meta-data, or avoid it, by jumping through a lot of hoops the language wasn't really meant for (?)

[2] C++ also uses classes instead of structs, but the do the same thing. It's normal -- but not necessary -- to use a class for something complicated, and struct for just storing numbers and things but not _doing_ anything. The name for the concept of "things like that" is "object"

A bluffer's guide to basic data structures: "structs"

A bluffer's guide to basic data structures: "structs"

no subject

no subject

Active Recent Entries