jack: (Default)
The idea in my last post got a bit buried, so I thought it through some more. I'm still mulling it through, I'm not sure if it makes sense.

Imagine you have a computer game. What are your classes like? Often you have a top-level "program" class. And maybe a "current game" class and "current level" class. And a whole bunch of stuff for each object in the level (whether those are separate class types, or just structs with an enum for type, or whatever).

You often have some data or functionality which is specific to your program, but should be accessible in many parts of the program. Or specific to the current game, or current level. Eg. many different events may add a score. Everything might write to a log file. Commonly functions on objects want to look at other "nearby" objects.

Currently, you basically have a choice of two scopes of values which are available to a function. Global variables, and variables in the current class. What I'm suggesting is that two is the wrong number of scopes to have. If you have two, one or two more is likely to be useful.

Some data naturally lives in a level neither global, nor in in the current class, but in the "program" class. And is visible to all classes "in" the program class.

What does "in" mean? Possibly "declared with a special syntax referencing the program class". But it might be better to treat it like a namespace or module, that says "everything in here can only be used by this class (or compatible classes like mocks)", and all member functions get *two* hidden parameters, one for the program, and one for the "this" pointer. Or four, if you have "program", "current game", and "level".

It's easy to imagine your program and say, "but there'll only be one current game at once". But once you SAY that, you can imagine why there wouldn't be. And then any values associated with that need to not be just shoved in the program class or in global scope, but managed properly.

And you CAN provide them to the "children" classes by giving the child class a pointer to the correct parent . You definitely should decide what should be visible everywhere and what only needs to be some places. But I'm suggesting it would be *clearer* not to do that, but explicitly choose what should be shared.

ETA: Simon points me to a post of gerald-duck's I read ages go but seem to have partially re-invented: http://gerald-duck.livejournal.com/710339.html
jack: (Default)
This isn't solely about rust, but it made me think about something I wasn't really aware of. There's several common uses for pointers. The four uses themselves are nothing particular, but I'm interested in thoughts about the speculation about #3 and #4.

1. Heap allocation.

If you allocate a value on the heap, not the stack, you need to refer to it by a pointer. And if you're using a language other than C, automatically de-allocate it after the stack-allocated pointer goes out of scope (either immediately, using a smart pointer in C++, or eventually, in a garbage collected language).

If that's *all* you want to do, you can hide the issue from the programmer completely if you want to, as with languages that expect heap-allocation by default, and you're just supposed to know which copies produce independent duplicates of the same data and which copies refer to the same common data.

In rust, this is a box (?)

2. Pass-by-reference.

If you pass a value to a function and either (a) want the function to edit that value or (b) the value is too large to copy the whole thing efficiently, you want to pass the value by reference. That could be done either with a keyword which specifies that's what happens under the hood, or explicitly by taking a pointer and passing that pointer.

In rust, you pass a ref (equivalent to a C++ reference or pointer), but there are various compile time checks to make sure the memory accesses are safe.

3. I need access to this struct from various different parts of my program.

Eg. a logging class. Eg. a network interface class. Each class which may need to access functionality of those classes needs a way to refer to them. There's a few ways of doing this, which are good enough, although not completely satisfactory.

You can make all those be static. But then there's no easy way to replace them in testing, and there's problems with lifetimes around the beginning and end of your program. You have to be careful to initialise them in the right order, or just assume you don't use them around the time they may be invalid (but that may throw up lots of errors from lint or rust compiler).

You can pass them in as arguments to every function. But that's clunky, and involves a lot of repetition[1]. However, see the weird suggestion at the end.

Or you can just make sure each class has a pointer to the necessary classes (or maybe to a top-level class which itself has pointers or members with the relevant classes), initialise it at class construction. However, this has *some* of the problems of the above two possibilities, it's less easy to replace the functions for testing, and it's somewhat redundant. This one is what's a little weird in rust, I think you have to use objects which are declared const, but actually have run-time-checked non-const objects inside ("interior mutability"). Again, see the weird suggestion at the end.

4. I have some class which contains many sibling objects which need to know about each other.

This *might* be a data structure, if you're implementing a vector, or a doubly-linked list, or whatever for a standard library. Probably not, those are usually implemented with old-school unchecked pointers like in C, and you just make sure you do it right. But it would be *nice* if you could have efficiency *and* checking.

More commonly, there's something like "a parent class with several different subsystems which each need to call functions in different ones of them". Or a computer game where the parent object is a map of tiles, where each tile contains a different thing ("wall" or "enemy" etc), and the types want to do different things depending what's adjacent to them.

In this case, my philosophy has slowly become, as much as possible, have each class return...something, and have the parent class do any glue logic. Which makes the coupling much less tight, ie. it's easier to change one type without it having baked in knowledge of all parent types and sibling types. But doesn't work if the connections are too complicated. And even if it works, gives up some of the flexibility of having different functions for different types of child type, because a lot of functionality has to be in the parent (where "do a different thing" may be "switch statement" not "child pointer derived from base/interface type and function dynamically dispatched". Again, notice, this is functionally very very similar, the question is about what's easy to read and write without making mistakes.

Again, see weird suggestion below.

A weird suggestion for new syntax

This is the bit that popped into my head, I don't know if it makes sense.

We have a system for encapsulating children from parents. The child exposes an interface, and the parent uses the interface and doesn't worry about the implementation. But here we have children who *do* need to know about parents. One option is to throw away the encapsulation entirely, and put things in a global scope.

But how about something inbetween?

Say there is a special way of declaring type A to be a parent (or more likely, an interface/base type which exposes only the functions needed, and an actual class which derives from/implements that), and B1, B2, B3 etc to be children types, types which are declared and instantiated from A.

Suppose our interface, A, exposes a logging member function or class, and two members of types B1 and B2 (because those are expected to be needed by most of the children).

And then, you can only declare or instantiate those children B1, B2, B3 etc in A or a member function of A (that is, where there is a this or self value of type compatible with A). And whenever you call a member function of one of those children, just like that child is passed along in a secret function parameter specified with a "this" or "self" value, there is a similar construct (syntax pending) to refer to members of A.

So, like, "b.foo(x,y)" is syntactic sugar for "foo(b,x,y)" where b becomes "this" or "self", make "a.b.foo(x,y)" syntactic sugar for "foo(a,b,x,y" where b becomes "self" and a becomes "parent" or "a::" or whatever.

Basically, ideally you'd ALWAYS have encapsulation. But sometimes, you actually do have a function you just want to be able to call from... most of your program. Without hassle. You know what you mean. But you can't easily specify it. So it sometimes ends up global. But it shouldn't be *completely* global. It should be accessible in any function called from a top-level "app" class or something, or any function of a member of that, or a member from that, if they opt in.

[1] Repetition

Everyone knows why repetition is bad, right? At best, you put the unavoidably-repeated bits in a clear format so you can see at a glance they're what you expect and have no hidden subtleties. But even above the arguments against, even if people are happy to copy-and-paste code, writing out extra things in function signatures drives people to find any other solution, even crappy ones.
jack: (Default)
Const values

Last time I talked about lifetimes. Now let me talk about the other part of references, ownership and borrow checking.

If you're dealing with const values, this is similar to other languages. By default, one place "owns" a value. Either declared on the stack, or on the heap (in a Box). Other places can be passed a const reference to that value. As described with lifetimes, rust checks at compile time that all of those references are finished with before the original goes out of scope. When the original goes out of scope, it's deallocated (from stack or heap).

Alternatively, it can be reference counted. In rust, you can use Rc<Type> instead of Box<Type> and it's similar, but instead of having a native reference to the value, you take a copy of the Rc, and the value is only freed from the heap when the last Rc pointing to it disappears.

One reason this is important is thread-safety. Rc isn't thread safe, and rust checks you don't transfer it to another thread for that reason. Arc changes reference count atomically so *is* thread safe, and can be sent to another thread. (It's a copy of the Arc that's sent, but one that refers to the same data.)

Const references can't usually be sent between threads unless the original had a lifetime of the whole program (static), because there's no universal way to be sure the thread is done with it, so it's always illegal for the original owner to go out of scope (?) But threads with finite lifetimes are hopefully coming in future (?)

Non-const values

A big deal in rust is making const (immutable) the default, and declaring non-const things (mut). I think that's a good way of thinking. But here it may get confusing.

You can have multiple references to an immutable value. But in order to be thread safe, you can only have one *mutable* reference. Including the original -- it's an error to access the original during the scope of a mutable reference. That's why it's called a "borrow" -- if you make a mutable reference to a value, you can only access the original again once the reference goes out of scope.

But a point that's less well agreed is how useful this is when you don't pass anything between threads.

One argument is that you might be able to have a pointer *to* a value that you then mutate, but if it's something like a vector, you can't have a pointer/reference to a value in it because that might have been invalidated. And even if you have an iterator which could in theory be safe (eg. the iterator contains an index, not just a pointer), you still need to check for the iterator being invalid when it's used, which reduces various optimisations.

Another argument, that I found more interesting, is that even if the value isn't invalidated in a memory-safety sense, if you change the value in two disparate parts of code (say, you loop through all X that are Y calling function Z, and function Z in turn calls function W which does something to some X, including the ones you're iterating through), it's easy for the logic you write to be incorrect, if you can't tell at a glance which values might be changed half way through your logic and which won't be.

I found that persuasive as a general principle. Though I'm not sure how practical it is to work with those constraints in practice, if they're generally helpful once you know how to work with them, or if they're an unnecessary impediment. Either way, I feel better for having thought about those issues.

Workaround, interior mutability

"Interior mutability" is feature of rust types (Cell and RefCell), which is a bit like "mutable" keyword in C++: it allows you to have a class instance which the compiler treats as constant, (eg. allowing optimisations like caching return values), but does something "under the hood" (eg. the class caches expensively calculated results, or logs requests made to it, or keeps a reference count).

There's a couple of differences. One is, as I understand it, you don't just write heedlessly to the mutable value, rather rust checks at run time that you only take one mutable reference to it at once. So if you screw up, it immediate panics, rather than working most of the time but with subtle bugs lurking.

But it's also the case that if you do want a shared class accessed by many parts of your program (a logging class say, is that a reasonable example?), rust encourages you to use interior mutability to replicate the default situation in C or C++, of having a class multiple different parts of your program have a pointer through which they can call (non-const) functions in it.

I have more thoughts on these different ways of using pointers maybe coming up.
jack: (Default)
I haven't looked at lifetimes relating to structs yet.

Come to think of it, if my previous understanding was right, the lifetimes of return values can only ever be a combination of lifetimes of input parameters, so there's only so many possibilities, and the compiler knows which ones are possible (because if you dropped the input parameters, it would know which of the potential return values it would still be valid to read)... why can't it just deduce the output lifetimes? Is it more complicated than that in most (or some) cases?

ETA: One more thing I forgot. Lifetimes don't *do* anything. They're like traits, or types which could have been automatically deduced: the compiler checks that the lifetimes you specify don't leave any variables being used-after-free. But they don't *change* the lifetime of a variable, just tell any code that uses the variable what the lifetime *is*.
jack: (Default)
The formatting is probably going to be screwed up here, because I'm going to use a lot of <. This is a mix of stuff I'm trying to get straight in my mind, so I hope it's somewhat informative, but please point out where I've been unclear, confused or incorrect.

I am going to talk about lifetimes specifically, and save "only one mutable reference at once" aspect of the borrow checker for the following post.

In C or C++, it's possible to take a pointer or reference to a variable, and use the pointer or reference after the value is no longer valid. If it happens within a single function, it's often possible for the compiler (or lint tool?) to warn you. Eg. returning a pointer or reference to a value in a temporary variable. If you have a pointer in a different part of the program, it's easy to miss. Ideally you write code so it doesn't happen, but it's good if it *definitely* can't happen.

Rust makes the equivalent of those compiler warnings a part of the language. Each value has an associated lifetime. That is typically the scope it was first declared in, but could be shorter (eg. if it's artificially dropped) or longer (if it's allocated on the heap). That is basically "how long it's ok to hold a pointer/reference to this value (or part of this value)"[1].

That's all much the same within one function, but rust applies the same guarantees across the whole program. In order to do so, if you have a reference of any sort, it needs to carry along a lifetime parameter. These are usually implicit to avoid boilerplate, which means you can dig yourself in surprisingly far before suddenly discovering you have NO IDEA how this works :)

A simple-ish example might be a function which take a string, and returns a substring. In C++, you would have to choose between returning a new string that copies that substring (with a small overhead), or returning a slice of some sort (a char*, or a special slice type) that references the original memory -- but becomes invalid if that memory goes out of scope and is deallocated. In rust, you can specify that the returned value has the same lifetime as the parameter supplied, and then the normal checks for the calling function make sure that the slice/reference isn't used after the original value is deallocated (or changed).

In fact, if there's only *one* parameter and the function returns a reference, the return is assumed to be a reference to the input parameter (or to part of it) and you don't need to specify the lifetimes, it all just happens. Except slightly more safely than in C++ where you would not usually write a function like that because it's not easy to see if it's used safely.

If there's two input parameters, you need to specify which the return value depends on. In principle you can specify the return value might depend on either, or on both, but I haven't tried anything like that.

That's about as far as I've got. There's more stuff I've thought about, but not certainly enough to talk about it.


The actual format is a special case of a template function. Lifetimes are named like identifiers are, but with a ' at the start. Conventionally 'a, 'b, 'c etc.

The function name is followed by <'a> or <'a, 'b, T> with as many lifetime and/or type parameters as needed. Each input reference can then be annotated with a type parameter after the &. You can use the same lifetime parameter for multiple input references and the function will just use the smallest lifetime (the lifetimes of the parameters supplied don't actually have to be the same).

Then the return value of the function specifies the appropriate lifetime parameter.

fn process_str<'a>(&'a in_str: String) -> &'a String

<b>Question 1</b>

That seems like a lot of confusing boilerplate. Since it seems like lifetimes ONLY come from template parameters, why do you need to specify them in the template parameters list? Why can't that just be omitted?

There's a stack overflow question, but the answer just says "better to be explicit", it doesn't really give any examples of what would be confusing without that.

<b>Question 1a</b>

For that matter, specifying the input parameters at all seems complicated. Since lifetimes can (?) only come from input parameters, why can't they be specified that way?

fn foo(a:String, b:String, c:String, d:String) -> & lifetime(a) String

Return a String with lifetime equal to the lifetime of a. And inside the lifetime construct, you could allow min, max etc to combine lifetimes of variables if necessary.

<b>Question 2</b>

If you dereference a reference to a value correctly before the value does out of scope, it's still an error if the reference is still in scope, even if you don't use it. That sort of makes sense (there's no point having it), but it also doesn't do any harm. Why isn't the end of lifetime considered the last time a variable is *used*, not where it goes out of scope?

There is an rfc to reconsider this question, but I don't think it was acted on. Presumably there's not much benefit and there is a chance of confusion.

<b>Question 3</b>

If the compiler knows where the reference is needed, why can't it keep the value alive that long? Like a reference counted or garbage collected value, but at compile time?

I guess that's just way too complicated or confusing.

<b>Comparison to other languages</b>

This fixes a big problem in languages that habitually have bare pointers (see C, and half of C++)

If you don't care about efficiency, of course, you can just use reference counted references or garbage collected references everywhere. This can occasionally be confusing (if some reference keeps a value alive, but the value isn't really meaningful any more). But basically works. (See the other half of C++, and most other languages.)

<b>Footnote 1</b>

Something I was confused by for a time, is that in rust you can only *copy* values explicitly with .clone() (like in C, you can only memcpy a struct if you know it's safe to do, or in C++, it's implicit, but you need to have a copy constructor). But unlike C, where writing a=b just doesn't work for most types, in rust, you can assign *any* type with a=b, but it functions as a move: it copies the value from b into a with a straight memcpy, including any contained pointers or whatever. But b is then invalid.

It checks at compile time that you can't use b again, so in practice the first time you notice this is "wait, it looked like I assigned ok, but then I got other weird errors".

But there are other benefits, like being able to return a struct value from a function without special arrangements to avoid a temporary copy.

But it confused me about lifetimes, because the contents of an object can often live on after the object is dropped. When in fact, the compiler often arranges that when it *would* memcpy, it actually reuses the same part of memory, so references might still be valid. But that's an implementation detail, so when you "move" an object, that's the end of its lifetime.
jack: (Default)
My goal for January was to learn some rust, and if possible contribute to the rust compiler/library source on github.

Rust is a language aimed at lowish level code with the efficiency of C, but with the safety of a garbage collector and type checker. Someone (ciphergoth? fanf?) originally pointed it out to me, and obviously I'm really interested in that intersection, as my experience has mostly been in lowish level stuff, but also in avoiding all forms of boilerplate and overhead.

For a while, an informal motto was "speed, safety, convenience, pick three", which is presumably won't live up to, but shows how it's being aimed.

It's not ready to replace C or C++, it's still maturing, but has matured a fair bit. And is almost the only language anywhere where using it for things C is used for now is even conceivable.

I don't know if my interest will go anywhere, but I feel like I learned useful things from just trying. Understanding the trade-offs made in designing a language, and the types of code patterns it invites similarly to C++, and ones it recommends for and against, and thinking about what the code I write is doing in practice, seem to have made me understand programming a little bit better.

So far

I read some of the introductory books and articles. I installed the compiler and package manager (on an Ubuntu VM) and made sure I could write a "Hello world" program.

I got the source code for the compiler and libraries, tested I could build it, and looked at the open bugs. I was very pleased that there was an existing effort to tag some bugs as "easy" for new contributors. I didn't try to edit any of the actual compiler code yet, but I did submit a small change to the documentation.

And that there was a bot (rust high-five) welcoming new contributors and assigning reviewers so small patches actually get accepted or rejected, not just languish. And a bot doing continuous integreation (rust bors, with a non-rust-specific development known as homu), specifically testing patches *before* being pulled to master. So changes actually made it into nightly release almost immediately, and three months later into a regular release.

I was also pleased that the code of conduct read like it was written by someone in this century.


I've read something about some of the concepts in rust people find weird, and may try to write something about my understanding, to see how much I've grokked, and get feedback from other people who've played with rust.

I've mentioned in passing several small design choices that I enjoyed. Eg. the error handling, usually returning an Option type, which is either a success with a return value, or an error with an error type or string. Eg. putting type annotations on functions arguments, but relying on automatic variable types within function bodies. I won't review all of these, but in general, they felt good when I saw them. If I actually compare them to what I'm used to in other languages, I'll see if they still feel good.
jack: (Default)
Why don't more languages compile to C?

It seems like, you can use your own memory manager etc if you want, and any specifics you want to optimise you can write a specific optimiser for, but even if it's a mess of pre-optimised assembly-in-C and higher-level-C, there's already a C compiler for most target systems, which produces reasonable machine code.
jack: (Default)
Beginning programming

I was telling Liv about the fizzbuzz test (original post: http://imranontech.com/2007/01/24/using-fizzbuzz-to-find-developers-who-grok-coding/ jeff atwood: http://blog.codinghorror.com/why-cant-programmers-program/ caveat from joel spolsky: http://www.joelonsoftware.com/items/2005/01/27.html)

The basic concept being that lots of people apply to a software engineering job who can't reliably right an extremely simple program (like, loop from 1 to 10). I've heard that a lot of places, enough that I'm sure something like that is true, although I'm not sure exactly what -- is it only for new graduates, who walked their way through a degree without grokking the basic principles? Or as Joel says, is it people who apply but never really hold a software job? Or is it that most people can hold a job by being "good enough" and muddle through and never need to understand the basics?

Opinions from friends who do interviews?

It always makes me feel introspective: am I judging myself too harshly? Or not harshly enough?

However, what came up in the conversation with Liv is something that I hadn't thought about, that many people I know got into hobby programming on 80s computers, when basically any programming exercise at all involved creating a whole program from scratch and running it -- when it was basically inconceivable to do anything else. But in this conversation, I blithely treated that as automatic, but realised that nowadays, it isn't. That many learning exercises do involve that, but that when I write software for real, it's quite rare I write a whole program from scratch, rather than plugging parts into an existing thing, or iteratively improving a program I already created. And that that's experience which is much less obvious than it used to be (hopefully within the experience of people applying for the sort of jobs being talked about, but not necessarily within the experience of someone who has gone quite far as a hobby).

Spinach Fritters

I saw a recipe and thought "ooh, that sounds nice and really easy" and tried it and it was and it was. And I'm embarrassed that's still news at this point in my life :( But I'm pleased that I did try that and it turned out well.



From an old post by Mark Dominus, "in the short run it kept the customer happy, and that is the most important thing; I say this entirely in earnest, without either sarcasm or bitterness."

Knowing his blogging style quite well, I know that he actually means that. As in, the trade off probably was the best thing to do, and it turned out well, and he's pleased and not bitter.

But boy howdy, it's nice to know that there are people who find it even harder than I do to convey "I'm not being sarcastic", even when they're literally saying "I'm not being sarcastic"! :)
jack: (Default)
1. Make an 'A' with 2000 unicode combining underlines
2. Paste into Microsoft Word
3. Restart Microsoft Word without document autorecovery

So, my other question is, if I have a programming language which specifies programs of the form "One single capital latin 'A', followed by some number N of unicode combining underlines" and processes them according to "Interpret N as a binary expression, and reinterpret it as an encoding of a perl/brainfuck program, then run it", does this mean that for the purposes of code golf writing-a-program-in-the-minimum-number-of-characters, every program will count as having only one character?

You could have a somewhat more efficient encoding by using different characters to encode more information.

I agree the idea of comparing fewest-number-of-character programs between languages is not a priori meaningful, but I think it often produces interesting results. (And the observation that there's always some imaginary language where the program you want is one or zero characters is correct, and good to make once, but does NOT invalidate the idea that it's interesting to compare different langugaes.)

This has the advantage that although the language is degenerate/isoteric, it will be the same language admitting different programs, just one-character programs, rather than only meaningful for one particular code gold challenge.

I highly advise not trying to write A-Underline code by hand, though :)

The only language I can imagine better at code gold would be the hypothetical JFGI language, which would accept programs of zero length and compile them to a program which accepts input on stdin, googles for it, and returns whatever follows on the top hit web page. That won't always work (it won't work on programs which aren't supposed to have an input), but it will work sometimes, and without all the tedious steps of _writing_ a program.
jack: (Default)

I remember a year or two ago hearing about elastic tab stops, and how at first it immediately seemed right a fulfilment of what tabstops are supposed to be, and implemented or not, useful for resolving the tabs-v-othertabs-v-spaces holy war conceptually if not in fact. But then later, it seemed an uphill struggle to implement it in real life, if you expect it to interoperate with other code editors.

At the time, I assumed the implementation would need to store the code in an other-editor-unfriendly and diff-unfriendly fashion, either (i) using a single tab character to indicate when an elastic tab stop was due or (ii) using some meta-data (smuggled in comments, or whatever) to indicate where the tabstops should be.

However, now, I can't remember why I assumed that? Surely if the editor maintained the existing whitespace as spaces or tabs, but looked for pieces of code like:
func(alpha, 1);
func(beta,  2);
func(gamma, 3);

blah();        // comment
blah_blah();   // comment
moreblah();    // comment
and resized them dynamically whenever you edited the whitespace, wouldn't that be right nearly all of the time? And it works better if everyone has smart editors, but it works ok when only some do.

After all, when was the last time when you deleted/inserted a character exactly on a tab stop, or followed by multiple spaces, and DID want the rest of the line to move? Surely if it's followed by a tab or multiple spaces, then 99.999% of the time, the rest of the line is designed to line up with something else, not to be a very specific distance from the part of the line you're editing? So simply "not screwing that up" would fix an awful lot of stuff, without any knowledge of syntax or whatever at all?

Admittedly, if you do have a line where you want a very specific number of spaces between two things, it wouldn't work, but when DO you ever want that? 0 and 1 spaces are very common, 2 very occasionally for indicating multiple levels of operator precedence, but 2+ spaces or a tab almost always mean "decouple the x-coord of this bit from the x-coord of the next bit?"

If many editors did that, it would be comparatively simple to EXPAND that functionality in lots of obvious ways, but it's generally a lot easier to get a new paradigm adopted by making it easier to adopt, than by making it MORE better.

What am I missing?
jack: (Default)
Obviously many people use git as easily as breathing. This post is obviously not addressed to them. This post is addressed to people who edit code, and every day or week copy it all into a directory called "backup-DD-MM-2010", or people who use Subversion or CVS.

I think many people may have heard of the idea, but be worried it might be too complicated for them to do easily. A long time ago I went from using nothing to using Subversion, and it's plainly a revelation: it takes about half an hour to install, and is Just Better. Recently I started using Mercurial with TortoiseHg (a convenient graphical interface for windows through add-ins to windows explorer), and already having a knowledge of what source control does, it took about half an hour to have the same directory on two different computers and a file server, and changes on the separate computers merged seamlessly.

If you're not already aware of the benefits of revision control/distributed revision control, what they are in simple terms:

  1. You can change some code and if it stops working, you can easily revert to the version that worked, without thinking "help, which bit did I change?"

  2. You can view a full history and see under what circumstances a particular line of code was originally written.

  3. You can commit successive changes on your own computer, even if they still need testing, and aren't complete, and only push them onto a central repository when a whole feature is finished. (You can do the same thing with a private branch in a non-distributed revision control system but many people don't like to)

  4. You don't always need access to a specific original computer/server so you can perform source control tasks even if you don't have network access

  5. Specifically you can clone the code base onto a laptop for working away, and then easily merge it back onto your desktop PC (whether or not you later push it to a central server) even if you've made other changes in the meantime, but while the computers are separated you can still commit code locally and have all the benefits of having atomic changes and history

  6. People may disagree, but I think it actually matches less-technical people's mental expectations better than a centralised system. People are used to the idea of "just copy the whole source tree here, and we can merge the changes later" which you CAN do with a distributed revision control system, except that you actually CAN merge the changes later rather than saying "oh fuck, they've diverged and it's all gone horribly wrong" if you don't use source control.


  1. Obviously if there are several of you working on a project from different parts of the world with no central authority a distributed revision control is even more necessary, but my point is that it's incredibly useful conceptually, even for a single developer

  2. Many people may find they need a possibly more technically feature rich system, such as Git, even if it may be less easy. If you're in that situation you already know what you want, you don't need this essay. This essay is just saying "you need a system AT LEAST as good as Mercurial+TortoiseHg, it is outside my competency to say who may need more"

  3. If you are working for an organisation with source control, it's probably easiest to use the existing one! :)

  4. I happen to be in the position of having a file server more conveniently available than a server than can support a whole client/server paradigm, so something easily based in a file system is incidentally useful to me of itself, regardless of its conceptual benefits, but other people will be the other way round.
jack: (Default)
Poll #1872 Variables with units
Open to: Registered Users, detailed results viewable to: All, participants: 11

Which do you prefer:

View Answers

quantity width = 2.0*meters;
0 (0.0%)

int width_m = 2;
4 (36.4%)

int width=2; // meters
9 (81.8%)

int width=2;
3 (27.3%)

jack: (Default)
In many "introduction to C-style programming language" books, you see two variables being set to the same value with syntax like:

 x = y = 0

In fact, I almost never write that in real life. Normally I find:

* The variables are being declared and initialised, and you can't write "int a = int b = 0"
* It's possible the variables may be set to different values under some circumstances, in which case I find assigning them in separate statements clearer.
* It's being done to squeeze an extra statement into a conditional expression, like "while (p=getcharacter()) { dosomething(p) }", and it's actually clearer to move the assignment to a separate line

All the same, if I've ever declared any assignment operators on any user defined classes, I've always scrupulously declared them to return the value used*.

But for the first time in about five years, I actually *did* try to write "if (newval>obj.max) obj.val = obj.max = newval".

And it failed. Because obj was from the .NET framework, which has getters and setters, and setters apparently do NOT return a value. I'm not sure if they _should_. But maybe this piece of advice is dead now, if I never needed it before?

* Aside

In the C++ books I've seen, in fact they return a non-const reference from their assignment operators, allowing you to write:


I assume no-one ever WOULD want to write that, because either (a) the assignment has no side effects, and it's pointless or (b) "obj=1" does something interesting, when it should have its own line. Why is the reference non-const, is it so it so that you can write "a = (obj=1)" or "f( obj=1 )" even if a.operator= or f take non-const parameters?
jack: (books)
This was difficult to write so it would include some background for non-programmers, but still be of some interest to real programmers.

There is a turing-machine equivalent based entirely on a simple sequnece of operations on which of a series of fractions divide each other described in http://en.wikipedia.org/wiki/FRACTRAN

On stack overflow, a challenge http://stackoverflow.com/questions/1749905/code-golf-fractran was recently posed for the shortest program which could run those "programs" and a special prize for anyone who wrote an interpreter in those fractions.

Writing a compiler in the same language is a sign that a programming language has "arrived". The first compiler ever had to be written in assembly language because that's why someone was writing a C compiler in the first place, so they didn't always have to code individual machine instructions by hand. But once someone had, then just like any other program, a C compiler is mostly written in C. Writing a fractran interpreter in fractran isn't especiallly useful, but is interesting.

I wanted to do so. I'd never used lex or yacc derivatives before, and felt it was something everyone else knew well and I'd really like to do so. You specify a bit of C and a grammar: a list of tokens like "{" and "+" and any string, and a list of valid compound statements, like "EITHER anything inside { } OR a single statement" and so on. And they automagically stitch it all together into a C program which reads those symbols from the command line, and does whatever you told it to.

It was surprisingly easy. By far the hardest part was getting a mix of automagically generated code and C++ to compile. Otherwise the program mostly just worked.

I started with a compiler for a register machine related to the fractions, and invented a syntax for a simple example thereof. It has an infinite array of registers (though two is sufficient) and three to four instructions:

 Inc r4
 Dec r1
 Goto LabelName
 Dec r2 else goto labelnameifzero

And made a program which can turn:

 Inc r1; inc r1; inc r1; inc r1; Inc r2; Inc r2; Inc r2;


 4, 3


 Inc r1; inc r1; inc r1; inc r1;

 Inc r2; Inc r2; Inc r2;

 dec r1 else goto Tmp1; 

 inc r5;

  dec r2 else goto Tmp2;
  Inc r3;
  Inc r4;
  Goto Loop2;

  dec r4 else goto Loop1;
  inc r2;
  goto Tmp2;

 dec r5 else goto End;
 inc r1;
 goto Tmp1;




I was about to add a back end which can turn the instructions into Fractran instructions (it's easy to represent "line number in program", "increase working value", "decrease working value" and "decrease else" with the fractions once you're used to it), and to add enough useful concepts build into the compiler but built out of those instructions, like: an array syntax which automatically compiles to using powers of 2 for first element, multiplied by powers of 3 for second element, etc, etc; program control structures like "if" "while"; variable names, etc.

Unfortunately, someone else had done something similar, and finished it first. So I decided to call the interpreter I'd written a success and come back to it when I had need of something similar, and not spend any more time trying to make it deal with fractions.