jack: (Default)
Stupid tech questions.

At work the source tree has a bunch of different components. A couple are "big" like the FPGA image, and the whole device image which includes that, plus a linux distro which includes some of the other components as well. The other components are mostly "small", small individual C/C++ programs or python packages.

The small components are mostly managed by conan package manager. This is ever so useful for assuring that each explicitly lists its dependencies and doesn't just depend on lots of header and source files from elsewhere in the source tree. But we are not using most of the functionality of a package manager -- everything in the source tree is assumed to be compatible, nothing depends on an earlier version of anything else or anything.

In effect the package management (a) is an easy way to install the latest thing on a different computer, as is done during the automated tests which interface with real hardware and (b) a convenient way of caching results between different builds. This doesn't matter that much for most of the components which use the package manager, but does matter for the FPGA image which takes 40 minutes to build, and is usually downloaded from jenkins rather than built fresh.

What I don't like is that there's no way to say "build everything necessary". You need to know all the individual components to build, or the sequence of jenkins jobs to trigger. And as there have been more added, I think it's got less convenient to build all the ones which could have been affected by a change, and to make sure you haven't missed any.

The other concern is that at the moment, "packages" built locally or from main are usually identified by a "release" number, but that is not always different. Whereas other builds identify packages by a git hash, which we would like to use in more places, but doesn't make it clear which is "the latest".

This is not all how we want it to be! It was put together for valid reasons but not planned in advance.

What I want is... a way of building all the different components that need to be built. With some (but not necessarily conmplete) ability to figure out prerequisites, or "which things have changed". Which is effectively... a build system. It feels like it should be obvious what build system to use, but I'm not finding it obvious yet!

Like, that could be make. But one requirement is that it's easy to say "the linux/windows version of this python package depends on the linux/windows version of this C++ package" and "the fragle configuration of the device image depends on the blah configuration of the fpga" and Make doesn't easily support that sort of parameterised target. I think?

I could write a simple script to "build everything". I quite like that. But... surely writing a NEW build system isn't correct?

We could use one of the systems we have, like conan. But conan seems optimised to be a package manager installing prerequisites, not "rebuilding these directories". Bitbake seems very well designed for this but is designed for building embedded linux systems and is probably more heavyweight than we want. And atm each component is built from some command line, we don't really want conan build system invoking other conan build systems.

It might be simpler if we moved away from the packaging and used an old school "The outputs are HERE in the source tree. You use the ones in the source tree and build them if necessary." But I'm not sure.

There's a bunch of ways which would be fine but maybe not great. It feels like there must be some standard answer but I'm not sure what it is. Thoughts?
jack: (Default)
I've been learning rust by writing a simple tile-based puzzle game. To my immense pleasure, the simple game engine I was using, Macroquad, made it mostly easy to put the existing game on a website (as well as on linux, windows, and android, not all tried out.) So far there's no actual puzzles, but I really like the fish I drew (the crab is not mine).

Play online at:

https://cartesiandaemon.github.io/rusttilegame/tilegame.html

It runs on mobile browser, but it's a lot easier to control on a desktop browser. If you have a keyboard, control the crab with the arrow keys and try to get to the exit. You can also click/press the top part of the screen to move up, etc.

Rust (an an extra decade or so of experience) has been great for "if it compiles, it probably works"!

PS. See source on github: https://github.com/CartesianDaemon/rusttilegame
jack: (Default)
I did some experimenting with rust macros. The idea is to make a rich assert that displays the values of any variables and expressions used in the assert condition (with some useful default omitting redundant ones).

It looks like writing this in the source:


...
rsst!( two.pow(2+3+5)+1 == 1025 ); // Passes. Execution continues.
rsst!( two.pow(2+3+5)+1 == 1023 ); // Panics. See below.


And getting this output:


$ cargo run
...
Assertion failed: two.pow(2+3+5)+1 == 1023
two.pow(2 + 3 + 5) + 1: 1025
two.pow(2 + 3 + 5): 1024
two: 2
2 + 3 + 5: 10


There's more info including functioning examples at https://github.com/CartesianDaemon/assrt

It was interesting to learn what was possible. As people intimated, rust macros were very powerful and flexible, but very hard to write.

I know this will probably never go anywhere from here, but I wanted to try what it looked like. I assume the same functionality exists, although I didn't find it immediately. I didn't try to publish the crate yet because it's not really functional, although I'll see if it works in my projects.

I am curious how much people think that's cool, how much people think that's pointless, and how much people think that's horrific. I always thought Catch.hpp which includes a similar thing in C++ using macros with horrific class magic hidden inside was really useful because it's presumably horrific to write the library, but perfectly seamless to use the library.

I followed tradition and hesitated a lot over choosing any names. I wanted something that looked like "assert" but wasn't taken. Knowing people may take objection I seriously considered "asst" or something else with "ass" centrally. For now it's "assrt" for the crate and "rsst" or "csst" for the macros.
jack: (Default)
I never got around to talking about what my current work do: http://undo.io There was some previous discussion on the topic on facebook: https://www.facebook.com/jack.vickeridge/posts/10103938681712440

What is a Time Travel Debugger

It records everything that happens in a program's execution, so you can step backwards as well as forwards, or rewind execution and then replay it again more carefully. Or you can "replay" it backward, e.g. going to the end of time, seeing your program crashed with a null pointer and then setting a watchpoint on that pointer and reverse-continuing until you find out where the pointer was set to that value.

There's two main modes of use, using it like a debugger sitting in front of a program, or using a companion recorder (which is actually an executable with much of the same code but packaged differently) to record your program in your overnight test suite, or running to replicate a bug that happens in a very long running process. Then once you've reproduced the bug once, you've almost finished, you can just load up the recording and step forward and back in a debugger until you figure out what went wrong.

That sounds impossible!

Yes, it does sound impossible, but it works.

It records literally everything the program does that interacts with the outside world in any way, e.g. any system call (including any file access, network access, gettime, even getpid, etc, etc), any instructions which write to shared memory, etc. That can get large for some programs (but customers do use it successfully!)

It saves a snapshot at several points during history (by forking the process there), so it can create the state of any point in history by forking another process from that snapshot, and playing it forward using the saved events instead of actually doing any of the things that interact with the outside world.

It does all this by rewriting the compiled program in memory, and maintaining a mapping between the rewritten memory and the original assembly. So you the user see the original source code and original assembly, with whatever level of debug info you originally compiled the program with. But behind the scenes, almost any non-trivial instruction is rewritten to do something else, to either to save the result of the instruction in the event log, or to replay the value from the event log.

That means that you can attach it to any program, compiled any way, just like any debugger can. You don't need to compile it with some magic -- people keep expecting this, and it could have been written that way, but instead, you can just connect it to any program you could attach gdb to.

Caveats

Recording multiple threads is slow, and recording multiple processes doesn't exist yet. We're working on it, but right now can help with some multithreading bugs but can't help with others.

Program execution is slower, between 2x and 10x. We are working to improve that. Replaying through execution can be faster than that (and you can usually go directly to the beginning, end, etc without any replaying).

This is all on linux only.

The interface and implementation is based on the gdb forntend/gdb server protocol. So by default it looks like debugging with gdb but with "reverse-next" as well as "next". And it works with any program which uses gdb backend, e.g. visual code, emacs, although some of those are tested extensively and some aren't.

But no linux debugger has a very good UI, so currently it is mainly used by people who have to debug using something like gdb anyway, but want to be able to solve harder bugs quicker. We are trying hard to make it easy for languages like python and java where the translation has to understand an interpreter as well as the code. This works in the sense that it can be recorded and replayed, but getting a good user experience is a lot harder.

Worth and Price

I always describe it as, the difference between "not having a debugger" and "having a debugger". If you have a debugger, maybe actually 90% of problems you can solve with print statements. But the 10% that you can't fix with print statements could take months to solve without a debugger, or hours with a debugger. It's hard to describe why you need a debugger to someone who hasn't tried using one. But almost no-one would go back to not having one.

A time travel debugger makes trivial the small proportion of issues that still feel impossible even with a debugger. You say, "yes, it fails intermittently but we don't know if we'll ever track it down unless someone wants to study the failure for nine months", but that might be only hours with the right tool.

Unfortunately, this tool takes a large amount of programmer effort to create, and is only viable if it's sold commercially. If you view it as "The 5% of bugs we have that take 9 months to track down, instead get solved in a few hours", you compare the cost to the salary for an extra programmer or two, it's very reasonable. But most people including me hate paying for tools, so it's hard to sell.

It has a great retention rate -- any companies which have subscribed to a contract, have almost always kept it, and programmers who have used it regularly (including me) are very very eager to keep having it available.

Currently there are several introductory offers. There's an educational license which is cheaper or free. There might be an offer of free licences to the right open source project if you're interested. There's a 30 day free trial, and a personal license, in the hopes people will become converts and persuade their employer to adopt it. There is standing offer that if you have an intractable hard to reproduce bug that's you'd like to see just go away, we can arrange some sort of trial to have someone come and help capture and diagnose that bug, and see if that leads to a longer term arrangement.

Ask questions in the comments. Feel free to download the trial -- if you've used gdb, it's fairly straightfoward to try out, and it's magical to see "step back, step forward".

Or if it sounds like you might be someone who would actually benefit from acquiring a license, I can put you in touch with helpful people -- we used to focus on big clients because there was a lot of shakedown, but now it works more reliably out of the box, it's plausible for a wider spectrum of companies and people.
jack: (Default)
I tweeted this in one sentence, but I thought it deserved revisiting.

In many ways, this is obvious, but I didn't have it all laid down in my head together till now.

When you write a program, there's any number of ways it might change. You might have included provisional "fail when input is invalid" but need to make that be handled more gracefully. You might have hard-coded an integer size, and maybe need to expand that later. You might need to change it from a stand-alone program to a library that other programs can link against, or vice versa.

The point is that for any of these cases, there's a spectrum of choices. You can ignore the problem, you can write the program so that the expansion is easy to add on later, or you can add the expansion in right now.

Sometimes "do it right now" is right. Sometimes the 'right' way is just as easy to write as the 'wrong' way, and equally clear even to inexperienced programmers, so you should just do it always.

I've previously talked about the times when the middle path is right. If you're not sure, it's usually a good call.

And there's always a trade-off between quickness now and maintenance cost.

But today, I want to talk about ignoring the problem. If the cost of making it extensible is real, and the chance of needing it is low, ignoring the problem is likely correct. Think carefully before tying yourself to 16-bit integers, if they may one day overflow and lead to much pain. But accepting 32-bit integers is, a lot of the time, fine, and the cost of making each integer into a type which could hold larger integers is that a lot of code becomes less clear (and in many languages, slower).

Mark Dominus pointed this out in both code and documentation for "small" libraries, especially provided as part of a language's standard set of libraries. The cost of any change, even just explaining that something isn't there, it that all users of the library take *slightly* longer to understand it. And if unchecked, eventually all the "you should clarify this" or "this change is really small, why not?" mount up and turn a small, lightweight library into a heavy, general purpose library. And then maybe the cycle starts again.

He also pointed out that when he was writing small command line utilities mostly for his own use, he often added extra command line options, because when he was originally writing it, the options were clearer in his mind. But he didn't *implement* them, because he'd probably never use them.

I was thinking of the habit of writing coding exercises. My instinct always used to be to look for the "right" solution. But actually, the right solution depends not on the current state of the program, but it's future evolution. If it's ACTUALLY not likely to change, leaving it clearly imperfect may be the right solution. If it will acquire many more users, spending developer effort on making the interface cleaner may become worthwhile. If much future code will be build to interface with it, get the interface right NOW, or it will become locked in.

It's not a matter of what TO do, but equally much what NOT to do.
jack: (Default)
The other thought that inspired the previous post is that I didn't have a good concept in my mind for a sort of assert I want sometimes, of something I don't expect to usually happen, or don't expect to happen in testing, even if it may happen in other scenarios.

Say "unrecognised parameter in config file" as exhibited in some bug in a computer game that turned off a lot of the ai for many of the enemies that people were talking about recently.

Or "needed to retry file access/memory allocation/etc". Or "input file size is more than 1Gb".

You don't necessarily want to treat those as failures, because they might happen in real life and if so, you want to deal with them. You don't want to stop the function at the point where they occur. But you sort of want to mark the behaviour somehow, because it will *usually* indicate a bug while developing.

I'm not sure how to think of these. Maybe "development only asserts"? Or "warnings which are not triggered, but counted and printed out at program termination during development and included with output of tests"?
jack: (Default)
Intro

I've read lots of explanations of when and how to use asserts, or other error handling. But I realised, I've never actually really sat down and thought through all the possibilities myself. Welcome to the ride!

For the purposes of this post, I will mainly consider "things that can happen in subsystems of a computer program other than successfully performing whatever operation you hoped to achieve".

I think it's important to think of "subsystems" in that way. A subsystem is about as general a term as I could manage, because sometimes this refers to a function, sometimes a class or other object, sometimes a library, sometimes a whole program, but I don't think the reasoning is tied to any particular one of those[1].

In particular, for any non-trivial program, it's important to recognise that subsystems are built out of subsystems, and the behaviour for a system may be different if it's "an encapsulated part of your code", "your whole program", "an external library" etc, which may become painfully apparent when it shifts from one to another.

It's also important to be realistic that any non-trivial program, the number of ways things can go OTHER than the one you wanted massively dwarf the 'correct' behaviour. If you want a one-time script that just has to work once, ignoring all those except to stop them happening in this particular situation is a good trade-off. But in anything else, you need to devote a lot -- maybe most -- of your time dealing with them.

Intro: footnoes

[1] Encapsulation is good. Objects are good. But encapsulation is a broader concept, and objects are *one* form of encapsulation, and different forms are appropriate in different circumstances.

Case 1: Successful operation

Yay! Not much to say here. Typically you return a value. Sometimes you don't return anything at all.

Case 2a: Internal error

This code is screwed up somehow. Some internal state is inconsistent. You read off the end of an array. You checked for invalid values, and then proceeded with a calculation and -- oops! found an invalid value anyway. You had two branches, an if and else, but somehow found some third state that wasn't covered.

Case 2b: Invalid input

The input (function parameters, possibly something read from a file, etc) doesn't fit the requirements you wrote for what they should be.

Sometimes this is conflated with 2a, but for reasons described below, I'm considering this separately to start with.

Case 2c, 2d, 2e...: Something else happened that prevented a successful completion

You were supposed to write to a file, but couldn't. Out of memory. Someone called "pop" but the list is empty.

How to deal with them

Traditionally, #1 is a place for asserts, 2c onward are a place for exceptions, error return values, or sometimes a null return value, and 2b is a place for massive arguments between those two options.

At least conceptually, I'd say the best way to deal with everything is the way rust does.

Most return values are actually a union of either a successful return (usually with a value), or an error return (with some info, say an error type from a hierarchy to say what the error is).

This provides the *calling* code the maximum options with minimum fuss.

If the calling code just wants a quick answer and doesn't care about handling failure, there's an "unwrap" function which takes the return value, turns a successful return into a plain value, and fails the whole program if there's an error. It's a bit like "not checking for null", except that you can search the whole program for "unwrap" and make sure there aren't any before doing anything important for it, but it's harder to search for "should have checked for null and didn't" when it's not immediately obvious which pointers should sometimes be null and which should never be null.

More often, you can use the "try" macro -- or the one character long syntactic sugar abbreviation of it, '?' suffix on an expression. That does the same thing, but instead of failing, it early-returns from the current function with the same error.

That's often what you want: if you get some sort of error three levels deep, probably all your other functions want to deal with that the same way whatever that is. Until it hits a boundary between subsystems, where you may want to deal with the error in a more specific way.

Indeed, I think not doing that is driven mostly by syntax. If you use error values, then every function that can fail needs to have an inconveniently long list of boilerplate saying "if any of these errors, early-return with that error". And if you use exceptions, you have a whole bunch of hidden complexity where looking at your function gives no indication how many errors may be abruptly perpetuated up through it, or from which lines in it.

I think in an ideal world, all the error cases would use this model. It would be easy to write code that returns an "invalid parameter" error with minimal boilerplate. And the calling code can either handle that error (if they care), or deliberately treat it as a failure (if it commits to validating the parameters in advance).

And even in the "internal error" case, you likely want a return value there. Usually the calling code should not handle them (since everything is probably screwed then anyway). But if, say, you're handling a buggy library, you can choose to handle them if you want, not be forced to deal with crashes.

But they *would* still be a special case. For instance, your debugger would be configured to stop on any internal errors (basically, equivalent of asserts). Even more so than for exceptions. And you may choose to disable checks for them if you're desperate for optimisation (also like asserts).

Optimisation

That also gives an answer to how you should handle unexpected errors in code at runtime. The fail fast people are correct that you usually shouldn't continue, since whatever you were doing is probably failing. But if you return an "internal error" value, higher layers of the program can handle that if you want.

Say, if your "open file" function fails under some conditions. Continuing to run the "open file" code after an error is bad -- it might be harmless, say, opening a file with an empty name, or it might be a security bug. But as long as you bail out of that function as soon as you detect the error, higher level code might choose to treat that as a unit, and retry, or report an error to the user, not just kill the program.

Conversely, if optimisation is key, you can choose the C-like approach of "crash immediately", or "disable all those checks since they should never happen anyway" and just let the consequences of that error bubble forward until they become irrelevant or something more serious goes wrong. Which isn't ideal, but is better than not having any of those checks in the first place and pretending they're not needed.

That leads to one of the points that led me into this post in the first place. That it's usually better to have your asserts in runtime non-debug code than to just not have the checks (unless you care about optimisation).
jack: (Default)
The premise

I have a tile-based computer adventure game. The current state is represented in memory as a 2d array of tiles representing the play area, each with one (or possibly more) objects on that tile. There is also an "undo" history, tracking recent actions, and the difference (objects moved, removed or added) so the previous or following state can be recreated.

In addition, each object remembers the most recent undo step which affected it, so you can click "undo" on an object and return it to the previous state (also undoing any following actions), not just undo one at a time until you rewind to the appropriate point.

I need to check the code, but as I remember, this is represented by each object having a pointer (well, the python equivalent) to one of the elements of the undo sequence. And when you redo or undo the pointers are updated to refer to the newly-correct object.

Now I'm unsure, but IIRC the undo steps refer to an object by coordinates in the play area, not a pointer to the object (in general, we assume the play area might store game objects as just ids or something, not as ab object in memory).

What happens when we want to save the game

We need to be able to save the game -- indeed, a modern game (especially one where actions aren't irreversible) should just save as it goes along, not require a separate load/save action to do so.

This means my instinctive layout above doesn't work. You can't save "a pointer". The best option is probably to use an index into the undo list which the undo list understands.

That can also cut out other possible bugs. If you have a pointer, it could be anywhere in memory. If you have an index into the undo list, you can choose to have the list check that the dereference is valid (and retain the option to turn those checks off if performance matters).

There's other possibilities but I think that's the best one. It is uncomfortably close to designing our own ORM -- we could alternatively have ALL objects represented by a unique id and index things by that instead (either via some global list of ids or only when we load from disk).

I run into this often when I'm game programming, the best way of 'referring' somehow to another game object -- by value or reference? by game coordinates or pointer to memory? But not in other types of programming. Does this experience ring a bell to anyone else?

But now I'm thinking...

This also reminds me of a problem I ran into when I started thinking about rust's memory models. If you have a class, that creates a bunch of other classes, and those classes want to have pointers to each other, there's no easy way of doing it iirc.

I think you need to rely on reference-counted pointers even though it feels like you shouldn't. That's not a problem in practice -- the "store an index" method above also has an indirection every time you need to access the object. But it feels like, you shouldn't need to. And a similar sort of "I want to refer to one of the classes which this big class is responsible for".

But I'm not sure if there's a way of combining these thoughts.
jack: (Default)
I got all five sciences working at a fairly regular clip, and an extendable copy-and-paste array of assemblers I can use to extend the set-up, and I've pulled back a bit -- I'd still like to launch a rocket, and maybe see how much throughput I can work up to, but I feel like I'm over a hump where I can see how it will go and I'm less hungry for it every minute :)

But it still brings to the fore a lot of software engineering instincts I've internalised but not always had proper names for.

Five whys

My research is slow. Why? I'm not producing one colour of science pack. Why? Not enough (say) electric engine parts. Why? They're not inputted automatically, apparently I kicked the assembly off by supplying a big batch up front and it ran out. Why? Because they need lubricant so I was manufacturing them near the chemical plant not the rest of the assemblers. Why? Running a pipe all the way there wasn't convenient. Why didn't I carry them on a conveyor belt? Also inconvenient, but
possible.

Automate it the second time

OK, that's about everything. That also touches another lesson. The *first* time I think this was reasonable -- I wanted to test all the manufacturing worked, I already had a great excess of electric engine parts in storage, just manually inputting them let me get everything else set up ok. But then at some point I topped it up again. That's the point where it would have been quicker to make a conveyor belt and fix it properly (I could have topped them up manually while I was building).

Hence my rule -- automate it the SECOND time you need to do it. If you do it the first time, you'll waste too much time automating things you may never need to do again, or that you don't have a good idea which bits are important. But once you've done it twice, you'll almost certainly need to do it again repeatedly, so investing in making it trivial to repeat is worth it.

Although factorio also teaches, there's a difference between "easy to do another 10 times" where a bit of manually joining up is a more sensible trade-off, and "easy to do another 100 times" when it needs to "just work" 90% of the time, and "easy to do another 1000 times" (e.g. any library inviting other people to use it) when in 99% of the time it needs to work without a lot of on-the-spot investment.

Make it extensible/stubs

When I manually put in some intermediate products, it's often worth putting in a short length of conveyor belt, so if I eventually automate it, I can just bring the assembled intermediates in on another belt and connect it to that one, I don't accidentally build too closely so there's no room to add the belt after.

In small scale, this is a stub, a function which you intend to fill in later. But in a larger sense, it's writing things in a way that you can add to them later -- not necessarily in an "official" way like re-using functions or inheriting from classes although that's good to do when you can, but just saying, if I do come and edit this code in future, how can I make it easy?

That means things like, even if you hardcode something, ideally put it in a variable so you can SEE what you've hard coded, and that if you ever change it, *some* of the work is done for you by just changing that one variable, even if you need to do things manually.

Larger scale issues

Factorio, with elements of sim-something and also of programming games like human resource factory, more than most games I've seen, teaches larger-scale programming instincts like laying out code for a larger project.

Lots of little things like, recognising which bits should be roughed out, which bits need to be planned in detail now, which bits are likely to be complicated and need a nice big empty space to play in, etc.

Recognising what is basically fine forever and what will need to be updated to do the same thing 10x or 100x faster, and which bits will work at that speed and which won't, and when it's fine to tie yourself to the simple implementation and when you should make the effort to avoid stuff.

Self-balancing is usually a trap

In factorio, my instinct was often, "if I build everything in the right ratio, everything will Just Work" and I'll have enough intermediates for all my products, and I don't need any logic controlling the system turning one thing off when there's too much etc etc.

But that is usually wrong for all but the simplest. Simple systems do work like this, which I like: all factorio machines stop when their output is full, so you don't need to prevent them jamming or exploding or anything (unlike real life! :)) so as long as each thing is used by the right thing, then it produces as much as it can, but when production rate exceeds the rate they're used by the next stage, they back up along a conveyor until the production machine stops.

But for anything at all complex, it doesn't work like that. Intermediate products are shared between different products. Sometimes the raw material inputs for a machine run out, and then it falls behind machines which were supposed to be producing at the same rate.

Sometimes you make a mistake and connect two adjacent belts, and you get coal in your copper plate belt or vice versa. The system needs to cope with that in some way that doesn't end up with you having to manually flush large parts of the factory (picking up everything wrong on the belt so the machines continue). It *will* have to recover from errors, so making it do so easily is a requirement, even if it wasn't obvious at first.

You need to recognise where it's better to make a little subsystem where everything makes the right amount for the next stage, and where it's better to produce lots of intermediate in one place that's available to anything that needs it -- but not so much that everything else is starved of resource.

I had to learn this balance in real programming as well.

Coordination costs

When you build a LARGE factory, you find stuff you didn't notice in a small factory taking up a lot of your time. That's partly because there's just more of it. But it's also because there's more different moving parts that need to work together. Software projects are the same: you need to recognise the transitions from "all in my head" to "need a tracking system" to "needs a committee" etc etc...
jack: (Default)
I was hoping this was going to be a bit more technical, but there you go. The idea is to write inspired by whatever I'm doing at the moment.

This is not so much debugging as, when you're sure the problem is fairly simple but it still sucks you in for ages and you can't tell if you're getting it fixed, how to avoid being dispirited. Which is a problem I used to have a lot. What has helped?

1. Talking to other people! Partly for a second pair of eyes, partly to just ground yourself and get someone else to share the situation and agree you're doing the right thing.

2. Take regular breaks.

3. Work through the problem step-by-step. This should be appearing here. It isn't. Check each intermediate step.

4. Decide if (often), it's worth making the step-by-step tests ones that can be saved in the test suite against future failures, rather than ad hoc "step through in debugger".

5. If you can, maybe work on something else for a bit.

6. Be realistic if this is still the most important thing to do or if you should be doing something else.

7. Be realistic, do you know as much as anyone else about fixing this? If so, seek to keep your spirits up and bring in second pairs of eyes as necessary. If not, don't be shy, seek out advice on how to debug the problem.

8. Take heart. I've faced a LOT of problems that have seemed simple but intractable, and looking back with the benefit of hindsight a few were beyond me at the time, and many would have benefited from a fresh perspective, but none were actually black magic, there was always a sensible way to proceed, and I usually found one.
jack: (Default)
I need to talk about programming more. I have so many programmer friends, but don't compare notes experiences etc enough. What topics should I talk about (talking primarily to draw contrasting reactions :))
jack: (Default)
Pipes (and mining drills) you can't run through are so annoying! Everyone said "give yourself enough space for your oil/chemical processing" and I still didn't space it out enough. It needs to be spaced enough that you can use underground pipes for everything except corners. And so the pipes don't "snap to" neighbouring pipes of other types.

For a long time there was a dichotomy between "do everything with belts" and "do everything with robots". They rejigged it so robots are still awesome, but they do construction and deliver materials and equipment to you, so you can just build stuff without worrying about running out, and use them to build large areas -- but they no longer do feeding into assemblers until you're much further through the game, and don't do high throughput as well, so there's a reason to use belts to automate all the big stuff, not just forget about them once you get bots.

There's always "something to do next", tidying up stuff, or automating the next thing, or building a bigger better version, which makes it very very addictive.

The specifics aren't very realistic, but the general feel of "this is complicated" really is.

ETA: It used to be the case you needed to raid alien monster nests to find some particular resources for "alien science". They rebalanced this in one of the big updates, which overall works a lot better: there's more intermediate tiers of research which gives various different technologies, so it's more plausible to choose what to focus on, not just get stuck on the first two for 90% of the game, then get everything all at once. But I miss that there was something you used to need to go out and find (other than more resources).

Factorio

Apr. 10th, 2018 09:53 am
jack: (Default)
Oh gosh, I knew this was going to be addictive for me and it was. You're crashed on an alien planet and need to build enough smelting, industry, power etc to eventually launch a rocket. But most of the game is about the layout of stuff, about getting a million conveyor belts getting the right components to the right machines without bumping into each other, and about layout out pipes between lakes, oil refineries, factories etc without getting into a giant knot.

I knew it would be but it's *really* *really* like software engineering. Like, you can *see* exactly what spaghetti code is: when you lay things out neatly and can see where everything is, it's easy to change things, but when you grow your conveyor belt layout organically, you can add *one* line fairly easily just fitting it in here and there, but if you do that often, everything ends up an unchangeable mess and you can't do anything with it without breaking half the existing stuff.

Somewhere in the middle you get useful tools like grabbers which can be programmed in unlimited ways, so you can turn things on only when you need the outputs, and even construction bots which can duplicate large sections of your factory.

Introspection and perfectionism

And you realise things. My instincts are always to build exactly the right amount of intermediates that the outputs need. But that doesn't actually work most of the time. Intermediates you always need for a specific product, and don't need LOTS of, it's often sensible to build a "manufacture the intermediates and final product" block with the assemblers all next to each other. Intermediates you need all over the place it's usually sensible to just manufacture lots of. But then you need to say -- you don't want to process ALL your iron into cogs, or you'll run out of iron for everything else. But you do need a *lot* of cogs. Is it better to just make cogs, and rely on that when you have "enough" cogs and the conveyor belt backs up, the cog machines stop taking iron? But then if you build too many further machines that need cogs, they'll never be satiated and everything that needs iron will start shutting down. Is it better to split the incoming iron 50/50 or some other proportion between iron and the basic iron intermediates like cogs and steel? But then what if that ratio's not correct. Or is it better to adjust depending on which important final products are running short?

My instinct is that there should be a perfect "right" way. But because you need to add new final products and change proportions and cope with raw fuels temporarily running dry, that's basically impossible, it's better to segregate the various stages, and turn on the earlier stages as needed (I haven't actually done that yet but I can see the benefit).

Specific parallels

And likewise, duplicating a section of your factory once you have the right construction bots, and once you manufacture enough of all the buildings etc needed, is just a push of a button and a short wait. But it takes ages to make all the edges line up right so the new conveyors get the right input etc.

Or, a thought occurred to me driving to work, you know construction sites always seem to be "nothing, nothing, nothing, suddenly a building"? Well, software development often *is* like that, and I know exactly why even though I don't like believing it: because there's an awful lot of work in getting things working together, so the actual working on things is comparatively scattered. (And you can either frontload it, and then have years of technical debt and trying to bodge things together, or build everything right but it's ages before you get things working. And wisdom is knowing the right balance :) )

It also gives insight into firefighting vs planning. You need a certain amount of "fixing up" things, at some point ore runs low and you need to lay out dozens of smelters and stuff on a new ore field, there's no way of avoiding that. And often, *something* will back up, or run out, or get mixed between two belts or something, for completely deterministic reasons that you *could* have forseen, and you need to take care of it. But if you spend *all* your time doing that you'll never make any progress. And conversely, sometimes it's easier to take a shortcut, like loading an assembler with the appropriate inputs manually, if you expect it to be a while before you run out -- but even if you do, it's useful to think through where you would put the proper conveyor belt feeds, e.g. by having an inserter that grabs from a chest, but you can replace with a belt when you need. That, "do the simple thing, but leave the connectors in place for replacing it with a polished version when you need" is often a very useful approach.

Aesthetics

It's pretty atmospheric. The background sounds vary according to what's on the screen, so you can *hear* radars blipping and furnaces roaring. I played on peaceful because I didn't want the pressure of fighting off alien wildlife at the same time, but those are pretty beautiful in a monstrous way. And when things are humming along perfectly it's really impressive. It reminds me, how much I love making complicated elegant mechanisms, and I need to do more programming where I design a thing as well as learn a new thing.
jack: (Default)
I posted Emojilution Match to android store! I don't really expect anything from it, but it was really exciting to go through the process and see if listed there. I had to fiddle the graphics to get right size versions, but they look really quite professional[1]. Like I'm competent or something :)

The process was fairly easy: you need to sign up for a developer account (paying a small fee) and a merchant account if you wanted your app to be paid. And fill in a bunch of forms about app versions, inappropriate content, regions, etc, etc, which were a bit intimidating but not complicated. And find the .apk not from the build output directory but from the top-level app directory.

I know other people have done this ages ago but it was exciting to actually do it for the first time.

The process was *mostly* hassle free. I did hesitate over "let google manage your app signing key and make unspecified (?) improvements to your app", but I decided for a simple game that was probably ok and I'd be more careful for an app that actually handled people's data of any sort. One of the sorts of content you need to describe if your game includes is "sexualities", I'm not quite sure what that means but I suspect :(
And it strongly pushes you to include your actual home address and really doesn't make clear that you're putting that on the internet for anyone to see[2], but apparently it's not absolutely required.

It's only on android for now -- I'm sorry, I wish it wasn't constrained to one platform, but porting is an effort (and I hear the apple store is a lot more bureaucratic and expense, is that true?) If anyone can point me in the right direction I might consider it.

https://play.google.com/store/apps/details?id=org.cartesianheights.emojilution

[1] With a couple of exceptions I need to fix.
[2] Hello? Might that have, like, catastrophic downsides?! :(
jack: (Default)
https://www.dropbox.com/s/cdky2dnc8ck50rk/emojilution-2018-03-04.apk?dl=0 (Edit: See alternative link at bottom)

I made a new version of Emojilution Match! Prominently, I changed:

* Removed all the extraneous menus and options left over from the open sudoku, and updated the icon
* Changed the scoring so when you match a third-level evolution of an emoji, you unlock an additional starting emoji which is worth more, but the more you have, the more cluttered the screen gets
* Added a cheat mode so you can play at home if you prefer
* Fixed saving and restoring so your game is still there if you restart the app (or turn it sideways)

General notes

* On an android phone, you should be able to download the apk above and it should give you an option to install it. You may need to enable "install from untrusted sources" in the settings somewhere. But I know lots of phones treat apks weirdly so if it doesn't recognise the alk I would like to know and if possible help.
* You need GPS (unless you need cheat mode), it doesn't yet warn you if you don't have GPS on.
* You do NOT need mobile internet (or any form of internet). That does mean, your game isn't saved to the cloud, so if you use a new phone, you'll need to start a new game :)

Edit 1: Currently *only* on android because that's what I was able to figure out first. It would be nice to be cross platform but it probably won't happen soon, sorry.

EDIT 2:

I believe the above file is correct, but dropbox apparently doesn't do something the HTTP headers necessary to get android to recognise the file. I don't quite understand, but downloading from github seems to work:

https://github.com/CartesianDaemon/emojilutionmatch/raw/master/releases/emojilution-match.apk

EDIT 3

And here's a version which doesn't crash at negative longitude:

https://github.com/CartesianDaemon/emojilutionmatch/blob/master/releases/emojilution-match-0.3.apk
jack: (Default)
OK, I revisited my emoji matching android game. Now I think I understand what's going on better, both the way the author of the game I based it on laid out different files, and how android UI works.

I usually split my code into "backend" and "frontend", and frontend usually instantiates a backend class that does all the working out all the consequences. And this was similar, except backend was called "game" and frontend was called "gui" which is reasonable but I kept expecting things in a slightly different place than they actually were.

The frontend is mostly a list of lots of SomethingActivity classes, where in Android-development speak, an activity is one mode the app may present to the user, something which displays... something, and handles input, etc. So in this case the important one was PlayActivity, when the game is actually active, but there were a lot of others for importing, exporting, choosing a game from a list, etc, etc, which I didn't use.

The way multitasking is handled, there are functions called when your app resumes, pauses, starts up, is killed, etc. And you need to look at a big flow chart to make sure you get it right, but I mostly just copied the code already there. Basically, there's a "quick save" thing, where if your app is still running in theory but there's not enough memory for it, the OS calls a function with a variable in which it can stash its current state (including things like "are any menus open" etc), and then if it's switched to the front again, that is used in its initial start-up function and it's expected to reconstruct its current state from that info, as if it had never been swapped out.

But it *also* needs to do a real save when it's swapped out, because the phone might restart or kill its app completely before it's swapped in again, where it writes the current game layout to a database (but doesn't try and track things you wouldn't expect to be the same after you start it up again).

When you turn the phone sideways, these functions are also called: there's not a separate "reflow all the GUI elements" function, instead it temporarily saves the state, then restores it again in the new orientation.

And I added a cheat mode so I (or anyone who would like to), can test it out without walking around.
jack: (Default)
Bitwise copy

I'd read this but not really thought about it before. Rust prioritises data structures which can be moved or copied with memcpy. That eases various things. But to achieve it you need to keep a very tight rein on many things which are used all over the place in most languages.

Notably, you can only have a pointer to a struct from one place, unless you specifically arrange to "borrow" it, but a borrowed value can't be changed or moved (either the original or any of the borrows). Rust blogs describe this as similar to the discipline needed when dealing with data from multiple threads, except to avoid mistakes like "I am in the middle of a computation using this value, and then call another function which changes this value, and forgot that created an implicit dependency between those bits of code".

This shows up in lots of confusing ways, like function parameters need to be borrowed or copied, else they are moved by default, and once moved, the original is gone and can't be accessed.[1]

I'm not sure if this will turn out to be really useful or not. I see the logic, and agree that can prevent mistakes, but I don't know if it's possible to write code to avoid those problems, or if in practice everyone ends up using one of the ways to work round this restriction, and then tracks any unfortunate implicit dependencies in their heads just like they used to.

The specific example I'm going to mention below is having owned structs which contain a pointer back to the owner, which doesn't usually work because someone else needs to have a pointer to the owner as well.

Interior mutability

This is only slowly making sense to me, I'm not sure how much sense the version I'm writing does.

A common pattern in some programs is "having a struct containing a bunch of settings which can be accessed and changed from multiple parts of the program".

This is particularly difficult in rust because having different pointers to an object usually forbids you from changing it.

The way round this is "interior mutability", that is, a value that can be changed even if it's part of a struct which is supposed to be immutable. It's a bit like "mutable" in C++ which is used for weird edge cases like caching calculated values in an apparently const class or allowing const functions to lock mutexes, etc. Except that you're apparently supposed to use it for any "normal" variables which you can read and write from multiple places. IIRC you can either use "Cell" which works on a normal value variable in other languages or RefCell which works like a pointer and has a lock at runtime which panics if you get it wrong.

This brings us back to the topic I was thinking about before, originally inspired by these features of rust. That a common pattern is needing a pointer to a owning class from an owned class. But you might not need that if you had the feature I discussed before, that whenever you access the owned object through a reference to the owning object, it carries along a secret pointer to it, like a second "this" pointer.

That could work for the case of "access some central object from several different parts of the program". If various parts are owned objects, that can access the parent object, but only when they're entered by code from that object, the parent object (including the settings object) is only accessed from one of the components at once, which borrows exclusive access to the parent when its member function is called.

However, the feature I wasn't sure of but would need to be added is, if you have a pointer to that owned object from anywhere *else* (notably, a callback of some sort), it needs a special pointer that wraps up a pointer to it and a pointer to its parent together. That does sound really hairy, although if you have a 64 bit pointer there should be lots of room to implement the sordid details somehow. Assuming you never need to nest this too deep. Although of course, at that point, you've given up any pretence these could be moved around in memory anyway, so maybe there's no benefit to this flexibility?

Footnotes

[1] I think explanations of this explain it really badly, in that most people encounter these errors before understanding why "move by default" is a thing at all, so don't find that a satisfying answer.
jack: (Default)
Removing code is good! But everywhere I've worked has had a "pile of makefiles" build system, which have invariably had problems when you remove a file, because the .d files are still hanging around, and make chokes on a source file because it doesn't have the headers it needed last time, even though they're actually not necessary to actually build the file.

And it's a matter of culture whether it's "when you check out code, you often need to make clean or make undepend somewhere to get it to compile" or "when you check in code, you need to find a workaround to make it build cleanly even if you've removed files".

Do people with more recent build tools than "make" avoid this problem?

However, after thinking it through carefully I eventually decided on one of the ways to makefiles cope with this correctly.

The trick

You still do "-include $(OBJ_FILES:%.c=%.d)" or equivalent.

But when you produce a .d file with gcc (usually as a side effect of producing a .o file via -MMD), add an extra line at the end of the recipe, a perl script which edits the .d file in-place and replaces each "filename.o: header1.h header2.h..." with "filename.o $(wildcard: header1.h header2.h...)"

That way, if any dependency has *changed* a rebuild is forced as normal. But only dependencies that actually exist become dependencies within the makefile. (Deleting a header file doesn't trigger a rebuild, but it doesn't with the old system either since the .o file already exists.)

I can share the exact script if anyone wants to see.
jack: (Default)
OK, so. We want to allocate a large block of memory that is contiguous as physical memory. That means allocating physical memory in the kernel (as with kmalloc), and then later providing it to userspace software. Presumably then mapping it into virtual memory for use in userspace with mmap from physical memory in dev/mem, although we may be doing something different for reasons which aren't relevant here.

We happen to have a kernel driver already for other experiments with our specific hardware, so we have somewhere convenient to put this kernel code as needed.

This is running on a hardware board dedicated to a single task, so we have a few advantages. We would prefer to allocate a large chunk on start-up, and will have complete control over which programs we expect to use it, we don't need to dynamically manage unknown different drivers trying to get this memory, and we never intend to free it, and the board will only be used for this so we don't need to make sure other programs run ok. And there's no restriction on addresses, DMA and other relevant peripherals can access the entire memory map, so unlike x86 we don't need to specifically reserve *low* memory.

There are several different related approaches, and I went through a few rabbit holes figuring out what worked.

Option 1: __memblock_alloc_base()

From research and helpful friends, I found some relevant instructions online. One was from "Linux Device Drivers, 3rd edition", the section entitled "Obtaining Large Buffers", about using alloc_bootmem_low to grab kernel pages during boot. I'm not sure, but I think, this was correct, but the kernel started using memblock instead of bootmem as a start-up allocator?

From the code in the contiguous memory allocator (search the kernel source for "cma"), I learned that possibly I should be using memblock functions as well. I didn't understand the different options, but I used the same one as in the contiguous memory allocator code, __memblock_alloc_base and it seemed to work. I tried large powers of 2 and could allocate half of physical memory in one go. I haven't fully tested this, but it seemed to work.

There are several related functions, and I don't know for sure what is correct, except that what the cma code did worked.

This code is currently in a kernel driver init function. The driver must be compiled statically into the kernel, you can't load it as a module later. You could put the code in architecture specific boot-up code instead.

Option 2: cma=

fanf found a link to some kernel patches which tried to make a systematic way of doing this, based on some early inconsistently-maintained patch, which later turned into code which was taken up by the kernel. Google for "contiguous memory allocator". There's an article about it from the time and some comments on the kernel commit.

It's a driver which can be configured to grab a large swath of contiguous memory at startup, and then hand that out to any other driver which needs it.

You specify the memory with "cma=64MB" or whatever size on the kernel command line. (Or possibly in the .config file via "make menuconf"?) You need to do this because it allocates on start-up, and it doesn't know if it should have this or not.

It then returns this memory to normal calls to "alloc_dma_coherent" which is designed to allocate memory which is physically contiguous, but doesn't normally allocate such big blocks. I hadn't tested this approach because I didn't need any specific part of memory so I'd been looking at kmalloc not "alloc_dma_coherent", but a colleague working on a related problem said it worked on their kernel.

It may also do clever things involving exposing the memory to normal allocating, but paging whatever else is there out to disk to free it up when needed, I'm not sure (?)

I was looking at the source code for this and borrowed the technique to allocate memory just for our driver. We may either go with that (since we don't need any further dynamic allocation, one chunk of memory is fine), or revert to using the cma later since it's already in the kernel.

I went down a blind alley because it looked like it wasn't enabled on my architecture. But I think that was because I screwed up "make menuconfig" not specifying the architecture, and actually it is. Look for instructions on cross-compiling it if you don't already have that incorporated in your build process.

Option 3: CONFIG_FORCE_MAX_ZONEORDER

This kernel parameter in .config apparently increases the amount of memory you can allocate with kmalloc (or dma_alloc_coherent?). We haven't explored this further because the other option seemed to work, and I had some difficulties with building .config, so I don't know quite how it works.

I found the name hard to remember at first. For the record, it means, ensure the largest size of zone which can be allocated is at least this order of magnitude (as a power of two). I believe it is actually 1 higher than the largest allowed value, double check the documentation if you're not sure.

Further options

There are several further approaches that are not really appropriate here, but may be useful under related circumstances.

* On many architectures, dma does scatter-gather specifically to read or write from non-contiguous memory so you shouldn't need this in the first place.

* Ensure the hardware can write to several non-contiguous addresses.

* Allocate the several blocks of the largest size kmalloc can allocate, and check that they do in fact turn out to be contiguous since kernel boot-up probably hasn't fragmented the majority of memory.

* Ditto, but just allocate one or several large blocks of virtual memory with malloc, and check that most of it turns out to be allocated from contiguous physical memory because that's what was available. This is a weird approach, but if you have to do it in userspace entirely, it's the only option you could take.
jack: (Default)
The android game I wrote last month is available for download (see bottom of this post).

Gameplay

It's a variant on an augmented reality match three game. Physically walk around to change which square is highlighted with a light grey background. Click that square to place the next tile there. The next tile is shown at the bottom of the screen. Match three of the same type in a row, and they vanish forming a new type. Then try to match three of *those*. When you reach hearts, match three hearts of any colour and they vanish entirely (but give lots of score).

For instance, three fish next to each other in a line make an octopus, three octopuses make a whale, three whales make a blue heart, three hearts of any colour vanish entirely. And similarly for the three other starter animals.

Only vertical and horizontal. But if you make a line of four, or two crossing lines of three, they all vanish. They only give one new tile, but you get more points.

It would be trivial to play if you could just click on a square, but it's surprisingly addictive when you play it walking about.

Be careful not to walk into the middle of roads! It's surprisingly easy to make that mistake when you're concentrating on your location in the game.

The screen wraps round, so you can always keep walking in one direction rather than walk in the opposite direction. It's best to start by figuring out which compass direction corresponds to which direction on the grid :)

Tips: When you complete an octopus, think about where you're going to put the fish to make the next octopus next to the first one.

Details

If you open the .apk file on an android device, it should ask if you want to install it. You can only do so if you agree to install apps which come from me not the play store. I think that should work but I don't know for sure.

It is very early stages. It seems to work on one or two devices, but I haven't tested it more extensively than that. It will hopefully be ok, but I don't know for sure. I would appreciate knowing everyone who tried it, just whether it ran ok or not, and if the game itself seemed to work.

It still has some UI from the open source OpenSudoku game I based the code on. Don't pay any attention to the menus or help.

File:

https://www.dropbox.com/s/md5sjt25xe3eean/emojilution-debug.apk

(Let me know if the link doesn't work. You should *not* need a dropbox account to use it, but you may have to scroll to the bottom of the screen to continue to download without one.)

Feedback

I would appreciate knowing everyone who tried it, just whether it installed ok or not, and if the game itself seemed to work.

Lots of things are known to be unfinished, so don't waste energy enumerating what's missing in menus etc. Do let me know anything that seems to prevent me playing the game. Do ask if it doesn't run or it's not obvious what to do. Comments on what's fun and what isn't are very much appreciated!

Thank you!

Active Recent Entries