jack: (Default)
OK, so. We want to allocate a large block of memory that is contiguous as physical memory. That means allocating physical memory in the kernel (as with kmalloc), and then later providing it to userspace software. Presumably then mapping it into virtual memory for use in userspace with mmap from physical memory in dev/mem, although we may be doing something different for reasons which aren't relevant here.

We happen to have a kernel driver already for other experiments with our specific hardware, so we have somewhere convenient to put this kernel code as needed.

This is running on a hardware board dedicated to a single task, so we have a few advantages. We would prefer to allocate a large chunk on start-up, and will have complete control over which programs we expect to use it, we don't need to dynamically manage unknown different drivers trying to get this memory, and we never intend to free it, and the board will only be used for this so we don't need to make sure other programs run ok. And there's no restriction on addresses, DMA and other relevant peripherals can access the entire memory map, so unlike x86 we don't need to specifically reserve *low* memory.

There are several different related approaches, and I went through a few rabbit holes figuring out what worked.

Option 1: __memblock_alloc_base()

From research and helpful friends, I found some relevant instructions online. One was from "Linux Device Drivers, 3rd edition", the section entitled "Obtaining Large Buffers", about using alloc_bootmem_low to grab kernel pages during boot. I'm not sure, but I think, this was correct, but the kernel started using memblock instead of bootmem as a start-up allocator?

From the code in the contiguous memory allocator (search the kernel source for "cma"), I learned that possibly I should be using memblock functions as well. I didn't understand the different options, but I used the same one as in the contiguous memory allocator code, __memblock_alloc_base and it seemed to work. I tried large powers of 2 and could allocate half of physical memory in one go. I haven't fully tested this, but it seemed to work.

There are several related functions, and I don't know for sure what is correct, except that what the cma code did worked.

This code is currently in a kernel driver init function. The driver must be compiled statically into the kernel, you can't load it as a module later. You could put the code in architecture specific boot-up code instead.

Option 2: cma=

fanf found a link to some kernel patches which tried to make a systematic way of doing this, based on some early inconsistently-maintained patch, which later turned into code which was taken up by the kernel. Google for "contiguous memory allocator". There's an article about it from the time and some comments on the kernel commit.

It's a driver which can be configured to grab a large swath of contiguous memory at startup, and then hand that out to any other driver which needs it.

You specify the memory with "cma=64MB" or whatever size on the kernel command line. (Or possibly in the .config file via "make menuconf"?) You need to do this because it allocates on start-up, and it doesn't know if it should have this or not.

It then returns this memory to normal calls to "alloc_dma_coherent" which is designed to allocate memory which is physically contiguous, but doesn't normally allocate such big blocks. I hadn't tested this approach because I didn't need any specific part of memory so I'd been looking at kmalloc not "alloc_dma_coherent", but a colleague working on a related problem said it worked on their kernel.

It may also do clever things involving exposing the memory to normal allocating, but paging whatever else is there out to disk to free it up when needed, I'm not sure (?)

I was looking at the source code for this and borrowed the technique to allocate memory just for our driver. We may either go with that (since we don't need any further dynamic allocation, one chunk of memory is fine), or revert to using the cma later since it's already in the kernel.

I went down a blind alley because it looked like it wasn't enabled on my architecture. But I think that was because I screwed up "make menuconfig" not specifying the architecture, and actually it is. Look for instructions on cross-compiling it if you don't already have that incorporated in your build process.

Option 3: CONFIG_FORCE_MAX_ZONEORDER

This kernel parameter in .config apparently increases the amount of memory you can allocate with kmalloc (or dma_alloc_coherent?). We haven't explored this further because the other option seemed to work, and I had some difficulties with building .config, so I don't know quite how it works.

I found the name hard to remember at first. For the record, it means, ensure the largest size of zone which can be allocated is at least this order of magnitude (as a power of two). I believe it is actually 1 higher than the largest allowed value, double check the documentation if you're not sure.

Further options

There are several further approaches that are not really appropriate here, but may be useful under related circumstances.

* On many architectures, dma does scatter-gather specifically to read or write from non-contiguous memory so you shouldn't need this in the first place.

* Ensure the hardware can write to several non-contiguous addresses.

* Allocate the several blocks of the largest size kmalloc can allocate, and check that they do in fact turn out to be contiguous since kernel boot-up probably hasn't fragmented the majority of memory.

* Ditto, but just allocate one or several large blocks of virtual memory with malloc, and check that most of it turns out to be allocated from contiguous physical memory because that's what was available. This is a weird approach, but if you have to do it in userspace entirely, it's the only option you could take.
jack: (Default)
The android game I wrote last month is available for download (see bottom of this post).

Gameplay

It's a variant on an augmented reality match three game. Physically walk around to change which square is highlighted with a light grey background. Click that square to place the next tile there. The next tile is shown at the bottom of the screen. Match three of the same type in a row, and they vanish forming a new type. Then try to match three of *those*. When you reach hearts, match three hearts of any colour and they vanish entirely (but give lots of score).

For instance, three fish next to each other in a line make an octopus, three octopuses make a whale, three whales make a blue heart, three hearts of any colour vanish entirely. And similarly for the three other starter animals.

Only vertical and horizontal. But if you make a line of four, or two crossing lines of three, they all vanish. They only give one new tile, but you get more points.

It would be trivial to play if you could just click on a square, but it's surprisingly addictive when you play it walking about.

Be careful not to walk into the middle of roads! It's surprisingly easy to make that mistake when you're concentrating on your location in the game.

The screen wraps round, so you can always keep walking in one direction rather than walk in the opposite direction. It's best to start by figuring out which compass direction corresponds to which direction on the grid :)

Tips: When you complete an octopus, think about where you're going to put the fish to make the next octopus next to the first one.

Details

If you open the .apk file on an android device, it should ask if you want to install it. You can only do so if you agree to install apps which come from me not the play store. I think that should work but I don't know for sure.

It is very early stages. It seems to work on one or two devices, but I haven't tested it more extensively than that. It will hopefully be ok, but I don't know for sure. I would appreciate knowing everyone who tried it, just whether it ran ok or not, and if the game itself seemed to work.

It still has some UI from the open source OpenSudoku game I based the code on. Don't pay any attention to the menus or help.

File:

https://www.dropbox.com/s/md5sjt25xe3eean/emojilution-debug.apk

(Let me know if the link doesn't work. You should *not* need a dropbox account to use it, but you may have to scroll to the bottom of the screen to continue to download without one.)

Feedback

I would appreciate knowing everyone who tried it, just whether it installed ok or not, and if the game itself seemed to work.

Lots of things are known to be unfinished, so don't waste energy enumerating what's missing in menus etc. Do let me know anything that seems to prevent me playing the game. Do ask if it doesn't run or it's not obvious what to do. Comments on what's fun and what isn't are very much appreciated!

Thank you!
jack: (Default)
I previously talked about accessing the scope of an owning class from a class declared and instantiated within it. Latest post here: https://jack.dreamwidth.org/1017241.html

The possible approaches seem to be: the owned class has a pointer back to the owning class; or functions dealing with the owned class get an extra parameter to the owning class. Whether that's implemented manually, or automatically by the language, or somewhere between.

Thinking some more about this, various things occurred to me:

Java

I hadn't realised, but I learned Java and C# (and maybe other recent/managed languages) do this automatically, presumably implemented by the owned class automatically having a pointer to the owning class, and checked where its instantiated that it's only instantiated by an instance of the owning class.



I was naturally drawn to a "owning pointer is passed in alongside or as part of a this pointer" implementation as it seemed more conceptually correct. However, the actual benefit of this is a lot smaller in most languages other than rust. I first started thinking about these options in a rust example, where having a pointer to the owning class needed some fancy dancing, because rust prefers to keep tight limits on how many pointers to a class at once (ideally one only).

This hopefully makes memory management safer, and means you can usually move classes around in memory using a raw memcpy, because they don't usually have internal pointers to different parts of them. But most other languages don't even try to do that, just assume that a non-trivial class is fixed in place in memory (or moved only by a garbage collector that knows where all the pointers are).

Implementation

If you try to avoid having a permanent pointer back to the owning class, and if you ever need a pointer to the owned class (this is common if you use it as a callback), you need to accept your pointers would actually be a pair (or more) of pointers, to the owning class, and to the owned class. The owned pointers might be an offset rather than a complete pointer. That's clunky, but wouldn't necessarily take up that much space if the language supports it. You could do a similar thing for iterators, like pointers to members of a collection, rather than having a bare pointer that only makes sense if you already know what the collection is.

That seems a useful concept, but I'm not sure how useful it would be in practice.
jack: (Default)
I was reflecting further on my previous comments on meta-history in source control.

One use case I imagined was that you can rebase freely, and people who've pulled will have everything just work assuming they always pull rebase. But I may have been too pessimistic. A normal pull rebase may usually just cope with the sort of rebasing people are likely to have done upstream anyway.

The other question is, are you losing old history by rebasing older commits? Well, I don't suggest doing it for very old commits, but I guess, you're not losing any history for commits that were in releases.

Although that itself raises a reason to have a connection between the new branch and the old: you shouldn't be rebasing history prior to a release much (usually not at all. maybe to squash a commit to make git bisect work?) But if you do, you don't want too parallel branches with the same commit, you want to be able to see where the release was on the "good" history (assuming there's a commit which is file-identical to the original release commit), and fall back to the "original" history only if there's some problem.

And as I've said before, another thing I like is the idea that if you're rebasing, you don't have a command that says "do this whole magic thing in one step", you have a thing that says "construct a new branch one commit at a time from this existing branch, stopping when there's a problem", and there is no state needed to continue after resolving a problem, you just re-run the command on the partially-constructed new branch. And then can choose to throw away the old branch to tidy up, but that's not an inherent part of the commadn.
jack: (Default)
I like the principle of duck typing.

Roast it if it looks sufficiently duck-like. Don't worry about whether it's officially a duck, just if it has the relevant features for roasting.

However, I don't understand the attachment to getting 3/4 of the way through the basting, stuffing and roasting project before suddenly discovering that you're trying to crisp a small piece of vaguely duck-shaped ornamental stonemasonry.

I agree with (often) only testing for the *relevant* features of duck-ness. But it seems like the best time to test for those relevant features is "as soon as possible", not "shut your eyes, and charge ahead until you fail". Is there a good reason for "fail fast, except for syntax errors, those we should wait to crash until we're actually trying to execute them"?

I've been working on my non-vegetarian metaphors, how did I do? :)
jack: (Default)
Thanks to everyone who commented on the previous post, or posted earlier articles on a similar idea. I stole some of the terminology from one of Gerald-Duck's posts Simon pointed me to. And have tried to iterate my ideas one step closer to something specific.

Further examples of use cases

There are several related cases here, many adapted from Simon's description of PuTTY code.

One is, in several different parts of the program are a class which "owns" an instance of a socket class. Many of the functions in the socket also need to refer to the owning class. There are two main ways to do that. One way is every call to a socket function passes a pointer to the parent. But that clutters up the interface. Or the socket stores a pointer to the parent initialised on construction. But there is no really appropriate smart pointer, because both classes have pointers to each other.

A socket must have an owner. And classes which deal with sockets will usually have exactly one that they own, but will also often have none (and later open one), or more than one.

And because you "know" the pointer to the parent will never be invalid as long as the socket is owned by the parent, because you never intend to pass that pointer out of that class, but there is no decent language construction for "this pointer is a member of this class, and I will never copy it, honest" which then allows the child to have a provably-safe pointer to the parent. This is moot in C if you don't have smart pointers anyway, but it would still be useful to exactly identify the use case so a common code construction could be used, so programmers can see at a glance the intended functionality. It would be useful to resolve in C++. And there are further problems in rust, where using non-provably-safe pointers is deliberately discouraged, and there's a greater expectation that a class can be moved in memory (and so shouldn't contain pointers to parts of itself).

The same problem is described too different ways. One is, "a common pattern is allocating an owned class as a member of another class, the owned class has an equal or shorter lifetime than the owner, and a pointer back to it which is known to always be valid with no pointer loops", or a special sort of two-way pointer, where one is an owning pointer and the other is a provably-valid non-owning pointer. Another is "classes often want to refer to the class that owns them, or the context they were called from, and there is no consistent/standard way of doing that."

Proposal

Using C++ terminology, in addition to deriving from a class, a class can be declared "within" another class, often an abstract base class aka interface aka trait of the actual parent(s).

class Plug
{
virtual void please_call_me_from_socket(int arg1)=0;
}

class Socket : within Plug
{
// Please instantiate me only from classes inheriting from Plug
public:
void do_something();
private:
int foo;
};


The non-static member functions of Socket, in addition to a hidden pointer parameter identifying the instance of socket which is accessed by "this->", has a second hidden parameter identifying the instance of Plug from which is was called, accessed by "parent->please_call_me_from_socket(foo)" (or parent<Plug>->please_call_me_from_socket(foo) or something to disambiguate if there are multiple within declarations. Syntax pending).

Where does that pointer come from? If it's called from a member function of a class which is itself within Plug, then it gets that value. That's not so useful for plug, but is useful for classes which you want to be accessible almost everywhere in your program, such as a logging class.

In that case, you may want a different syntax, say a "within" block which says all classes in the block are within an "app" class, and then naturally all pass around a pointer to the top-level app class without any visual clutter. And it only matters when you want to use it, and when you admit the logger can't "just be global".

For Socket, we require than member functions are Socket are only called from member functions of Plug (which is what we expected in the first place, but hadn't previously had a way of guaranteeing). And then the "parent" pointer comes from the "this" pointer of the calling function.

There is probably also some niche syntax for specifying the parent pointer explicitly, if the calling code has a pointer to a class derived from Plug, but isn't a member function, or wants to use a different pointer to this. The section on pointers may cover this.

Pointers, Callbacks, Alternatives, and Next Steps )
jack: (Default)
I am still mulling this over after reading some articles on it (thanks, fanf, Kaela).

Background

Imagine you have a fairly simple function.

RetType func1(arg1, arg2)
{
   return func3(func2(arg1),func2(arg2)).func4();
}


But those other functions may encounter errors. Eg. they involve opening files, which may not be there.

Assume the error return can't usually be passed to a follow-up function.[1] The obvious then necessary step is for each function call, test the return value, if it's an error, return an error from this function. Else continue with the calculation. But this usually involves several lines of code for each of these functions, which obscures the desired control flow.

If you are willing to accept exceptions, you can just write the code above an allow any exceptions to propagate. But that represents a lot of hidden complexity from not knowing what might be thrown. And often overhead in runtime.

And in fact, this may obscure a common pattern, that for some function (eg. "parse this"), you SOMETIMES want to treat the failure as an error, and sometimes to interrogate it. As in, choose in the calling code whether failure is an error-value or exception.

Also remember, in C-like languages, many values unavoidably have a possible error case which can't be passed to other functions, null pointer. Ideally it would be clear which pointers might be null and which have already been assumed not to be.

In Rust

In Rust (if I understand correctly), these possibilities are often wrapped up in a try macro.

There is a conventional "Result" return type from most functions which may succeed or fail, which has one of two values. Either 'Ok' (usually though not required wrapping a return value). Or 'Err', wrapping a specific error (just a string, or an error object).

The try macro combines the "test return value, if it's an error, return that error from this function, else, evaluate to the successful value" into a brief expression:

try!(func2(arg))

Which seems like often what you want. Obviously if you want to handle the error in some way (say, you're interested in whether it succeeds, not just the successful result), you can interrogate the result value for ok or err.

And there's also a macro for "assume success, unwrap the result value, panic if it's not there", just like you can access a pointer without checking for null if you want. But functions which can't return an error shouldn't return "Result", so if you do that, it's clear you *might* fail. Which is exactly what you want for throw-away code. But it does mean, you can search for the unwrap macro if you want to find all the points where you did that and fix them.

Rust recent innovation: ?

I mention try! for historical reasons, but just recently, Rust has promoted it into a language feature, reducing the overhead further from four to six characters, to 1: '?' after a value means the same thing as the try macro.

Result<int, errtype="ErrType"> func1(arg1, arg2)
{
   return func3(func2(arg1)?,func2(arg2)?)?.func4()?; // Pseudocode, not actual rust syntax
}


Rust recent innovation: chaining

This is also really new and not standard yet, but I like the idea. Error chaining. The function .error_chain(|| "New error") is applied to the result of a function call. If it was a success, that's fine. If not, this error is added to the previous error. It is typically then followed by the try macro or ?. (I think?)

That means that your function can return a more useful error, eg. "could not open log file" or "could not calculate proportion". Which carries along the additional information of WHY it couldn't, eg. "could not open file XXXX in read mode" or "div by zero".

And then a higher level function can decide which of those it cares about handling -- usually not the lowest level one.

In some ways like exceptions, but (hopefully, because Rust) with no runtime overhead.

Footnotes

[1] I often think of it as, an error-value is one that, under any future operation of any sort, stays the same error value, but that's usually not how it's actually implemented.
jack: (Default)
A little while ago, someone told me about a really simple algorithm brainteaser. Suppose you want to find both the minimum and maximum of an array. Instead of writing something like:
   for (int i=0;i<size;i+=2)
   {
      if (arr[i]<min) min = arr[i];
      if (arr[i+1]<min) min = arr[i+1];
      if (arr[i+1]>max) max = arr[i+1];
      if (arr[i]>max) max = arr[i];
   }

You can reduce the number of comparisons per two elements from 4 to 3 by doing something like:
      if (arr[i]<arr[i+1])
      {
         if (arr[i]<min) min = arr[i];
         if (arr[i+1]>max) max = arr[i+1];
      }
      else
      {
         if (arr[i+1]<min) min = arr[i+1];
         if (arr[i]>max) max = arr[i];
      }

I asked, does it make a difference if that pipelines less efficiently, and I didn't really get an answer, but I got the impression that wasn't a sensible question to ask.

But when I actually tried it, with some simple instrumentation code (using "clock()" from "time.h"), the second took about twice as long. On a windows PC, compiled with cl, using O2.

When I looked at the disassembly, each comparison looked to be something like:
   if (arr[i]<min) min = arr[i];
0040118B  mov         ecx,dword ptr [i] 
0040118E  mov         edx,dword ptr [arr] 
00401191  mov         eax,dword ptr [min] 
00401194  mov         ecx,dword ptr [edx+ecx*4] 
00401197  cmp         ecx,dword ptr [eax] 
00401199  jge         min_max_2+59h (4011A9h) 
0040119B  mov         edx,dword ptr [min] 
0040119E  mov         eax,dword ptr [i] 
004011A1  mov         ecx,dword ptr [arr] 
004011A4  mov         eax,dword ptr [ecx+eax*4] 
004011A7  mov         dword ptr [edx],eax

Which didn't seem great, but did seem like the number of instructions was proportional to the number of lines expected to be executed.

What have I missed?
jack: (Default)
In C and C++, you should avoid using an uninitialised variable for several reasons, not least of which, it's undefined behaviour (?) But in practice, what are the relative likelihoods of the (I think?) permitted outcomes:

(a) it being treated as some unknown value
(b) the following code being deleted by the compiler
(c) something even weirder happening?
jack: (Default)
I talked about this several times before, but the idea was still settling down in my head and I don't think it made a lot of sense.

Imagine each commit had one parent that was the "branch" parent or "this is a logical atomic change" parent or the "the tests still pass" parent, or the "space not time" parent. All the same guarantees you'd expect from code on the same branch hold (eg. compiles, tests pass). This represents history the same way a non-branching source control history does, a list of changes that sum up the current state of the software. Or, for that matter, the same way a heavily rebased clunk-free history does. It shows the code being built up.

And each commit may have none or one (or more) "content" parent or "rebase" parent or a "chronological" parent or a "meta" parent, that represents "the change made in this commit, is based on this other commit".

If you already merge from branch into trunk, you may find the parents are quite like this already.

Why might you want to do this? Well, to me, the good reason is that it does away with all the "oh no I want to rebase but I already pushed". The pre-rebase history is just always there by default, though you could choose to purge those commits if you wanted. So any software working off your remote, when it pull-rebases, can just automatically move anything that was built on top of your pre-rebase branch onto your post-rebase branch, just as if you'd committed a few extra commits without rebasing anything. And the new code isn't just suddenly in limbo, anyone can check that the branch tip is the same as the pre-rebase branch tip.

It also may provide useful hints for merging avoiding duplicated commits, when there's extra info about which commits are "the same" other than their content. It doesn't solve the problem, but it may help some of the time your source control program can't automatically fix it.

It also removes all the "oh fuck, I screwed up, reflog, reflog, flog harder!" if you accidentally screw up a branch you're working with. Instead of the previous tip of the branch floating in limbo hoping you'll retreive it, the previous branch history, all of it, is retained, until you explicitly delete it. You don't even need to be on the same computer, someone else can ask you can push the changes and sort out your mess, when you can't (I think?) push a reflog.
jack: (Default)
https://www.dropbox.com/s/re1eu39rpaphtx1/pride_and_prejudice_e2f.html?dl=0

You may have seen the hash I made of Pride and Prejudice last week. Here's the point.

The idea is, take a simple word-by-word translation and an English text, and apply it to one word in the first sentence, and also anywhere that word appears thereafter, and another word in the second sentence, and anywhere that word appears thereafter, as so on. And the replacement has a hovertip telling you the english word (that should work in the link, at least in Chrome, let me know if not?).

The idea is, like people learn bits of Lapin by reading Watership Down, without ever consciously trying to remember it, or by watching TV with subtitles, that you always can click to get the translation, but by being repeatedly reinforced, you start to remember it as you go on without having to make a big effort to do so.

Or at any rate, that was the idea. I've no idea if it would work!

I apologise for the quality of the translation, I used a dodgy wordlist to get something up and running, but it doesn't do an actual context-sensitive translation. You could also do the reverse, take a French text and translate all the words in your wordlist to start with, and successively less as you go. So the grammar would be correct.

Any suggestions?
jack: (Default)
State of progress: "Qu'il N'est-ce UNE vérité universally acknowledged, Qu'on UNE single Homme In possession DU UNE Bonne fortune, must s'agir In voulez DU UNE épouse."

No, it's not supposed to be a good translation, it's supposed to be an approximate word-by-word thing for reasons which will become apparent later. I'm pleased it worked AT ALL, as badly-capitalised franglish as it is :)
jack: (Default)
I got my robot-programming game to run on Android! I knew it shouldn't be *that* difficult, but it's really magical seeing something I wrote running on a platform I never expected to run on.

I haven't done any more to the game since last year, so it's not really *playable*, but it runs and you can interact with it.

I used kivy as the UI framework for a python game, because it advertised being able to compile to android, and I wanted to learn more python more than I wanted to learn java. So I developed the program on the PC, using kivy for graphics and mouse events (which later become touch/drag events on a touchscreen).

And then after several false starts, I downloaded a VM set up to the do the "build to android" step with buildozer, updated buildozer to the latest version, copied my source to the VM, I already had generated a buildozer.spec file, and it all just worked -- I got an apk, I opened it from dropbox on android, and there was my game running.

Gotchas, I don't expect anyone to try this with my instructions, but in case you do, things I didn't find obvious: to share a folder with the VM, you need to add your user account to a "can see shared folders" usergroup; buildozer can fail to work on a shared folder, copy the files to a local directory on the VM; you should be able to install the android dev kit etc with or without buildozer, but I couldn't get it to work.
jack: (Default)
My latest thought is, I expect to be able to get a close-to-optimal matching for N=2n or 2n+1 people in n rounds of N/3 groups of 3, when N is a multiple of 3.

But if we have exactly enough groups that each person meets each other, that relies on each group of 3 in each round, all 3 pairs not having met before. So maybe the right algorithm is for each round, "randomly generate groups of three where all 3 pairs haven't met before, made of people not already placed this round". And as soon as you can't, back up a round or more and try again. That guarantees permuting when we need to, as soon as we need to. And I sort of hope that when things work in the middle, they'll "just work" for the last few rounds but I don't know if they will.

However, I'm away until Sunday so I probably won't have a chance to try it. Anyone else interested enough to have a go?
jack: (Default)
Having failed to use the biggest hammer ("ask the internet if anyone can think of a general mathematical solution"), I tried the next-biggest, brute-force.

I wrote a program which divides people into random matchings, and then switches matches which are over- or -under represented. I'm not quite sure what sort of "random switching" is best, I hoped to just get lucky.

The first effort found a version for 9 people easily, but didn't find one any larger than that. The second was about the same. That's where I am now. I can describe the shuffling in detail if anyone is interested or thinks they might have any suggestions.
jack: (Default)
logbook

fanf linked to an article about this, I can't remember where now, but I wanted to talk about keeping a logbook. I think most people do something similar in some way, but I don't know if everyone thinks about it the same way. About three years ago, I noticed that I didn't always remember things I needed, and it was a problem, and I started keeping the simplest possible solution, which seems to have done fine so I still have it.

I use a text file (backed up, and in a separate source control system just in case I ever want to fish out anything I've deleted, tho' that doesn't seem likely) as a diary or logbook.

Every code change goes in source control, but almost everything ELSE goes in the diary. Notes on where such-and-such server is, and who to ask for log-in details. Results from a series of experiments. Dead-ends investigated and abandoned. Steps to install something that I'm likely to need again in 18 months, but I don't know if anyone else will need.

Brief summaries of conversations with people, suggestions, etc. Informal deadlines, comments on other projects I don't officially need to know but it's likely useful to be aware of. A summary, entirely for myself, of what I've done today, at whatever level of detail seems useful (sometimes just, "closed off several dead ends, nothing else useful", often "did X, Y and Z. need to finish A and B. Decide if C." Occasionally a more detailed checklist pending being copied somewhere else).

And if anything seems like it WILL be important to other people, it's easy to copy it into a bug report or anywhere else.

what DOESN'T go in there

There's several things that do NOT go in the logbook. Anything that's already in source control commit comments doesn't need to (though sometimes the comments are a cleaned-up version of a longer work-in-progress in the diary).

Personal thoughts, frustrations, introspection, etc, etc, I find it valuable to write down in a SEPARATE file. Thinking through them is sometimes useful, but I never need to refer back to them in the same way.

Immediate TODOs go into another file, although I'm considering changing that. Long-term suggestions go into another file, or into bug database.

Brainstorming is typically on paper, but once I'm thinking of things which are fairly definitely relevant, they usually go on TODO list or logbook.

There's an art to judging what's necessary. I used to write too much. If I've gone down a bunch of blind allies fixing transient problems, I don't usually need to record all the details, just say "after a bunch of faff, realised the problem is X".

Format

You could do something different. Keep a physical logbook. Have a procedure for everything. Organise by topic instead of date. Have a better memory.

But what I do is simple. One long text file. I find everything important by just searching (occasionally I'm slightly redundant to make sure any likely keywords are included in what I'm writing). I have a line ">>>> [date]" in the correct format, a blank line, and then whatever. Usually a short description of what I did. Sometimes a list of things, which are copied to a todo-list or lightly edited throughout the day. Sometimes results of something. Sometimes a useful command line I might need to refer to later.

I never need to skim, I rarely need to say "what things did I do that week", so as long as I can clearly separate days, I don't worry about taking up too much space. If I needed to, I might annotate each day with a split of which tasks I spent time on, but I generally work on projects measured in weeks or longer.

At the top are a few general notes, such as links to servers, internal webpages, etc.

If I worked on significantly different work, I might keep one per project. Eg. for each home project I would have something similar, but at work I just have one for everything related to the company codebase.

Benefits

Anything I previously knew, is generally easily searchable, I almost never have to say "wait, why isn't it working, I had this problem before, why can't I remember what I did"?

If I need to check what state things were in at some time in the past, it's fairly easy to check what I was working on at the time.

It's always obvious what I was in the middle of, I never have to ask "wait, what was I doing on Friday, why did I stop?"

I almost never have to think "I would have written that down. But where?"

You

What do you do?
jack: (Default)
Aha! My Ubuntu VM had an old version of git. After updating, my previous attempts at cvs-fast-export now Just Work, and work's entire CVS repository as of a month ago is now in git! Now I need to decide if that's useful by itself, if it's worth tidying up with reposurgeon and otherwise, and if I can persuade anyone else it would be useful to use... :)

But I feel chuffed it worked, even though ESR did all the work and all I did was run "cvs-fast-export | git fast-import" :)
jack: (Default)
I could have found an answer that fitted this question and yesterdays question both, but I decided they were interesting in different ways.

Technological innovations I think we're groping towards, which I'm impatient to have already:

A programming language with a syntax as straightforward as python, but works like C++14 is trying to, of letting it all compile to blazing fast code, even for embedded systems, by default, but letting you easily use dynamic typing where you actually want it. And of letting you use static type checking MOST of the time, but lets you be as dynamic as you need when you actually need it.

Widespread 3D printing of replacement parts, etc. We're nearly there, but we're waiting for a slightly wider variety of materials, and a wider database of possible things. Where you can say "I want this £10 widget holder from the supermarket, but can I get one 30% longer if I pay extra? OK? Thank you!"

Private cars replaced by mega-fleets of robot taxis and universal good public transport throughout/between all population dense areas.

Everyone uses git, or another dvcs, and the interface is actually consistent and friendly for everybody.

Decent, standardised, change-tracking and formatting for non-plain-text documents that allows sensible merging. (OK, this seems to be two steps forward and three steps back, so maybe there's no point waiting for it, but I'd still like it! :))
jack: (Default)
Following on from the previous post, I think I have a clearer idea of what I was imagining, but I'm still not sure if it makes sense or if I have the implications right.

Premises

You own a repository, imposing the convention that commits should use "first parent" dependencies to represent "history on the same branch" and "second parent" dependencies to represent changes added to that branch.

That gives a notion of "the" history of a branch, distinct from "all commits which contributed to it". That contrasts traditional git practice of considering all branches equal, but I don't think contradicts it. If two branches exchange commits, each can consider its own history primary, and the other as "additions being merged in", even if that means you have a commit "merged branch B into branch A" and another commit "merged branch A into branch B" which is the same except for having its parents the other way round where traditional git would just have one commit.

This is basically putting back the notion that branches have an official history, which some revision control systems have, but git rejected. I hadn't realised the distinction until I heard people talk about it.

This also assumes, it's possible to implement "git log --first-parent" and "git bisect --first-parent" and "git annotate --first-parent" etc to mean "which commit made the change on this branch" where they don't already exist. I realise this may not be plausible, but I don't think there's any technical reason why it's any harder than the normal git versions?

Controversial Corollary

If that is already your default, it means you may be able to introduce extra second-parent links where they would currently fill the log history with garbage, break git-bisect, etc.

Specifically, suppose that whenever you change history (primarily when you change the order/content/collation etc of your commits, but also when you rebase your changes onto a new upstream), the new commits are shown as having "first parent" dependencies as normal, but also have "second parent" dependencies to the previous HEAD (which in traditional git rebasing would become orphaned on no branch and eventually be garbage collected).

Are there any benefits?

This fits conversations I had with simont that rewriting history should itself be versioned, but I'm not sure if it actually provides practical benefits.

Would it be the case that if you were making a complicated rebase, you would automatically be assured your previous state was preserved, without having to manually create a temporary branch?

Would it be the case that if you were making a complicated rebase and made a mistake, you wouldn't have to rely on there being no garbage collection between making the mistake and recovering from it?

Would it be the case that if you were making a complicated rebase, it's possible to push and pull those processes into other repositories and share the process between multiple people, rather than assuming it all has to be completed in one repository?

Would it be the case that if you rewrite history on a branch, other branches/repos downstream can do "git pull --rebase" and it will just magically work? This is what I was looking for -- it feels like "tidying up history" should be a thing which is embraced everywhere, not a dirty secret that you have to keep private...

I'm not sure it works at all like that? But I feel like there's _something_ right about the idea?
jack: (Default)
This has been bouncing around my brain for a while. It's difficult for me to work out because I don't have enough experience how projects are usually branched. I started off finding it very complicated, but now I'm thinking about just a few simple questions.

Do I have this right? Suppose in git, you follow a workflow of "develop features on branches (either feature branches or personal branches) and when they work, merge them into the main development branch". What do you expect the history of your main development branch to look like?

One extreme is "a list of all the historical commits, including false starts, reverted commits, etc, that went into developing each feature which has been merged into the main branch". Many CVS repositories would look like this by default. Is that right?

Another extreme would be "each feature branch is rebased to be a single commit saying 'developed feature X' or a small number of commits, each complete in themselves". This would make your main branch look a lot tidier, at the expense of losing all the history of the development of each feature.

It seems like what you ACTUALLY want is, the main branch is a list of "feature X" commits, but with the ability to easily view where those came from. And it seems like git actually has this with "git log --first-parent", with the assumption that the "first" parent of a merge is on the "same" branch, and the "second" parent is where the code was merged from. Which is often but not always true. Is that about right?

This is effectively an effort to put back something that was true by default in CVS, and is not necessarily true for git, but is often true in practice, that each branch has one particular history of "the history of that branch" (basically, the history of whatever the maintainers of that branch deemed good enough -- eg. complete features ready to release; or any code that compiles and passes regression tests, etc).

The same concept seems to apply at every level. If someone submits a large feature to a large open source project, they might often have to submit it as a patch or patch set, saying "this is the code I propose", shorn of history. Because that's what someone will approve or reject. But it might be useful to have the true history hanging around somewhere in case it shows how a mistake came into being or similar. Likewise, at the smallest level, if I make a local commit, and then amend it to fix a small typo, that's effectively saying "the logical history of this commit is just the replacement commit, but the complete history is the replacement commit, merged from an abandoned leaf branch containing the commit with the typo".

The same concept also seems to apply to rebasing a history into logical changes, or squashing it into one commit. The logical (first-parent) history should be the cleaned-up history. But that should have second-parent links that aren't shown on "the history of that branch" unless you want them, showing the pre-rebase history. Basically, I'm treating "first parent" as normal parent links (first/logical/clean/final/hard) and "second parent" links as soft links that should only be shown when asked for (second/original/soft). Although if you wanted to think of it that way, you'd have to find a way to cope with merges where first and second parent are both "first parent" in this system, etc.

I'm not proposing this yet, I'm asking, "is this what everyone else does and no-one told me?" or "is this nonsense because I've completely misunderstood?"

But if you were to do something like this, I guess what you'd do is (1) have a development process of always merging from feature branch into main branch and (2) work out anywhere "second parent" links should exist but aren't (eg. after rebase?) and how to insert them. And maybe (3), produce any extra command line tools/options necessary for displaying the history in the right way.

Does that make any sense?