jack

This has been bouncing around my brain for a while. It's difficult for me to work out because I don't have enough experience how projects are usually branched. I started off finding it very complicated, but now I'm thinking about just a few simple questions.

Do I have this right? Suppose in git, you follow a workflow of "develop features on branches (either feature branches or personal branches) and when they work, merge them into the main development branch". What do you expect the history of your main development branch to look like?

One extreme is "a list of all the historical commits, including false starts, reverted commits, etc, that went into developing each feature which has been merged into the main branch". Many CVS repositories would look like this by default. Is that right?

Another extreme would be "each feature branch is rebased to be a single commit saying 'developed feature X' or a small number of commits, each complete in themselves". This would make your main branch look a lot tidier, at the expense of losing all the history of the development of each feature.

It seems like what you ACTUALLY want is, the main branch is a list of "feature X" commits, but with the ability to easily view where those came from. And it seems like git actually has this with "git log --first-parent", with the assumption that the "first" parent of a merge is on the "same" branch, and the "second" parent is where the code was merged from. Which is often but not always true. Is that about right?

This is effectively an effort to put back something that was true by default in CVS, and is not necessarily true for git, but is often true in practice, that each branch has one particular history of "the history of that branch" (basically, the history of whatever the maintainers of that branch deemed good enough -- eg. complete features ready to release; or any code that compiles and passes regression tests, etc).

The same concept seems to apply at every level. If someone submits a large feature to a large open source project, they might often have to submit it as a patch or patch set, saying "this is the code I propose", shorn of history. Because that's what someone will approve or reject. But it might be useful to have the true history hanging around somewhere in case it shows how a mistake came into being or similar. Likewise, at the smallest level, if I make a local commit, and then amend it to fix a small typo, that's effectively saying "the logical history of this commit is just the replacement commit, but the complete history is the replacement commit, merged from an abandoned leaf branch containing the commit with the typo".

The same concept also seems to apply to rebasing a history into logical changes, or squashing it into one commit. The logical (first-parent) history should be the cleaned-up history. But that should have second-parent links that aren't shown on "the history of that branch" unless you want them, showing the pre-rebase history. Basically, I'm treating "first parent" as normal parent links (first/logical/clean/final/hard) and "second parent" links as soft links that should only be shown when asked for (second/original/soft). Although if you wanted to think of it that way, you'd have to find a way to cope with merges where first and second parent are both "first parent" in this system, etc.

I'm not proposing this yet, I'm asking, "is this what everyone else does and no-one told me?" or "is this nonsense because I've completely misunderstood?"

But if you were to do something like this, I guess what you'd do is (1) have a development process of always merging from feature branch into main branch and (2) work out anywhere "second parent" links should exist but aren't (eg. after rebase?) and how to insert them. And maybe (3), produce any extra command line tools/options necessary for displaying the history in the right way.

Does that make any sense?

Flat | Top-Level Comments Only

From:

cjwatson

Most large-scale git users require merged feature branches to represent a logical sequence of changes to implement that feature rather than the evolution of the branch, because it makes bisection easier if you don't have broken commits along the way. Adding extra parent commits tends to confuse matters here, and I don't think people normally do it quite the way you suggest. (Although there is an interesting technique used by a Debian package management tool called git-dpm, which converts a rebasing branch into a fast-forwarding branch by repeatedly merging the rebased tip into your main working branch; that sacrifices straightforward bisectability in service of other goals, so it's possible you might want to look at that kind of approach, although I think it's really only suitable for branches that maintain evolving unmerged patch series.)

Personally, I think the standard approach is a workaround for an inadequacy in git. It would help if more things defaulted to following the primary line of development, and if there were such a thing as "git bisect --first-parent". I suspect that ship has sailed by now, though.

The structure you suggest is, incidentally, more or less the way things are often done in bzr, but the way git tends to flatten out the history by default has strongly encouraged different approaches there.

jack

require merged feature branches to represent a logical sequence of changes to implement that feature rather than the evolution of the branch

Right, that makes sense. I guess that says that people don't find a significant need for history finer-grained than that, as long as the logical commits are valid.

I guess if you wanted to preserve that history, you could have a habit of commit messages including the name of an obsolete branch containing the pre-rebase history. That would function similarly to a "soft-link", just slightly less conveniently. But in fact, it seems like people don't need that.

I think I'm mixing up "what might be a good idea if as a policy for a whole repository" for "what someone can do with their own commits". Most of these ideas are only workable for a repository as a whole, although the same was true of "only commit things that compile and pass tests" -- that was revolutionary not so long ago, right?

But if you _were_ setting the policy, you could do like Simon did and say "rewrite history to clean up feature branches, when you do merge, arrange it so first-parent is the 'upstream' branch". Then if someone implements "git bisect --first-parent" it's useful to you even if not everyone else?

And now I'm just pondering. I'm not sure I have a clear enough idea of what a "normal" workflow would look like.

The sort of large project you describe, the main branch would be something like

Feature 1, logical commit 1
Feature 1, logical commit 2
Feature 1, logical commit 3
Feature 1, logical commit 4
Feature 2, logical commit 1
Feature 2, logical commit 2
Bugfix 1
Bugfix 2
Bugfix 3

Where each sequence was merged from a branch?

Whereas in my hypothetical "first parent" repository, it might look like

Feature 1
Feature 2
Bugfixes

And there's be a hypothetical "log First level second parent" which showed the history like it was on the "normal" repository. But you could hypothetically drill deeper and see the history of each of those logical commits, if there was any reason to (which likely there isn't).

But I'm not sure I have that right?

2d branching structure including original and revised history

2d branching structure including original and revised history

no subject

no subject

no subject

Active Recent Entries