[Revctrl] Cherry picking for DAG based systems

William Uther willu.mailingLists at cse.unsw.edu.au
Fri Dec 29 06:30:34 EST 2006


On 29/12/2006, at 8:41 PM, Nathaniel J. Smith wrote:

> On Fri, Dec 29, 2006 at 02:47:59PM +1100, William Uther wrote:
>> In this sense, basic OT theory is a DAGgy system - it can't cherry-
>> pick arbitrarily but rather it just merges two histories.  There are
>> some inverse transformation functions (T-1) that do allow cherry-
>> picking like functionality, but they are much more limited than the
>> forward version.  In particular this TTF paper ( http://hal.inria.fr/
>> inria-00109039/en/ ) states: "The preconditions of the inverse of a
>> transformation function required that the first parameter operation
>> resulted from a previous transformation according to the second one.
>> Indeed, it is not allowed to call this inverse function in order to
>> swap two operations causally dependent because these operations were
>> not concurrent, and thus were not previously 'serialised' using
>> transformation functions."  I have some ideas to get around this, but
>> they're not well developed yet.  One tricky part here is that it is
>> not possible to tell just by looking at two TTF patches whether one
>> is dependent on another or not.
>
> Interesting.  It seems like most of the point of looking at OT
> systems, in the VCS context, would be the possibility of improving
> cherry-picking support...

OT theory is very like Darcs theory.  The solutions will be the same,  
and the properties of the system roughly the same.  I just prefer  
thinking in terms of OT theory rather than patch commutes.

>>>> You're viewing it like this:
>>>>
>>>>    a
>>>>   / \
>>>>  ab  ac
>>>>  |\ /|
>>>>  | X |
>>>>  |/ \|
>>>> abc acb
>>>
>>> See how symmetrical this case is? You could invert all instances of
>>> b and c and have exactly the same thing. That's a good indicator
>>> that picking one side over the other automatically is a bad thing.
>>
>> If that were how the system presented it to the user I agree it would
>> be a bad thing.  I wasn't planning to have it presented to the user
>> that way.
>
> ...How _else_ are you going to present it to the user?

The way I suggested.  The system automatically merges things into the  
canonical order.  A commit to swap the order is a separate commit  
with an add and a delete in it.  That commit will have its own commit  
message, and go out over the commit email list separately.  You  
choose to incorporate that patch or not the same way you'd choose to  
incorporate any other patch.

Here is the diagram again:

    a
   / \
  ab  ac
  |\ /|
  | X |
  |/ \|
abc abc  <--- here both people get the merge automatically resolved  
to the canonical order
  |
  |
acb <--- this is where one person changes to the non-canonical ordering

Imagine another example where someone makes a commit that deletes a  
file.  This is a bad thing™, but the system isn't going to  
automatically flag a conflict - that will clean merge into other  
systems.  You need to be reading commit messages and checking that  
people aren't making changes that screw up your system.

I think the way that the UI presents this to the user would have a  
HUGE effect on their expectations of how the system would behave.  My  
thoughts below on how to make this a conflict are because I want to  
separate mechanism from policy, not because I necessarily agree with  
your policy (not that I stridently disagree either...).

> Users work
> with files, those are what they understand.  Those nodes are the
> contents of the users' file at each point in time.  Will users have to
> be able to read your bespoke patch notation to understand basic
> operation?

No.  Users should be able to think about the system one of two ways:

i) as a set of patches, like darcs, or
ii) as a collection of revisions where they can merge revisions.

In either case there is a question of when conflicts are flagged.  I  
am hoping to separate mechanism from policy there (with a sensible  
default) so that different conflict schemes can be tested.

e.g. Why have conflicts that are in terms of lines?  Why not conflict  
when there are two changes close by in the same block (assuming  
you're editing a known file type)?  Should converting spaces to tabs  
conflict at all?

I'm also assuming that the underlying mechanism is hidden behind a  
UI.  Yes, I will have my own patch format and it wont look like  
anything anyone has seen before, but No, the user will never have to  
see and deal with patches in that format.

>>> In fact, everybody (and I mean *everybody*) who doesn't already
>>> understand merging theory so well that their judgement is clouded
>>> agrees that this should be a conflict.
>>
>> It shouldn't be too hard to make that a conflict, if that is what
>> people want.
>
> I look forward to hearing how :-).

The transformation function that passes a conflict resolution patch  
through another conflict resolution patch generates a conflict  
between the two conflict resolution patches.  :)  But I'm guessing  
that didn't make a whole lot of sense.  I'm still working on the  
details.  I'd rather convince myself it all works before I waste your  
time on it.  As I'm implementing I'm finding annoying edge cases.

Hrm.  Here is the idea (in rough outline):

Like OT theory, the patching is based on inserting bytes in a stream  
of bytes.  Patches can insert a string of bytes in a location, delete  
a string of bytes, create a new file, etc.  As well as the location  
where the patch will insert/delete the bytes, the patch also has a  
'conflict-range' around that location.  Any change in that range will  
generate a conflict.  This means that the underlying mechanism works  
with binary files, and is flexible enough to emulate current line- 
based conflicts.  For use with actual binary files I'd probably  
extend the conflict range to be the entire file - then it'll behave  
like current systems there too.  Note the separation of mechanism and  
policy :).

The transformation functions are like the tombstone transformation  
functions.  Having said that, you also need to add the ability to  
mark lines as conflicting.  When you transform one patch through  
another where either patch is within the other's conflict range, all  
lines affected by either patch are marked as conflicting.  The  
conflict label that includes the patch ID's of the two conflicting  
patches.  The conflict label is conceptually applied separately to  
each conflicting byte - it is not a normal conflict marker, although  
it may be presented to the user that way.  Apart from the conflict  
label, the text is just inserted in a canonical order, same as normal  
OT.  There is another patch type that removes the conflict markers  
for a particular conflict.  This would be generated for the user when  
they mark the conflict as resolved (ala 'svn resolved').

e.g.

Here I'll write patches as:

   ins(posn, byte, patchID, conflict-range) - insert <bytes> at  
<posn>.  <patchID> is universally unique - I'll use small integers  
but manually & magically ensure their uniqueness here.  <conflict- 
range> is defined post-insertion.

   resolve(range, patchA, patchB, patchID) - resolve the conflict  
between <patchA> and <patchB> over the lines in <range>.

   del(posn, conflict-range, patchID) - delete the character at posn

User 1                  User 2

ins(1, A, 1, 1-2)      ins(1, B, 2, 1-2)

      A                  B

              merge  <-- at this point each user has to transform the  
other ins patch through their own.  They each end up with a the same  
result.  Because the patches are in each other's conflict range

      _AB_              _AB_    <-- here the underline marks the AB  
chars as labelled as conflicting.

resolve(1-3, 1, 2, 3)  resolve(1-3, 1, 2, 4)
                        del(2, 1-3, 5)
                        ins(1, B, 6, 1-3)

      AB                BA

              merge          <-- here, each user has to pass the  
other's 'resolve' patch through their own.  When they do, it will  
either be identical, in which case you could have an implicit clean  
merge, or you could generate another 'conflict' (resolve the original  
conflict, but now mark the lines as conflicting with the patch ids of  
the two 'resolve' patches that conflict).

      _BAB_           _BAB_  <-- the end result is that both sides  
have a working copy with enough information for the user to resolve  
the situation.

The interesting thing about this algorithm is that it has BOTH a  
weave AND patch adjustment.  The patch adjustment makes some things  
easy, and the weave makes sure that the patch adjustment is correct.

There are a few things at the moment that worry me:

When I include file operations and various types of conflicts, I  
currently have 17 different types of patches, each of which needs to  
be transformable through any other.  In many cases the transforms are  
just the identity, but I think that is too many for a believable  
manual proof of correctness (especially given the long history of  
transformation operations with subtle bugs).

I have a plan for inverse operators, but I want to get ordinary DAG  
style ops going first.  (and I'm running out of time to work on it.   
grrrr.  Maybe I should stop writing emails...?)

>> One possible middle-of-the-road solution is to have two types of
>> patches.  One type is marked to support implicit convergence the
>> other will conflict as normal.  When you transform an 'implicit
>> convergence' patch through another patch, any identical text added in
>> the same location will just be implicitly converged.  This would
>> require some extra marking to make sure that the inverse is still OK,
>> but it should be possible.  That way the person generating the patch
>> could have the option of choosing if they want implicit convergence
>> or not.
>
> Again, I'm worried that users will have trouble understanding such
> details... for most users, version control (and especially merging) is
> already utter black magic.  I'd like to make systems simple enough to
> change that somewhat, of course.  But it seems to me that regardless,
> a practical system needs to be usable by people who have only a
> shallow understanding of what's going on, and in any case the effort
> needed to understand the system needs to be commensurate with the
> payoff you get... hopefully that'd be the case here, I dunno.

My goal is to separate mechanism from policy.  The system should be  
flexible enough to handle a number of ways of doing things.  There  
should be sensible defaults that people don't have to touch.  If you  
want to get fancy beyond that, you can.

The real issue here is one that any smart history-based system will  
have.  Users often just massage their working copy until it looks  
right.  They don't care about the actual history of the data, as long  
as it looks ok now.  That means that any history based system will be  
screwed.  This came up in the original arch discussions.  (e.g. If I  
make a patch that deletes a file and the re-creates it with the same  
name and text.)

I think the solution to that problem is not fancier merge algorithms,  
but ui design that guides users towards doing things the right way  
where possible, and helps them sort out the mess later if they need  
to.  But that is someone else's problem™.

Be well,

Will         :-}

--
Dr William Uther                           National ICT Australia
Phone: +61 2 8306 0424               Computer Science and Engineering
Email: william.uther at nicta.com.au      University of New South Wales
Email: willu at cse.unsw.edu.au                  Sydney, Australia

Web: http://www.cse.unsw.edu.au/~willu/   or  http://www.nicta.com.au/

NICTA email Disclaimer:
http://www.cse.unsw.edu.au/~willu/NICTAEmailDisclaimer.html
UNSW email Disclaimer:
http://www.eng.unsw.edu.au/emaildis.htm




More information about the Revctrl mailing list