branch representation

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

branch representation

Nathaniel Smith
So, err, here's a stupid idea that popped into my head, reading the
recent discussion of 'disapprove'.  I want it to be clear that this is
not a "here's something I've been thinking about for 3 months and
it's going to solve 5 problems you hate and 5 you never even knew you
had" kind of message, this is a "hrm, this is a crazy idea and I can't
even tell whether it's crazy enough to be right or not" kind of
message...

So the idea is: what if we got rid of branch certs, and put a branch
field inside the revision object?  So each revision is uniquely,
irrevocably, in a single branch.  So each revision is not just a
snapshot, but a snapshot with a purpose attached.  And instead of
automatically putting a branch cert on at commit time, you put a
"yeah, this is good" cert on (since the rev already has a purpose
built in, your vague affirmation of goodness can be assumed to match
that).

Yeah, this is a really weird idea.  It's weird enough that I have
trouble really imagining it to evaluate it.  (And enough that I
deserve to be hit repeatedly with a stick for bringing it up when
we're trying to _stabilize_ things...)

Anyway, this wouldn't help with things that people find
counter-intuitive like, "this branch is discontinuous", or "this
branch has multiple branch points", or "this branch has multiple
heads".  It would help with the confusion about revs being in multiple
branches, and generally reduce the weirdness we have all the time
where we have to assume all revs have branch certs on them, and try
and guess an appropriate branch name given a revision, and so on -- it
seems like the code and users both want to think of branch certs as at
least somewhat special.  I'm thinking of guess_branch, and update's
tricky handling of branches, and netsync filtering by branch...

It might make doing trust stuff significantly easier.  I _think_ a
design criterion for a trust system is that I want to be able to
specify rules for trusting certs that aren't branch certs, and I want
to do this per-branch.  This seems very tricky, if the rule for
deciding whether you trust a 'foo' cert begins "collect all branch
certs on the same rev, invoke the branch trust rules on them, for
each branch cert that turns out to be trusted, find that branch's
trust rules for 'foo' certs, and then somehow combine the results of
applying each branch's rules".  And there are probably horrible
convoluted attacks:
  Alice has tag and branch cert rights to branch A.
  Bob has tag and branch cert rights to branch B.
  Alice wants to check out the tag 'A-release' so she can send it to
    the CD manufacturers.
  Bob does good work over on his project, so Alice does trust him to
    tag revs on branch B, but he happens to hate project A.
  Bob takes a buggy rev R on branch A, and adds two certs to it:
    branch: B
    tag: A-release
Now rev R is, according to the trust rules, quite definitely in both A
and B.  The A-release tag on it is trusted with respect to branch B's
rules, but not respected with respect to branch A's rules.  So... do
we trust the A-release tag, or what?  I guess the unavoidable
conclusion is that you can't determine cert trust based on just the
contents of the DB plus the current trust rules, but also need to know
something about the current context?  (checkout -b B -r t:A-release
should work, checkout -b A -r t:A-release should silently ignore it?)
This seems bad.  Maybe I'm missing something.

(Okay, maybe trust issues _have_ been stewing around in the back of my
head for more than 3 months.  But I don't really understand them, so
it's maybe premature to use them as arguments in a discussion...)

This idea does add a significant new piece to the model -- instead of
the nice clean DAG of snapshots _here_, and generic metadata mechanism
_there_ setup, we have a piece of magic metadata, that needs to be
handled differently, etc.

Umm.  Probably lots of other things to say, too, but I'll stop here
for now :-).

-- Nathaniel

--
"Of course, the entire effort is to put oneself
 Outside the ordinary range
 Of what are called statistics."
  -- Stephan Spender


_______________________________________________
Monotone-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/monotone-devel
Reply | Threaded
Open this post in threaded view
|

Re: branch representation

Timothy  Brownawell
On Wed, 2005-10-26 at 17:05 -0700, Nathaniel Smith wrote:
> It might make doing trust stuff significantly easier.  I _think_ a
> design criterion for a trust system is that I want to be able to
> specify rules for trusting certs that aren't branch certs, and I want
> to do this per-branch.  This seems very tricky, if the rule for
> deciding whether you trust a 'foo' cert begins "collect all branch
> certs on the same rev, invoke the branch trust rules on them, for
> each branch cert that turns out to be trusted, find that branch's
> trust rules for 'foo' certs, and then somehow combine the results of
> applying each branch's rules".

If they don't all match, print a warning and exit?

>  And there are probably horrible
> convoluted attacks:
>   Alice has tag and branch cert rights to branch A.
>   Bob has tag and branch cert rights to branch B.
>   Alice wants to check out the tag 'A-release' so she can send it to
>     the CD manufacturers.
>   Bob does good work over on his project, so Alice does trust him to
>     tag revs on branch B, but he happens to hate project A.
>   Bob takes a buggy rev R on branch A, and adds two certs to it:
>     branch: B
>     tag: A-release
> Now rev R is, according to the trust rules, quite definitely in both A
> and B.  The A-release tag on it is trusted with respect to branch B's
> rules, but not respected with respect to branch A's rules.  So... do
> we trust the A-release tag, or what?  I guess the unavoidable
> conclusion is that you can't determine cert trust based on just the
> contents of the DB plus the current trust rules, but also need to know
> something about the current context?  (checkout -b B -r t:A-release
> should work, checkout -b A -r t:A-release should silently ignore it?)
> This seems bad.  Maybe I'm missing something.

Perhaps this is a reason to keep different projects in different
databases? Bob could just put the A-release tag on something that's
*only* on his branch, and it'd work just as well.

> (Okay, maybe trust issues _have_ been stewing around in the back of my
> head for more than 3 months.  But I don't really understand them, so
> it's maybe premature to use them as arguments in a discussion...)
>
> This idea does add a significant new piece to the model -- instead of
> the nice clean DAG of snapshots _here_, and generic metadata mechanism
> _there_ setup, we have a piece of magic metadata, that needs to be
> handled differently, etc.
>
> Umm.  Probably lots of other things to say, too, but I'll stop here
> for now :-).




_______________________________________________
Monotone-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/monotone-devel
Reply | Threaded
Open this post in threaded view
|

Re: branch representation

Daniel Carosone
In reply to this post by Nathaniel Smith
On Wed, Oct 26, 2005 at 05:05:14PM -0700, Nathaniel Smith wrote:
> So the idea is: what if we got rid of branch certs, and put a branch
> field inside the revision object?  So each revision is uniquely,
> irrevocably, in a single branch.  

So if an identical revision (in terms of resulting code) is to be on
more than one branch, it needs to be more than one revision, and I now
need some other way to prove that the revisions are
identical-apart-from-branch?

> Anyway, this wouldn't help with things that people find
> counter-intuitive like, "this branch is discontinuous", or "this
> branch has multiple branch points", or "this branch has multiple
> heads".

Good. Because I like these things. For me at least, it's not so much
that they were counter-intuitive at first introduction, more that they
were couter to previous training with other VCSs.  I found them very
intuitive, in part because they represented a simple consistent
underlying pattern and deeper insight.  They represented to me that
monotone was fundamentally sane.

Many of the issues that remain in the 'counter to previous training'
category are matters of UI maturity, 'best practice patterns'
development, etc.

> It would help with the confusion about revs being in multiple
> branches, and generally reduce the weirdness we have all the time
> where we have to assume all revs have branch certs on them, and try
> and guess an appropriate branch name given a revision, and so on -- it
> seems like the code and users both want to think of branch certs as at
> least somewhat special.  

Maybe. I'm not familiar with these assumptions and cases in the code,
I have to defer to you there.

> It might make doing trust stuff significantly easier.  I _think_ a
> design criterion for a trust system is that I want to be able to
> specify rules for trusting certs that aren't branch certs, and I want
> to do this per-branch.  
>
> This seems very tricky, if the rule for deciding whether you trust a
> 'foo' cert begins "collect all branch certs on the same rev, invoke
> the branch trust rules on them, for each branch cert that turns out
> to be trusted, find that branch's trust rules for 'foo' certs, and
> then somehow combine the results of applying each branch's rules".
I'm not sure it's that bad.  It probably helps to think about trust
having a direction. A lot of the rest of monotone is about directed
graphs, trust pretty much should be too.

The endpoint is revisions - ultimately, revisions are trusted if
there's a path to them through the trust graph.  Equally importantly,
all the other nodes in the trust graph are *not* revisions; they're
either direct or indirect certs.  Right now, we only have direct certs
(branch, etc) that point to revisions, indirect certs would be
permissions certs and identity certs and trust certs of some or other
form.

These forms might include path constraints in the trust graph, such as
for example the requirement that in order for the branch cert trust to
apply to a revision, there must also be a (trusted) testresult
(author/date/etc) cert for a certain name.

All identities are trusted now, adding identity trust certs with
constraints, makes this graph deeper, but not necessarily any less
directed.  In fact, if we're looking for a design criterion around
trusts, I think one should be that trust should always remain directed
at least in evaluation.

For comparative illustration, think about gpg web-of-trust
evaluation. The certs themselves are a web; the trust evaluation
algorithm searches through this graph adding weighted trust scores to
build a directed tree from the trust roots.

You example above sounds convoluted, because you change direction
several times when trying to describe the trust evaluation algorithm.
Say instead you do something morally equivalent to:

 * build a graph of all certs
 * (early optimisation) prune all paths that don't lead to the desired
   selection criteria (revision(s), branches, etc .. if relevant;
   perhaps you're searching for trusted revisions)
 * starting at the identity trust root(s?), start labelling certs with
   identity trust by signer
 * prune all certs that aren't trusted for identity (maybe we need a
   specific terminology for this, perhaps 'authenticated')
 * starting at the permission trust root(s?), start labelling certs
   according to trust-constraint matching rules
 * prune all certs that don't have their permissions constraints met
 * any certs remaining lead from both identity and trust roots,
   through to trusted revisions.

Furthermore, this thought model might provide useful design clues for
(especially) permissions rules/certs.

Put it another way.. perhaps one of the reasons the users and the code
want to treat branches specially right now is that, right now, the
*only* trust/permission mechanisms we really have in practical use are
based around branches.  (testresult certs, as the next-in-line
candidate, reveal more about the lack of a stronger permissions system
than they do about a model for further development, i think)

> And there are probably horrible convoluted attacks:
> [..]

Yes, and there are many other less convoluted cases to consider.  I'm
worried about transitive trusts even for simpler scenarios.  Alice
trusts Bob, Bob trusts Charlie.  Charlie does a commit, Bob does a
merge without looking carefully, Alice winds up with trusts on
Charlie's code.  Perhaps this means that Alice's trust in Bob is
misplaced - we just need to consider exactly what we want from a
permissions and trust system, when we come to it.

> I guess the unavoidable conclusion is that you can't determine cert
> trust based on just the contents of the DB plus the current trust
> rules, but also need to know something about the current context?
> (checkout -b B -r t:A-release should work, checkout -b A -r
> t:A-release should silently ignore it?)  This seems bad.  Maybe I'm
> missing something.

I think you might well be right. I think the "-b B" is part of the
"selection criteria" I pruned (early) in my example above.  (in
reality, step 1 is the db, step 2 is the initial selection)

> (Okay, maybe trust issues _have_ been stewing around in the back of my
> head for more than 3 months.  But I don't really understand them, so
> it's maybe premature to use them as arguments in a discussion...)

I think probably we need to start the discussion there, and understand
the trust issues, objectives, requirements, model first. Then allow
that to guide further design or infrastructure change.  It's really
the next major stage of evolution for monotone as I see it.

> This idea does add a significant new piece to the model -- instead of
> the nice clean DAG of snapshots _here_, and generic metadata mechanism
> _there_ setup, we have a piece of magic metadata, that needs to be
> handled differently, etc.

Yeah. Without drastic inherent and evident benefit, I dislike it for
that reason alone, but it's a great thought-provoker.

--
Dan.
_______________________________________________
Monotone-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/monotone-devel

attachment0 (193 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: branch representation

Wim Oudshoorn
In reply to this post by Nathaniel Smith
Nathaniel Smith <[hidden email]> writes:

> So the idea is: what if we got rid of branch certs, and put a branch
> field inside the revision object?  So each revision is uniquely,
> irrevocably, in a single branch.  So each revision is not just a
> snapshot, but a snapshot with a purpose attached.  And instead of
> automatically putting a branch cert on at commit time, you put a
> "yeah, this is good" cert on (since the rev already has a purpose
> built in, your vague affirmation of goodness can be assumed to match
> that).

I can't really follow what you mean with "yea, this is good" cert.
Is that supposed to be a new kind of cert?


> it seems like the code and users both want to think of branch certs
> as at least somewhat special.  I'm thinking of guess_branch, and
> update's tricky handling of branches, and netsync filtering by
> branch...

I think a better way is to deemphasize the importance of the
branch certificate.  I think the concept that it is just
a way to get to a revision is very powerfull.  The reason
I think monotone should by default add a branch certificate
at commit time is convenience, nothing more.    
(Oh and it is a huge convenience.)

>
> This idea does add a significant new piece to the model -- instead of
> the nice clean DAG of snapshots _here_, and generic metadata mechanism
> _there_ setup, we have a piece of magic metadata, that needs to be
> handled differently, etc.

I like this clear model.  I don't think putting the branch inside
the revision is going to solve anything.  It will reduce the flexibility
you have with branches and I don't see any inherit advantage.

Also, I think that with the clean structure it has now, monotone is
actually easier to grasp than most other VC systems.  
The hard thing when explaining monotone to others is let realize
that it IS simple, making it more complex to alleviate that is
the wrong direction IMO.

Wim Oudshoorn.



_______________________________________________
Monotone-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/monotone-devel
Reply | Threaded
Open this post in threaded view
|

Re: branch representation

Zbynek Winkler
In reply to this post by Nathaniel Smith
Nathaniel Smith wrote:

>So the idea is: what if we got rid of branch certs, and put a branch
>field inside the revision object?  So each revision is uniquely,
>irrevocably, in a single branch.  So each revision is not just a
>snapshot, but a snapshot with a purpose attached.  And instead of
>automatically putting a branch cert on at commit time, you put a
>"yeah, this is good" cert on (since the rev already has a purpose
>built in, your vague affirmation of goodness can be assumed to match
>that).
>  
>
Hmm. I think we first need to decide what *exactly* branches are before
evaluating ideas about changing them.

The first thing that comes to mind is

    Are branches development lines or just a way to group arbitrary
otherwise unrelated groups of revisions?

I think that the term "branch" comes with some kind of expectation.
Almost all VCS use the term and understanding is that it usually
represents a development line and not a bag of things. Do not get me
wrong, I think it is really useful to have "bag of things" but please
let's not call them branches because it is IMHO unnecessary confusion
for anyone who has ever used other VCS. Just look at these pictures
http://www.mozilla.org/roadmap/branching-2005-01-25.png
http://www.mozilla.org/roadmap/branching-2002-12-26.png
Everyone wants to talk about stuff like "branch point" and draw the the
branches as lines of development. And it is just too hard when your VCS
does not know these terms or (worse) uses them to mean something
completely different.

>Yeah, this is a really weird idea.  It's weird enough that I have
>trouble really imagining it to evaluate it.  (And enough that I
>deserve to be hit repeatedly with a stick for bringing it up when
>we're trying to _stabilize_ things...)
>
>Anyway, this wouldn't help with things that people find
>counter-intuitive like, "this branch is discontinuous", or "this
>branch has multiple branch points", or "this branch has multiple
>heads".  It would help with the confusion about revs being in multiple
>branches, and generally reduce the weirdness we have all the time
>where we have to assume all revs have branch certs on them, and try
>and guess an appropriate branch name given a revision, and so on -- it
>seems like the code and users both want to think of branch certs as at
>least somewhat special.  I'm thinking of guess_branch, and update's
>tricky handling of branches, and netsync filtering by branch...
>  
>
Yes, at least I think that branches are somewhat special and that the
general idea of grouping revisions in to a bag can be done by just an
arbitrary cert.

>It might make doing trust stuff significantly easier.
>
I agree. I think it would have clearer model this way. I would really
like the fact that you need to create a new revision when moving some
changes between branches... because in fact, I see adding new code to a
branch as a new development and not as attachment of some arbitrary meta
data. It would support the "line of development" thing much better.

Zbynek

--
http://zw.matfyz.cz/     http://robotika.cz/
Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic



_______________________________________________
Monotone-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/monotone-devel