[mnet-devel] new ideas for old MetaTracking

Zooko O'Whielacronx zooko at zooko.com
Sat Nov 22 12:18:52 GMT 2003


So a few days ago as I was testing v0.6.2.290-STABLE, I realized that it never 
stopped trying to use old peers even if they were long gone.  This would be a 
big problem -- every time a peer would come and go, your Mnet node would get a 
bit slower since it would try to use that dead peer every time you tried to do 
anything for the rest of your Mnet node's life.

I was trying to figure out how to fix this by having some heuristic about when 
to stop trying to reach a peer.  That obviously risks the opposite problem: 
that a high- quality, reliable peer goes off-line for a day, and when it comes 
back everyone ignores it because their "stop trying dead peers" heuristic has 
kicked in.

I tried to envision a probability distribution that would try absent peers 
often enough to rediscover re-connected ones but not often enough to waste 
your time talking to dead ones as the number of permanently-dead peers grows 
unboundedly.

I couldn't.

Then I wondered how Mojo Nation and Mnet v0.6 had worked as well as they had 
so far.  I realized that Mojo Nation had been using the MetaTrackers for 
liveness-detection all along.  I had previously been thinking of MetaTrackers 
as providing three services: Original Introduction (with the help of bootpages), 
contact-info-lookup, and "please introduce me to some new nodes" discovery.  
Now I realize that MetaTrackers also provide the essential service of "find 
out which nodes are present and which are absent right now".

While thinking about this I also realized that the idea of using consistent 
hashing in metatracking means we can have very scalable metatracking.

The current v0.6.2.290-STABLE metatracking scheme degrades quickly as the 
number of MetaTrackers grows.  That's why I've been advertising for three 
solid MetaTracker operators -- no more and no less.  If we had more than 
three, then node A would send "hello" to MT01, but node B would send 
"lookup contact info" to MT04, and node B would fail to discover node A's 
contact info.  The obvious fix to this of either having nodes talk to all of 
the MetaTrackers or else of having the MetaTrackers talk to each other would 
mean that the load on each MetaTracker increases proportionally to the total 
number of nodes in the network, making metatracking inherently unscalable.

But if node A looks at the XOR metric of his own Id compared with the Ids of 
the MetaTrackers, and then node B looks at the XOR metric of node A's Id 
compared with the Ids of the MetaTrackers, then they will both talk to the 
same MetaTracker, so node B can learn node A's current contact info while 
leaving all of the other MetaTrackers alone.  That means that with regard to 
"lookup contact info", metatracking becomes inherently scalable.  (Leaving 
aside for the moment the issue of how node A and node B get a list of current 
MetaTrackers!)

But what about the "list servers" message, which is used to get to know a 
random assortment of nodes which you have either never previously met, or 
which recently came on-line?

Well, I'm thinking that we can achieve very high, if not perfect, scalability 
by trading-off a different factor: the latency between node A connecting to 
the network and node B learning about node A's liveness!

The neat thing about this, is that increasing that latency sort of *helps* 
rather than hurts, since nodes that just recently joined for the first time, 
or nodes that just recently re-connected for the first time, are more likely 
to disappear again in the future.  We actively *prefer* to avoid meeting nodes 
that have not been connected for a long time.

Let me explain how the new scheme works and you'll see what I mean.

MetaTracking Scheme X:

 * Every 15 minutes, you say "hello" to the MetaTracker whose Id is closest to 
   yours.
 * Whenever you try to contact a peer and the message fails, you say "lookup 
   contact info" to the MetaTracker whose Id is closest to that peer's.
   + After you've done that, if the message fails *again*, you remove that 
     peer from your list of "active peers".  You will never try to talk to him 
     again until the next time you find his Id in a "list servers response" 
     message.
 * Every 15 minutes, you say "list servers" to a MetaTracker.  No matter how 
   many MetaTrackers there are, you say "list servers" to only one of them.  
   You have a list of known MetaTrackers, and you are iterating through that 
   list, querying the next MetaTracker for its known servers every 15 minutes.
   Whenever you say "list servers" to a MetaTracker, it dumps back the list of 
   *all* servers that have said "hello" to it within the last 15 minutes.

That's it!  That's MetaTracking Scheme X.

Now suppose that you have an Mnet with K nodes and M MetaTrackers, and 
everything is working.  Now supposed that K doubles and M doubles.  What 
changes?

Well, still ignoring for the moment the question of how the nodes learn about 
the MetaTrackers, the load of handling "hello" and "lookup contact info" on 
each MetaTracker *doesn't change at all*.  No matter what size the network is, 
each MetaTracker has to handle only K/M "hello"'s per unit of time and K/M 
"lookup contact info"'s per unit of time.

Likewise, the load of handling "list servers" messages doesn't change at 
all -- each MetaTracker receive's K/M "list servers" messages per unit of time.

The big thing that changes about discovering servers is that the average 
latency between a node A (re-)connecting to the network and node A being 
discovered by node B doubles.  As I've said, this is almost a feature rather 
than a bug, since we wish to discriminate against newcomers.  Keep in mind 
that once node B discovers node A, then node B will never need to *rediscover* 
node A until the following sequence occurs:  1.  node A disconnects from the 
network, and doesn't send a "hello" for 15 minutes.  2.  node B tries to send 
a message to node A, fails (because node A isn't connected), sends a "lookup 
contact info", and gets a response saying that no current contact info is 
available.  3.  node A later reconnects to the network.

Now the load does change on the client side.  The number of peers that each 
node has to keep track of doubles.  This is one place where Mnet diverges from 
current academic p2p research -- we don't care (yet) about the asymptotic 
cost of each node having to know about each other node, only about the 
concrete cost.  For example, if each node knows about 1,000,000 other nodes, 
that requires only a few megabytes of space and only a few seconds of 
computation.  (As long as we're good about how we store and how we 
compute...  ;-))

I would very much like to hear your feedback about this, but be warned that 
I don't want to worry about any micro-optimizations at this point.  For 
example, suppose that instead of sending the complete list of known servers in 
every "list servers response" message, we instead sent only servers that the 
querier doesn't already know, or something like that.  That might (or might 
not) reduce the load on the MetaTracker, but *only by a constant factor*.

I don't want to add complexity to the design, protocols, or implementation in 
order to increase the concrete scalability of an individual MetaTracker 
*unless* the current implementation isn't good enough for the current load.

A word about the provenance of these ideas:  I've titled the message "new 
ideas...", but it's likely that some or all of these ideas were already 
envisioned by Jim McCoy, Doug Barnes, and Greg Smith years ago.  Indeed, some 
of them were envisioned by *me* already, but they are newly understood by me 
in this context when other complicating or competing ideas are stripped out.  
For example, the "MetaTracking Scheme X" in this document eliminates the 
notion that you query a MetaTracker "when needed" -- when you've decided that 
you want to know more servers, in favor of the notion that you query every 
15 minutes rain-or-shine.  The presence of the "query when you feel like you 
want to know more nodes" concept obscured from me the necessity of the "query 
every so often" concept in the original Mojo Nation MetaTracking design.  The 
path I took to rediscovering the latter was to try to boil down the 
MetaTracking design to only one concept, choosing the former concept as the 
one to keep, then discovering that the latter concept was required and 
switching to the latter concept as the one to keep.

Regards,

Zooko



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
_______________________________________________
mnet-devel mailing list
mnet-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mnet-devel




More information about the Mnet-devel mailing list