[mnet-devel] new ideas for old MetaTracking
Zooko O'Whielacronx
zooko at zooko.com
Sat Nov 22 12:18:52 GMT 2003
So a few days ago as I was testing v0.6.2.290-STABLE, I realized that it never
stopped trying to use old peers even if they were long gone. This would be a
big problem -- every time a peer would come and go, your Mnet node would get a
bit slower since it would try to use that dead peer every time you tried to do
anything for the rest of your Mnet node's life.
I was trying to figure out how to fix this by having some heuristic about when
to stop trying to reach a peer. That obviously risks the opposite problem:
that a high- quality, reliable peer goes off-line for a day, and when it comes
back everyone ignores it because their "stop trying dead peers" heuristic has
kicked in.
I tried to envision a probability distribution that would try absent peers
often enough to rediscover re-connected ones but not often enough to waste
your time talking to dead ones as the number of permanently-dead peers grows
unboundedly.
I couldn't.
Then I wondered how Mojo Nation and Mnet v0.6 had worked as well as they had
so far. I realized that Mojo Nation had been using the MetaTrackers for
liveness-detection all along. I had previously been thinking of MetaTrackers
as providing three services: Original Introduction (with the help of bootpages),
contact-info-lookup, and "please introduce me to some new nodes" discovery.
Now I realize that MetaTrackers also provide the essential service of "find
out which nodes are present and which are absent right now".
While thinking about this I also realized that the idea of using consistent
hashing in metatracking means we can have very scalable metatracking.
The current v0.6.2.290-STABLE metatracking scheme degrades quickly as the
number of MetaTrackers grows. That's why I've been advertising for three
solid MetaTracker operators -- no more and no less. If we had more than
three, then node A would send "hello" to MT01, but node B would send
"lookup contact info" to MT04, and node B would fail to discover node A's
contact info. The obvious fix to this of either having nodes talk to all of
the MetaTrackers or else of having the MetaTrackers talk to each other would
mean that the load on each MetaTracker increases proportionally to the total
number of nodes in the network, making metatracking inherently unscalable.
But if node A looks at the XOR metric of his own Id compared with the Ids of
the MetaTrackers, and then node B looks at the XOR metric of node A's Id
compared with the Ids of the MetaTrackers, then they will both talk to the
same MetaTracker, so node B can learn node A's current contact info while
leaving all of the other MetaTrackers alone. That means that with regard to
"lookup contact info", metatracking becomes inherently scalable. (Leaving
aside for the moment the issue of how node A and node B get a list of current
MetaTrackers!)
But what about the "list servers" message, which is used to get to know a
random assortment of nodes which you have either never previously met, or
which recently came on-line?
Well, I'm thinking that we can achieve very high, if not perfect, scalability
by trading-off a different factor: the latency between node A connecting to
the network and node B learning about node A's liveness!
The neat thing about this, is that increasing that latency sort of *helps*
rather than hurts, since nodes that just recently joined for the first time,
or nodes that just recently re-connected for the first time, are more likely
to disappear again in the future. We actively *prefer* to avoid meeting nodes
that have not been connected for a long time.
Let me explain how the new scheme works and you'll see what I mean.
MetaTracking Scheme X:
* Every 15 minutes, you say "hello" to the MetaTracker whose Id is closest to
yours.
* Whenever you try to contact a peer and the message fails, you say "lookup
contact info" to the MetaTracker whose Id is closest to that peer's.
+ After you've done that, if the message fails *again*, you remove that
peer from your list of "active peers". You will never try to talk to him
again until the next time you find his Id in a "list servers response"
message.
* Every 15 minutes, you say "list servers" to a MetaTracker. No matter how
many MetaTrackers there are, you say "list servers" to only one of them.
You have a list of known MetaTrackers, and you are iterating through that
list, querying the next MetaTracker for its known servers every 15 minutes.
Whenever you say "list servers" to a MetaTracker, it dumps back the list of
*all* servers that have said "hello" to it within the last 15 minutes.
That's it! That's MetaTracking Scheme X.
Now suppose that you have an Mnet with K nodes and M MetaTrackers, and
everything is working. Now supposed that K doubles and M doubles. What
changes?
Well, still ignoring for the moment the question of how the nodes learn about
the MetaTrackers, the load of handling "hello" and "lookup contact info" on
each MetaTracker *doesn't change at all*. No matter what size the network is,
each MetaTracker has to handle only K/M "hello"'s per unit of time and K/M
"lookup contact info"'s per unit of time.
Likewise, the load of handling "list servers" messages doesn't change at
all -- each MetaTracker receive's K/M "list servers" messages per unit of time.
The big thing that changes about discovering servers is that the average
latency between a node A (re-)connecting to the network and node A being
discovered by node B doubles. As I've said, this is almost a feature rather
than a bug, since we wish to discriminate against newcomers. Keep in mind
that once node B discovers node A, then node B will never need to *rediscover*
node A until the following sequence occurs: 1. node A disconnects from the
network, and doesn't send a "hello" for 15 minutes. 2. node B tries to send
a message to node A, fails (because node A isn't connected), sends a "lookup
contact info", and gets a response saying that no current contact info is
available. 3. node A later reconnects to the network.
Now the load does change on the client side. The number of peers that each
node has to keep track of doubles. This is one place where Mnet diverges from
current academic p2p research -- we don't care (yet) about the asymptotic
cost of each node having to know about each other node, only about the
concrete cost. For example, if each node knows about 1,000,000 other nodes,
that requires only a few megabytes of space and only a few seconds of
computation. (As long as we're good about how we store and how we
compute... ;-))
I would very much like to hear your feedback about this, but be warned that
I don't want to worry about any micro-optimizations at this point. For
example, suppose that instead of sending the complete list of known servers in
every "list servers response" message, we instead sent only servers that the
querier doesn't already know, or something like that. That might (or might
not) reduce the load on the MetaTracker, but *only by a constant factor*.
I don't want to add complexity to the design, protocols, or implementation in
order to increase the concrete scalability of an individual MetaTracker
*unless* the current implementation isn't good enough for the current load.
A word about the provenance of these ideas: I've titled the message "new
ideas...", but it's likely that some or all of these ideas were already
envisioned by Jim McCoy, Doug Barnes, and Greg Smith years ago. Indeed, some
of them were envisioned by *me* already, but they are newly understood by me
in this context when other complicating or competing ideas are stripped out.
For example, the "MetaTracking Scheme X" in this document eliminates the
notion that you query a MetaTracker "when needed" -- when you've decided that
you want to know more servers, in favor of the notion that you query every
15 minutes rain-or-shine. The presence of the "query when you feel like you
want to know more nodes" concept obscured from me the necessity of the "query
every so often" concept in the original Mojo Nation MetaTracking design. The
path I took to rediscovering the latter was to try to boil down the
MetaTracking design to only one concept, choosing the former concept as the
one to keep, then discovering that the latter concept was required and
switching to the latter concept as the one to keep.
Regards,
Zooko
-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive? Does it
help you create better code? SHARE THE LOVE, and help us help
YOU! Click Here: http://sourceforge.net/donate/
_______________________________________________
mnet-devel mailing list
mnet-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mnet-devel
More information about the Mnet-devel
mailing list