[mnet-devel] idea for download strategy

Zooko zooko at zooko.com
Fri Mar 28 01:38:21 GMT 2003


This isn't fully-formed.  It draws from several sources: old download code (Jim, 
Bram Cohen, Greg Smith), BlockWrangler/GSR (me, Hauke, Myers), 
BlockWrangler/KISS (Luke), and general IRC chitchat (et al.).  It's similar to 
current BlockWrangler/GSR.

So you want to download a file.  You'll do the following thing over and over 
(*when* exactly to do it is part of the issue, but we'll worry about that 
later).

First, if some blocks have been located, download the best one.  (Which is the 
best?  We'll worry about that later.)

Second, if you don't have enough currently-outstanding "do you have blocks" 
requests for discovering block locations, then choose the block that you most 
want to locate (which is that?  We'll worry about it later.), and pick the best 
blockserver for that block (how?  We'll worry about that later.), and send a "do 
you have blocks" message to that blockserver.  It's a waste to send a "do you 
have blocks" message with only a single blockId in it, so fill it out with the 
best 32 blockIds for that blockserver.  (How do you choose the best blockIds for 
a blockserver?  We'll worry about that later.)

Okay, I think that's it.  This is what BlockWrangler currently does, and all the 
"worry about it later" parts are supposed to be delegated to a 
BlockWranglingStrategy.  A particularly subtle one is "when do we act and when 
do we wait?".

BlockWrangler has lots of reality-check assertions to make sure that nothing 
silly happens, such as sending a "do you have blocks" request for a block that 
you have already downloaded.

(It even has an assertion like "You idiot!  You want block XYZ, and server ABC 
has said that he has it, so why are you sitting there and not sending 'request 
block' messages to anyone right now?".)

So, we should probably use the same sort of design for the new download system.  
I haven't looked at the code, but I presume this is somewhat close to what Myers 
has already implemented.

Okay, all of this was just preface for the 'strategy' part.  Here is a proposed 
strategy.  Like I say, it isn't very well thought-out.

The first question is when to act and when not.  But I don't want to think about 
it right now.  Let's assume that this is pretty much independent of the other 
questions.

The next question is, which block is best to download among located blocks?  
This is probably the same as the question of which block is best to search for 
among unlocated blocks.  Possible answers include: (a) the block that is on the 
fastest blockserver, (b) the block that is earliest in the stream (for 
incremental download/incremental display), (c) the block that is least widely 
replicated (for robustness -- in case the last copy is about to go off-line).

Well, (c) is not really known to us until we've gotten lots of "dyhb" responses 
about that block.  We don't want to wait to get lots of "dyhb" responses before 
we start downloading, so I don't see how to do (c) effectively, so I'll ignore 
it.

We don't *currently* do incremental anything, so I'll ignore (b).  There, that 
simplifies things.  So I search for blocks, and download blocks, from the 
fastest blockserver.  The definition of "fastest" is actually pretty complicated 
though!  For downloading blocks, it is simple, the fastest block server is the 
one that has the highest "download block rate", where "download block rate" is 
defined as the average turnaround time on "request block" requests times the 
average success rate of "request block" requests.

For searching for blocks, it is more complicated.  The "fastest" server is the 
one with the highest "locate and download block rate", where "locate and 
download block rate" is defined as the "locate and download block turnaround 
time" times the "locate and download block success rate".  The former is the 
turnaround time between sending a "do you have blocks" and receiving a "request 
block response".  That is: in order to download a block from a server, you have 
to send it a "do you have blocks", receive a "do you have blocks response", then 
send it a "request block", then receive a "request block response".  The "locate 
and download block turnaround time" is the time for that entire 4-message 
sequence to complete.

The "locate and download block success rate" is the ratio of blocks that got 
from that server to blocks that you wanted from that server.  If you decide that 
you don't want the block then it doesn't count in the "locate and download block 
success rate".  (This can happen because you got the block from someone else or 
because you completed the file using alternate shares.  But if the user cancels 
a download then we *should* count against everyone that we are currently trying 
to get blocks from, since the user might have cancelled out of impatience.)  If 
the server says it doesn't have the block, or says that it does but then never 
delivers it, then this counts against this success rate.

Okay, this sounds fine so far, but now I'm not sure about this:

Suppose I send "dhyb" queries over and over to the same blockserver asking about 
the same blocks.  Maybe I just haven't received its reply yet!  So I should have 
a constraint (ideally enforced by the BlockWrangler-type enforcer) that you 
don't send a dyhb query to a server about a block that is currently an 
outstanding dyhb request to that server ("outstanding" meaning hasn't passed its 
soft time-out).

Okay, I'll stop for now.  Any holes in this so far?

Oh -- there's a huge issue that I forgot to mention above.  You want to use the 
XOR metric to identify which blockservers are most likely to have which blocks 
before you query them.  How do you combine that with motivation (a): to use the 
fastest blockserver?  My current idea is sort of sloppy: use the sum-of-squares-
of-badness, i.e., the square of the XOR distance plus the square of the inverse 
of the "locate and download blocks rate".  This is what MojoHandicapper has done 
since days of yore.

Regards,

Zooko

http://zooko.com/
         ^-- under re-construction: some new stuff, some broken links



-------------------------------------------------------
This SF.net email is sponsored by:
The Definitive IT and Networking Event. Be There!
NetWorld+Interop Las Vegas 2003 -- Register today!
http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
_______________________________________________
mnet-devel mailing list
mnet-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mnet-devel




More information about the Mnet-devel mailing list