[mnet-devel] about handling transient network outages
Arno Waschk
hamamatsu at gmx.de
Sun Mar 7 23:42:05 GMT 2004
okay, it looks like osh got his MT working, so we should add him to
bootpages. How can I decide which bootpage contains useful data?
Where could bootpage owners get actual info then if they like to update?
Is there a way to avoid unchangeably hardcoded bootpages in a pack.
Couldn't we load a page from mnet-webcvs from sf.net?
Can we have timestamps in the bootpage infos so a broker could load all of
them, and decide which one is the latest.
But before we need to enable brokers to load them. I am afraid we never
really reach the point of loading them (except for fresh broker startup)
since there is always one (even stale) MT reported by peerman, no?
and we avoid to send a hello in many cases where a hello would be sensible.
For instance: at startup, the first hello is scheduled quickly. The code
decides not to send a hello since usually there is not yet a commstrat
calculated for us. But it does not schedule a new one. But is does set a
timestamp self._lasthellotime (IIRC), no matter whether the hello was
sent, or replied to. Usually at startup there is another hello scheduled 1
minute later, which won't happen either because that mentioned timestamp
is checked, and it is decided that we must not hello, because we just did,
and we don't have new info to tell (although we _should_ have), and there
is no other scheduled instead. and so on and so on... That is the slow
startup bug, BTW, but i am sure this avoids sensible hello messages in
many other cases. Remember that we always try a small selection of known
MTs no matter how much we know about their responsiveness.
Too tired to continue, there quite a bunch more things like that... Arno
On 7 Mar 2004 17:25:09 -0500, Zooko O'Whielacronx <zooko at zooko.com> wrote:
>
> It would be nice if your node remembered the comm strats for peers, even
> while
> those same comm strats are failing due to a transient network error, and
> tried
> them again later. I don't think that it does, though. I think that it
> forgets
> all the comm strats when they fail, and tries to lookup new ones, which
> of
> course doesn't work during a network outage. So I *think* that
> currently you
> don't reconnect to the network after a (sufficiently long) outage until
> you
> manage to load a bootpage and then contact a MT with the contact info
> from the
> bootpage.
>
> Since currently we have only one working MT (mine) and currently of the
> 7 or so
> bootpages probably only 1 or so has my current IP address, this can take
> a long
> time.
>
> Easy suggestion, for v0.6.2: ping the bootpage operators, prune all
> hardcoded
> bootpage URL that point to bootpages that don't contain at least one
> good MT
> contact info, set up several more MTs (like, half-a-dozen would be
> great), put
> the IPs (or better yet DNS names) of those MTs into all the remaining
> bootpages.
>
> This should be done anyway to increase the robustness and responsiveness
> of the
> network.
>
> Difficult suggestion, perhaps best for v0.6.3: dig into the Byzantine,
> infinitely sophisticated guts of the v0.6 comms code and figure out how
> to make
> it keep old comm strats for later use without allowing those old comm
> strats to
> keep fresh new comm strats out, and without tricking the higher-layer
> code into
> thinking dead nodes are still active. This *might* be really easy, I
> don't
> know. There is a concept of "choose between these two comm strategies",
> and it
> is encapsulated into a single method
> ("CommStrat.choose_best_strategy()"), so
> I *think* if you make that function always choose the newer/better
> replacement
> then you can safely keep the old comm strats around and retry them.
>
> Regards,
>
> Zooko
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: IBM Linux Tutorials
> Free Linux tutorial presented by Daniel Robbins, President and CEO of
> GenToo technologies. Learn everything from fundamentals to system
> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
> _______________________________________________
> mnet-devel mailing list
> mnet-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/mnet-devel
>
--
http://www.arnowaschk.de
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
mnet-devel mailing list
mnet-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mnet-devel
More information about the Mnet-devel
mailing list