|
|
cosign-discuss at umich.edu
|
general discussion of cosign development and deployment
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Cosign Loop Breaking
Thanks, Wes.
I think the whole idea has some merit, and wouldn't be a bad thing to
have. But, given that the root cause was a fairly fundamental bug wrt
redundancy in the Java filter, and not the result of a service's
configuration error, I'm thinking it's not all that likely we'd see
looping behavior in the future. The chance is still there due to the
redirecting nature of COSIGN, but we don't have any evidence yet of
folks like me DOSing the system just by configuring it wrong on a given
service, and the core system itself is only going to get more reliable.
So, if I had my way, I'd still put that idea I once mentioned about
having different session timeouts for different services higher in the
development queue. 8)
Not that I don't appreciate you all wanting to put effort into
reliability and performance--I really, really do. All I'm saying is, to
the extent I have influence over it, I want to be protective of y'all's
time.
Wesley D Craig wrote:
On 08 Jun 2004, at 15:37, Cory Snavely wrote:
But maybe I've just offended the person with the weirdo application
8), and maybe the rationale for having protection from this in COSIGN
is clearer if we know the types of misconfigurations where this can
happen. Could you share more details?
cosignd replication is lazy and lossy. There is no requirement in the
protocol for all of the cosignd's to have exactly the same data. This
managed inconsistency is dealt with by having the cosign filters query
multiple cosignd's if the first cosignd responds with "I don't know."
At UMich, we saw the looping problem in a big way due to a bug in one of
the cosign filter implementations. In particular, the java cosign
filter had an error where it would not fail over to other cosignd's when
the information it was looking for was not present on the one-and-only
cosignd it was currently connected to.
The effect of this bug was that users would visit the protected service,
be redirected to the cosign registration service, be redirected back to
the service, ad nauseam. Since the protected service was moderately
popular, thousands of users bounced back and forth between the service
and cosign, bring both to their knees and causing a campus-wide cosign
outage.
The bug wasn't immediately obvious. As I recall, a mis-configuration in
a firewall precipitated the problem. One of the three campus cosign
servers is located behind on departmental firewall. When the firewall
was briefly mis-configured, the cosign protected application that this
department runs was able to get to the cosign server, but no users were
able to get to the same cosign server's web interface. Also, the other
two cosign servers weren't able to send data to the third cosign server,
causing their databases to be quite inconsistent.
So, we fixed the bug in the java filter. However, if we had had the
loop breaking code in place, 1) the problem would have been more
obvious, and 2) we would not have had a campus-wide cosign outage.
Campus-wide cosign outages are real downer for me, so we spent some time
thinking how we could never, ever have another one. This is how we came
up with the loop-breaking code.
:wes
|
|