cosign-discuss at umich.edu
general discussion of cosign development and deployment
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Cosign Loop Breaking
On 08 Jun 2004, at 15:37, Cory Snavely wrote:
But maybe I've just offended the person with the weirdo application
8), and maybe the rationale for having protection from this in COSIGN
is clearer if we know the types of misconfigurations where this can
happen. Could you share more details?
cosignd replication is lazy and lossy. There is no requirement in the
protocol for all of the cosignd's to have exactly the same data. This
managed inconsistency is dealt with by having the cosign filters query
multiple cosignd's if the first cosignd responds with "I don't know."
At UMich, we saw the looping problem in a big way due to a bug in one
of the cosign filter implementations. In particular, the java cosign
filter had an error where it would not fail over to other cosignd's
when the information it was looking for was not present on the
one-and-only cosignd it was currently connected to.
The effect of this bug was that users would visit the protected
service, be redirected to the cosign registration service, be
redirected back to the service, ad nauseam. Since the protected
service was moderately popular, thousands of users bounced back and
forth between the service and cosign, bring both to their knees and
causing a campus-wide cosign outage.
The bug wasn't immediately obvious. As I recall, a mis-configuration
in a firewall precipitated the problem. One of the three campus cosign
servers is located behind on departmental firewall. When the firewall
was briefly mis-configured, the cosign protected application that this
department runs was able to get to the cosign server, but no users were
able to get to the same cosign server's web interface. Also, the other
two cosign servers weren't able to send data to the third cosign
server, causing their databases to be quite inconsistent.
So, we fixed the bug in the java filter. However, if we had had the
loop breaking code in place, 1) the problem would have been more
obvious, and 2) we would not have had a campus-wide cosign outage.
Campus-wide cosign outages are real downer for me, so we spent some
time thinking how we could never, ever have another one. This is how
we came up with the loop-breaking code.