CoSign: Collaborative Single Sign-On  
AnnouncementsDiscussion
 

cosign-discuss at umich.edu
general discussion of cosign development and deployment
 

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Cosign Loop Breaking



Thanks, Wes.

I think the whole idea has some merit, and wouldn't be a bad thing to have. But, given that the root cause was a fairly fundamental bug wrt redundancy in the Java filter, and not the result of a service's configuration error, I'm thinking it's not all that likely we'd see looping behavior in the future. The chance is still there due to the redirecting nature of COSIGN, but we don't have any evidence yet of folks like me DOSing the system just by configuring it wrong on a given service, and the core system itself is only going to get more reliable.

So, if I had my way, I'd still put that idea I once mentioned about having different session timeouts for different services higher in the development queue. 8)

Not that I don't appreciate you all wanting to put effort into reliability and performance--I really, really do. All I'm saying is, to the extent I have influence over it, I want to be protective of y'all's time.

Wesley D Craig wrote:
On 08 Jun 2004, at 15:37, Cory Snavely wrote:

But maybe I've just offended the person with the weirdo application 8), and maybe the rationale for having protection from this in COSIGN is clearer if we know the types of misconfigurations where this can happen. Could you share more details?

cosignd replication is lazy and lossy. There is no requirement in the protocol for all of the cosignd's to have exactly the same data. This managed inconsistency is dealt with by having the cosign filters query multiple cosignd's if the first cosignd responds with "I don't know."


At UMich, we saw the looping problem in a big way due to a bug in one of the cosign filter implementations. In particular, the java cosign filter had an error where it would not fail over to other cosignd's when the information it was looking for was not present on the one-and-only cosignd it was currently connected to.

The effect of this bug was that users would visit the protected service, be redirected to the cosign registration service, be redirected back to the service, ad nauseam. Since the protected service was moderately popular, thousands of users bounced back and forth between the service and cosign, bring both to their knees and causing a campus-wide cosign outage.

The bug wasn't immediately obvious. As I recall, a mis-configuration in a firewall precipitated the problem. One of the three campus cosign servers is located behind on departmental firewall. When the firewall was briefly mis-configured, the cosign protected application that this department runs was able to get to the cosign server, but no users were able to get to the same cosign server's web interface. Also, the other two cosign servers weren't able to send data to the third cosign server, causing their databases to be quite inconsistent.

So, we fixed the bug in the java filter. However, if we had had the loop breaking code in place, 1) the problem would have been more obvious, and 2) we would not have had a campus-wide cosign outage. Campus-wide cosign outages are real downer for me, so we spent some time thinking how we could never, ever have another one. This is how we came up with the loop-breaking code.

:wes


 
Copyright © 2002 - 2004 Regents of the University of Michigan :  Page last updated 15-December-2010