|
cosign-discuss at umich.edu
|
general discussion of cosign development and deployment
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Replication Problems
That's fine, thanks heaps for that Kevin
I am not surpirsed about the replication error though, because if the
machine is under high load, and has like 1000 processes most of which
are cosign daemons servicing requests, then the replication process will
only get a small amount of CPU time to actually process the events. At
least that is my understanding of things. That is why I am worndering if
I am putting an unreasonable load on it.
Brett
On Thu, 2005-02-03 at 10:13, kevin mcgowan wrote:
> Brett,
>
> I can get you Michigan's numbers for these things, but it will be
> tomorrow morning (EDT) at the soonest. My sense is that the load
> you're generating is massive, but the check errors surprise me
> nonetheless.
>
> I'll get you our numbers as soon as I can.
>
> Kevin
>
> On Feb 2, 2005, at 3:33 PM, Brett Lomas wrote:
>
> > Hello all.
> >
> > We have just done a large scale distributed test of our cosign
> > installation and have noticed a couple of problems and are just
> > wondering if anyone else has seen this.
> >
> > We have 2 servers, each with apache and the cosign daemon and monster.
> > Each replication to the other pretty much as normal. Each of the
> > servers
> > are behind a foundry, so access to 443 and 6663 is load balanced
> > effectively accross the two boxes.
> >
> > When i ran a stress test (the code for which I will publish so other
> > can
> > use this as it is useful as a tester and user experience monitor for
> > things like nagios etc) on 83 machines, each creating 20 processes,
> > thus
> > giving a maximum concurrancy of 1660 logins sustained over about 2
> > hours
> > we see a lot of cosign check errors.
> >
> > It appears this is because the cosign replication processes are
> > competing for CPU on a box which can, during this test get up to about
> > 700 cosignd processes. So the replication events seem to just queue up
> > and eventually get processed, but not before the check may have failed.
> >
> > Has anyone seen this, or are we infact stressing too much for the an
> > expected load we will not get. We have approximately 4000 student
> > computers and about 11000 desktops in total.
> >
> > I cant help wondering if i am stressing it way beyond what we would
> > normally get. Can someone who is running this in production send me
> > details of the number of logins and check you get a day and the number
> > of machines you have on campus?
> >
> > Thanks
> >
> > Brett
> >
> >
> > !DSPAM:420139b8250431892128387!
> >
> >
> >
> >
>
> ... "I love being weird, or at least, I've made my
> peace with it." ...
>
|