Replication Problems

Hello all.

We have just done a large scale distributed test of our cosign
installation and have noticed a couple of problems and are just
wondering if anyone else has seen this.

We have 2 servers, each with apache and the cosign daemon and monster.
Each replication to the other pretty much as normal. Each of the servers
are behind a foundry, so access to 443 and 6663 is load balanced
effectively accross the two boxes.

When i ran a stress test (the code for which I will publish so other can
use this as it is useful as a tester and user experience monitor for
things like nagios etc) on 83 machines, each creating 20 processes, thus
giving a maximum concurrancy of 1660 logins sustained over about 2 hours
we see a lot of cosign check errors.

It appears this is because the cosign replication processes are
competing for CPU on a box which can, during this test get up to about
700 cosignd processes. So the replication events seem to just queue up
and eventually get processed, but not before the check may have failed.

Has anyone seen this, or are we infact stressing too much for the an
expected load we will not get. We have approximately 4000 student
computers and about 11000 desktops in total.

I cant help wondering if i am stressing it way beyond what we would
normally get. Can someone who is running this in production send me
details of the number of logins and check you get a day and the number
of machines you have on campus?



