SSO Help

An Online Community for Identity & Access Management Professionals

Hi,

is there any way to send email and page when policy server services down?

Recently we had network issues and policy server unable to connect LDAP server, due to that policy server stopped , is there is way we can restart policy server
automatically


Thank you,

Tags: automatically, policy, restart, server

Views: 25

Reply to This

Replies to This Discussion

Typically we go the route of using something like Unicenter or Operations Manager to monitor the environment and send alerts if there are issues. SiteMinder has the ability to generate SNMP traps to alert these tools when there are problems. You can also use these tools to restart services, etc., on the servers.

The issue you have identified above is a little unclear to me. Did the policy server die because it could connect to the LDAP directory due to the network issues? You should not need to restart the policy server. It should reconnect to the directory once it again becomes available. If the process did die the watchdog thread should automatically restart the policy server. At least it did that historically. Maybe that has changed.

Todd
You're asking a few questions.....

1) Send email when the policy server is down:
You can use URL monitoring against the policy server admin console (when the service is down you can't login), though you would then need to have the monitor page out.
We run a PERL script that logs in as a web agent to each policy server. If it doesn't get a response it sends out an email to our mail relay server.

2) Automatically restart
If the service CRASHES, in windows you can tell a service to start after 1-3 unexpected crashes. If the service HANGS, you're pretty much SOL. My recommendation is to run SMSTATS and monitor Current Depth, I would recommend running it once every 5-10 minutes. We alarm if we go past a Current Depth of 100, but I've been thinking about lowering it to say 50. What you really want to watch for is when it goes up and ramps up, that means people somewhere are having a bad experience on login.

3) LDAP failure (you didn't ask this)
Work with your monitoring team to do an LDAP bind and alarm out when it takes too long (or fails).

RSS

© 2012   Created by CoreBlox

Badges  |  Report an Issue  |  Terms of Service