SystemCenterCentral: Automatically Reset Unhealthy Unit Monitors (when alert closed in error by a human)

In the past to resolve such issues we had to implemented scripts that were triggered daily by the task scheduler on one of the management servers. Pete Zerger shows us a way how to use Orchestrator and be more effective. Read the about the solution here.

Forward alerts from SCOM 2007 R2 to SCOM 2012 via Orchestrator

If you are doing parallel upgrade from SCOM 2007 R2 to SCOM 2012 that means you have two SCOM environments. In such situation it will be difficult for support teams to work with two SCOM consoles. To ease that process Kelverion create a runbook from Orchestrator 2012 that will allow you to forward alerts from SCOM 2007 R2 to SCOM 2012. That way you can consolidate all alerts at one place. The best about this runbook is that Kelverion is offering it for free download. Download it from here.

SLA Management in System Center Operations Manager

The idea behind this article is to show you how you can create dynamic groups that represent different Service Level Agreements (for example GOLD, SILVER or BRONZE). Depending what SLA level is certain CI (server) it will be put in the corresponding group.

Also I should mention that this solution is already available over Internet but is described in a couple of articles by different authors an I just want to gather all the information on one place and point out some tips that will be helpful if you decide to implement such solution on your own.

ALL CREDITS GO TO KEVIN HOLMAN, TIM McFADDEN AND ALL GUYS WHO PROVIDED COMMENTS ON THEIR ARTICLES.

First you need to build your SLA model. The best way is to use Active Directory. Lets say you have 3 different SLAs – GOLD, SILVER and BRONZE then you can create 3 security groups for example SLA-GOLD, SLA-SILEVR and SLA-BRONZE. In these groups you will put the AD computer objects of your servers. For example if server SQL01.lab.com have SLA GOLD the computer object of that server have to be added as member of group SLA-GOLD. With this example you distribute all your servers in the groups depending on your SLA. If you have servers that do not have SLA you do not add them in any of the groups. When you populate the 3 groups you have to create one GPO. That GPO should apply different registry key on the servers depending in which security group is. It is good idea the reg key to be applied in path like this HKLM\Software\CompanyName with DWORD Values like SupportLevel and data like 1 for GOLD, 2 for SILVER, 3 for BRONZE and 0 for NO-SLA. So if server doesn’t belong to any of the 3 groups DWORD Value SupportLevel with data 0 will applied. This GPO should be linked to the OU where you store the computer objects for your servers.

Note: You can use your own DWORD or String values for distinguishing SLA.

This SLA model for Active Directory was developed by my colleague Yordan Dimov.

Now that we have the registry keys applied to the servers we need to bring that information in SCOM. Kevin Holman has a great article describing how to do that titled “Creating custom dynamic computer groups based on registry keys on agents”.

The disadvantage of the proposed solution by Kevin is that it populates only the the SCOM computer objects in the dynamic groups. Kevin also mentions that disadvantage. Bu there is solution for that proposed by Tim McFadden in “Dynamic Computer groups that send heartbeat alerts”. In the article you will find out how to populate the groups the SCOM heartbeat object of the servers. In the comments of the article you can also find out a way to add the cluster names if the servers that are added to particular group are nodes of a cluster. I suggest to populate the groups with the heartbeat objects and the cluster names in order not to miss alerts when you forward them to SCSM or any other Configuration Management System.

Note: You have to have some some basic knowledge about the structure of management packs and XML.

After this you might think you are ready but there are some other obstacles you may face. If you have Hyper-V servers with virtual machines on top of them and you’ve imported Hyper-V management pack you will probably stumble on one particular issue. If you add Hyper-V server to SLA group all the virtual machines that are located on that server will be added to the SLA group in SCOM also. And some of these virtual machines may even do not have SLA and you alerts for them will be forwarded to your ticketing system. I can confirm that this issue is present in SCOM 2007 but you may also face it in SCOM 2012. In the Hyper-V Management Pack there is discovery that creates relationship between the Hyper-V server and the guest virtual machines but that association doesn’t work properly as it creates these weird issues. Another issue that you might face because of that association is for example if you put Hyper-V hosts in maintenance mode all virtual machines on that host will also be put in maintenance mode. But there is a cure for these issues also. You have to disable that discovery and Kevin described how in his article “Why do my group memberships for Windows Computers have machines that don’t belong there?”.

If you follow the steps described by Kevin and you still see this association for hyper-v servers that are part of clusters I suggest to follow these steps to resolve it completely:

1. Manually uninstall all SCOM agents on all nodes part of a cluster.

2. Remove the cluster name from agentless monitoring. If you can do it trough the SCOM console follow this article “Operations Manager (SCOM) 2007 – How to remove cluster objects from scom when computer objects in cluster cannot be deleted”.

4. Delete the nodes from Agent Managed views in Administration pane.

3. Run Remove-DisabledMonitoringObject command in SCOM PowerShell. Wait 20-30 min.

4. Install SCOM agents on all nodes.

5. Add cluster name to Agentless monitoring.

Now that you fixed any obstacles you can configure SCOM to send alerts to SCSM only for servers that are in SCOM SLA group. This can happen trough the following steps:

1. Open SCOM console.

2. Open Administration pane.

3. Open Internal connectors view.

4. Find the connector that sends alerts to SCSM and configure it to send alerts alerts only from your dynamic SLA groups.

Additionally with these SLA groups in SCOM you can create different overrides depending on SLA level.

Once again a big thank you to all members of the SCOM community.

Routing Alerts from SCOM in SCSM by using Custom Field Criteria Type

Recently I faced the task to route alerts from SCOM in SCSM to different Support Groups. It seemed like an easy task because in most cases routing is based in Management Pack Name criteria. For example alerts that come from Management Pack that contains “SQL” in its name are assigned to SQL Support Group, alerts that come from Management Pack that contains “BizTalk” in its name are assigned to BizTalk Support Group and etc. You get the idea you can create such routing rule for every Management Pack. Besides this routing rule you can also route alerts based on SCSM groups membership of computer, Custom Fields and  Monitoring Classes.

image

When I started configuring routing based on Management Pack Name I didn’t had any issues everything was working as it was suppose to work you just have to be careful not put make any conflicts with rules by overlapping them. But when I tried to configure routing based on Custom Field I faced issues. In the next lines I will describe how stumbled on that issue and how I fixed it. I couldn’t find any such issue over Internet so I’ve decided to share it with the community.

Lets say that we have two Support Groups – Backup and Storage. Those two Support Groups are using one management pack in SCOM to monitor their devices. So in SCSM we need to configure: alerts that are coming from devices supported by Backup Support Group to be assigned to Backup team and alerts that are coming from devices supported by Storage to be assigned to Storage team. Most of you will probably suggest that we can put these devices in groups in SCSM and route alerts based on that or even easier we can route them based on Monitoring Class. But these two options are also not available because all these devices are monitored by SNMP so they they do not have CI record in SCSM and all alerts come from the server where the management pack is installed in our case this is the RMS server. Such management pack is HP Storage Management Pack. This management pack monitors various storage devices manufactured by HP and all is put in one MP file. Lets say we want to monitor 3PAR Storage, SAN Switches, D2D Devices and Tape Libraries with this management pack. All of these device are monitored by SNMP and we want 3PAR and SAN switches alerts to go to Storage Support Group and D2D device and Tape Libraries alerts to go Backup Support Group. When alerts for these devices are created in SCOM the first 6 custom fields are filled with values:

  • Custom Field 1 – Source of the Event
  • Custom Field 2 – Logging Computer name
  • Custom Field 3 – Device Id
  • Custom Field 4 – Device Name
  • Custom Field 5 – Source Computer Name – the computer that generated the event
  • Custom Field 6 – Source Computer Domain Name – the domain of the computer that generated the event

So custom fields for alert could look like this:

image

Or like this:

image

From the examples above it is clearly that the best option is to route alerts based on Custom Field 1. Before creating the route rule I will show you the steps for creating the templates that will be used by these rules.

If we go in SCSM console –> Library –> Lists and open the properties of Incident Tier Queue list we can see that we have 3 Support Groups – Storage, Backup and Windows:

image

So we need to configure 2 Templates in SCSM – one for Storage and one for Backup Support Group. We go to Templates and from Action Menu we choose Create Template and new window appears:

image

We can name the templates “SCOM Incidents Storage”, for class to choose Incident and for management pack you we can select a custom management pack where we store such settings. When we click OK an incident form will open. This is our template and here we have to fill the fields that will be changed when alert meets certain routing rule criteria. In our case we can populate Classification category, Source and Support group:

image

You can choose to populate different fields but Support group is the field that is actually used for assignment. When We click OK the template will be saved. Another template have to be created the same way for Backup:

image

Now we are ready with the templates and we can configure the routing rules in SCOM Alert Connector. I will not show how this connector is configured because it is pretty simple operation and there are a lot of articles over Internet about that.

When we open the SCOM Alert Connector Properties there is Alert Routing Rules tab and on that tab routing rules are added:

image

You can see even the option that if alert doesn’t meet any of the specified routing rules Operations Manager Incident Template will be used for them. This is the default SCSM Template. When we click on Add button a new window appears. In this window I gave distinguishing name for the routing rule, which template to use and the criteria for the alerts:

image

So I was ready with my first routing rule so I’ve clicked OK on the rule and OK on Connector’s window. Before creating more rules I’ve decided to test if my routing was right. You can create test alert from your device or you can take any alert that is with status new and it is not forwarded to your SCSM server and modify the custom fields like those for your device. After you modify them you can forward that alert to SCSM to see if it will be routed correctly just like this by selecting Forward to –> Alert Sync: SCOM alerts:

image

After the alert was forwarded I’ve open the SCSM console and found the alert created as a incident:

image

As you can see from the screenshot the Storage template I’ve created wasn’t applied to this incident because Support Group field was empty which meant that the default Operations Manager Incident Template was applied and the alert didn’t matched my routing criteria. At this point I understood that I have to make some troubleshooting in order to solve this.

The first thing I wanted to see if Custom Fields properties arrived in SCSM from SCOM properly. This can be seen in the Extensions tab of the incident:

image

As we can see from the screenshot all properties are the same as they appear in SCOM. I couldn’t find any reason why this solution is not work so I’ve started to modify the routing rule by different methods like using Custom Field 3 for rule instead of 1.

image

But this didn’t work also so I’ve switched back to Custom Field 1 and realized that the value of “3PAR” that I’ve put for that field was still there. I thought when I select Custom Field 3 the value for Custom Field 1 will be automatically reset but this was not the case. This lead me to the thought that all used Custom Fields have to be defined in the routing rule in order to work so I’ve created the routing rule for Storage to use all Custom Fields:

image

image

image

image

image

image

I’ve also created the routing rule for Backup to see if they will work in parallel:

image

image

image

image

image

image

After creating the tow rules they looked like this in the SCOM Alert Connector:

image

As you can see the routing rules are different only for the definition of Custom Field 1.

After saving the SCOM Alert Connector configuration I’ve modified the custom fields of two alerts in SCOM and forward them to SCSM:

image

image

When the alerts were forwarded successful I’ve checked the SCSM console to see how both alerts look:

image

image

As you can see both alerts are routed correctly and assigned to the right Support Group.

In order to use routing of alerts for custom fields all used fields have to be configured in the routing rule.

The behavior of the connector for routing alerts using Custom Fields criteria is the same for SCOM 2007 R2 and 2012.

KB: OpsMgr: Monitoring Alerts may not auto-resolve

This knowledge base article offers explanation to the issue.