Development on Maintenance Mode Integration in SCVMM 2012 with SCOM 2012 Bug

No so long ago I’ve wrote and article SCVMM Maintenance mode for host does not put the computer object in maintenance mode in SCOM reported as fixed in SC 2012 SP1 and later verified that the same issue is not fixed in SP1 in article SCVMM and SCOM Maintenance Mode Integration Not Fixed in SC 2012 SP1 UR1. After reporting this bug for second time yesterday I’ve received e-mail from MSFT that there is development on the reported bug. The bug report is passed to a feature owner as a Design Change Request (DCR). It will be fantastic if this feature can be fixed in a future Update Rollup. To me it is not so hard bug to fix but developer perspective it can quite different so if we do not receive this fix in Update Rollup let’s hope it will be fixed in future service pack or in vNext. The bug report is still active and you can still vote on it here.

SCVMM and SCOM Maintenance Mode Integration Not Fixed in SC 2012 SP1 UR1

Not so long ago I’ve wrote an article titled SCVMM Maintenance mode for host does not put the computer object in maintenance mode in SCOM reported as fixed in SC 2012 SP1. I’ve reported the bug described in the article and that report was closed on Microsoft Connect site as fixed before the official release of SC 2012 SP1. As I do not trust blindly and always verify any information I’ve tested if this bug is fixed after the release of SC 2012 SP1 UR1. After I’ve made the test in my home lab I was experiencing the same bug again. My test was verified also by Steve Beaumont so the issue is not in my TV screen only. To me in 99% of the cases when you put Hyper-V host in maintenance mode in VMM you will probably want to restart that host and not to receive SCOM alerts about it. I think it is highly unprofessional someone to report to you an bug and you to close that report as fixed without actually fixing it. But as stubborn guys I’ve logged this bug again. If you want to join you can vote for fixing this bug here. I suggest to hurry with your vote before someone closing that report as fixed again Smile.

SCVMM Maintenance mode for host does not put the computer object in maintenance mode in SCOM reported as fixed in SC 2012 SP1

Last week I’ve submitted a bug on a feature of SCVMM that I’ve spotted a long time ago. Due to various reasons I’ve somehow always forgot to submit it. The bug is that when you integrate SCOM 2012 and SCVMM 2012 you can put the Hyper-V host in maintenance mode in VMM and this will automatically put the Hyper-V host in maintenance mode in SCOM. This is true but to some extent. Actually only the objects in SCOM related to SCVMM are put in maintenance mode. What this means if you put the Hyper-V host in maintenance mode trough SCVMM and you restart that host you will still receive alerts like health service failure or failed to connect from SCOM about that host. The reason for that is if only objects related to VMM are put in maintenance mode other objects like the computer object, the health service and the agent watcher are still in active monitoring. A few days after submitting that bug in Microsoft Connect site I’ve received e-mail that this bug is closed as fixed:

image

As this issue is still present in System Center Virtual Machine Manager 2012 SP1 beta this leads me to believe that there are probably post beta releases for TAP customers which are not publically available. In a matter of fact there are some rumors over Internet that System Center 2012 SP1 has passed RTM and could be available early in January. I really hope so these rumors are true so we can have final release of Service Pack 1 for System Center 2012 as soon as possible. When RTM is available I will try to test if this bug is really fixed.

The Case of Run As Accounts Not Deleted from SCSM 2010 SP1 Database After Being Deleted from SCSM Console

The case began when a advanced user accidently entered his account in SCSM 2010 SP1 console in Run As accounts. Later on he deleted his account from Run As accounts. At the time this happened SCSM 2010 SP1 was still not in production so it was not an issue. When the System Center Service Manager environment went in production alerts from SCOM 2007 R2 monitoring started to appear like this one:

The Health Service could not log on the RunAs account <ACCOUNT NAME> for management group <MANGEMENT GROUP NAME>. The error is Logon failure: unknown user name or bad password.(1326L). This will prevent the health service from monitoring or performing actions using this RunAs account.

From the alert we can see which actual account is causing these alerts. So I thought that I will open the SCSM console go to Administration pane and Run As accounts and delete the account from there. But for my surprise when I did that no user account was present there only service accounts that were working normally. The next step was to verify in SCSM event logs that this alert was actually there:

Log Name: Operations Manager
Source: HealthService
Event ID: 7000
Task Category: Health Service
Level: Error
Description:

The Health Service could not log on the RunAs account <ACCOUNT NAME> for management group <MANGEMENT GROUP NAME>. The error is Logon failure: unknown user name or bad password.(1326L). This will prevent the health service from monitoring or performing actions using this RunAs account.

When I looked at the logs I’ve found that this alert was logged almost every hour. I’ve checked the account in question in Active Directory and it was locked. This led me to the idea that the account was located somewhere with old password and was used by Service Manager. As the architecture of SCSM is similar to the SCOM architecture I’ve figured out that accounts were saved in ServiceManager DB and may be this user account was still stuck in the database because it was somehow not deleted properly.

I’ve made some digging over Internet and I’ve found this article: Best Practices: Service Manager 2010 Management Pack for Operations Manager 2007 R2. In point 6 you can see the same issue with a workaround proposed:

I Have Previously Deleted Run As accounts from the UI: If you have deleted Run As accounts from the UI, the symptom will be that you get an alert which tells you that a Run As account is invalid, and when you look at the credentials of the Run As account, you notice that it is not shown in the Run As account view in the Service Manager console.

You can either ignore the alert (if you close it, it will right back), or you can disable the monitor. We are currently looking into how we can help you get out of this state and will hopefully have a solution for SP1. I will make sure to update this post once we have a definitive plan.

Best Practice to Avoid this Issue: The best way to avoid this issue is to never delete Run As accounts from the UI. You can reuse existing Run As accounts by changing their name and/or credentials. If you would like to stop using a run as account, you can change its credentials to Local System and change the name to something easy to remember such as “Inactive.”

This way, you will not end up with stale Run As accounts which cause events to be placed in the Operations Manager event log.

As you see the issue exists in SCSM 2010 and SCSM 2010 SP1 CU3. After seeing this workaround I’ve contacted the user to verify that he entered his account in SCSM and later deleted it. User confirmed this was the case. I’ve decided to implement the workaround. I’ve entered the user credentials in Run As accounts again and later changed the account to System. The issue continued to exist as now I was receiving errors from health service that the user account could not logon locally on the SCSM server. I’ve decided to user my account as dummy account and replace the user’s account with mine in SCSM console. The result was that the health service as continuing to use the user’s account and after changing the password for my account I’ve noticed that logon failure alerts were logged for my account also. That was not smart move to use my account as dummy account Smile. It may be called dummy move Winking smile.

So we now had two user accounts entered in SCSM database that were generating alerts. Clearly the workaround was not working in our case and clearly this was bug in SCSM 2010 SP1. I could try to delete the accounts from the database directly with some SQL query but as SQL is not my strong side and this was production service I’ve decided that Microsoft Support should be contacted to provide resolution. So case was logged to Microsoft. After several e-mails of communication and providing information to Premier Field Engineer and it the issue was identified as bug the FPE contacted the support group of SCSM. The support group of SCSM confirmed it was a bug. They also said that no hotfix is planned for release for this issue but they will provide us with workaround. The good part is that this issue is fixed in SCSM 2012 and deleting accounts from SCSOM 2012 console are deleted also from the database. I’ve verified it in my home test lab also. After several days we received the workaround in a form of SQL query that will delete the unneeded accounts. While waiting for the solution I’ve entered both user accounts with their current passwords in SCSM console in order not be locked by the health service using them.

Here is the SQL query that was provided (you should execute the queries again ServiceManager DB):

   1:  DECLARE @SecureStorageElementId uniqueidentifier; 
   2:  -- change "GUID" to the SecureStorageElementID
   3:  -- of the invalid runas account
   4:  SET @SecureStorageElementId = 'GUID'; 
   5:  BEGIN TRANSACTION
   6:  EXEC dbo.p_CredentialManagerStorageDelete @SecureStorageElementId;
   7:  COMMIT TRANSACTION

The GUIDs for the problematic accounts you can find by listing all accounts and their SecureStorageElementID:

   1:  select * from CredentialManagerSecureStorage

In our case we have found that every time we have deleted account and entered it again an new account was entered in ServiceManager  DB with different SecureStorageElementID.

We tested the query first in Test environment. Than implemented the solution in our production environment. All went smooth and both user accounts were deleted from the SCSM database. Error were not logged in SCSM event log and in SCOM also. Before executing the queries against the ServiceManager DB make sure you have deleted the accounts in question first in SCSM console also.

Before actually implementing this solution in your environment I strongly recommend these actions:

1. Test the query in a Test environment.

2. Backup your production database before executing the queries.

This solution is provided “AS IS” with no warranties from Microsoft or me. Neither Microsoft nor me are responsible if you mess up your SCSM Database/SCSM Environment if you execute the procedure incorrectly.

Many thanks to Microsoft Support for providing us a workaround for fixing this issue. Another case solved.