Azure Monitor Alert Series Part 3


It is time for the third part of this blog series. This time we will cover two type of alerts to speed up the pace. Also as I have mentioned before these types of alerts are very similar to Administrative alerts and difference comes from properties section mainly. The alert types we will cover today are:

  • Security Activity Log Alert
  • Service Health Alert

Let’s first start by listing some important information about Security Activity Alert:

  • The record for the alert is generated by ASC in Activity Log. This means that when you have alert in ASC you have to create another alert rule just to get notified properly. If ASC integrates with Microsoft ATP the alerts from that system will also appear in activity log.
  • Alerts in Sentinel will not appear in Activity Log so by creating such alert you cannot get notified on them. At least on the Sentinel alerts you can create manually, not the built in. Managed to test this only with custom created Sentinel alert. What a mess is this to my opinion.
  • Alerts are generated per instance so for every activity log record you get new alert instance
  • To designate this alert Security Activity Log Alert alert you need to scope the alert to Security category
  • You cannot assign severity for these alerts. The severity is always translated to Sev4 when it is Security category just like Administrative. You can get the severity on the actual alert from the properties though.
  • Supports common alert schema
  • It is best to create these alerts per subscription

As I have mentioned before all alerts that are based on Activity log records can be created via Portal or ARM Templates. In previous blog post I have explained why I am not fond on creating things trough Portal. So let’s proceed directly to ARM:

{
    "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "actionGroupResourceId": {
            "type": "string"
        }
    },
    "variables": {
        "apiVersions": {
            "activityLogAlerts": "2017-04-01"
        }
    },
    "resources": [
        {
            "name": "Security Alert High Severity",
            "type": "Microsoft.Insights/activityLogAlerts",
            "apiVersion": "[variables( 'apiVersions' ).activityLogAlerts]",
            "location": "Global",
            "properties": {
                "enabled": true,
                "description": "Security log alert sample.",
                "scopes": [
                    "[subscription().id]"
                ],
                "condition": {
                    "allOf": [
                        {
                            "field": "category",
                            "equals": "Security"
                        },
                        {
                            "field": "severity",
                            "equals": "High"
                        }
                    ]
                },
                "actions": {
                    "actionGroups": [
                        {
                            "actionGroupId": "[parameters('actionGroupResourceId')]"
                        }
                    ]
                }
            }
        }
    ]
}

As you can see here the condition is very simple. We basically have condition for Security category and for severity. The reason why we are not doing any complex condition is because first it does not make much sense and second the properties of this record vary heavily on per alert bases. There are a few properties that are constant and everything else is different depending on the alert. These differences are also not documented. You can potentially also create alert that is scoped only to Security category and skip the condition for severity. That way you will have less alerts.

Let’s have a look a simple Security log record:

Security Activity log record

A few things you should note in that record:

  • resource ID is not the ID of the machine that is affected by the alert. Sounds very stupid to me that these records do not contain the resource id of the affected resource. This is critical part when dealing with alerting of Azure resources
  • resourceType, severity, compromisedEntity, remediationSteps and attackedResourceType are the fields in properties that constant among different ASC alerts as far as I could test this. Also some of these are present only if the affected resource is Azure resource.
  • compromisedEntity usually is the name of the resource
  • severity is the alert severity of the ASC alert. Can be Low, Medium or High

To my modest opinion the records for these activity logs are a mess that is not even documented well. If you have to do some processing on these it is very hard task as you do not know what to expect. Taking into consideration as well that Sentinel alerts are very different experience from Azure Monitor alerts (not aligned at all with them) and their instances do not appear as records in Activity log I would say the security solutions in Azure needs some heavy improvement in terms of APIs, documentation and alerting.

Moving to Service Health alert let’s list some important information about it:

  • Alerts are generated per instance so for every activity log record you get new alert instance
  • To designate this alert Service Health Alert alert you need to scope the alert to ServiceHealth category
  • You cannot assign severity for these alerts. The severity is always translated to Sev4 when it is Security category just like Administrative.
  • Supports common alert schema
  • It is best to create these alerts per subscription
  • Service Health alerts are free

Service Health alerts have its own experience of creating them in Service Health blade but I will proceed directly to the ARM template example:

{
    "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "actionGroupResourceId": {
            "type": "string"
        }
    },
    "variables": {
        "apiVersions": {
            "activityLogAlerts": "2017-04-01"
        }
    },
    "resources": [
        {
            "name": "Service Health Alert",
            "type": "Microsoft.Insights/activityLogAlerts",
            "apiVersion": "[variables( 'apiVersions' ).activityLogAlerts]",
            "location": "Global",
            "properties": {
                "enabled": true,
                "description": "Service Health alert sample.",
                "scopes": [
                    "[subscription().id]"
                ],
                "condition": {
                    "allOf": [
                        {
                            "field": "category",
                            "equals": "ServiceHealth"
                        },
                        {
                            "field": "properties.stage",
                            "equals": "Active"
                        },
                        {
                            "anyOf": [
                                {
                                    "field": "properties.incidentType",
                                    "equals": "Incident"
                                },
                                {
                                    "field": "properties.incidentType",
                                    "equals": "Maintenance"
                                }
                            ]
                        }
                    ]
                },
                "actions": {
                    "actionGroups": [
                        {
                            "actionGroupId": "[parameters('actionGroupResourceId')]"
                        }
                    ]
                }
            }
        }
    ]
}

As you can see we have condition for the Category. We have also the condition for the stage. The stage designates if the service issue is Active or Resolved. There is possibility also for RCA. In case you want to get alerted on all stages you should remove the stage condition. In this case we only get alerted when the service health issue is active. In this condition we also have anyOf condition. By the official documentation for Activity log alerts Microsoft does not allow or support this (it is not very clear) so if you want to use such kind of logic use it in your own risk. Although if you create Service Health alert from Portal it will use such conditions. As far as I have tested it works. Basically we want to receive alert of incident type is Incident or Maintenance. Incident is when there is actual service outage or degradation and Maintenance is when Azure notifies you that for example a host where your VM is located will be restarted in the coming future. They will provide additional details on the time frame. Incident type can also with value Informational or ActionRequired. Those are used to notify you on things like deprecation of service, changes in service, etc. If you want to be alerted only on Incident you can remove the anyOf and Maintenance conditions. Below you can see a sample of the additional fields that are available in properties.

Service Health Activity log record

My advise here if you do not have some specific requirements create one alert per subscription that is not conditioned to specific region or service. As you can see from the above screenshot you have the possibility to use fields like service and region. If you want to get the full list of all services possible got the Service Health blade and try to create alert from there. You will see the full list of selection. Regions are just the display names of the different Azure locations. Not that in the example region is Global as some services are not tied to specific location.

One thing that I wish all Activity log based alerts had is ability to create one alert that is applicable to all subscription. I wish I could define the alert in a way that I do not have to put subscriptions IDs or something else, it automatically applies to all subscriptions and when new subscriptions are created they are just covered by that alert without any modification.

I hope this third part was helpful and useful for you.

2 thoughts on “Azure Monitor Alert Series Part 3

  1. Thank you for doing this series, Stanislav. I have a mental block from the time I’ve spent on Azure alerts so far, and it would help my path further if you could point me in the right direction, or confirm my issue. Working with SCOM and other monitoring systems for years, I’ve become accustomed with management packs, where you introduce a more or less complete way to monitor a system or application created by the product group who knows their system best, and then just tune the pack with hundreds or thousands of monitors to your environment. When now working with Azure alerts I feel we’re back to the stone age, figuring and configuring all the different monitoring points and thresholds we need. Am I missing something, or are we actually back to picking up one pebble after another instead of using an excavator?

    1. Hi Pal,
      Thanks for taking interest in my blog posts. May be to some extent you are right. Certainly some knowledge could have been picked from SCOM and translated to Azure Monitor. I also think that SCOM and specifically MPs had a lot of things that you do not need so may be it is good that we have to start over. That way we can think what is important and signal only on it. This is of course looking for VMs specifically. For other services in Azure like PaaS and SaaS types SCOM really didn’t had anything to monitor those so there is nothing to be brought there. Third angle is that there are a lot of custom applications and writing MP for customer applications was not easy thing. With Azure Monitor at least for some parts making alerting and visualization, etc. is a lot easier. And of course if we look things on higher level there will always be some things that are better in SCOM compared to Azure Monitor and vice versa of course. It is topic that we can probably debate all day but in general as I have said I agree that some knowledge could be brought to Azure Monitor. The good thing is that MPs are just xml files that you can read and probably easily translate to how it should be done in Azure Monitor. I have done that and it is not so hard job.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.