Monitoring Windows Services States with Log Analytics


Monitoring Windows Services States is one of the most common requests that I’ve seen on forums, groups and blog posts. My fellow MVP and OMS expert Stefan Roth wrote a similar blog post titled OMS – Monitor Windows Services / Processes. I would suggest to check it out as well. The approach I will show is somehow already cover in official article that demonstrates custom fields in Log Analytics. The difference is that we now have the new rich Log Analytics search syntax so we do not need  custom fields anymore. This approach also is different from Stefan’s as his one covers wider topic with monitoring processes by using performance counters. In this approach we will use windows events which Stefan mentions that is not reliable but he was referring to specific Event Id which I also agree it is not reliable. In the next steps I will use another Event Id that is reliable 100%. The advantage of using windows events for monitoring windows services states are:

  • Only windows events are gathered which results in less data uploaded compared to performance data
  • You do not have to add performance counter for each process, you just need to add only one event log to monitor all services

  • The services are shown with their actual name that is used in services.msc or Get-Service cmdlet.

  • We have the actual state of the service when it happened

Some of the disadvantages of this method are:

  • Until the service is started or stopped it will take at least 5 minutes until the data appears in Log Analytics

With that said let’s see how we can achieve this task very easy by using the power of Log Analytics search.

The first we need to do is to add System event log as data source:

image

If you prefer you can only add Information channel.

The next step is just to execute a query but before that let’s mention what we will do. We will filter on Event Id 7036 from System log. This event contains information which service has stopped or started. That information is contained in EventData column but is unstructured. By using the Log Analytics powerful language we will structure that data very nicely. This is done by executing the following query:

Event

| where EventLog == “System” and EventID == 7036 and Source == ‘Service Control Manager’

| parse kind=relaxed EventData with * ‘<Data Name=”param1″>’ Windows_Service_Name ‘</Data><Data Name=”param2″>’ Windows_Service_State ‘</Data>’ *

| sort by TimeGenerated desc

| project Computer, Windows_Service_Name, Windows_Service_State, TimeGenerated

image

As you can see from the screenshot we have the Computer in question, the service name, the state – stopped and running when it was started and the time.

Keep in mind this is point in time state and I suggest to automate so that when service is stopped you fire a runbook that starts it. By having the actual name of the service this is pretty easy.

Remember also to filter on the services that you want to monitor because there are a lot of services that start and stop all the time especially on Windows Server 2016.

53 thoughts on “Monitoring Windows Services States with Log Analytics

  1. Great post! Another option is to use the Change Tracking solution, with a query like this one:

    ConfigurationChange
    | where ConfigChangeType == “WindowsServices”
    | sort by TimeGenerated desc
    | project Computer, SvcName, SvcState, TimeGenerated

    1. I cannot share the exact runbook but you get the data from Log Analytics – Computer and Service Name. You will have hybrid worker where the remediation work, the hybrid worker will execute the runbook. The runbook will make a PS session to the server and use Get-Service and Start-Service to start it. Simple as that. There is no special magic to the runbook.

    1. Yes,
      It is very simple. You just add a filter on the newly created field Windows_Service_Name. Fill the service name in quotes. Keep in mind that “==” is case sensitive if you want case insensitive you can use “=~”. Any Log Analytics string operators are available for you to filter.

      Event
      | where EventLog == “System” and EventID == 7036 and Source == ‘Service Control Manager’
      | parse kind=relaxed EventData with * ‘’ Windows_Service_Name ‘’ Windows_Service_State ‘’ *
      | where Windows_Service_Name == “”
      | sort by TimeGenerated desc
      | project Computer, Windows_Service_Name, Windows_Service_State, TimeGenerated

      1. Thanks for the guide, however only the WMI Performance Adapter Service and Windows Update Service are shown. Should the original query not show all services?
        Also tried the specific service query and entered the service name here: | where Windows_Service_Name == “Custom Servicename”. This however shows not result although the service exists.

        1. Hi,
          With this solution we monitor only the states happening with the services. You will not see all your services there you will only see services when they are stopped or started. If a service hasn’t been stopped or started you will not see it there. When a service is stopped or started there is specific event logged in the System Event Log. We track that event so we know at what specific time service was started or stopped. The query visualizes those actions and can be used for creating alert to monitor if specific service was stopped for example.

  2. i just thinking about scenario service is stopped in multiple computer and i want receive an separate alert for each computer, is there any possibility for this.

    1. Hi
      There is probably problem with copying the query. Unfortunately so far in my blog posts I’ve posted code like any other text and this creates issues with code. When you copy it it there might be empty lines. Remove the empty lines in the query. Also quotes (‘) will not be copied correctly so please remove the quotes and replace them by typing quote sign from your keyboard. I’ve already started using better way to post code in my blog posts.

    1. You can create Log Analytics View to show you the stopped services. Keep in mind that will show you when service was stopped but not if the service is still stopped. You are monitoring the states as described in the article. Preferably you will create alerts and tie them to automation to avoid people manually going to start the services.

      1. Lets rephrase, do you think its possible to make a widget that show the currently stopped services, and not when they have started up again. Maybe with the change tracking logs?
        I’ve been trying to figure out a way to do this, without any succes.

  3. it is obvious that monitoring with an interval of 1 time every 5 minutes, plus the time to send the metrics in Azure, or sending the events of the event log to the Log Analytics is not a useful solution. Since very often the SLA of business services requires a faster response.
    MS need to remove the restriction in 5 minutes or make a separate plan for subscription to send alerts more often.
    What do you think about it? Maybe we have additional tricks in OMS?

  4. Hey, great post. It worked after edtiting the copied code.
    I’ve got results for the different vm’s(6), but both columns “Windows_Service_Name” and “Windows_Service_State” are empty for about 2164 records, in my oppinion it can’t be right. Do you have any suggestions to resolve the problem?

    1. Hi
      I do not know what is your data nor how your modified query looks. Without those two it is hard to provide you an answer. If those fields are not filled you either processing events that are not related to service status events or you are have changed the query so it does not work correctly.

      1. Thank you for your answer. I just removed the empty lines, and replaced the existing quotes with new qoutes at the same location, because I’ve got some syntax error. So, I didn’t change anything important. When I run the query, I just get the name of the vm’s where this events were find and when they were find. If I comment out the “| project” line, I get the name of the services in the “ParameterXml” column.

          1. Event
            | where EventLog == “System” and EventID == 7036 and Source == “Service Control Manager”
            | parse kind=relaxed EventData with * “” Windows_Service_Name “” Windows_Service_State “” *
            | sort by TimeGenerated desc
            | project Computer, Windows_Service_Name, Windows_Service_State, TimeGenerated

            Unfortunatelly, I cannot paste pictures in this this reply. But I get 497 records in four columns “Computer”, “Windows_Service_Name”, “Windows_Service_State” and “TimeGenerated”. “Computer” and “TimeGenerated” have information, but the other two columns are empty.

            1. Sorry, it seems that he didn’t paste the “” into the reply before. It should be, as well as in your query, inside the 2 quotes.

              1. The actual query is

                Event
                | where EventLog == ‘System’ and EventID == 7036 and Source == ‘Service Control Manager’
                | parse kind=relaxed EventData with * ” Windows_Service_Name ” Windows_Service_State ” *
                | sort by TimeGenerated desc
                | project Computer, Windows_Service_Name, Windows_Service_State, TimeGenerated

                you have removed the most important part that provides the logic for parsing

  5. I have done with different approach instead of using event data field , I used ‘RenderedDescription’ field. Here it come script.
    Event
    | where EventID == 7036 and Source == “Service Control Manager”
    | project RenderedDescription , Computer , EventID , TimeGenerated
    | where RenderedDescription == “The SQL Server (MSSQLSERVER) service entered the stopped state. ”
    | project Computer , EventID , TimeGenerated , RenderedDescription
    | sort by TimeGenerated desc
    | where TimeGenerated >= ago(3h)

    1. Yes that is possible as well but your approach is not very dynamic. The approach I’ve demonstrated is very dynamic as it works for every service possible on windows. With your approach if the SQL server has a named instances the above will not work as the named instance will have different name than MSSQLSERVER.

  6. Hi, Thanks for the helpful post.
    I added the configuration for system events but I somehow cannot see the eventlog table in there. Any suggestions?

    1. Did you add it for Event levels – Error, Warning, Information? Do you have Windows computers reporting to your Log Analytics workspace. How much time have you waited after adding the event log?

      1. Apologies, silly mistake. I was looking for the eventlogs table. I should be looking for events table though 🙂
        I can see windows services event in the table now.
        Also, is there a way to monitor non-windows services? I cannot see my created services.

        1. Not sure what you mean by non-windows services. The described method can be used for any service that you see in services.msc. Obviously it works only on Windows OS based machines and not on Linux OS based ones.

  7. Hi Thanks for the post!
    It help a lot with the task I’m currently working on

    But i have a question:
    – Is it possible to make it check if there is a change in the state of the service.
    Like when a service changes from “Stopped” to “Running”??
    Example:
    /*Only looking for services that have stopped*/
    Windows_State_Service == “Stopped”
    /*But I want to have the service removed from the list, if it starts again*/
    /*is that possible????*/

    1. Hi,
      The query below will visualize the latest state available in Log Analytics per computer, per service;

      “`
      Event
      | where EventLog ==”System” and EventID == 7036 and Source == ‘Service Control Manager’
      | parse kind=relaxed EventData with * ” Windows_Service_Name ” Windows_Service_State ” *
      | sort by TimeGenerated desc
      | project Computer, Windows_Service_Name, Windows_Service_State, TimeGenerated
      | summarize arg_max(TimeGenerated, *) by Computer, Windows_Service_Name
      “`

      1. Hi again

        When I try to use the query you have sent, all I’m getting is:

        Syntax Error:
        There are no columns to be calculated
        Support id: f7efdd13-0a9a-4f33-a832-7b244ec1a82e

        And I don’t really know what to do with it so far

        The only place in the query that Log Analytics doesn’t like is the:
        “parse kind=relaxed”
        Where relaxed is the only one that have a red underline

        (I don’t know if I have done something stupid or not 🙂 But I can’t get it to work)

        1. Some things cannot be copied good in reply windows.

          Event
          | where EventLog == ‘System’ and EventID == 7036 and Source == ‘Service Control Manager’
          | parse kind=relaxed EventData with * ” Windows_Service_Name ” Windows_Service_State ” *
          | sort by TimeGenerated desc
          | project Computer, Windows_Service_Name, Windows_Service_State, TimeGenerated
          | summarize arg_max(TimeGenerated, *) by Computer, Windows_Service_Name

  8. Hello Stanislav,

    Thank you very much for this interesting post. On my case VM are shut down every night. As you know, shut downs generate some stopped events…

    How can I exclude these event logs with my query ? I thought to use HeartBeat and compare TimeGenerated with LastHeartBeat. But I couldn’t build a proper query.

    let LastHeartBeat = Heartbeat | where Computer contains “Virtual Machine” | summarize max(TimeGenerated) by Computer;
    Event
    | where Computer in (LastHeartBeat)
    | where EventLog == “System” and EventID == 7036 and Source == “Service Control Manager”
    | parse kind=relaxed EventData with * ” Windows_Service_Name ”
    Windows_Service_State ” *
    | where Windows_Service_Name contains “SERVICE_NAME” and Windows_Service_State == “SERVICE_STATE”
    | where TimeGenerated <= LastHeartBeat

  9. Hi, Hope ypu are doing good.
    I’m using following query to find that “Microsoft Monitoring Agent” has stooped.I dont get any result though on the other side if i replace “stopped” with “running”, it respond me all occurrence in last an hour.
    Kindly assist me to come out of this so that i could utilize it.

    Event
    | where EventLog == “System” and EventID == 7036 and Source == “Service Control Manager”
    | parse kind=relaxed EventData with * ” Windows_Service_Name ” Windows_Service_State ” *
    | where Windows_Service_Name has “Microsoft Monitoring Agent” | where Windows_Service_State contains “Stopped”
    | where TimeGenerated > ago(1h)
    | sort by TimeGenerated desc
    | project Computer, Windows_Service_Name, Windows_Service_State, TimeGenerated

    1. Hi,
      I am not sure what is the specific problem in your case. I have not tried with with Microsoft Monitoring Agent service. When that service is stopped there is no log being send to Log Analytics so may be it is caused by that. If you can I would suggest using Change Tracking solution as that one allows you to have interval for services changed state to 10 seconds and the data gathered is a little bit better formatted.

      1. Thanks for you valuable inputs,As far as i understood it takes almost 30 mins to send telemetry to Log Analytics .(it takes around 30 mins to reflect through kusto query)
        However when i start service(running state) it reflects within couple of mins.
        Any inputs will be highly appreciated.

        1. If you use the above method System log is uploaded every 5 minutes to Log Analytics. So usually there is around 5 minutes delay until the event happens on the machine and it is visible in Log Analytics search. If you use Change Tracking solution that one is configured to upload data every 30 minutes by default but you can configure it to low as 10 seconds. Check the official documentation of that solution for additional details.

  10. Hi Stanislav,

    I’m starting with Azure Monitoring and found your article on monitoring services. I’ve changed the query to get the latest status like this:

    Event
    | where EventLog == “System” and EventID == 7036 and Source == “Service Control Manager”
    | parse kind=relaxed EventData with * ” Windows_Service_Name ” Windows_Service_State “” *
    | sort by TimeGenerated desc
    | summarize LastTime = arg_max(TimeGenerated,*) by Computer, Windows_Service_Name
    | sort by Computer, Windows_Service_Name asc
    | project Computer, Windows_Service_Name, Windows_Service_State, LastTime

    That seems to work well. However, I can see entries for both the service name and the displayname:

    IKE and AuthIP IPsec Keying Modules stopped 9/14/2021, 10:05:33.200 AM
    IKEEXT running 9/14/2021, 10:06:00.893 A

    MSMQ running 9/14/2021, 10:06:04.583 AM
    Message Queuing stopped 9/14/2021, 10:05:33.150 A

    Looking at the underlying event 7036 both variants are used:

    IKEEXT
    running

    but also

    IKE and AuthIP IPsec Keying Modules
    stopped

    So Windows is not consistent in reporting the names. Especially after rebooting a system the results using the DisplayName are incorrect. Once the service gets restarted it would display the correct information.

    The monitored system is running Server 2016

    Is there a way around this ?

    Thanks
    Thorsten

    1. Hi,
      If that is the case this is probably Windows Server issue and you need to turn to the official support for it.
      Otherwise instead of extracting the display name you can extract the name of the service rather the display name and use that. I would guess the name for service with display name Message Queuing is MSMQ. I believe the name was also available in EventData column.

      1. Thanks for your quick reply.
        Unfortunately, only one of the names is in the event details, either the long display name or the internal service name.

        We might have to raise that with Microsoft. But not sure they’ll create a fix for it.

        Do you know about another more reliable way to monitor service health ? For change tracking we would need to use Azure Arc to get our on-prem systems added which comes at additional cost

        Thorsten

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.