Upgrade your SCOM Notifications with PowerShell

At a client recently for a proof of concept job, we implemented OpsManager to replace an existing monitoring product they were using in their environment.

Out of the gates, they loved it!  SCOM had out of the box management functionality for most the equipment in their environment, and with installing just a few quick management packs, they were able to monitor everything they wanted.  It was great, it was easy and everyone had that warm, fuzzy feeling of IT Project Satisfaction.

One of the major concerns we began to hear was that the out of the box alerts from SCOM weren’t very informative.  For instance, an e-mail would tell you that an alert was triggered, and when and on which computer, but other than that, you were kind of on your own.

I was quickly volunteered eager to jump into the fray, employing two of my favorite tools to fix the issue, Orchestrator and PowerShell!

To start, here is the default notification:

–>Alert: ConfigMgr 2007 Component Health:

SMS_PXE_SERVICE_POINT state

Source: sccmpr01

Path: sccmpr01.woodlawn.net

Last modified by: USA\OPsmgr

Last modified time: 2/11/2014 10:41:32 PM Alert description: sccmpr01

           – ConfigMgr 2007 Component Health: SMS_PXE_SERVICE_POINT state.

            The availability state for SMS component ‘SMS_PXE_SERVICE_POINT’ in site WD1
changed from ‘Online’ to ‘Failed‘.  Its installation state is ‘Installed’.  Its execution state is ‘Hung’.  This component last provided a heartbeat at
’02/11/2014 22:39:23′.  The next
heartbeat is expected in ’30’ seconds from that time.

Alert view link: “http://scom.woodlawn.net/OperationsManager?DisplayMode=Pivot&AlertID=%7b1[…]-aa489%7d

Notification subscription ID generating this message:
{6E14B614-838C-77E1-0176-3A369BC231C2}

 Yeah, pretty uninspiring.  There is a web link, which is nice, but we can’t get to the meat of the issue.  They asked for something which I thought was quite reasonable: “For a disk space alert, why can’t I see which disk and what threshold triggered the alert”, or “For CPU Usage monitors, how come I can’t see a listing of which application are pegging the CPU?”.   Seemed pretty reasonable to me.  

So, here is what I did.  Using Orchestrator, I created a runbook that listens for a new Alert or Monitor being created.  For the next step of the runbook, a PowerShell script is run that reaches out using the Operations Manager module and gathers information about the event using various methods and properties.  This information is used to build an HTML e-mail, making liberal use of the Convert-ToHTML -Fragment and -As Table and -As List parameters.

We then run a snippet of code, based on the alert title to gather additional information.  For instance, if the alert is a ‘disk space too low’ monitor that is exceeded, we may run a WMI query and gather information about the hard drive space free based on the drive letter mentioned in the alert.

The key thing to realize here is that this example just uses a bit of PowerShell to pull out some interesting information already there in Operations Manager, and stores it in a variable which is then string-expanded into an HTML message body.  There are some typos in the text below, all of which stems from the Knowledge base and article info present in OpsMgr.

And here is our final result:

Alert – NA-SCOM-01 – Logical Disk Free Space is low

Information

This alert was triggered because the following monitor was
exceeded:

Logical Disk Free Space – Monitor the percentage
free space and number of free MBytes remaining on a logical disk. Only when
both the low percentage free space threshold and low number of free MBytes
threshold is the disk flagged as having low disk free space.

System
Name

Drive
Type

Volume
Name

Name

Size
(GB)

Free
Space (GB)

Percent
Free

NA-SCOM-01

3a

C:

99.90

1.62

1.67

Thresholds

 

The following threshold criteria were evaluated during this alert:

System
Drive Warning MBytes Threshold:

500

System
Drive Warning Percent Threshold:

10

System
Drive Error Mbytes Threshold:

300

System
Drive Error Percent Threshold:

5

Non
System Drive Warning Mbytes Threshold:

2000

Non
System Drive Warning Percent Threshold:

10

Non
System Drive Error Mbytes Threshold:

1000

Non
System Drive Error Percent Threshold:

5

 

Click here to view the Alert: “http://scom.ops.customer.net/OperationsManager?[..]”

Notification subscription ID generating this message: Tier II Support
– 8 hour Response SLA

Knowledgebase

 

The following information has been provided to assist in
addressing this matter:

Summary

The amount of free disk space on the logical disk volume has
exceeded the threshold. System performance may be adversely affected and the
ability to add or modify existing files on the logical disk volume may not be
possible until additional free space is made available.

Configuration

The Logical Disk Free Space monitoring routine is a high
configurable solution that enables Operators to set varying threshold values
for system and non-system logical disk volumes. In addition separate threshold
values can be set for Warning and Error states.

Since logical disk volumes may vary in size from a few gigabytes
to many terabytes or more the Logical Disk Free Space monitoring routine
requires that an Operator indicate both the Megabyte and Percentage based
threshold values that must be passed before the Warning and Error thresholds
reached. This means that in order for the threshold to be reached both the
Megabyte and Percentage based threshold values for the System or Non-System
Drive must be breached.

The default threshold values for the Logical Disk Free Space
monitoring routine include:

System Drive Free Space Thresholds (Defaults)

Parameter

Default Value

System Drive Error Mbytes
Threshold

100

System Drive Error Percent Threshold

5

System Drive Warning
Mbytes Threshold

200

System Drive Warning Percent Threshold

10

Non-System Drive Free Space Thresholds (Defaults)

Parameter

Default Value

Non-System Drive Error
Mbytes Threshold

1000

Non-System Drive Error Percent Threshold

5

Non-System Drive Warning
Mbytes Threshold

2000

Non-System Drive Warning Percent Threshold

10

 

Please note that Overrides can be used to change any of the
threshold values that are defined above. In addition these thresholds can be
applied to all logical disk volume instances in the management group or if
needed separate threshold values can be defined for specific logical disk
volume instances.

Causes

When existing files grow in size and the new files are added, the
free space is taken up on a logical disk. When the amount of free space on the
logical disk falls below the threshold, the state for the logical disk will
change.

Resolutions

To increase the amount of available disk space, do one or more of
the following:

·  Run Disk
Cleanup to gain more free space on the disk.

·  Back up and
remove files, or delete unnecessary files from the disk.

·  Move files to
another disk or to offline storage.

·  Purchase
additional storage or switch to a larger disk.

To view recent disk space history you can use the following view:

Start Disk Capacity View


This approach uses a runbook to gather the information needed to create this report, however the same could be done using a notification channel in SCOM for the clever.

Big thanks to Sean Duffey for his great blog post Building a Daily Systems report email with Powershell for getting me started down this path.

Advertisements

One thought on “Upgrade your SCOM Notifications with PowerShell

  1. Nigel McMaster September 25, 2015 / 9:21 am

    Hi Steve,

    I see that you have added the Notification Subscription Id to the bottom of the email report.
    I would greatly appreciate it if you could show me how you got that information from the alert notification.

    Many thanks.

Have a code issue? Share your code by going to Gist.github.com and pasting your code there, then post the link here!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s