Monday, April 14, 2014

Zabbix : Create a production network interface trigger

Following my two previous posts on how to add interface's description in Zabbix graphs [1] and triggers [2], I will finish this serie of Zabbix posts with the creation of a production interface trigger.

By default Zabbix includes the "Operational status was changed ..." trigger which is (from my opinion) a big joke :
  • The trigger disappears (status "OK") after the next ifOperStatus check (60 seconds by default)
  • The trigger is raised when an equipment is plugged in. This is a "good to know information" but I can't rise a high severity trigger each time something is plugged !
  • I can't tell if the interface was up and went down OR if the interface was down and went up.
  • If I want to have a "Something was plugged in on GEX/X/X" trigger, I would make a special trigger for that purpose.
  • The trigger doesn't include the interface's description (which is extremely irritating and makes me want to kill little kittens). Check my previous post [2] if you care about kitten's survival.
This new trigger will have the following properties :
  • Raise ONLY if the interface was up (something was plugged in) and went down (equipment stopped, interface shut or somebody removed the cable). 
  • Will disappear if the interface come back up.
  • A "high" severity and will include interface's description.

Go to "Configuration -> Templates -> Template SNMP Interfaces -> Discovery -> Trigger prototypes" and click on "Create trigger prototype".

Use the following line as trigger's name : 

 Production Interface status on {HOST.HOST}: {#SNMPVALUE}, {ITEM.VALUE2} : {ITEM.VALUE3}  

Use this as trigger's expression : 

 {Template SNMP Interfaces:ifOperStatus[{#SNMPVALUE}].avg(3600)}<2&{Template SNMP Interfaces:ifOperStatus[{#SNMPVALUE}].last(0)}=2&{Template SNMP Interfaces:ifAlias[{#SNMPVALUE}].str(this_does_not_exist)}=0  

This expression means, raise if interface was up "avg(3600)}<2" AND went down "last(0)}=2". The 3600 value specify how long the trigger will stay up; After 3600s "avg(3600)" will equals 2 and the trigger will disappear.
The .str(this_does_not_exist)}=0 expression is used to show the interface's description and is explained in my previous post [2].

Use this as trigger's description :
 Interface status went up to down !!!  
 Interface : {#SNMPVALUE}, {ITEM.VALUE1} = {ITEM.VALUE3}  

Set the severity to "high" (or whatever is your concern), you can override severity for each of your interface/equipement.

Wait until the discovery rule is refreshed (default is 3600s) or temporarily set it to 60s. We can now try to disable an interface to check the results, let's do this on bccsw02 ge/0/0/3 :

The trigger is raised as expected with the hostname, interface name and description, if you configured Zabbix actions, the alert message will look like
"Production Interface status ev-bccsw02: ge-0/0/3, down (2) : EV-ORADB01 - BACK_PROD"

Let's renable the interface :

Trigger goes green as the interface went up, you should receive a message saying :
"Production Interface status ev-bccsw02: ge-0/0/3, up (1) : EV-ORADB01 - BACK_PROD"
 

Be aware that you can also use SNMP traps for that purpose.

Hope that helps !

[1] : http://sysnet-adventures.blogspot.fr/2014/02/zabbix-display-network-interface.html
[2] : http://sysnet-adventures.blogspot.fr/2014/04/zabbix-display-network-interface.html

Zabbix : Display network interface description in triggers


 In a previous post [1], I explained how to solve a very fustrating thing about Zabbix : "How add network interface's description in your graph names."

In this post, I'll explain how to fix another very fustrating thing about Zabbix : "How to add network interface description in your trigger names"

Zabbix has a default interface trigger which is raised when an interface status changes.
Good thing it would have been if we didn't have the same issue we had with the graphs; you don't have the interface description neither in the trigger's name nor in the comment. This is very annonying, especially if you receive alerts during the night.

Below an example of the default Zabbix trigger alert :



Seems like Ge1 operational status changed, good to know, but again what the hell is "ge1" ???
Message to Zabbix team : Do you really think I learnt all my switches port allocations by heart ???

The good news here is you can solve this stupidity with à "crafty" trick !

Trigger names/descriptions don't interpret items so using the "Zabbix Graph" trick [1] won't work...
To get your interface's description, you'll need to insert a "interface alias" item (ifAlias) in your trigger expression and reference it in the trigger name with the Zabbix standard macro "{ITEM.VALUEX}"

Go to "Configuration -> Templates -> Template SNMP Interfaces -> Discovery -> Trigger prototypes"

You should have a trigger named "Operational status was changed on {HOST.NAME} interface {#SNMPVALUE}" which matches the screenshot above.
To get the interface description, we first add a trigger expression that checks if the interface alias (i.e description) equals (str() function) a string that will NEVER match for example "this_does_not_exist" :

 {Template SNMP Interfaces:ifAlias[{#SNMPVALUE}].str(this_does_not_exist)}=0  

This line means, the network interface description is NOT "this_does_not_exist" which is always true. Finally we add an AND operator (&) between the original expression and the string comparison which gives us the final trigger expression :

 {Template SNMP Interfaces:ifOperStatus[{#SNMPVALUE}].diff(0)}=1&{Template SNMP Interfaces:ifAlias[{#SNMPVALUE}].str(this_does_not_exist)}=0  

EDIT: From an user's comment, it appears that newer versions of Zabbix require to replace the "&" sign by an "AND" string.

This line means there were a interface operational status change AND the interface's alias is NOT "this_does_not_exist".
This alias comparaison is just a trick so we can reference the interface's alias (i.e description) with the "{ITEM.VALUEX}" standard macro.

Now change the trigger name with the following string :

  Operational status was changed on {HOST.NAME} interface {#SNMPVALUE} : {ITEM.VALUE2}  

As you can see, I added the macro {ITEM.VALUE2} that returns the name of the second item in the trigger's expression which is, you guessed it, the interface alias !

Wait until the discovery rule is refreshed (default is 3600s) or temporarily set it to 60s and enjoy the happiness of the result :


You can also use the {ITEM.VALUE2} macro in the trigger's description, very handy if you want to include additional information for the on-call guy.

In the next post [2], I'll show how to create a real interface trigger; from my point of view this default trigger is completely useless :
  • The trigger disappears after the next ifOperStatus check (60 seconds by default)
  • The trigger is raised when an equipment is plugged in. This is a "good to know information" but I can't rise a high severity trigger each time something is plugged !
  • I can't tell if the interface was up and went down OR if the interface was down and went up.
  • If I want to have a "Something was plugged in on GEX/X/X" trigger, I would make a special trigger for that purpose.

[1] http://sysnet-adventures.blogspot.fr/2014/02/zabbix-display-network-interface.html
[2] http://sysnet-adventures.blogspot.fr/2014/04/zabbix-create-production-network.html