Roxmon - A host monitoring system running in Roxen and Pike.: Roxmon internals

5. Roxmon internals

Since Roxmon is only a small project all this design information is kept in this document together with all other information. This could be confusing for the reader that doesn't need it but hopefully he can skip it without getting confused.

5.1 What you need to know to write a Macro

This section describes the parts of the design that you need to know in order to write a Macro for the Roxmon.

Alarm syntax

Every alarm has the following fields:

category

A string.

This is the area that the alarm regards or a service. The exact names used in each installation is chosen by the Administrator when Configuring a Macro. A suggestion is chosen by the Macro designer.

It is intended for the Monitor to be able to connect an alarm to one of the services in its diagram and give that service a color depending on the current status. The Administrator drawing the diagram and allocating names for the things on that diagram will have to take these names, located in the Macros, into account to be able to do this. Could be things like authpriv, cron, kern, lpr, mail ...

severity

A string.

One of Emergency, Alert, Critical, Error, Warning, Notice, Info, Debug, and Unknown that are listed here in falling order of importance. Other strings will be treated as Unknown. The Unknown severity is a sign of a Macro with bugs or not fully developed.

The severity is normally chosen by the Administrator when Configuring the Macro. A suggestion is chosen by the Macro designer.

alarm

A string.

A small sentence describing what the alarm is about.

details

A string.

This is any amount of text explaining or describing the condition around the alarm.

time

A timestamp.

Allocated automatically by the Agent.

hostinfo

A mapping.

Allocated automatically by the Agent.

This can be used to identify the host that generated the alarm.

START_TIME

An integer.

Allocated automatically by the Agent. Used to uniquely identify an alarm.

SEQUENCE_NUM

An integer.

Allocated automatically by the Agent. Used to uniquely identify an alarm.

The size of an alarm is best to keep less than 16k in order not to deteriorate communication performance.

5.2 Other design aspects of Roxmon

This is where all the rest of the design information of Roxmon is kept.

The ideal approach would be to sort the design details in a systematic order depending on how it is built, however I have chosen to first describe everything that is of any use to the Macro writer and then, in this chapter, the rest of the stuff and that will make this section a little less systematic. You will probably have to go back and forth between this chapter and the previous one ( What you need to know to write a Macro).

Agent installation

The agent is installed in the directory /var/opt/Roxmon.

The pike binary and all its auxiliary files are installed in /var/opt/Roxmon/pike with subdirectories bin, include, modules...

Configurations are in directory /var/opt/Roxmon/config and a list of yet unhandled "alarms" are in /var/opt/Roxmon/alarms. The configuration is actually a collection of pike-scripts, macros and their configuration files that "run" within the agent.

/var/opt/Roxmon/
                pike/
                     bin
                     include
                     modules
                     ...
                config/
                alarms/
                run/

Monitor installation

The monitor is a Roxen module installed in the Roxen server.

Server connection

This chapter describes the communication between the Agent and the Monitor.

Functions initiated from the server

PING

Immediatly responded to by the Agent.

DOWNLOAD

Download request.

FETCH_ALARM_LIST, FETCH_ALARM

A query for alarms.

DELETE_ALARM

Reset an alarm.

UPDATE_ALARM

Update an alarm.

These functions are always answered immediatly (or as fast as possible).

Functions initiated from the client

ALARM

Deliver an alarm

ALARM_DELETED

Somebody else has deleted an alarm.

ALARM_MODIFIED

Somebody else has modified the alarm.

The functions from the client are all asynchronous, they are never acknowledged. The last two are only sent to the server connections that did not initiate the operation.

Protocol syntax

All operations and responses delivered are mappings that we have made encode_value on. They are coded using a simple HOLERITH-code: <number of chars>H<data>. The data is an encoded mapping. The function name is the value of the field with the name "op". Arguments are field with other names. For responses the op field has the value "RESPONSE" and the original op is in the field with the name "response".

Details on all operations

Here is the complete list of operations with arguments:

Function: "op":"PING" Response: "op":"RESPONSE" "response":"PING" other arguments are returned.

Function: "op":"DOWNLOAD" "name":<name of the function downloaded> (a string used as the filename). "contents":<the complete pike-script> (a string stored in the file). If the "contents" is the empty string, this means that we remove the file and disactivate the function. Response: "op":"RESPONSE" "response":"DOWNLOAD"

Function: "op":"FETCH_ALARM_LIST" Response: "op":"RESPONSE" "response":"FETCH_ALARM_LIST" "alarms":<an array(mapping) with all the alarms names>

Function: "op":"FETCH_ALARM" "alarm":<mapping with the alarm name> Response: "op":"RESPONSE" "response":"FETCH_ALARM" "alarm":<mapping with the alarm information> If there is no such alarm this has the value 0.

Function: "op":"DELETE_ALARM" "alarm":<mapping with the alarm name> Response: "op":"RESPONSE" "response":"DELETE_ALARM" "alarm":<mapping with the alarm name> (same)

Function: "op":"UPDATE_ALARM" "alarm":<mapping with the alarm information> Response: "op":"RESPONSE" "response":"UPDATE_ALARM" "alarm":<mapping with the alarm information> (same)

Async: "op":"ALARM" "alarm":<mapping with the alarm information>

Async: "op":"ALARM_DELETED" "alarm":<mapping with the old alarm information>

Async: "op":"ALARM_MODIFIED" "alarm":<mapping with the new alarm information>