#This file was created by LinuxDoc-SGML #(conversion : Frank Pavageau and Jose' Matos) \lyxformat 2.15 \textclass linuxdoc \language default \inputencoding default \fontscheme default \papersize Default \paperfontsize default \spacing single \secnumdepth 3 \tocdepth 3 \paragraph_separation indent \defskip medskip \quotes_language default \quotes_times 2 \paperorientation portrait \papercolumns 1 \papersides 1 \paperpagestyle default \layout Title \added_space_top vfill \added_space_bottom vfill Roxmon - A host monitoring system running in Roxen and Pike. \layout Author Linus Tolke \family typewriter linus@lysator.liu.se \family default \layout Date $Rev$ $Date: 2000/02/08 22:10:36 $ \layout Abstract Roxmon is an aid to monitor hosts and receive some notice when some condition is seen. It is intended to monitor and report error conditions and the provided macros are designed to help you in setting up your own monitoring. \layout Standard \begin_inset LatexCommand \tableofcontents \end_inset \layout Section Introduction \layout Standard This document comprises all parts of the documentation of Roxmon. It starts off by describing what problem the project aims to solve and thus what void the Roxmon "product" intends to fill. Then it includes the information needed for the operator installing and configuring Roxmon and for the solution provider who wants to use Roxmon as a platform for his applications. Finally the design of Roxmon is described for the people taking part in the development of Roxmon itself. \layout Subsection Glossary \layout Standard There are a lot of possibly confusing words used here. I order to establish some kind of order in the chaos this Glossary will be the terminology used in this document and in the Roxmon code. As with all other areas regarding the Roxmon project, please don't hesitate to suggest improvements or corrections using the bug report mechanism. \layout Description Administrator \protected_separator Person or role responsible for the Hosts in the network. This is also the person that does the Installation and Configuration of Roxmon. \layout Description Agent \protected_separator The part of Roxmon running on every host. This is the part that does all the digging in logfiles and poking around different processes in order to determine if there is anything out of the ordinary to report. Each Host has an Agent possibly Configured differently to reflect the services allocated on that Host. \layout Description Alarm \protected_separator A condition reported by an Agent. It could be that some problem has occurred that needs to be taken care of or it could be that the problem has been taken care of by some automatic feature making the manual intervention needless. This is called a trap in the SNMP-world. \layout Description Alarm \protected_separator levels \protected_separator Some of the Macros are normally Configured with level when they are supposed to report a problem. It could be when the disk is 90% full or then the process syslog is bigger than 1Meg. These configuration parameters are called Alarm levels. \layout Description Configuring, \protected_separator Configuration \protected_separator The process of allocating Macros to different Agent instances depending on what services are run on that Host and to decide the Alarm levels and testing interval on each host. \layout Description Host \protected_separator A computer. Every host can only run one instance of the Agent. \layout Description Installing, \protected_separator Installation \protected_separator The process of adding a Host to the monitored network by installing the Agent and adding the Host to the Monitor. \layout Description Macro \protected_separator I believe this is a controversial choice. \layout Standard This is the Pike "program" that implements the checking of a condition. It is run within the Agent. Normally this just tests a condition, if it is met, it sends an alarm and goes to sleep, if not met, it goes to sleep immediatly. Then after a configurable amount of time it wakes up again and starts over. This could however be initiated by a program reporting things to the Macro or some more elaborate scheme. \layout Description Monitor \protected_separator This is a part of Roxmon. It is the part that collects the alarms from all Agents in a big list, shows that list to the Administrator, and allows the Administrator to work with the list, acknowledging alarms and such. It is implemented in Roxen. \layout Description Pike \protected_separator Pike is the name of the programming language that Roxmon (Agent, Macros and Monitor) is written in. It is also the name of the interpreter running the Agent and Roxen. \layout Description Roxen \protected_separator \begin_inset LatexDel \htmlurl{ \end_inset http://www.roxen.com/ \begin_inset LatexDel }{ \end_inset Roxen \begin_inset LatexDel } \end_inset is a Web Server product from \begin_inset LatexDel \htmlurl{ \end_inset http://www.idonex.se \begin_inset LatexDel }{ \end_inset Idonex AB \begin_inset LatexDel } \end_inset . The Roxen Challanger version of Roxen that is used by Roxmon is available under the terms of GPL. \layout Description Site \protected_separator A collection of Hosts spread over a geographical area so small that the Administrator normally walks or takes the elevator to reach all Hosts. \layout Description User \protected_separator A person normally working on the Host. Ideally he will not notice Roxmon but only the improved availability of the network and services gained from introducing Roxmon. \layout Subsection The project and license \layout Standard Roxmon is an open-source project run by Linus Tolke on his spare time. Everybody is allowed to copy and use the result of the project without any warranty or support under the terms of GPL. \layout Subsection Contributions and configuration \layout Standard By design, Roxmon has a very thin line between what is Roxmon delivered items and what you add when you do the Configuration. This means that if you are Configuring Roxmon to handle some product I hope you will find time to contribute your work to the Roxmon project so that others can gain from it. The Roxmon project is open for additions for monitoring all kinds of services and products. The license agreement of the product you are monitoring might pose limitations that you will have to follow but in most cases that probably won't be any problem. \layout Subsection Big changes from the previous version \layout Standard For small changes, see the ChangeLog file. \layout Subsection Where to obtain new versions of Roxmon \layout Standard The \begin_inset LatexDel \url{ \end_inset http://roxmon.sourceforge.net/ \begin_inset LatexDel }{ \end_inset Roxmon project home page \begin_inset LatexDel } \end_inset is located at the \begin_inset LatexDel \url{ \end_inset http://sourceforge.net/ \begin_inset LatexDel }{ \end_inset SourceForge \begin_inset LatexDel } \end_inset . \layout Subsection Roxmon plan \layout Standard This is how I currently the development of Roxmon will work in versions of the product: \layout Itemize 0.0.x Initial versions. \layout Itemize 0.1.0 First publicly available version. \layout Itemize 0.1.x Buggfixes, more prepared macros available. \layout Itemize 1.0.x Complete with simple development tools and help. \layout Itemize 2.x More exotic features like support for several masters, handling SNMP queries and SNMP traps... \layout Standard As partners/testers for the 0.0.x versions and alpha/betatesters for the 0.1.0-version I would like to see administrators for sites/organisations with: \layout Itemize 3 - 20 Unix machines, all with the same Unix version (a version supported by Pike) \layout Itemize All machines located geographically at one site \layout Itemize Roxen Challanger already running internally \layout Itemize Administrator with knowledge of Pike \layout Standard As alpha/beta-testers for the 1.0-versions, I would like to see administrators for sites/organisations with: \layout Itemize 5 - 100 Unix machines, with possibly different Unix versions \layout Itemize All machines located geographically at one site \layout Standard The 2.0 versions are targetted for organisations with: \layout Itemize 5 - 1000 Unix machines \layout Itemize Machines spread geographically \layout Itemize Requirements for several monitoring stations for redundance reasons \layout Section Problem description \layout Standard Roxmon is intended to be a tool for the administrator to help him quickly locate problems on any of the hosts in the network. The kind of problems detected can be anything from a user turning the computer off to disk full or network down. \layout Standard The important thing is to provide the information on how each machine is doing to the administrator and give him a quick overview of the status of his network. \layout Standard There are other, more mature, products on the market that solve the same problem but Roxmon has the following advantages: \layout Itemize It is available under the terms of GPL. \layout Itemize It is written entirely in Pike making the newly developed macros immediatly available on all hardware/OS-combinations where Pike is available. \layout Section Vision \layout Description Commercial \protected_separator vision \protected_separator Hopefully this product will be useful for a wide range of system administrators working with OS and applications from the free world promoting the spirit of open source and free software. They will in turn contribute to make this the number one tool for this. \layout Standard This will also promote the language Pike and the web server Roxen as application platform for a wider range of applications. \layout Standard The language chosen (Pike) that is an intepreted language running on many platforms, will make it possible to seamlessly use the add-on applications on different platforms more or less without having to keep track of platform differences. This will simplify the exchange of applications and stimulate the project. \layout Description Technical \protected_separator vision \protected_separator The tool building the infrastructure of the system is lightweight. This means that most of the intelligens will reside in the "configured" parts of the system. This infrastructure will be augmented with security solutions for authorisation and good handling of firewall requirements. This without having to modify the applications. \layout Standard The applications are developed and structured in a way that makes them easy to find and configure the monitoring as you want it. \layout Section Installation, configuration and running \layout Section Roxmon internals \layout Standard Since Roxmon is only a small project all this design information is kept in this document together with all other information. This could be confusing for the reader that doesn't need it but hopefully he can skip it without getting confused. \layout Subsection What you need to know to write a Macro \begin_inset LatexCommand \label{to-write-a-macro} \end_inset \layout Standard This section describes the parts of the design that you need to know in order to write a Macro for the Roxmon. \layout Subsubsection Alarm syntax \layout Standard Every alarm has the following fields: \layout Description category \protected_separator A string. \layout Standard This is the area that the alarm regards or a service. The exact names used in each installation is chosen by the Administrator when Configuring a Macro. A suggestion is chosen by the Macro designer. \layout Standard It is intended for the Monitor to be able to connect an alarm to one of the services in its diagram and give that service a color depending on the current status. The Administrator drawing the diagram and allocating names for the things on that diagram will have to take these names, located in the Macros, into account to be able to do this. Could be things like \family typewriter authpriv \family default , \family typewriter cron \family default , \family typewriter kern \family default , \family typewriter lpr \family default , \family typewriter mail \family default ... \layout Description severity \protected_separator A string. \layout Standard One of \family typewriter Emergency \family default , \family typewriter Alert \family default , \family typewriter Critical \family default , \family typewriter Error \family default , \family typewriter Warning \family default , \family typewriter Notice \family default , \family typewriter Info \family default , \family typewriter Debug \family default , and \family typewriter Unknown \family default that are listed here in falling order of importance. Other strings will be treated as \family typewriter Unknown \family default . The \family typewriter Unknown \family default severity is a sign of a Macro with bugs or not fully developed. \layout Standard The severity is normally chosen by the Administrator when Configuring the Macro. A suggestion is chosen by the Macro designer. \layout Description alarm \protected_separator A string. \layout Standard A small sentence describing what the alarm is about. \layout Description details \protected_separator A string. \layout Standard This is any amount of text explaining or describing the condition around the alarm. \layout Description time \protected_separator A timestamp. \layout Standard Allocated automatically by the Agent. \layout Description hostinfo \protected_separator A mapping. \layout Standard Allocated automatically by the Agent. \layout Standard This can be used to identify the host that generated the alarm. \layout Description START_TIME \protected_separator An integer. \layout Standard Allocated automatically by the Agent. Used to uniquely identify an alarm. \layout Description SEQUENCE_NUM \protected_separator An integer. \layout Standard Allocated automatically by the Agent. Used to uniquely identify an alarm. \layout Standard The size of an alarm is best to keep less than 16k in order not to deteriorate communication performance. \layout Subsection Other design aspects of Roxmon \layout Standard This is where all the rest of the design information of Roxmon is kept. \layout Standard The ideal approach would be to sort the design details in a systematic order depending on how it is built, however I have chosen to first describe everything that is of any use to the Macro writer and then, in this chapter, the rest of the stuff and that will make this section a little less systematic. You will probably have to go back and forth between this chapter and the previous one ( \begin_inset LatexCommand \ref{ \end_inset to-write-a-macro \begin_inset LatexDel }{ \end_inset What you need to know to write a Macro \begin_inset LatexDel } \end_inset ). \layout Subsubsection Agent installation \layout Standard The agent is installed in the directory /var/opt/Roxmon. \layout Standard The pike binary and all its auxiliary files are installed in /var/opt/Roxmon/pike with subdirectories bin, include, modules... \layout Standard Configurations are in directory /var/opt/Roxmon/config and a list of yet unhandled "alarms" are in /var/opt/Roxmon/alarms. The configuration is actually a collection of pike-scripts, macros and their configuration files that "run" within the agent. \layout Code /var/opt/Roxmon/ pike/ bin include modules ... config/ alarms/ run/ \layout Subsubsection Monitor installation \layout Standard The monitor is a Roxen module installed in the Roxen server. \layout Subsubsection Server connection \layout Standard This chapter describes the communication between the Agent and the Monitor. \layout Paragraph Functions initiated from the server \layout Description PING \protected_separator Immediatly responded to by the Agent. \layout Description DOWNLOAD \protected_separator Download request. \layout Description FETCH_ALARM_LIST, \protected_separator FETCH_ALARM \protected_separator A query for alarms. \layout Description DELETE_ALARM \protected_separator Reset an alarm. \layout Description UPDATE_ALARM \protected_separator Update an alarm. \layout Standard These functions are always answered immediatly (or as fast as possible). \layout Paragraph Functions initiated from the client \layout Description ALARM \protected_separator Deliver an alarm \layout Description ALARM_DELETED \protected_separator Somebody else has deleted an alarm. \layout Description ALARM_MODIFIED \protected_separator Somebody else has modified the alarm. \layout Standard The functions from the client are all asynchronous, they are never acknowledged. The last two are only sent to the server connections that did not initiate the operation. \layout Paragraph Protocol syntax \layout Standard All operations and responses delivered are mappings that we have made encode_value on. They are coded using a simple HOLERITH-code: H. The data is an encoded mapping. The function name is the value of the field with the name "op". Arguments are field with other names. For responses the op field has the value "RESPONSE" and the original op is in the field with the name "response". \layout Paragraph Details on all operations \layout Standard Here is the complete list of operations with arguments: \layout Quote Function: "op":"PING" Response: "op":"RESPONSE" "response":"PING" other arguments are returned. \layout Quote Function: "op":"DOWNLOAD" "name": (a string used as the filename). "contents": (a string stored in the file). If the "contents" is the empty string, this means that we remove the file and disactivate the function. Response: "op":"RESPONSE" "response":"DOWNLOAD" \layout Quote Function: "op":"FETCH_ALARM_LIST" Response: "op":"RESPONSE" "response":"FETCH_ALARM_LIST" "alarms": \layout Quote Function: "op":"FETCH_ALARM" "alarm": Response: "op":"RESPONSE" "response":"FETCH_ALARM" "alarm": If there is no such alarm this has the value 0. \layout Quote Function: "op":"DELETE_ALARM" "alarm": Response: "op":"RESPONSE" "response":"DELETE_ALARM" "alarm": (same) \layout Quote Function: "op":"UPDATE_ALARM" "alarm": Response: "op":"RESPONSE" "response":"UPDATE_ALARM" "alarm": (same) \layout Quote Async: "op":"ALARM" "alarm": \layout Quote Async: "op":"ALARM_DELETED" "alarm": \layout Quote Async: "op":"ALARM_MODIFIED" "alarm": \layout Standard \the_end