A Monit Primer
From its website:
“Monit is a free open source utility for managing and monitoring processes, programs, files, directories and filesystems on a UNIX system. Monit conducts automatic maintenance and repair and can execute meaningful causal actions in error situations.”
I love Monit for the completeness of its feature set. But just as much, I love Monit for the conciseness and the readability of its configuration language. As we will see, with very little effort you can understand other configuration files, and borrow from them to write your own.
Configuration concepts
At the heart of a Monit configuration file are its service entries and their service tests:
- A service entry specifies a resource that Monit should monitor, such as a daemon process, file, directory, remote host, and so on.
- A service test is defined inside a service entry and defines an error condition for the resource. This condition should not be satisfied if the resource is performing normally. Examples include a process not accepting connections on a specified port, a file having an unexpected checksum, CPU usage exceeding a given threshold, and so on.
Monit typically runs continuously as a daemon, during which it periodically wakes up and evaluates each service test defined in its configuration file. If the error condition of a service test is satisfied, then Monit generates an error, also called an alert. Monit sends these alerts via email to one or more specified addresses specified in the configuration file. The time between Monit waking up to evaluate these tests is called a cycle and is also specified in the configuration file.
Additionally, you can toggle monitoring for each service entry. When monitoring is disabled for an entry, Monit will not evaluate its service tests and consequently will not generate alerts for it.
Upon startup, Monit looks for this configuration file at /etc/monitrc
or /etc/monit.conf
. To assert that Monit can find and parse this file, run monit -t
:
$ monit -t
Control file syntax OK
Now let’s look at the contents of such a file.
Preamble
Here’s a basic preamble for a Monit configuration file:
set daemon 120
set logfile /var/log/monit/monit.log
set eventqueue
basedir /var/monit
slots 5000
set httpd port 2812 address localhost
allow localhost
set mailserver smtp.gmail.com port 587
using tlsv1 with timeout 30 seconds
username "username" password "password"
set alert email@example.com with reminder on 15 cycles
The first line configures Monit to run continuously as a daemon with a cycle length of 120 seconds, or two minutes:
set daemon 120
The following starts the embedded web server of Monit:
set httpd port 2812 address localhost
allow localhost
Through its interface you can start, stop, and restart processes, toggle monitoring for service entries, and determine which ones have failing tests that are generating alerting. Moreover, starting the web server enables use of the Monit client, which uses the web server to communicate with the daemon process. The port 2812 address localhost
binds the web server to port 2812 on the loopback device, while allow localhost
restricts the client to localhost. Therefore, to access the web interface from another computer, you must first tunnel in. Consult the manual for more security options, like support for SSL and HTTP Basic Authentication.
If you create a dedicated Gmail account for Monit, the end configures Monit to send alerts via email through that account:
set mailserver smtp.gmail.com port 587
using tlsv1 with timeout 30 seconds
username "username" password "password"
set alert email@address.com with reminder on 15 cycles
Replace "username"
with the Gmail username. If you are using Google Apps for your domain, use the full email address. Likewise, replace "password"
with the password for that account. Then replace email@example.com
with the email address to which Monit should send the alerts. The part with reminder on 15 cycles
means if a service test is generating an alert on every cycle, then Monit will only send an email on the first cycle and every 15 cycles thereafter. Because each cycle is two minutes, Monit will only send an email once every 30 minutes.
These emails have a very simple format. For example, Monit could deliver the following message if it restarts Redis:
Does not exist Service redis
Date: Wed, 21 Aug 2013 14:24:08
Action: restart
Host: frontend08
Your faithful employee,
monit
Service entries and tests
Following the preamble should be one or more service entries, each containing its respective service tests.
Each service entry starts with check [type] [identifier]
, where:
type
specifies the type of resource that Monit should monitor, such asprocess
for a process,host
for the connection to a remote host, and so on.identifier
is a unique identifier for this service entry, used in the web interface and included in any alert messages delivered via email.
An entry restricts the service tests defined inside of it to those that are meaningful for its resource. Each service test has the form if [body] then [action]
, where:
body
specifies an error condition for the resource. Complicated ones can be split across multiple lines.action
specifies what Monit should do if the error condition is satisfied. There are many, but below we will only look atalert
andrestart
.
Now let’s look at some service entry types, and some of the service tests they can define.
Processes
A service entry starting with check process
contains service tests to ensure that a daemon process is running as expected. Monit always tests whether such a process is running by checking for the existence of its PID file, which simply contains the process identifier. (For more details about PID files, consult this answer on Stack Overflow and this answer on the Unix & Linux Stack Exchange.) The expected location of the PID file is specified in the service entry by with pidfile
. If a process is not running, then Monit executes the command specified by start program
.
For example, the following check process
entry monitors uWSGI:
check process uwsgi with pidfile /usr/local/var/run/uwsgi/uwsgi.pid
start program = "/etc/init.d/uwsgi start"
stop program = "/etc/init.d/uwsgi stop"
This service entry is very simple: If Monit cannot find the PID file /usr/local/var/run/uwsgi/uwsgi.pid
, then it assumes that uWSGI is not running and executes /etc/init.d/uwsgi start
.
The following check process
entry monitors Redis and introduces a service test:
check process redis with pidfile /usr/local/var/run/redis/redis.pid
start program = "/etc/init.d/redis start"
stop program = "/etc/init.d/redis stop"
if failed port 6379 with timeout 3 seconds then alert
If Monit cannot find the PID file /usr/local/var/run/redis/redis.pid
, then it assumes that Redis is not running and executes /etc/init.d/redis start
. If Redis is running, then Monit evaluates the service test on the last line, which attempts to connect to Redis on port 6379. If Monit cannot connect after 3 seconds, then it generates an alert.
The following check process
entry monitors CouchDB and contains a more complicated service test:
check process couchdb with pidfile /usr/local/var/run/couchdb/couchdb.pid
start program = "/etc/init.d/couchdb start"
stop program = "/etc/init.d/couchdb stop"
if failed port 5984
with protocol http request "/some_db"
with timeout 5 seconds
then alert
If Monit cannot find the PID file /usr/local/var/run/couchdb/couchdb.pid
, then it assumes that CouchDB is not running and executes /etc/init.d/couchdb start
. If CouchDB is running, then Monit evaluates the service test at the end, which attempts to request from CouchDB the URL /some_db
over HTTP on port 5984. If Monit does not receive a response after 5 seconds, then it generates an alert.
(To quickly switch to the topic of readability, and
, with
, has
, using
, and program
are examples of noise keywords. Monit ignores these keywords in a configuration file, increasing its resemblance to English and improving its readability. For example, protocol http timeout 5 seconds
can be written as protocol http and with timeout 5 seconds
.)
This last check process
entry monitors Nginx and contains two service tests:
check process nginx with pidfile /var/run/nginx.pid
start program = "/etc/init.d/nginx start"
stop program = "/etc/init.d/nginx stop"
if failed port 443 type tcpssl protocol http
request "/some/path" hostheader "domain.com"
with timeout 5 seconds
then alert
if failed port 443 type tcpssl protocol http
request "/some/path" hostheader "domain.com"
with timeout 10 seconds
3 times within 4 cycles
then restart
depends on uwsgi
If Monit cannot find the PID file /var/run/nginx.pid
, then it assumes that Nginx is not running and executes /etc/init.d/nginx start
.
But if Nginx is running, then Monit evaluates the two service tests that follow. In the first test, Monit attempts to request from Nginx the URL /some/path
over HTTPS on port 443. The Host
header of this request is domain.com
, which is required by HTTP/1.1 to work with virtual hosting. If Monit does not receive a response after 5 seconds, then it generates an alert. In the second test, Monit requests the same URL over HTTPS on the same port and using the same Host
header. But instead of Monit generating an alert if it does not receive a response after 10 seconds on any cycle, it waits until it fails to receive such a response for at least three of four consecutive cycles. When this happens, Monit generates an alert and restarts Nginx in an attempt to fix the problem. To restart Nginx, Monit executes stop program
and then start program
.
This Nginx deployment is configured to delegate requests to the uWSGI process monitored by the service entry uwsgi
. Nginx cannot run correctly if uWSGI is not running correctly. The last line, depends on uwsgi
, captures this dependency and affects the entry uwsgi
in the following ways:
- If
uwsgi
is stopped, thennginx
is stopped. - Likewise, if
uwsgi
is unmonitored, thennginx
is unmonitored. - If
uwsgi
is started, it first stopsnginx
if it is running. Then onceuwsgi
is running,nginx
is started again. - Likewise, if
uwsgi
is monitored, it first unmonitorsnginx
if it is monitored. Then onceuwsgi
is monitored,nginx
is monitored again.
The service entry nginx
, however, can be started, stopped, monitored, and unmonitored independently.
Connectivity
The following check host
entry monitors the connectivity of another host:
check host myhost with address myhostname
if failed icmp type echo
count 5 with timeout 5 seconds
2 times within 3 cycles
then alert
Upon evaluating its service test, Monit attempts to reach the address myhostname
via ping, or ICMP echo requests. The count 5 with timeout 5 seconds
specifies that 5 consecutive echo requests will be sent to myhostname
per cycle. If no response for any of these requests is received within 5 seconds of sending them, then Monit assumes that myhostname
is down for the cycle. If myhostname
is assumed down for at least two of three consecutive cycles, then Monit generates an alert.
Space usage
The following check filesystem
entry monitors disk usage:
check filesystem rootfs with path /dev/xvda1
if space usage > 85% for 3 cycles then alert
Upon evaluating its service test, Monit generates an alert if the filesystem at path /dev/xvda1
is more than 85% full for three consecutive cycles, or six minutes.
Memory or CPU usage
The following check system
entry monitors memory and CPU usage:
check system myserver
if memory > 85% 2 times within 3 cycles then alert
if cpu(user) > 75% for 2 cycles then alert
if cpu(system) > 65% for 2 cycles then alert
Upon evaluating its three service tests, Monit generates an alert when any of the following error conditions are met:
- If the memory usage of the system is greater than 85% for at least two of three consecutive cycles.
- If the CPU spends more than 75% of two consecutive cycles in user space.
- If the CPU spends more than 50% of two consecutive cycles in system or kernel space.
Monit can also monitor the memory or CPU usage of a daemon process given similar service tests in its check process
entry. When doing this, memory
monitors the memory usage of the process itself, and totalmemory
monitors the total memory usage of the process and its children. Likewise, cpu
monitors the CPU usage of the process itself, and totalcpu
monitors the total CPU usage of the process and its children. Both totalmemory
and totalcpu
are useful in the case of uWSGI, where each worker process is a fork of a master process:
check process uwsgi with pidfile /usr/local/var/run/uwsgi/uwsgi.pid
...
if totalmemory > 75% for 2 cycles then alert
if totalcpu > 50% for 2 cycles then alert
Knowing more
I hope I’ve shared with you practical excerpts from a Monit configuration file. What I’ve shown here is only a small fraction of what Monit can do; to know its full power, consult its manual. And for more configuration examples, consult this wiki page.
comments powered by Disqus