watchdog.8 - manifest_repos/watchdog - Git at Google

 .TH WATCHDOG 8 "January 2005"
 .UC 4
 .SH NAME
 watchdog \- a software watchdog daemon
 .SH SYNOPSIS
 .B watchdog
 .RB [ \-F | \-\-foreground ]
 .RB [ \-f | \-\-force ]
 .RB [ \-c " \fIfilename\fR|" \-\-config\-file " \fIfilename\fR]"
 .RB [ \-v | \-\-verbose ]
 .RB [ \-s | \-\-sync ]
 .RB [ \-b | \-\-softboot ]
 .RB [ \-q | \-\-no\-action ]
 .SH DESCRIPTION
 The Linux kernel can reset the system if serious problems are detected.
 This can be implemented via special watchdog hardware, or via a slightly
 less reliable software-only watchdog inside the kernel. Either way, there
 needs to be a daemon that tells the kernel the system is working fine. If the
 daemon stops doing that, the system is reset.
 .PP
 .B watchdog
 is such a daemon. It opens
 .IR /dev/watchdog ,
 and keeps writing to it often enough to keep the kernel from resetting,
 at least once per minute. Each write delays the reboot
 time another minute. After a minute of inactivity the watchdog hardware will
 cause the reset. In the case of the software watchdog the ability to
 reboot will depend on the state of the machines and interrupts.
 .PP
 The watchdog daemon can be stopped without causing a reboot if the device
 .I /dev/watchdog
 is closed correctly, unless your kernel is compiled with the
 .I CONFIG_WATCHDOG_NOWAYOUT
 option enabled.
 .SH TESTS
 The watchdog daemon does several tests to check the system status:
 .IP \(bu 3
 Is the process table full?
 .IP \(bu 3
 Is there enough free memory?
 .IP \(bu 3
 Are some files accessible?
 .IP \(bu 3
 Have some files changed within a given interval?
 .IP \(bu 3
 Is the average work load too high?
 .IP \(bu 3
 Has a file table overflow occurred?
 .IP \(bu 3
 Is a process still running? The process is specified by a pid file.
 .IP \(bu 3
 Do some IP addresses answer to ping?
 .IP \(bu 3
 Do network interfaces receive traffic?
 .IP \(bu 3
 Is the temperature too high? (Temperature data not always available.)
 .IP \(bu 3
 Execute a user defined command to do arbitrary tests.
 .IP \(bu 3
 Execute one or more test/repair commands found in /etc/watchdog.d.  These commands are called with the argument \fBtest\fP or \fBrepair\fP.
 .PP
 If any of these checks fail watchdog will cause a shutdown. Should any of
 these tests except the user defined binary last longer than one minute the
 machine will be rebooted, too.
 .PP
 .SH OPTIONS
 Available command line options are the following:
 .TP
 .BR \-v ", " \-\-verbose
 Set verbose mode. Only implemented if compiled with
 .I SYSLOG
 feature. This
 mode will log each several infos in
 .I LOG_DAEMON
 with priority
 .IR LOG_INFO.
 This is useful if you want to see exactly what happened until the watchdog rebooted
 the system. Currently it logs the temperature (if available), the load
 average, the change date of the files it checks and how often it went to sleep.
 .TP
 .BR \-s ", " \-\-sync
 Try to synchronize the filesystem every time the process is awake. Note that
 the system is rebooted if for any reason the synchronizing lasts longer
 than a minute.
 .TP
 .BR \-b ", " \-\-softboot
 Soft-boot the system if an error occurs during the main loop, e.g. if a
 given file is not accessible via the
 .BR stat (2)
 call. Note that
 this does not apply to the opening of
 .I /dev/watchdog
 and
 .IR /proc/loadavg ,
 which are opened before the main loop starts.
 .TP
 .BR \-F ", " \-\-foreground
 Run in foreground mode, useful for running under systemd (for example).
 .TP
 .BR \-f ", " \-\-force
 Force the usage of the interval given or the maximal load average given
 in the config file.
 .TP
 .BR \-c " \fIconfig-file\fR, " \-\-config\-file " \fIconfig-file"
 Use
 .I config-file
 as the configuration file instead of the default
 .IR /etc/watchdog.conf .
 .TP
 .BR \-q ", " \-\-no\-action
 Do not reboot or halt the machine. This is for testing purposes. All checks
 are executed and the results are logged as usual, but no action is taken.
 Also your hardware card or the kernel software watchdog driver is not
 enabled. Temperature checking is also disabled since this triggers
 the hardware watchdog on some cards.
 .SH FUNCTION
 After
 .B watchdog
 starts, it puts itself into the background and then tries all checks
 specified in its configuration file in turn. Between each two tests it will write to
 the kernel device to prevent a reset.
 After finishing all tests watchdog goes to sleep for some
 time. The kernel drivers expects a write to the watchdog device every minute.
 Otherwise the system will be reset. As a default
 .B watchdog
 will sleep for
 only 1 second so it triggers the device early enough.
 .PP
 Under high system load
 .B watchdog
 might be swapped out of memory and may fail
 to make it back in in time. Under these circumstances the Linux kernel will
 reset the machine. To make sure you won't get unnecessary reboots make
 sure you have the variable
 .I realtime
 set to
 .I yes
 in the configuration file
 .IR watchdog.conf .
 This adds real time support to
 .BR watchdog :
 it will lock itself into memory and there should  be no problem even under the
 highest of loads.
 .PP
 On system running out of memory the kernel will try to free enough memory by killing process. The
 .B watchdog
 daemon itself is exempted from this so-called out-of-memory killer.
 .PP
 Also you can specify a maximal allowed load average. Once this load average
 is reached the system is rebooted. You may specify maximal load averages for
 1 minute, 5 minutes or 15 minutes. The default values is to disable this
 test. Be careful not to set this parameter too low. To set a value less then
 the predefined minimal value of 2, you have to use the
 .B -f
 option.
 .PP
 You can also specify a minimal amount of virtual memory you want to have
 available as free. As soon as more virtual memory is used action is taken by
 .BR watchdog .
 Note, however, that watchdog does not distinguish between
 different types of memory usage. It just checks for free virtual memory.
 .PP
 If you have a watchdog card with temperature sensor you can specify
 the maximal allowed temperature. Once this temperature is reached the
 system is halted. The default value is 120. There is no unit conversion so make
 sure you use the same unit as your hardware.
 .B watchdog
 will issue warnings
 once the temperature increases 90%, 95% and 98% of this temperature.
 .PP
 When using file mode
 .B watchdog
 will try to
 .BR stat (2)
 the given files. Errors returned
 by stat will
 .B not
 cause a reboot. For a reboot the stat call has to last at least one minute.
 This may happen if the file is located on an NFS mounted filesystem. If your
 system relies on an NFS mounted filesystem you might try this option.
 However, in such a case the
 .I sync
 option may not work if the NFS server is
 not answering.
 .PP
 .B watchdog
 can read the pid from a pid file and
 see whether the process still exists. If not, action is taken
 by
 .BR watchdog .
 So you can for instance restart the server from your
 .IR repair-binary .
 .PP
 .B watchdog
 will try periodically to fork itself to see whether the process
 table is full. This process will leave a zombie process until watchdog wakes
 up again and catches it; this is harmless, don't worry about it.
 .PP
 In ping mode
 .B watchdog
 tries to ping the given IP addresses. These addresses do
 not have to be a single machine. It is possible to ping to a broadcast
 address instead to see if at least one machine in a subnet is still living.
 .PP
 .B Do not use this broadcast ping unless your MIS person a) knows about it and
 .B b) has given you explicit permission to use it!
 .PP
 .B watchdog
 will send out three ping packages and wait up to <interval> seconds
 for the reply with <interval> being the time it goes to sleep between two
 times triggering the watchdog device. Thus a unreachable network will not
 cause a hard reset but a soft reboot.
 .PP
 You can also test passively for an unreachable network by just monitoring
 a given interface for traffic. If no traffic arrives the network is
 considered unreachable causing a soft reboot or action from the
 repair binary.
 .PP
 .B watchdog can run an external command for user-defined tests. A return code
 not equal 0 means an error occured and watchdog should react. If the external
 command is killed by an uncaught signal this is considered an error by watchdog
 too.
 The command may take longer than the time slice defined for the kernel device
 without a problem. However, error messages are
 generated into the syslog facility. If you have enabled softboot on error
 the machine will be rebooted if the binary doesn't exit in half the time
 .B watchdog
 sleeps between two tries triggering the kernel device.
 .PP
 If you specify a repair binary it will be started instead of shutting down
 the system. If this binary is not able to fix the problem
 .B watchdog
 will still cause a reboot afterwards.
 .PP
 If the machine is halted an email is sent to notify a human that
 the machine is going down. Starting with version 4.4
 .B watchdog
 will also notify the human in charge if the machine is rebooted.
 .SH "SOFT REBOOT"
 A soft reboot (i.e. controlled shutdown and reboot) is initiated for every
 error that is found. Since there might be no more processes available,
 watchdog does it all by himself. That means:
 .IP 1. 4
 Kill all processes with SIGTERM.
 .IP 2. 4
 After a short pause kill all remaining processes with SIGKILL.
 .IP 3. 4
 Record a shutdown entry in wtmp.
 .IP 4. 4
 Save the random seed from
 .IR /dev/urandom .
 If the device is non-existant or
 there is no filename for saving this step is skipped.
 .IP 5. 4
 Turn off accounting.
 .IP 6. 4
 Turn off quota and swap.
 .IP 7. 4
 Unmount all partitions except the root partition.
 .IP 8. 4
 Remount the root partition read-only.
 .IP 9. 4
 Shut down all network interfaces.
 .IP 10. 4
 Finally reboot.
 .SH "CHECK BINARY"
 If the return code of the check binary is not zero
 .B watchdog
 will assume an
 error and reboot the system. Be careful with this if you are using the
 real-time properties of watchdog since
 .B watchdog
 will wait for the return of
 this binary before proceeding. An positive exit code is interpreted as an
 system error code (see
 .I errno.h
 for details). Negative values are special to
 .BR watchdog :
 .TP
 \-1
 Reboot the system. This is not exactly an error message but a command to
 .BR watchdog .
 If the return code is \-1
 .B watchdog
 will not try to run a shutdown
 script instead.
 .TP
 \-2
 Reset the system. This is not exactly an error message but a command to
 .BR watchdog .
 If the return code is \-2
 .B watchdog will simply refuse to write the
 kernel device again.
 .TP
 \-3
 Maximum load average exceeded.
 .TP
 \-4
 The temperature inside is too high.
 .TP
 \-5
 .I /proc/loadavg
 contains no (or not enough) data.
 .TP
 \-6
 The given file was not changed in the given interval.
 .TP
 \-7
 .I /proc/meminfo
 contains invalid data.
 .TP
 \-8
 Child process was killed by a signal.
 .TP
 \-9
 Child process did not return in time.
 .TP
 \-10
 Free for personal use.
 .SH "REPAIR BINARY"
 The repair binary is started with one parameter: the error number that
 caused
 .B watchdog
 to initiate the boot process. After trying to repair the
 system the binary should exit with 0 if the system was successfully repaired
 and thus there is no need to boot anymore. A return value not equal 0 tells
 .B watchdog
 to reboot. The return code of the repair binary should be the error
 number of the error causing
 .B watchdog
 to reboot. Be careful with this if you
 are using the real-time properties since
 .B watchdog
 will wait for
 the return of this binary before proceeding.
 .SH "TEST DIRECTORY"
 Executables placed in the test directory are discovered by watchdog on
 startup and are automatically executed.  They are bounded time-wise by
 the test-timeout directive in watchdog.conf.

 These executables are called with either "test" as the first argument
 (if a test is being performed) or "repair" as the first argument (if a
 repair for a previously-failed "test" operation on is being performed).

 The as with test binaries and repair binaries, expected exit codes for
 a successful test or repair operation is always zero.

 If an executable's test operation fails, the same executable is automatically
 called with the "repair" argument as well as the return code of the
 previously-failed test operation.

 For example, if the following execution returns 42:

     /etc/watchdog.d/my-test test

 The watchdog daemon will attempt to repair the problem by calling:

     /etc/watchdog.d/my-test repair 42

 This enables administrators and application developers to make intelligent
 test/repair commands.  If the "repair" operation is not required (or is
 not likely to succeed), it is important that the author of the command
 return a non-zero value so the machine will still reboot as expected.

 Note that the watchdog daemon may interpret and act upon any of the reserved
 return codes noted in the Check Binary section prior to calling a given
 command in "repair" mode.
 .SH BUGS
 None known so far.
 .SH AUTHORS
 The original code is an example written by Alan Cox
 <alan@lxorguk.ukuu.org.uk>, the author of the kernel driver. All
 additions were written by Michael Meskes <meskes@debian.org>. Johnie Ingram
 <johnie@netgod.net> had the idea of testing the load average. He also took
 over the Debian specific work. Dave Cinege <dcinege@psychosis.com> brought
 up some hardware watchdog issues and helped testing this stuff.
 .SH FILES
 .TP
 .I /dev/watchdog
 The watchdog device.
 .TP
 .I /var/run/watchdog.pid
 The pid file of the running
 .BR watchdog .
 .SH "SEE ALSO"
 .BR watchdog.conf (5)
	.TH WATCHDOG 8 "January 2005"
	.UC 4
	.SH NAME
	watchdog \- a software watchdog daemon
	.SH SYNOPSIS
	.B watchdog
	.RB [ \-F \| \-\-foreground ]
	.RB [ \-f \| \-\-force ]
	.RB [ \-c " \fIfilename\fR\|" \-\-config\-file " \fIfilename\fR]"
	.RB [ \-v \| \-\-verbose ]
	.RB [ \-s \| \-\-sync ]
	.RB [ \-b \| \-\-softboot ]
	.RB [ \-q \| \-\-no\-action ]
	.SH DESCRIPTION
	The Linux kernel can reset the system if serious problems are detected.
	This can be implemented via special watchdog hardware, or via a slightly
	less reliable software-only watchdog inside the kernel. Either way, there
	needs to be a daemon that tells the kernel the system is working fine. If the
	daemon stops doing that, the system is reset.
	.PP
	.B watchdog
	is such a daemon. It opens
	.IR /dev/watchdog ,
	and keeps writing to it often enough to keep the kernel from resetting,
	at least once per minute. Each write delays the reboot
	time another minute. After a minute of inactivity the watchdog hardware will
	cause the reset. In the case of the software watchdog the ability to
	reboot will depend on the state of the machines and interrupts.
	.PP
	The watchdog daemon can be stopped without causing a reboot if the device
	.I /dev/watchdog
	is closed correctly, unless your kernel is compiled with the
	.I CONFIG_WATCHDOG_NOWAYOUT
	option enabled.
	.SH TESTS
	The watchdog daemon does several tests to check the system status:
	.IP \(bu 3
	Is the process table full?
	.IP \(bu 3
	Is there enough free memory?
	.IP \(bu 3
	Are some files accessible?
	.IP \(bu 3
	Have some files changed within a given interval?
	.IP \(bu 3
	Is the average work load too high?
	.IP \(bu 3
	Has a file table overflow occurred?
	.IP \(bu 3
	Is a process still running? The process is specified by a pid file.
	.IP \(bu 3
	Do some IP addresses answer to ping?
	.IP \(bu 3
	Do network interfaces receive traffic?
	.IP \(bu 3
	Is the temperature too high? (Temperature data not always available.)
	.IP \(bu 3
	Execute a user defined command to do arbitrary tests.
	.IP \(bu 3
	Execute one or more test/repair commands found in /etc/watchdog.d. These commands are called with the argument \fBtest\fP or \fBrepair\fP.
	.PP
	If any of these checks fail watchdog will cause a shutdown. Should any of
	these tests except the user defined binary last longer than one minute the
	machine will be rebooted, too.
	.PP
	.SH OPTIONS
	Available command line options are the following:
	.TP
	.BR \-v ", " \-\-verbose
	Set verbose mode. Only implemented if compiled with
	.I SYSLOG
	feature. This
	mode will log each several infos in
	.I LOG_DAEMON
	with priority
	.IR LOG_INFO.
	This is useful if you want to see exactly what happened until the watchdog rebooted
	the system. Currently it logs the temperature (if available), the load
	average, the change date of the files it checks and how often it went to sleep.
	.TP
	.BR \-s ", " \-\-sync
	Try to synchronize the filesystem every time the process is awake. Note that
	the system is rebooted if for any reason the synchronizing lasts longer
	than a minute.
	.TP
	.BR \-b ", " \-\-softboot
	Soft-boot the system if an error occurs during the main loop, e.g. if a
	given file is not accessible via the
	.BR stat (2)
	call. Note that
	this does not apply to the opening of
	.I /dev/watchdog
	and
	.IR /proc/loadavg ,
	which are opened before the main loop starts.
	.TP
	.BR \-F ", " \-\-foreground
	Run in foreground mode, useful for running under systemd (for example).
	.TP
	.BR \-f ", " \-\-force
	Force the usage of the interval given or the maximal load average given
	in the config file.
	.TP
	.BR \-c " \fIconfig-file\fR, " \-\-config\-file " \fIconfig-file"
	Use
	.I config-file
	as the configuration file instead of the default
	.IR /etc/watchdog.conf .
	.TP
	.BR \-q ", " \-\-no\-action
	Do not reboot or halt the machine. This is for testing purposes. All checks
	are executed and the results are logged as usual, but no action is taken.
	Also your hardware card or the kernel software watchdog driver is not
	enabled. Temperature checking is also disabled since this triggers
	the hardware watchdog on some cards.
	.SH FUNCTION
	After
	.B watchdog
	starts, it puts itself into the background and then tries all checks
	specified in its configuration file in turn. Between each two tests it will write to
	the kernel device to prevent a reset.
	After finishing all tests watchdog goes to sleep for some
	time. The kernel drivers expects a write to the watchdog device every minute.
	Otherwise the system will be reset. As a default
	.B watchdog
	will sleep for
	only 1 second so it triggers the device early enough.
	.PP
	Under high system load
	.B watchdog
	might be swapped out of memory and may fail
	to make it back in in time. Under these circumstances the Linux kernel will
	reset the machine. To make sure you won't get unnecessary reboots make
	sure you have the variable
	.I realtime
	set to
	.I yes
	in the configuration file
	.IR watchdog.conf .
	This adds real time support to
	.BR watchdog :
	it will lock itself into memory and there should be no problem even under the
	highest of loads.
	.PP
	On system running out of memory the kernel will try to free enough memory by killing process. The
	.B watchdog
	daemon itself is exempted from this so-called out-of-memory killer.
	.PP
	Also you can specify a maximal allowed load average. Once this load average
	is reached the system is rebooted. You may specify maximal load averages for
	1 minute, 5 minutes or 15 minutes. The default values is to disable this
	test. Be careful not to set this parameter too low. To set a value less then
	the predefined minimal value of 2, you have to use the
	.B -f
	option.
	.PP
	You can also specify a minimal amount of virtual memory you want to have
	available as free. As soon as more virtual memory is used action is taken by
	.BR watchdog .
	Note, however, that watchdog does not distinguish between
	different types of memory usage. It just checks for free virtual memory.
	.PP
	If you have a watchdog card with temperature sensor you can specify
	the maximal allowed temperature. Once this temperature is reached the
	system is halted. The default value is 120. There is no unit conversion so make
	sure you use the same unit as your hardware.
	.B watchdog
	will issue warnings
	once the temperature increases 90%, 95% and 98% of this temperature.
	.PP
	When using file mode
	.B watchdog
	will try to
	.BR stat (2)
	the given files. Errors returned
	by stat will
	.B not
	cause a reboot. For a reboot the stat call has to last at least one minute.
	This may happen if the file is located on an NFS mounted filesystem. If your
	system relies on an NFS mounted filesystem you might try this option.
	However, in such a case the
	.I sync
	option may not work if the NFS server is
	not answering.
	.PP
	.B watchdog
	can read the pid from a pid file and
	see whether the process still exists. If not, action is taken
	by
	.BR watchdog .
	So you can for instance restart the server from your
	.IR repair-binary .
	.PP
	.B watchdog
	will try periodically to fork itself to see whether the process
	table is full. This process will leave a zombie process until watchdog wakes
	up again and catches it; this is harmless, don't worry about it.
	.PP
	In ping mode
	.B watchdog
	tries to ping the given IP addresses. These addresses do
	not have to be a single machine. It is possible to ping to a broadcast
	address instead to see if at least one machine in a subnet is still living.
	.PP
	.B Do not use this broadcast ping unless your MIS person a) knows about it and
	.B b) has given you explicit permission to use it!
	.PP
	.B watchdog
	will send out three ping packages and wait up to <interval> seconds
	for the reply with <interval> being the time it goes to sleep between two
	times triggering the watchdog device. Thus a unreachable network will not
	cause a hard reset but a soft reboot.
	.PP
	You can also test passively for an unreachable network by just monitoring
	a given interface for traffic. If no traffic arrives the network is
	considered unreachable causing a soft reboot or action from the
	repair binary.
	.PP
	.B watchdog can run an external command for user-defined tests. A return code
	not equal 0 means an error occured and watchdog should react. If the external
	command is killed by an uncaught signal this is considered an error by watchdog
	too.
	The command may take longer than the time slice defined for the kernel device
	without a problem. However, error messages are
	generated into the syslog facility. If you have enabled softboot on error
	the machine will be rebooted if the binary doesn't exit in half the time
	.B watchdog
	sleeps between two tries triggering the kernel device.
	.PP
	If you specify a repair binary it will be started instead of shutting down
	the system. If this binary is not able to fix the problem
	.B watchdog
	will still cause a reboot afterwards.
	.PP
	If the machine is halted an email is sent to notify a human that
	the machine is going down. Starting with version 4.4
	.B watchdog
	will also notify the human in charge if the machine is rebooted.
	.SH "SOFT REBOOT"
	A soft reboot (i.e. controlled shutdown and reboot) is initiated for every
	error that is found. Since there might be no more processes available,
	watchdog does it all by himself. That means:
	.IP 1. 4
	Kill all processes with SIGTERM.
	.IP 2. 4
	After a short pause kill all remaining processes with SIGKILL.
	.IP 3. 4
	Record a shutdown entry in wtmp.
	.IP 4. 4
	Save the random seed from
	.IR /dev/urandom .
	If the device is non-existant or
	there is no filename for saving this step is skipped.
	.IP 5. 4
	Turn off accounting.
	.IP 6. 4
	Turn off quota and swap.
	.IP 7. 4
	Unmount all partitions except the root partition.
	.IP 8. 4
	Remount the root partition read-only.
	.IP 9. 4
	Shut down all network interfaces.
	.IP 10. 4
	Finally reboot.
	.SH "CHECK BINARY"
	If the return code of the check binary is not zero
	.B watchdog
	will assume an
	error and reboot the system. Be careful with this if you are using the
	real-time properties of watchdog since
	.B watchdog
	will wait for the return of
	this binary before proceeding. An positive exit code is interpreted as an
	system error code (see
	.I errno.h
	for details). Negative values are special to
	.BR watchdog :
	.TP
	\-1
	Reboot the system. This is not exactly an error message but a command to
	.BR watchdog .
	If the return code is \-1
	.B watchdog
	will not try to run a shutdown
	script instead.
	.TP
	\-2
	Reset the system. This is not exactly an error message but a command to
	.BR watchdog .
	If the return code is \-2
	.B watchdog will simply refuse to write the
	kernel device again.
	.TP
	\-3
	Maximum load average exceeded.
	.TP
	\-4
	The temperature inside is too high.
	.TP
	\-5
	.I /proc/loadavg
	contains no (or not enough) data.
	.TP
	\-6
	The given file was not changed in the given interval.
	.TP
	\-7
	.I /proc/meminfo
	contains invalid data.
	.TP
	\-8
	Child process was killed by a signal.
	.TP
	\-9
	Child process did not return in time.
	.TP
	\-10
	Free for personal use.
	.SH "REPAIR BINARY"
	The repair binary is started with one parameter: the error number that
	caused
	.B watchdog
	to initiate the boot process. After trying to repair the
	system the binary should exit with 0 if the system was successfully repaired
	and thus there is no need to boot anymore. A return value not equal 0 tells
	.B watchdog
	to reboot. The return code of the repair binary should be the error
	number of the error causing
	.B watchdog
	to reboot. Be careful with this if you
	are using the real-time properties since
	.B watchdog
	will wait for
	the return of this binary before proceeding.
	.SH "TEST DIRECTORY"
	Executables placed in the test directory are discovered by watchdog on
	startup and are automatically executed. They are bounded time-wise by
	the test-timeout directive in watchdog.conf.

	These executables are called with either "test" as the first argument
	(if a test is being performed) or "repair" as the first argument (if a
	repair for a previously-failed "test" operation on is being performed).

	The as with test binaries and repair binaries, expected exit codes for
	a successful test or repair operation is always zero.

	If an executable's test operation fails, the same executable is automatically
	called with the "repair" argument as well as the return code of the
	previously-failed test operation.

	For example, if the following execution returns 42:

	/etc/watchdog.d/my-test test

	The watchdog daemon will attempt to repair the problem by calling:

	/etc/watchdog.d/my-test repair 42

	This enables administrators and application developers to make intelligent
	test/repair commands. If the "repair" operation is not required (or is
	not likely to succeed), it is important that the author of the command
	return a non-zero value so the machine will still reboot as expected.

	Note that the watchdog daemon may interpret and act upon any of the reserved
	return codes noted in the Check Binary section prior to calling a given
	command in "repair" mode.
	.SH BUGS
	None known so far.
	.SH AUTHORS
	The original code is an example written by Alan Cox
	<alan@lxorguk.ukuu.org.uk>, the author of the kernel driver. All
	additions were written by Michael Meskes <meskes@debian.org>. Johnie Ingram
	<johnie@netgod.net> had the idea of testing the load average. He also took
	over the Debian specific work. Dave Cinege <dcinege@psychosis.com> brought
	up some hardware watchdog issues and helped testing this stuff.
	.SH FILES
	.TP
	.I /dev/watchdog
	The watchdog device.
	.TP
	.I /var/run/watchdog.pid
	The pid file of the running
	.BR watchdog .
	.SH "SEE ALSO"
	.BR watchdog.conf (5)