%PDF- %PDF-
Mini Shell

Mini Shell

Direktori : /proc/self/root/proc/self/root/usr/share/nagios/html/docs/
Upload File :
Create Path :
Current File : //proc/self/root/proc/self/root/usr/share/nagios/html/docs/wprocs.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.5"/>
<title>Nagios: Worker Processes</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/search.js"></script>
<script type="text/javascript">
  $(document).ready(function() { searchBox.OnSelectItem(0); });
</script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
 <tbody>
 <tr style="height: 56px;">
  <td style="padding-left: 0.5em;">
   <div id="projectname">Nagios
   &#160;<span id="projectnumber">4.4.3</span>
   </div>
   <div id="projectbrief">Dev docs for Nagios core and neb-module hackers</div>
  </td>
 </tr>
 </tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.5 -->
<script type="text/javascript">
var searchBox = new SearchBox("searchBox", "search",false,'Search');
</script>
  <div id="navrow1" class="tabs">
    <ul class="tablist">
      <li><a href="index.html"><span>Main&#160;Page</span></a></li>
      <li class="current"><a href="pages.html"><span>Related&#160;Pages</span></a></li>
      <li><a href="annotated.html"><span>Data&#160;Structures</span></a></li>
      <li><a href="files.html"><span>Files</span></a></li>
      <li>
        <div id="MSearchBox" class="MSearchBoxInactive">
        <span class="left">
          <img id="MSearchSelect" src="search/mag_sel.png"
               onmouseover="return searchBox.OnSearchSelectShow()"
               onmouseout="return searchBox.OnSearchSelectHide()"
               alt=""/>
          <input type="text" id="MSearchField" value="Search" accesskey="S"
               onfocus="searchBox.OnSearchFieldFocus(true)" 
               onblur="searchBox.OnSearchFieldFocus(false)" 
               onkeyup="searchBox.OnSearchFieldChange(event)"/>
          </span><span class="right">
            <a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
          </span>
        </div>
      </li>
    </ul>
  </div>
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
     onmouseover="return searchBox.OnSearchSelectShow()"
     onmouseout="return searchBox.OnSearchSelectHide()"
     onkeydown="return searchBox.OnSearchSelectKey(event)">
<a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(0)"><span class="SelectionMark">&#160;</span>All</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(1)"><span class="SelectionMark">&#160;</span>Data Structures</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(2)"><span class="SelectionMark">&#160;</span>Files</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(3)"><span class="SelectionMark">&#160;</span>Functions</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(4)"><span class="SelectionMark">&#160;</span>Variables</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(5)"><span class="SelectionMark">&#160;</span>Typedefs</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(6)"><span class="SelectionMark">&#160;</span>Macros</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(7)"><span class="SelectionMark">&#160;</span>Pages</a></div>

<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<iframe src="javascript:void(0)" frameborder="0" 
        name="MSearchResults" id="MSearchResults">
</iframe>
</div>

</div><!-- top -->
<div class="header">
  <div class="headertitle">
<div class="title">Worker Processes </div>  </div>
</div><!--header-->
<div class="contents">
<div class="toc"><h3>Table of Contents</h3>
<ul><li class="level1"><a href="#philosophy">Philosophy</a></li>
<li class="level1"><a href="#protocol">Protocol</a><ul><li class="level2"><a href="#apis">API's</a></li>
</ul>
</li>
<li class="level1"><a href="#registering">Registering a worker - The handshake</a><ul><li class="level2"><a href="#request">Requests</a></li>
<li class="level2"><a href="#responses">Responses</a></li>
</ul>
</li>
<li class="level1"><a href="#logging">Logging</a></li>
<li class="level1"><a href="#xchgexample">Protocol Exchange Example</a></li>
</ul>
</div>
<div class="textblock"><p>Everything related to worker processes.</p>
<h1><a class="anchor" id="philosophy"></a>
Philosophy</h1>
<p>The idea behind separate worker processes is to achieve protected parallelization. Protected because a worker being naughty shouldn't affect the core process, and parallel because we use multiple workers. Ideally between 1.5 and 3 per CPU core available to us.</p>
<p>Workers are free-standing processes, kept small, and with no knowledge about Nagios' object structure or logic. The reason for this is that small processes can achieve a lot more fork()s per second than large processes (800/sec for a 300MB process against 13900/sec for a 1MB process). While workers can (and do) grow a little bit in memory usage when it's running many checks in parallel, they will still be a lot smaller than the primar Nagios daemon, and the memory they occupy should be released once the checks they're running are done.</p>
<h1><a class="anchor" id="protocol"></a>
Protocol</h1>
<p>Workers use a text-based protocol to communicate with workers. It's fairly simple and very easy to debug. The breakdown goes as follows: </p>
<ul>
<li>A request consists of a sequence of key/value pairs. </li>
<li>A key is separated from its value with an equal sign ('='). </li>
<li>A key/value pair is separated from the next key/value pair with a nul byte ('\0'). </li>
<li>Each request is separated from the next with a message delimiter sequence made up by a one-byte followed by three nul bytes: "\1\0\0\0". </li>
<li>Keys cannot contain equal signs. underscores and numbers. </li>
<li>Values cannot contain nul bytes. </li>
<li>Neither keys nor values can contain the message delimiter. </li>
<li>A zero-length value is considered to be the empty string.</li>
</ul>
<dl class="section note"><dt>Note</dt><dd>Even though it's technically legal to put almost anything in the key field, you should stick to mnemonic names when extending the protocol and just use lower case letters and underscores. </dd>
<dd>
Keys are case sensitive. JOB_ID is <em>not</em> the same as job_id.</dd></dl>
<h2><a class="anchor" id="apis"></a>
API's</h2>
<p>Worker processes communicate with Nagios using libnagios API's exclusively. Since you're looking at a subpage of the documentation for that documentation right now, I'll just assume you've found it. Although using the libnagios api's when writing a worker is completely optional, it's highly recommended.</p>
<p>The key API's to use are: </p>
<ul>
<li>nsock - for connecting to and communicating through the qh socket </li>
<li>kvvec - for parsing requests and building responses </li>
<li>worker - for utils and stuff nifty to have if you're a worker </li>
<li>runcmd - for spawning and reaping commands </li>
<li>squeue - for maintaining a queue of the running job's timeouts </li>
<li>iocache - for bulk-reading requests and parsing them </li>
<li>iobroker - for multiplexing between running tasks and the master nagios process.</li>
</ul>
<dl class="section note"><dt>Note</dt><dd>In particular, have a look at the "parse_command_kvvec()" and "finish_job()" functions in lib/worker.c. They will do a large part of the request/response handling for you.</dd></dl>
<h1><a class="anchor" id="registering"></a>
Registering a worker - The handshake</h1>
<p>Workers register with Nagios through the queryhandler, using a query sent to the wproc handler. Since the query handler reserves the nul byte as a magic delimiter for its messages, this one time we use the semicolon instead, as is almost-standard in the internal-only queryhandlers. Typically, the default worker process registers with a query such as this: </p>
<pre class="fragment">@wproc register name=Core Worker $pid;pid=$pid\0
</pre><p>Nagios will then respond with </p>
<pre class="fragment">OK\0
</pre><p> followed by a stream of commands.</p>
<p>Nagios currently understands the following (short) list of special keys: </p>
<ul>
<li>pid - The pid of the worker process. Sometimes used to check if a worker is online </li>
<li>name - Used to set the name of the worker </li>
<li>max_jobs - Used to tell Nagios how many concurrent jobs this worker can handle </li>
<li>plugin - basename() or absolute path of specific plugins that this worker wants to handle checks for.</li>
</ul>
<dl class="section note"><dt>Note</dt><dd>plugin can be given multiple times. It is valid for a single single worker to say "plugin=check_snmp;plugin=check_iferrors", for example.</dd>
<dd>
Many workers can register for the same plugin(s). They will share the load in round-robin fashion.</dd></dl>
<p>Complete C-code for registering a generic worker with Nagios follows: </p>
<div class="fragment"><div class="line"><span class="keyword">static</span> <span class="keywordtype">int</span> nagios_core_worker(<span class="keyword">const</span> <span class="keywordtype">char</span> *path)</div>
<div class="line">{</div>
<div class="line">    <span class="keywordtype">int</span> sd, ret;</div>
<div class="line">    <span class="keywordtype">char</span> response[128];</div>
<div class="line"></div>
<div class="line">    is_worker = 1;</div>
<div class="line"></div>
<div class="line">    set_loadctl_defaults();</div>
<div class="line"></div>
<div class="line">    sd = <a class="code" href="nsock_8h.html#a698ebbdfe5e3589dd9e2d466d87ceb5a">nsock_unix</a>(path, <a class="code" href="nsock_8h.html#aaa52ee91165aeea3d09b1d857b6b4426">NSOCK_TCP</a> | <a class="code" href="nsock_8h.html#a408ebfcac776e538f3b8ef3d7297d3c4">NSOCK_CONNECT</a>);</div>
<div class="line">    <span class="keywordflow">if</span> (sd &lt; 0) {</div>
<div class="line">        printf(<span class="stringliteral">&quot;Failed to connect to query socket &#39;%s&#39;: %s: %s\n&quot;</span>,</div>
<div class="line">               path, <a class="code" href="nsock_8h.html#adcaaf011dcd99b0d782cc2b89736f2cf">nsock_strerror</a>(sd), strerror(errno));</div>
<div class="line">        <span class="keywordflow">return</span> 1;</div>
<div class="line">    }</div>
<div class="line"></div>
<div class="line">    ret = <a class="code" href="nsock_8h.html#a0469978df30212122748624f95e9c472">nsock_printf_nul</a>(sd, <span class="stringliteral">&quot;@wproc register name=Core Worker %d;pid=%d&quot;</span>, getpid(), getpid());</div>
<div class="line">    <span class="keywordflow">if</span> (ret &lt; 0) {</div>
<div class="line">        printf(<span class="stringliteral">&quot;Failed to register as worker.\n&quot;</span>);</div>
<div class="line">        <span class="keywordflow">return</span> 1;</div>
<div class="line">    }</div>
<div class="line"></div>
<div class="line">    ret = read(sd, response, 3);</div>
<div class="line">    <span class="keywordflow">if</span> (ret != 3) {</div>
<div class="line">        printf(<span class="stringliteral">&quot;Failed to read response from wproc manager\n&quot;</span>);</div>
<div class="line">        <span class="keywordflow">return</span> 1;</div>
<div class="line">    }</div>
<div class="line">    <span class="keywordflow">if</span> (memcmp(response, <span class="stringliteral">&quot;OK&quot;</span>, 3)) {</div>
<div class="line">        read(sd, response + 3, <span class="keyword">sizeof</span>(response) - 4);</div>
<div class="line">        response[<span class="keyword">sizeof</span>(response) - 2] = 0;</div>
<div class="line">        printf(<span class="stringliteral">&quot;Failed to register with wproc manager: %s\n&quot;</span>, response);</div>
<div class="line">        <span class="keywordflow">return</span> 1;</div>
<div class="line">    }</div>
<div class="line"></div>
<div class="line">    enter_worker(sd, start_cmd);</div>
<div class="line">    <span class="keywordflow">return</span> 0;</div>
<div class="line">}</div>
</div><!-- fragment --><p>The "enter_worker()" part actually refers to a libnagios function that lives in worker.c. The set_loadctl_defaults() call can be ignored. It's primarily intended to give sane defaults about how many jobs we can run, so we (in theory) can tell Nagios that we're swamped in case we run out of filedescriptors or child processes.</p>
<h2><a class="anchor" id="request"></a>
Requests</h2>
<p>A complete request looks like this (with C-style format codes replaced with actual values, ofcourse): </p>
<pre class="fragment">job_id=%d\0type=%d\0command=%s\0timeout=%u\0\1\0\0\0
</pre><p>Note that values can contain equal signs, but cannot contain nul bytes, and cannot contain the message delimiter sequence. By including nagios/lib/worker.h and using worker_ioc2msg() followed by worker_kvvec2buf_prealloc(), you will get a parsed key/value vector handed to you. Have a look in base/workers.c to see how it's done for the core workers.</p>
<h2><a class="anchor" id="responses"></a>
Responses</h2>
<p>Once the worker is done running a task, it hands over the result to the master Nagios process and forgets it ever ran the job. The workers take no further action, regardless of how the task went. The exception is if the job timed out, or if the worker failed to even start the job, in which case it should report the error to Nagios and only <em>then</em> forget it ever got the job.</p>
<p>The response is identical to the request in formatting but differs in the understood keys. The request sent from Nagios to the worker must precede the other result variables. In particular, the job_id must be the first variable Nagios sees for it to parse the result as a job result rather than as something else.</p>
<p>The variables required for the response to a successfully executed job on a registered worker process are as follows: </p>
<ul>
<li>job_id - The job id (as received by Nagios) </li>
<li>type - The job type (as Nagios sent it) </li>
<li>start - Timeval struct for start value in $sec.$usec format </li>
<li>stop - Timeval struct for stop time in $sec.$usec format </li>
<li>runtime - Floating point value of runtime, in seconds </li>
<li>outstd - Output caught on stdout </li>
<li>outerr - Output caught on stderr </li>
<li>exited_ok - Boolean flag to denote if the job exited ok. A non-zero return code can still be achieved </li>
<li>wait_status - Integer, as set by the wait() family of system calls</li>
</ul>
<p>The following should only be present when the worker is unable to execute the check due to an error, or when it cannot provide all the variables required for a successfully executed job due to arbitrary system errors: </p>
<ul>
<li>error_msg - An error message generated by the worker process </li>
<li>error_code - The error code generated by the worker process</li>
</ul>
<p>error_code 62 (ETIME - Timer expired) is reserved and means that the job timed out. </p>
<dl class="section note"><dt>Note</dt><dd><em>never</em> invent error codes in the range 0-10000, since we'll want to reserve that for special cases.</dd></dl>
<p>The following are completely optional (for now): </p>
<ul>
<li>command - The command we executed </li>
<li>timeout - The timeout Nagios requested for this job</li>
</ul>
<h1><a class="anchor" id="logging"></a>
Logging</h1>
<p>Worker processes can send events to the main Nagios process that will end up in the nagios.log file. The format is the same as that in requests and responses, but a log-message consists of a single key/value pair, where the key is always 'log'. Consequently, a request from a worker to the main process to log something looks like this: </p>
<pre class="fragment">log=A random message that will get logged to nagios.log\0
</pre><p>It's worth noting that Nagios will prefix the message with the worker process name, so as to make grep'ing easy when debugging experimental workers.</p>
<h1><a class="anchor" id="xchgexample"></a>
Protocol Exchange Example</h1>
<p>A register + execution of one job on a worker process will, with the standard Nagios core worker look like this, after the worker process has connected to the query handler socket but before it has sent anything. Note that the nul-bytes separating key/value pairs have been replaced with newline to enhance readability. Also note that this depicts only the required steps, which go as follows: </p>
<pre class="fragment">Step 1, Worker:
  @wproc register name=Worker Hoopla;max_jobs=100;pid=6196\0
Step 2, Nagios:
  OK\0
Step 3, Nagios:
  job_id=0
  type=2
  timeout=60
  command=/opt/plugins/check_ping -H localhost -w 40%,100.0 -c 60%,200.0
  \1\0\0\0
Step 4, Worker:
  job_id=0
  type=2
  timeout=60
  start=1355231532.000123
  stop=1355231532.994343
  runtime=0.994120
  exited_ok=1
  outstd=OK: RTA: 12.6ms; PL: 0%|rta=12.6ms;100.0;200.0;0;; pl=0%;40;60
  wait_status=0
  outerr=
  \1\0\0\0
</pre><p> Steps 3 and 4 in this chain repeat indefinitely. </p>
</div></div><!-- contents -->
<!-- start footer part -->
<hr class="footer"/><address class="footer"><small>
Generated by &#160;<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/>
</a> 1.8.5
</small></address>
</body>
</html>

Zerion Mini Shell 1.0