Greetings,<div> One of the necessary duties when running a production environment operating hundreds or thousands of tunnels is to have visibility into the state of the machines, tunnels, etc. Naturally, this data often has to be aggregated and interesting slices carved out of the aggregate data for analysis. We find that instrumentation is vital to visibility and hence a successful operation. Think of the pilot flying in inclimate weather. Without optical visibility, he must rely on the instrumentation. Along these lines, I would like to solicit anecdotal accounts of how others are tackling this, or how you are not and why it is unnecessary. Some vendor based ipsec solutions give the access to diagnostic counters. These are great for capturing in aggregate and visualizing. One can quickly spot trends and stress points. After going through some of the Strongswan wiki, I am unable to find information on how to best tackle this. I have some questions around how to improve our visibility.</div>
<div><br></div><div>Under Linux we have access to robust snmpd functionality to take scripts and bind them to snmp, such that polling can provide the result returned by the script. One could also write an agent that grabs diagnostic data and sends it directly to something like graphite.</div>
<div>This gets me part of the way there. However, I am unable to access all of the diagnostic data of interest. I understand that parts of this can be done with Strongswan commands and some may have to be extracted from the kernel. I am hoping the experts can provide some suggestions. I have below a sample list of the types of things I am interested in. I'm not opposed to cobbling things together, but I suspect that many of these things are already referenced within the code somewhere, just not instrumented to keep tally or expose those counters. Of course, it would be great to not have to cobble and have a nice RPC/XML API, SNMP or other query system, but I also understand that this is not the core of the development effort or intention. This list is off the top of my head, but hopefully good enough for illustrating the general direction.</div>
<div><br></div><div><br></div><div><div>Dropped packets (both cumulative and individual counters for anti-replay drops and other drops).</div><div>Phase 1 packets sent/received</div><div>Phase 2 packets sent/received</div>
<div>Phase 1 proposals failure/success counter</div><div>Phase 2 proposals failure/success counter</div><div>Packets sent over tunnel (encrypted), ESPOutBytes, ESPInBytes, etc</div><div>Packets received over tunnel (decrypted)</div>
<div>ESP Deletes sent/received</div><div>Create_child_sa request sent/received</div><div>DPD sent</div><div>DPD success</div><div>DPD failure</div><div>New tunnels from DPD</div><div>Failure to establish tunnel</div><div>
Failure due to unmatched endpoint in config</div><div>Failure in authentication, counters for the various methods (number of invalid certs, etc)</div><div>Failures around CRLs</div><div>Failure due to unmatched routes</div>
<div>Per-peer relevant counters</div><div>Renegotiation success/failures</div></div><div><br></div><div>Thank you,</div><div>Robin.</div>