<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div class=""><br class="">
</div>
Oh, of course. SIGSEGV is the offending signal - should’ve seen that.
<div class=""><br class="">
</div>
<div class="">I did not build StrongSwan, rather am running the latest public release for Centos 6 - SS v5.2.0. I did not do anything special to get symbolic debugging, rather just downloaded all the recommended debug packages. I did have an issue with RPM
GPG keys but got past it fairly easily. Given the core file, it seems to me you could easily replicate what I did to get symbolic GDB on the core and more quickly/easily be able to determine root cause. </div>
<div class=""><br class="">
</div>
<div class="">I have produced this issue three times. Twice on the initiator and once on the responder. If memory serves, it took a couple hours to produces the responder failure and eight hours to produce the initiator failure. The tunnel was up the duration
until the failure, with the initiator host issuing a periodic ping, once per minute, to a host within the private VPN behind the responder. All three are expected to be the same failure as they all had similar log entries:</div>
<div class=""><br class="">
</div>
<div class="">
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
Mar 6 23:56:33 ip-10-100-34-179 charon: 13[DMN] thread 13 received 11</div>
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
Mar 6 23:56:33 ip-10-100-34-179 charon: 13[LIB] dumping 2 stack frame addresses:</div>
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
Mar 6 23:56:33 ip-10-100-34-179 charon: 13[LIB] /lib64/libpthread.so.0 @ 0x7f5e633b2000 [0x7f5e633c1710]</div>
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
Mar 6 23:56:33 ip-10-100-34-179 charon: 13[LIB] -> sigaction.c:0</div>
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
Mar 6 23:56:33 ip-10-100-34-179 charon: 13[LIB] [0xd30fe0]</div>
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
Mar 6 23:56:33 ip-10-100-34-179 charon: 13[DMN] killing ourself, received critical signal</div>
</div>
<div class=""><br class="">
</div>
<div class="">I will try to more quickly produce the crash by setting ikelifetime. Is there a recommended (or minimum) value?</div>
<div class=""><br class="">
</div>
<div class="">Is there a work-around to this problem?</div>
<div class=""><br class="">
</div>
<div class="">Some frame #5 info:</div>
<div class=""><br class="">
</div>
<div class="">
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
(gdb) frame 5</div>
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
#5 0x00007f5e63cf48ac in certs_filter (data=0x7f5e280033e0, in=0x7f5e50c6caf8, out=0x7f5e50c6cba8)</div>
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
at credentials/sets/mem_cred.c:93</div>
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
93<span class="Apple-tab-span" style="white-space:pre"> </span>if (data->cert == CERT_ANY || data->cert == cert->get_type(cert))</div>
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
(gdb) p data->cert</div>
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
$1 = CERT_X509</div>
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
(gdb) p *data</div>
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
$2 = {lock = 0xd20070, cert = CERT_X509, key = KEY_ANY, id = 0x7f5e280032c0}</div>
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
(gdb) p cert->get_type</div>
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
$3 = (certificate_type_t (*)(certificate_t *)) 0xd30fe0</div>
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
(gdb) p *cert</div>
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
$4 = {get_type = 0xd30fe0, get_subject = 0x7f5e631a9ed8 <main_arena+88>, has_subject = 0, get_issuer = 0, </div>
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
has_issuer = 0x7f5e5d7cdb00 <has_issuer>, issued_by = 0x7f5e5d7ce0a0 <issued_by>, </div>
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
get_public_key = 0x7f5e5d7cdb10 <get_public_key>, get_validity = 0x7f5e5d7ce030 <get_validity>, </div>
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
get_encoding = 0x7f5e5d7cdcb0 <get_encoding>, equals = 0x7f5e5d7d3930 <equals>, get_ref = 0x7f5e5d7cdfa0 <get_ref>, </div>
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
destroy = 0x7f5e5d7ce780 <destroy>}</div>
<div style="margin: 0px; font-family: Courier; background-color: rgb(226, 225, 227);" class="">
(gdb) </div>
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
<div class=""> <br class="">
<div>
<blockquote type="cite" class="">
<div class="">On Mar 9, 2015, at 8:43 AM, Martin Willi <<a href="mailto:martin@strongswan.org" class="">martin@strongswan.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">Ken,<br class="">
<br class="">
<blockquote type="cite" class="">The initiator received signal 6 (SIGABRT) after eight hours of operation.<br class="">
</blockquote>
<br class="">
Actually, the offending signal is SIGSEGV (11). charon catches that,<br class="">
prints a backtrace, and then calls abort() to terminate itself.<br class="">
<br class="">
<blockquote type="cite" class="">I have a ~182MB core file from the initiator. How can I get it to you?<br class="">
</blockquote>
<br class="">
I don't think that helps much, as I can't analyze that without your<br class="">
build and environment.<br class="">
<br class="">
<blockquote type="cite" class="">#2 0x0000000000401393 in segv_handler (signal=11) at charon.c:199<br class="">
#5 0x00007f5e63cf48ac in certs_filter (data=0x7f5e280033e0, in=0x7f5e50c6caf8, out=0x7f5e50c6cba8)<br class="">
at credentials/sets/mem_cred.c:93<br class="">
#6 0x00007f5e63ce6a55 in enumerate_filter (this=0x7f5e28003000, o1=0x7f5e50c6cba8, o2=0x7f5e63ce6ce0, o3=0x40,<br class="">
o4=0x7f5e28000088, o5=0x1) at collections/enumerator.c:525<br class="">
#7 0x00007f5e63ce6953 in enumerate_nested (this=0x7f5e280033a0, v1=0x7f5e50c6cba8, v2=0x7f5e63ce6ce0, v3=0x40,<br class="">
v4=0x7f5e28000088, v5=0x1) at collections/enumerator.c:448<br class="">
#8 0x00007f5e63cf35c0 in get_cert (this=<value optimized out>, cert=<value optimized out>, key=<value optimized out>,<br class="">
id=<value optimized out>, trusted=<value optimized out>) at credentials/credential_manager.c:269<br class="">
#9 0x00007f5e63890535 in process_certreq (this=0x7f5e34001040, message=<value optimized out>)<br class="">
at sa/ikev2/tasks/ike_cert_pre.c:85<br class="">
#10 process_certreqs (this=0x7f5e34001040, message=<value optimized out>) at sa/ikev2/tasks/ike_cert_pre.c:142<br class="">
#11 0x00007f5e63890acb in process_i (this=0x7f5e34001040, message=0x7f5e44000ff0) at sa/ikev2/tasks/ike_cert_pre.c:524<br class="">
#12 0x00007f5e63886bce in process_response (this=0x7f5e34000b20, msg=0x7f5e44000ff0) at sa/ikev2/task_manager_v2.c:538<br class="">
</blockquote>
<br class="">
charon crashes while looking up the CA certificate that the peer<br class="">
indicates trust in by sending a CERTREQ payload. Never seen that, likely<br class="">
that one of the in-memory certificate instances is corrupt, and/or that<br class="">
something is wrong with the refcounting of such a certificate.<br class="">
<br class="">
You may further analyze the issue by inspecting the "data" and "cert"<br class="">
objects in frame #5 at credentials/sets/mem_cred.c:93.<br class="">
<br class="">
<blockquote type="cite" class="">01[IKE] reauthenticating IKE_SA cazena-pdc[3]<br class="">
[...]<br class="">
09[ENC] parsed IKE_SA_INIT response 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) CERTREQ N(MULT_AUTH) ]<br class="">
[...]<br class="">
09[DMN] thread 9 received 11<br class="">
</blockquote>
<br class="">
Is this issue reproducible every time? With a constant tunnel uptime?<br class="">
Can you reduce the time-to-crash if you reduce the re-authentication<br class="">
interval configured with the ipsec.conf ikelifetime option?<br class="">
<br class="">
Regards<br class="">
Martin<br class="">
<br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</body>
</html>