[strongSwan] Charon reset

Mon Mar 9 17:02:52 CET 2015

Oh, of course.  SIGSEGV is the offending signal - should’ve seen that.

I did not build StrongSwan, rather am running the latest public release for Centos 6 - SS v5.2.0.  I did not do anything special to get symbolic debugging, rather just downloaded all the recommended debug packages.  I did have an issue with RPM GPG keys but got past it fairly easily.  Given the core file, it seems to me you could easily replicate what I did to get symbolic GDB on the core and more quickly/easily be able to determine root cause.

I have produced this issue three times.  Twice on the initiator and once on the responder.  If memory serves, it took a couple hours to produces the responder failure and eight hours to produce the initiator failure.  The tunnel was up the duration until the failure, with the initiator host issuing a periodic ping, once per minute, to a host within the private VPN behind the responder.  All three are expected to be the same failure as they all had similar log entries:

Mar  6 23:56:33 ip-10-100-34-179 charon: 13[DMN] thread 13 received 11
Mar  6 23:56:33 ip-10-100-34-179 charon: 13[LIB]  dumping 2 stack frame addresses:
Mar  6 23:56:33 ip-10-100-34-179 charon: 13[LIB]   /lib64/libpthread.so.0 @ 0x7f5e633b2000 [0x7f5e633c1710]
Mar  6 23:56:33 ip-10-100-34-179 charon: 13[LIB]     -> sigaction.c:0
Mar  6 23:56:33 ip-10-100-34-179 charon: 13[LIB]     [0xd30fe0]
Mar  6 23:56:33 ip-10-100-34-179 charon: 13[DMN] killing ourself, received critical signal

I will try to more quickly produce the crash by setting ikelifetime.  Is there a recommended (or minimum) value?

Is there a work-around to this problem?

Some frame #5 info:

(gdb) frame 5
#5  0x00007f5e63cf48ac in certs_filter (data=0x7f5e280033e0, in=0x7f5e50c6caf8, out=0x7f5e50c6cba8)
    at credentials/sets/mem_cred.c:93
93 if (data->cert == CERT_ANY || data->cert == cert->get_type(cert))
(gdb) p data->cert
$1 = CERT_X509
(gdb) p *data
$2 = {lock = 0xd20070, cert = CERT_X509, key = KEY_ANY, id = 0x7f5e280032c0}
(gdb) p cert->get_type
$3 = (certificate_type_t (*)(certificate_t *)) 0xd30fe0
(gdb) p *cert
$4 = {get_type = 0xd30fe0, get_subject = 0x7f5e631a9ed8 <main_arena+88>, has_subject = 0, get_issuer = 0,
  has_issuer = 0x7f5e5d7cdb00 <has_issuer>, issued_by = 0x7f5e5d7ce0a0 <issued_by>,
  get_public_key = 0x7f5e5d7cdb10 <get_public_key>, get_validity = 0x7f5e5d7ce030 <get_validity>,
  get_encoding = 0x7f5e5d7cdcb0 <get_encoding>, equals = 0x7f5e5d7d3930 <equals>, get_ref = 0x7f5e5d7cdfa0 <get_ref>,
  destroy = 0x7f5e5d7ce780 <destroy>}
(gdb)

On Mar 9, 2015, at 8:43 AM, Martin Willi <martin at strongswan.org<mailto:martin at strongswan.org>> wrote:

Ken,

The initiator received signal 6 (SIGABRT) after eight hours of operation.

Actually, the offending signal is SIGSEGV (11). charon catches that,
prints a backtrace, and then calls abort() to terminate itself.

I have a ~182MB core file from the initiator. How can I get it to you?

I don't think that helps much, as I can't analyze that without your
build and environment.

#2  0x0000000000401393 in segv_handler (signal=11) at charon.c:199
#5  0x00007f5e63cf48ac in certs_filter (data=0x7f5e280033e0, in=0x7f5e50c6caf8, out=0x7f5e50c6cba8)
   at credentials/sets/mem_cred.c:93
#6  0x00007f5e63ce6a55 in enumerate_filter (this=0x7f5e28003000, o1=0x7f5e50c6cba8, o2=0x7f5e63ce6ce0, o3=0x40,
   o4=0x7f5e28000088, o5=0x1) at collections/enumerator.c:525
#7  0x00007f5e63ce6953 in enumerate_nested (this=0x7f5e280033a0, v1=0x7f5e50c6cba8, v2=0x7f5e63ce6ce0, v3=0x40,
   v4=0x7f5e28000088, v5=0x1) at collections/enumerator.c:448
#8  0x00007f5e63cf35c0 in get_cert (this=<value optimized out>, cert=<value optimized out>, key=<value optimized out>,
   id=<value optimized out>, trusted=<value optimized out>) at credentials/credential_manager.c:269
#9  0x00007f5e63890535 in process_certreq (this=0x7f5e34001040, message=<value optimized out>)
   at sa/ikev2/tasks/ike_cert_pre.c:85
#10 process_certreqs (this=0x7f5e34001040, message=<value optimized out>) at sa/ikev2/tasks/ike_cert_pre.c:142
#11 0x00007f5e63890acb in process_i (this=0x7f5e34001040, message=0x7f5e44000ff0) at sa/ikev2/tasks/ike_cert_pre.c:524
#12 0x00007f5e63886bce in process_response (this=0x7f5e34000b20, msg=0x7f5e44000ff0) at sa/ikev2/task_manager_v2.c:538

charon crashes while looking up the CA certificate that the peer
indicates trust in by sending a CERTREQ payload. Never seen that, likely
that one of the in-memory certificate instances is corrupt, and/or that
something is wrong with the refcounting of such a certificate.

You may further analyze the issue by inspecting the "data" and "cert"
objects in frame #5 at credentials/sets/mem_cred.c:93.

01[IKE] reauthenticating IKE_SA cazena-pdc[3]
[...]
09[ENC] parsed IKE_SA_INIT response 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) CERTREQ N(MULT_AUTH) ]
[...]
09[DMN] thread 9 received 11

Is this issue reproducible every time? With a constant tunnel uptime?
Can you reduce the time-to-crash if you reduce the re-authentication
interval configured with the ipsec.conf ikelifetime option?

Regards
Martin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.strongswan.org/pipermail/users/attachments/20150309/5207b5c6/attachment-0001.html>