[strongSwan] Charon reset
Ken Nelson
ken at cazena.com
Mon Mar 9 17:02:52 CET 2015
Oh, of course. SIGSEGV is the offending signal - should’ve seen that.
I did not build StrongSwan, rather am running the latest public release for Centos 6 - SS v5.2.0. I did not do anything special to get symbolic debugging, rather just downloaded all the recommended debug packages. I did have an issue with RPM GPG keys but got past it fairly easily. Given the core file, it seems to me you could easily replicate what I did to get symbolic GDB on the core and more quickly/easily be able to determine root cause.
I have produced this issue three times. Twice on the initiator and once on the responder. If memory serves, it took a couple hours to produces the responder failure and eight hours to produce the initiator failure. The tunnel was up the duration until the failure, with the initiator host issuing a periodic ping, once per minute, to a host within the private VPN behind the responder. All three are expected to be the same failure as they all had similar log entries:
Mar 6 23:56:33 ip-10-100-34-179 charon: 13[DMN] thread 13 received 11
Mar 6 23:56:33 ip-10-100-34-179 charon: 13[LIB] dumping 2 stack frame addresses:
Mar 6 23:56:33 ip-10-100-34-179 charon: 13[LIB] /lib64/libpthread.so.0 @ 0x7f5e633b2000 [0x7f5e633c1710]
Mar 6 23:56:33 ip-10-100-34-179 charon: 13[LIB] -> sigaction.c:0
Mar 6 23:56:33 ip-10-100-34-179 charon: 13[LIB] [0xd30fe0]
Mar 6 23:56:33 ip-10-100-34-179 charon: 13[DMN] killing ourself, received critical signal
I will try to more quickly produce the crash by setting ikelifetime. Is there a recommended (or minimum) value?
Is there a work-around to this problem?
Some frame #5 info:
(gdb) frame 5
#5 0x00007f5e63cf48ac in certs_filter (data=0x7f5e280033e0, in=0x7f5e50c6caf8, out=0x7f5e50c6cba8)
at credentials/sets/mem_cred.c:93
93 if (data->cert == CERT_ANY || data->cert == cert->get_type(cert))
(gdb) p data->cert
$1 = CERT_X509
(gdb) p *data
$2 = {lock = 0xd20070, cert = CERT_X509, key = KEY_ANY, id = 0x7f5e280032c0}
(gdb) p cert->get_type
$3 = (certificate_type_t (*)(certificate_t *)) 0xd30fe0
(gdb) p *cert
$4 = {get_type = 0xd30fe0, get_subject = 0x7f5e631a9ed8 <main_arena+88>, has_subject = 0, get_issuer = 0,
has_issuer = 0x7f5e5d7cdb00 <has_issuer>, issued_by = 0x7f5e5d7ce0a0 <issued_by>,
get_public_key = 0x7f5e5d7cdb10 <get_public_key>, get_validity = 0x7f5e5d7ce030 <get_validity>,
get_encoding = 0x7f5e5d7cdcb0 <get_encoding>, equals = 0x7f5e5d7d3930 <equals>, get_ref = 0x7f5e5d7cdfa0 <get_ref>,
destroy = 0x7f5e5d7ce780 <destroy>}
(gdb)
On Mar 9, 2015, at 8:43 AM, Martin Willi <martin at strongswan.org<mailto:martin at strongswan.org>> wrote:
Ken,
The initiator received signal 6 (SIGABRT) after eight hours of operation.
Actually, the offending signal is SIGSEGV (11). charon catches that,
prints a backtrace, and then calls abort() to terminate itself.
I have a ~182MB core file from the initiator. How can I get it to you?
I don't think that helps much, as I can't analyze that without your
build and environment.
#2 0x0000000000401393 in segv_handler (signal=11) at charon.c:199
#5 0x00007f5e63cf48ac in certs_filter (data=0x7f5e280033e0, in=0x7f5e50c6caf8, out=0x7f5e50c6cba8)
at credentials/sets/mem_cred.c:93
#6 0x00007f5e63ce6a55 in enumerate_filter (this=0x7f5e28003000, o1=0x7f5e50c6cba8, o2=0x7f5e63ce6ce0, o3=0x40,
o4=0x7f5e28000088, o5=0x1) at collections/enumerator.c:525
#7 0x00007f5e63ce6953 in enumerate_nested (this=0x7f5e280033a0, v1=0x7f5e50c6cba8, v2=0x7f5e63ce6ce0, v3=0x40,
v4=0x7f5e28000088, v5=0x1) at collections/enumerator.c:448
#8 0x00007f5e63cf35c0 in get_cert (this=<value optimized out>, cert=<value optimized out>, key=<value optimized out>,
id=<value optimized out>, trusted=<value optimized out>) at credentials/credential_manager.c:269
#9 0x00007f5e63890535 in process_certreq (this=0x7f5e34001040, message=<value optimized out>)
at sa/ikev2/tasks/ike_cert_pre.c:85
#10 process_certreqs (this=0x7f5e34001040, message=<value optimized out>) at sa/ikev2/tasks/ike_cert_pre.c:142
#11 0x00007f5e63890acb in process_i (this=0x7f5e34001040, message=0x7f5e44000ff0) at sa/ikev2/tasks/ike_cert_pre.c:524
#12 0x00007f5e63886bce in process_response (this=0x7f5e34000b20, msg=0x7f5e44000ff0) at sa/ikev2/task_manager_v2.c:538
charon crashes while looking up the CA certificate that the peer
indicates trust in by sending a CERTREQ payload. Never seen that, likely
that one of the in-memory certificate instances is corrupt, and/or that
something is wrong with the refcounting of such a certificate.
You may further analyze the issue by inspecting the "data" and "cert"
objects in frame #5 at credentials/sets/mem_cred.c:93.
01[IKE] reauthenticating IKE_SA cazena-pdc[3]
[...]
09[ENC] parsed IKE_SA_INIT response 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) CERTREQ N(MULT_AUTH) ]
[...]
09[DMN] thread 9 received 11
Is this issue reproducible every time? With a constant tunnel uptime?
Can you reduce the time-to-crash if you reduce the re-authentication
interval configured with the ipsec.conf ikelifetime option?
Regards
Martin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.strongswan.org/pipermail/users/attachments/20150309/5207b5c6/attachment-0001.html>
More information about the Users
mailing list