<div dir="ltr">Hi,<div><br></div><div>I need some help on peculiar problem that we are  facing in his IPSec setup. </div><div><br></div><div>First of all I will mention the kind of IPSec setup customer using in his network.</div><div><br></div><div><br><div class="gmail_chip gmail_drive_chip" style="width:396px;height:18px;max-height:18px;background-color:#f5f5f5;padding:5px;color:#222;font-family:arial;font-style:normal;font-weight:bold;font-size:13px;border:1px solid #ddd;line-height:1"><a href="https://drive.google.com/file/d/0B4ZfPbrUfHkZa2FRZkNjeVo0Y0k/edit?usp=drive_web" target="_blank" style="display:inline-block;overflow:hidden;text-overflow:ellipsis;white-space:nowrap;text-decoration:none;padding:1px 0px;border:none;width:100%"><img style="vertical-align: bottom; border: none;" src="https://ssl.gstatic.com/docs/doclist/images/icon_10_generic_list.png"> <span dir="ltr" style="color:#15c;text-decoration:none;vertical-align:bottom">Cust.jpg</span></a></div><br></div><div>Cutstomer has network as per attached diagram. my device (In diagram cell sites) has two routes via Cisco router AR1 and AR2 to Juniper gateway. Path via AR1 is primary route through which IPSec tunnels are established. If primary link goes down, sites has BFD protocol enabled and it detects the link down and path is rer-route  via router AR2.</div><div><br></div><div>Problem customer is facing when primary link goes down, BFD is detecting the link down and rerouting is also happenning properly but we are seeing ESP packets dropped. </div><div>generally after reroute packets should flow in such way that nothing has happened because BFD has detected has link down and it has added new route. But Packets are getting dropped due to spi mismatch. Device recovers only when rekey happens  . Rekey time is 2 hours. Initially we doubted may be the network has issue via router AR2 or Gateway is misbehaving, but than device should not receive ESP packets. </div><div><br></div><div><br></div><div><br></div><div>In our application code we are seeing many tunnel deletion request is coming from strongswan via netlink messages for spi that do not exist now. These are huge in no. It looks like strongswan or linux kernal has stored all the old spi that were used sometime in past for packet encryption and decryption. And when present tunnel goes down, it is trying to delete all the past tunel for all old spi. But since tunnel is not present, spi not found prints are coming. <b style="background-color:rgb(255,0,0)"> But my doubt is why tunnel deletion request is coming for spi that does not exist</b> ? Generally when a tunnel is deleted due to rekey or some other problem, all tunnesl and corresponding spi shluld be cleared at once. </div><div><br></div><div><div>not present                                   468   468   2258133644 </div><div> 43081367 OCT_System           WARNING Wed Apr 22 2015 08:41:31 768ms syslogd.c(134)                 OCT_syslogd: ipda_cv: processXfrmSaMessage(): conn(<b>spi=<span style="background-color:rgb(255,0,0)">0xcfa6556b</span></b>,dstIp=0xa00a194) not present                                   468   468   2260373638 </div><div> 43081380 OCT_System           WARNING Wed Apr 22 2015 08:41:32 658ms syslogd.c(134)                 OCT_syslogd: ipda_cv: processXfrmSaMessage(): conn(<b>spi=<span style="background-color:rgb(255,0,0)">0xcae0db97</span></b>,dstIp=0xa00a194) not present                                   468   468   2261263695 </div><div> 43081395 OCT_System           WARNING Wed Apr 22 2015 08:41:34 487ms syslogd.c(134)                 OCT_syslogd: ipda_cv: processXfrmSaMessage(): conn(spi=0xc7469ad2,dstIp=0xa00a194) not present                                   468   468   2263093559 </div><div> 43081408 OCT_System           WARNING Wed Apr 22 2015 08:41:35 367ms syslogd.c(134)                 OCT_syslogd: ipda_cv: processXfrmSaMessage(): conn(spi=<span style="background-color:rgb(255,0,0)">0xc3d42a66</span>,dstIp=0xa00a194) not present                                   468   468   2263973563 </div></div><div><br></div><div><br></div><div>Also, we are seeing a lot of prints in strongswan like this :</div><div>It looks like strongswan is trying to establish CHILD_SA but it is not able to to do so:</div><div><br></div><div><div> 43088969 OCT_syslogd          INFO    Wed Apr 22 2015 08:53:19 888ms syslogd.c(134)                 charon: 12[IKE] establishing CHILD_SA conn10{6}                                                                                  468   468   2968494109 </div><div> 43088970 OCT_syslogd          INFO    Wed Apr 22 2015 08:53:19 888ms syslogd.c(134)                 charon: 12[IKE] establishing CHILD_SA conn10{6} </div></div><div><br></div><div><span style="background-color:rgb(255,0,0)">I am not able to under stand why these many prints are coming as above and why trongswan is taking so much time to established CHILD_SA?</span></div><div><span style="background-color:rgb(255,0,0)"><br></span></div><div>And finally we are seeing CHILD_SA is established.</div><div>Once CHILD_SA is established, everything is fine. Packets  site recover and packets start flowing.</div><div><br></div><div><div>43090893 OCT_syslogd          INFO    Wed Apr 22 2015 08:55:54 838ms syslogd.c(134)                 charon: 11[IKE] CHILD_SA conn10{14} established with SPIs c6d3b37b_i 055e896c_o and TS <a href="http://0.0.0.0/0">0.0.0.0/0</a> === <a href="http://0.0.0.0/0">0.0.0.0/0</a>                   468   468   3123443800 </div><div> 43090894 OCT_syslogd          INFO    Wed Apr 22 2015 08:55:54 838ms syslogd.c(134)                 charon: 11[IKE] CHILD_SA conn10{14} established with SPIs c6d3b37b_i 055e896c_o and TS <a href="http://0.0.0.0/0">0.0.0.0/0</a> === <a href="http://0.0.0.0/0">0.0.0.0/0</a>                   468   468   3123443824 </div><div> </div></div><div>Any input about above symptoms will be great help.</div><div><br></div><div>Thanks & Regards,</div><div>Bhashkar</div></div>