<div dir="ltr"><div>Hi,</div><div><br></div><div>I have been testing with a Juniper SRX talking to a strongswan gateway and have noticed several problems when I have short rekey settings on the juniper. I have tried to debug a few and put in some "fixes", but it just seems like I hit one problem after another. I am trying IKEv1, with strongswan 5.1.2. The rekey values I use are 4 minutes for phase2. The reauth times for IKE is around 10 minutes. I have described some of the problems below and what I have found from my debugging</div>
<div><br></div><div>1. Improper state changes and calling of updown script.</div><div>I noticed this problem in the method check_for_rekeyed_child in src/libcharon/sa/ikev1/tasks/quick_mode.c . When a new quick mode connection comes and a check is being made for existing connections, the check is stopped when the first connection is found. This is because of the following code in the function</div>
<div><span class="" style="white-space:pre"> </span>enumerator = this->ike_sa->create_child_sa_enumerator(this->ike_sa);</div><div><span class="" style="white-space:pre"> </span>while (this->reqid == 0 && enumerator->enumerate(enumerator, &child_sa))</div>
<div><span class="" style="white-space:pre"> </span>{</div><div>The check stops when reqid is not 0. Part of what this function does is changing the state of an existing connection to CHILD_REKEYING. However, if there are two existing connections, because the first rekeyed child has not expired, one connection is in the state CHILD_REKEYING and the other is state CHILD_INSTALLED, the above loop finds the connection in state CHILD_REKEYING, changes its state to CHILD_REKEYING (from CHILD_REKEYING) and stops. The child in state CHILD_INSTALLED is not touched. When the new child is established, there are now three child SAs - 2 in CHILD_INSTALLED state and one in CHILD_REKEYING state. </div>
<div>A side effect of this condition is that when the connection that was incorrectly in CHILD_INSTALLED state expires, the child updown script is called with down-client. This should not have happened, because technically, this child was being rekeyed.</div>
<div><br></div><div>The fix I made for the above was to remove the "this->reqid == 0" check in the while loop and let the enumerator go through all the existing connections.</div><div><br></div><div>2. Sometimes child updown script with "up-client" gets called for a rekeyed child after IKE reauth</div>
<div>I have noticed this a few times. In the method "install" in src/libcharon/sa/ikev1/tasks/quick_mode.c, where the old child SA is got from the IKE SA, and if the old SA is NULL, the child_updown is called. The following is the code snippet.</div>
<div><br></div><div><span class="" style="white-space:pre"> </span>if (this->rekey)</div><div><span class="" style="white-space:pre"> </span>{</div><div><span class="" style="white-space:pre"> </span>old = this->ike_sa->get_child_sa(this->ike_sa,</div>
<div><span class="" style="white-space:pre"> </span>this->proposal->get_protocol(this->proposal),</div><div><span class="" style="white-space:pre"> </span>this->rekey, TRUE);</div><div><span class="" style="white-space:pre"> </span>}</div>
<div><span class="" style="white-space:pre"> </span>if (old)</div><div><span class="" style="white-space:pre"> </span>{</div><div><span class="" style="white-space:pre"> </span>charon->bus->child_rekey(charon->bus, old, this->child_sa);</div>
<div><span class="" style="white-space:pre"> </span>}</div><div><span class="" style="white-space:pre"> </span>else</div><div><span class="" style="white-space:pre"> </span>{</div><div><span class="" style="white-space:pre"> </span>charon->bus->child_updown(charon->bus, this->child_sa, TRUE);</div>
<div><span class="" style="white-space:pre"> </span>}</div><div><br></div><div><div>I have noticed that sometimes old is NULL for a child that actually rekeyed. This seems to happen when after the IKE SA has been reauth. I think in other places we do not call up-client for child-sa that is rekeyed. I moved the check for old inside the check for this->rekey and this seemed to solve the problem.</div>
</div><div><br></div><div>3. Calling of updown script for child sa in REKEYING state when DPD detected.</div><div>It seems that in some places the updown script is not called when the child sa being deleted is in REKEYING state. However when DPD is detected and an IKE connection is being deleted, all the child sa also are deleted. But there is no check to see if the child is in CHILD_REKEYING state. So the updown script is called with "down-client" for all child sa, even if they are in REKEYING state, which is different from other places where the child updown script is not called for a rekeyed child_sa.</div>
<div>I saw this problem in the ike_updown method in src/libcharon/bus/bus.c</div><div><br></div><div>4. It seems that a child state is being set to CHILD_REKEYING even before the new child is completely established. This happens in method check_for_rekeyed_child in src/libcharon/sa/ikev1/tasks/quick_mode.c, which is called when the first handshake of a new quick mode connection happens. I have noticed a problem with juniper where, the new quick mode connection fails to complete because of a bad hash value in the message 3 of the handshake. This causes the infant connection to be deleted, but what this also means is that the existing connection has already made a state change to CHILD_REKEYING. A side effect of this is that when the existing connection expires, the updown script is never called because the child_sa is in state CHILD_REKEYING. One possible fix I tried is that when a child sa in state CHILD_REKEYING is being deleted, I check if there is another connection with the same name under the IKE_SA. If there are none, I call the updown script with down-client.</div>
<div><br></div><div>5. During IKE_SA reauth, when a new IKE SA is being created and is adopting child sa from previous IKE SA, only INSTALLED child sa are adopted. This is because a child_sa is added to an ike_sa only after it is installed. So if a new child sa is being created at around the same time as the new IKE SA is being created, there is a chance that this child sa is associated with the old IKE SA which is being torn down, causing this child sa to be deleted when the older IKE SA is deleted.</div>
<div><br></div><div>6. I have seen a crash when enumerating through an empty array. For example, in enumerating an empty policy_enumerator. The problem seemed to be a bad count in the underlying array_t. It seems that in the method array_create in src/libstrongswan/collections/array.c, the count and data variables are not initialized to 0. </div>
<div><br></div><div>7. One of the more important debug logs I added to help me debug was to log the state change of a child SA in method set_state in src/libcharon/sa/child_sa.c . It would be very helpful if this log is added in the code base.</div>
<div><br></div><div>Please let me know if my investigations and my possible fixes make any sense.</div><div><br></div><div>regards,</div><div>sach</div></div>