Windows system >> Linux system Tutorial >> Linux system FAQ

RACK provides power source for TCP BBR

At the end of the previous article "Google's BBR TCP Congestion Control Algorithm's Four Variable Speed Engines", I mentioned that the bbr algorithm requires a steady stream of energy as a nominally powerful engine. Supply, and this type of energy is a packet. It is also mentioned that the TCP fast retransmission mechanism only retransmits the data packet judged as LOST once. Therefore, when the retransmission data packet is lost again and the sliding window cannot be slid, the data packet transmission cannot be provided, the bbr engine It will stall, at this time can only wait for TCP timeout! Of course, the cost of timeout is a bit large, not only for congestion control, for the entire connection, it is no different from the abnormal disaster!

Bbr needs a steady stream of data packets to supply it to run at high speed. Bbr does not care that these packets are new packets, packets marked as LOST, retransmitted packets, or even constructed error packets... There is a data packet! bbr truly implements the decoupling between congestion control and packet marking/sending.

After talking about the needs, talk about the plan.

The Linux kernel has introduced the RACK mechanism before bbr, which is designed to quickly discover and retransmit packets that have been lost again after retransmission. The processing of these packets is very important. If not processed in time, Will fall into the abyss of RTO. Of course, the RTO callback is in your own hands, and you can handle it less aggressively. However, for the connection that has stalled, it is self-deception to continue to barely support it... RACK solves this problem. However, the RACK before bbr does not get the most benefit, because RACK can immediately detect which packets are lost, especially those that are lost again after retransmission, but since the calculation of the congestion window has been fixed towards ssthresh at this time. The direction of the PRR falls, limited by the congestion window, and the instant ready to send more data can not be sent out! The bbr engine can instantly digest the energy sent by the RACK, and the two cooperate to open a dynamic High-end engine.

RACK (please look at this draft first) From the name, the Recent ACK, that is, the most recent ACK, of course, also includes SACK, so the correct name should be Recent (s) ACK, RACK does not record data is (s) The time of the ACK, but when the ACK is received, the transmission time of the data packet confirmed by the ACK is recorded, and the Recent is sent in the transmission time, that is, the latest transmission. The idea of RACK is to record the transmission time T.rack of the packet confirmed by this Recent (s) ACK, and then give a time window twin, and the unacknowledged packets sent before the time T.rack-twin are marked. For LOST, then these packets will be sent to the sending logic to send. This is very common sense.

The RACK code is super simple. The core logic is a file with two functions, located in net/ipv4/tcp_recovery.c:

* at one time ( s) The latest transmission time of the acknowledged packet is the rack.mstamp during the processing of the ACK.

* xmit_time - Transmitted time of the currently processed acknowledged packet

* sacked - Is the selected packet of the currently processed acknowledged packet selected for confirmation? Retransmitted Have you ever been? Wait a minute.

void tcp_rack_advance(struct tcp_sock *tp, const struct skb_mstamp *xmit_time, u8 sacked);

* Record according to tcp_rack_advance The transmission time of the confirmed packet to be transmitted at the latest is rack.mstamp and

* The difference between the transmission time skb.mstamp of the packet that is not selected and confirmed in the retransmission queue, and whether or not the flag is LOST.

* RACK has a built-in twin, and all packets that match rack.mstamp-skb.mstamp>twin are marked as LOST.

* If the packet has been retransmitted, then clear the reposted stamp!

int tcp_rack_mark_lost(struct sock *sk); Br>

The above is the two interfaces about RACK. TCP will call these two interfaces when processing ACK:

1). Handling the information carried by ACK (TCP header ACK number or option) When calling sACK block), call tcp_rack_advance;

2). When sending an ACK is not a sequential ACK, call tcp_rack_mark_lost when entering the exception Alert.

The simplicity of this RACK mechanism is that it no longer distinguishes between normal sequential ACKs and SACKs. It only compares timestamps. Regardless of the order of transmission, it is only based on the information carried in the acknowledgment to determine whether a packet is a packet. To be retransmitted. Take the following sequence as an example:

1| 2| 3| 4| 5| 6| 7| 8|

Suppose an ACK confirms 4, then UNA is 5, assuming that the ACK does not carry SACK information, only confirms 4, then rack.mstamp is the time of 4 transmission, and now the problem is, after 4 How can 5,6,7,8 be sent before 4? It is obviously after 4! The simplicity of RACK is reflected here! 5,6,7,8 is behind 4 but no one can guarantee They must be sent in the order of the serial number. It is more reasonable to record the time series of transmission! The typical scenario is that 4, 5, 6, 7, and 8 are retransmitted packets, first retransmitted 7. 8, then retransmit 6, then retransmit 4, and finally retransmit 5, so the time sequence sent is:

1| 2| 3| 7| 8| 6| 4| 5|

Now 4 is confirmed, according to the above time sequence, I have reason to continue waiting for the confirmation of 5, because 5 is sent after 4, but 7,8,6 are sent before 4 In the end, etc.? Here 7 is the first to send, to determine the difference between skb7.mstamp and rack.mstamp, if it is greater than twin, it means that it is unbearable to continue to wait for 7 selection confirmation, and vice versa. In the win, then there is reason to continue to wait, it may be out of order! The same strategy handles 8 and 6.

Is this processing much simpler? Just compare the difference between the last transmission time of the packet and the rack.mstamp in time order! Ignore whether the packet was ever retransmitted. However, this solves the complexity problem of performing LOST judgment according to the serial number.

However, in the case of out-of-order, you may think that the RACK mechanism may misrepresent many packets that are not lost (actually out of order or ACK out of order), in fact, here Reflecting the role of the twin time window, the RACK time sequence is not a strict time sequence, it is a quasi-time sequence mechanism with buffer. Even if you think that twin is useless, you can turn off RACK again if you are not good!

Note that the choice of twin is generally 1/4 of the minimum RTT, where the minimum RTT is independent of SRTT. , it is the true measured measured discrete RTT win_minmax (please see the win_minmax details in the "Google's BBR TCP Congestion Control Algorithm's Four Variable Speed Engines") minimum, based on a minimum RTT sampled by a time window that automatically slides backwards, here The main purpose is not to smooth out the noise (which is a bit like the ostrich strategy...), but to filter out the jitter caused by non-congestion, which is an act of actively finding noise that minimizes the effects of BufferBloat.

With the RACK mechanism, bbr no longer has to worry about the fact that there is no packet to send.

As long as bbr can collect bandwidth and RTT according to (s) ACK, then bbr can run according to these bandwidths, RTT feedback full rate, and the bandwidth and RTT feedback can be sent back. The package, once again, whether it is a new packet or a retransmission packet, as long as it is sent, they can all feed back the result, whether it is ACK, SACK, or DSACK... We are back around, and the RACK mechanism has even lost packets. Streaming packets that can be sent (that is, packets marked as LOST) can also be provided when events such as out-of-order are used. This process runs smoothly, no longer waiting for the RTO to time out!

Before bbr, once the packet loss or serious out-of-order occurs, TCP will take over the congestion control algorithm, but now it is not! The previous practice is wrong, The so-called packet loss, out of order, these are TCP congestion control state machine logic to guess their own, to a large extent is not true, any algorithm can not accurately guess whether it is really a packet loss, want to deceive a Bats are very easy to hit the wall. Similarly, TCP is also a scorpion! So, the practice of bbr is correct:

The bbr algorithm itself: Calculate the send rate and window.

TCP Congestion Control State Machine: Prepare new data, mark LOST (legacy mode and RACK mode), that is, provide transportable data packets, and fill the esophagus provided by the bbr algorithm.

TCP Transfer Logic: Actually transfer any packets that can be transferred, new packets, all packets marked as LOST.

The above three people cooperate with each other and answer the question "How much is transmitted?", "What is transmitted?", "and how to transmit?" and other issues, and the three are completely based on the present ( s) ACK feedback to interact, completely independent of each other.

...

What do you do next?

Next, I am going to spit a slot. Things have to start after the industrial revolution, people try to put the steam engine on the carriage. ..