Friday, June 10, 2011

How to Optimize your Internet Connection using MTU and RWIN

The TCP Maximum Transmission Unit (MTU) is the maximum size of a single TCP packet that can pass through a TCP/IP network.


An easy way to figure out what your MTU should be is to use ping where you specify the payload size:

ping -s 1464 -c1 google.com

Note though that the total IP packet size will be 1464+28=1492 bytes since there is 28 bytes of header info. Thus if the packet gets fragmented for payload above 1464, then you should set your MTU=1492. Ping will let you know when it becomes fragmented with something like the following:

ping -s 1464 -c1 google.com

PING google.com (72.14.207.99) 1464(1492) bytes of data.
64 bytes from eh-in-f99.google.com (72.14.207.99): icmp_seq=1 ttl=237 (truncated)

--- google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 118.672/118.672/118.672/0.000 ms
john@TECH5321:~$ ping -s 1465 -c1 google.com
PING google.com (64.233.167.99) 1465(1493) bytes of data.
From adsl-75-18-118-221.dsl.sndg02.sbcglobal.net (75.18.118.221) icmp_seq=1 Frag needed and DF set (mtu = 1492)

--- google.com ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

In other words, to find your correct MTU, you would first start with a small packet size, and then gradually increase it until you see fragmentation; the cutoff point will be what to use for your MTU (using the formula payload + 28 = MTU). Note in the first case shown above where the payload size is 1464, the packet was transmitted fine, but in the second case where the payload size is 1465, ping complains "Frag needed"; to clarify, that means any packet with a payload of 1464 or less will be sent just fine, but a payload size of 1465 or above will end up being fragmented. Therefore, 1464 is the maximum payload, and that means the MTU is 1464+28=1492.

To set the MTU temporarily (will be lost after a reboot), you can do:

sudo ifconfig mtu 1492

Note that unfortunately some NICs do not allow you to change their MTU. You can use "ifconfig" by itself to see what the MTU is for your NIC and whether the MTU changes when you use the above command.
Or to make the change permanent, you can add it to /etc/network/interfaces:

gksudo gedit /etc/network/interfaces

And then add "mtu " in it for the particular interface. Here's an example of mine that uses my wireless interface wlan0:

iface wlan0 inet static
address 192.168.1.23
netmask 255.255.255.0
gateway 192.168.1.1
wireless-essid John's Home WLAN
mtu 1492

TCP Receive Window (RWIN)


In computer networking, RWIN (TCP Receive Window) is the maximum amount of data that a computer will accept before acknowledging the sender. In practical terms, that means when you download say a 20 MB file, the remote server does not just send you the 20 MB continuously after you request it. When your computer sends the request for the file, your computer tells the remote server what your RWIN value is; the remote server then starts streaming data at you until it reaches your RWIN value, and then the server waits until your computer acknowledges that you received that data OK. Once your computer sends the acknowledgement, then the server continues to send more data in chunks of your RWIN value, each time waiting for your acknowledgment before proceeding to send more.

Now the crux of the problem here is with what is called latency, or the amount of time that it takes to send and receive packets from the remote server. Note that latency will depend not only on how fast the connection is between you and the remote server, but it also includes all additional delays, such as the time that it takes for the server to process your request and respond. You can easily find out the latency between you and the remote server with the ping command. When you use ping, the time that ping reports is the round-trip time (RTT), or latency, between you and the remote server.

When I ping google.com, I typically get a latency of 100 msec. Now if there were no concept of RWIN, and thus my computer had to acknowledge every single packet sent between me and google, then transfer speed between me and them would be simply the (packet size)/RTT. Thus for a maximum sized packet (my MTU as we learned above), my transfer speed would be:

1492 bytes/.1 sec = 14,920 B/sec or 14.57 KiB/sec

That is pathetically slow considering that my connection is 3 Mb/sec, which is the same as 366 KiB/sec; so I would be using only about 4% of my available bandwidth. Therefore, we use the concept of RWIN so that a remote server can stream data to me without having to acknowledge every single packet and slow everything down to a crawl.

Note that the TCP receive window (RWIN) is independent of the MTU setting. RWIN is determined by the BDP (Bandwidth Delay Product) for your internet connection, and BDP can be calculated as:

BDP = max bandwidth of your internet connection (Bytes/second) * RTT (seconds)

Therefore RWIN does not depend on the TCP packet size, and TCP packet size is of course limited by the MTU (Maximum Transmission Unit).

Before we change RWIN, use the following command to get the kernel variables related to RWIN:

sysctl -a 2> /dev/null | grep -iE "_mem |_rmem|_wmem"

Note the space after the _mem is deliberate, don't remove it or add other spaces elsewhere between the quotes.

You should get the following three variables:

net.ipv4.tcp_rmem = 4096 87380 2584576
net.ipv4.tcp_wmem = 4096 16384 2584576
net.ipv4.tcp_mem = 258576 258576 258576

The variable numbers are in bytes, and they represent the minimum, default, and maximum values for each of those variables.

net.ipv4.tcp_rmem = Receive window memory vector
net.ipv4.tcp_wmem = Send window memory vector
net.ipv4.tcp_mem = TCP stack memory vector

Note that there is no exact equivalent variable in Linux that corresponds to RWIN, the closest is the net.ipv4.tcp_rmem variable. The variables above control the actual memory usage (not just the TCP window size) and include memory used by the socket data structures as well as memory wasted by short packets in large buffers. The maximum values have to be larger than the BDP (Bandwidth Delay Product) of the path by some suitable overhead.

To try and optimize RWIN, first use ping to send the maximum size packet your connection allows (MTU) to some distant server. Since my MTU is 1492, the ping command payload would be 1492-28=1464. Thus:

ping -s 1464 -c5 google.com

PING google.com (64.233.167.99) 1464(1492) bytes of data.
64 bytes from py-in-f99.google.com (64.233.167.99): icmp_seq=1 ttl=237 (truncated)
64 bytes from py-in-f99.google.com (64.233.167.99): icmp_seq=2 ttl=237 (truncated)
64 bytes from py-in-f99.google.com (64.233.167.99): icmp_seq=3 ttl=237 (truncated)
64 bytes from py-in-f99.google.com (64.233.167.99): icmp_seq=4 ttl=237 (truncated)
64 bytes from py-in-f99.google.com (64.233.167.99): icmp_seq=5 ttl=237 (truncated)

--- google.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3999ms
rtt min/avg/max/mdev = 101.411/102.699/105.723/1.637 ms

Note though that you should run the above test several times at different times during the day, and also try pinging other destinations. You'll see RTT might vary quite a bit.

But for the above example, the RTT average is about 103 msec. Now since the maximum speed of my internet connection is 3 Mbits/sec, then the BDP is:
Code:

(3,000,000 bits/sec) * (.103 sec) * (1 byte/8 bits) = 38,625 bytes

Thus I should set the default value in net.ipv4.tcp_rmem to about 39,000. For my internet connection, I've seen RTT as bad as 500 msec, which would lead to a BDP of 187,000 bytes. Therefore, I could set the max value in net.ipv4.tcp_rmem to about 187,000. The values in net.ipv4.tcp_wmem should be the same as net.ipv4.tcp_rmem since both sending and receiving use the same internet connection. And since net.ipv4.tcp_mem is the maximum total memory buffer for TCP transactions, it is usually set to the the max value used in net.ipv4.tcp_rmem and net.ipv4.tcp_wmem.

And lastly, there are two more kernel TCP variables related to RWIN that you should set:

sysctl -a 2> /dev/null | grep -iE "rcvbuf|save"

which returns:

net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_moderate_rcvbuf = 1

Note enabling net.ipv4.tcp_no_metrics_save (setting it to 1) means have Linux optimize the TCP receive window dynamically between the values in net.ipv4.tcp_rmem and net.ipv4.tcp_wmem. And enabling net.ipv4.tcp_moderate_rcvbuf removes an odd behavior in the 2.6 kernels, whereby the kernel stores the slow start threshold for a client between TCP sessions. This can cause undesired results, as a single period of congestion can affect many subsequent connections.

Before you change any of the above variables, try going to http://www.speedtest.net or a similar website and check the speed of your connection. Then temporarily change the variables by using the following command with your own computed values:

sudo sysctl -w net.ipv4.tcp_rmem="4096 39000 187000" net.ipv4.tcp_wmem="4096 39000 187000" net.ipv4.tcp_mem="187000 187000 187000" net.ipv4.tcp_no_metrics_save=1 net.ipv4.tcp_moderate_rcvbuf=1

Then retest your connection and see if your speed improved at all.

Once you tweak the values to your liking, you can make them permanent by adding them to /etc/sysctl.conf as follows:

net.ipv4.tcp_rmem=4096 39000 187000
net.ipv4.tcp_wmem=4096 39000 187000
net.ipv4.tcp_mem=187000 187000 187000
net.ipv4.tcp_no_metrics_save=1
net.ipv4.tcp_moderate_rcvbuf=1

And then do the following command to make the changes permanent:

sudo sysctl -p


http://swik.net/Ubuntu/Only+Ubuntu/How+to+Optimize+your+Internet+Connection+using+MTU+and+RWIN/cbnda