Saturday, September 3, 2011

DSMARK queuing discipline


DSMARK queuing discipline was specially designed by Werner Almesberger (C sources indicate that Werner wrote this code), Jamal Hadi Salim and Alexey Kuznetsov to fulfill with the Differentiated Service specifications. It is in charge of packet marking in the DS Linux  implementation.
Its behavior is really very simple when it is compared with some other Linux Traffic Control queuing disciplines. People complain about how to use DSMARK because the available documentation is really highly technical. I will try to clear is this section, as far as my understanding permits this, how to configure and how to use the DSMARK queuing discipline.
 
Let us start by telling that, on the contrary to other queuing disciplines, DSMARK doesn't shape, control or police traffic. It doesn't prioritize, delay, reorder or drop packets. It just marks packets using the DS field. If you, dear lector, jump directly to this section to know, ASAP, how DSMARK work, I would suggest you to have a brief read to section 1.3 where the DS Field is explained.
 
To begin with our explanation we draw our traditional diagram, this time related to the DSMARK qdisc:
 
 
Okay, DSMARK looks like a full class queue. And in fact it is. Classes are numbered as 1, 2, 3, 4,..., n-1, where n is a parameter that defines the size of one internal table required to implement the queuing discipline behavior. This parameter is called indices. Being q the queue number, the element q:0 is the main queue itself. Elements from q:1 to q:n-1 are the classes of the queuing discipline (represented in the figure above as yellow rectangles). Each of these classes can be selected using a filter (its elements are represented as green rectangles) attached to the queuing discipline; the packets being selected by the filter are placed in the respective DSMARK class.
 
  

What is the difference between DSMARK and a full class qdisc as HTB, for example? Basically the class behavior. A HTB class just control the upper rate of packets flowing through it while a DSMARK class just marks packets flowing through it.
Where, what and when does the DSMARK class mark packets? Packets are marked on the DS field. They are marked with an integer value that we define for each class when we configure the queuing discipline. Packets are marked just before they leave the queuing discipline to be placed on the outgoing network driver interface.
To create a new DSMARK queuing discipline we use this command:
Yellows indicate values we have to supply. Let's see this in detail. This command sets our DSMARK as the root queue of the ethernet interface 0. <hd> is the handle number of the queue. <id> is the size of the internal table that defines the number of classes (id-1) contained in the queue. <did> is the default index; those packets that don't match any of the existing classes will be placed in the default class defined by this index. This parameter is optional. Final (optional) parameter is set_tc_index; more about this will be explained below.
Some configuration commands are:
This command creates a DSMARK queuing discipline as root on ethernet interface eth0. The qdisc is numbered as 1:0 and contains a 32 elements table (from element 0 to element 31). Elements 1 to 31 are usable as classes 1:1 to 1:31.
This command creates a DSMARK queuing discipline having class 1:1 (from other qdisc) as parent. The qdisc is numbered as 2:0 and contains a 8 elements table. Default index is number 7 which correspond to the dsmark class 2:7.
Okay, fine. But, how does DSMARK mark packets? To answer this question we need a schematic representation of the DSMARK internal table. Next picture helps to understand better this stuff:
In this figure we enter by the left with a class number corresponding to an index value. In the example, the class number is 1:3 (index 3 assuming DSMARK as 1:0). The internal table has two columns called mask and value containing hexadecimal integer values. The integer values selected are: mask = 0x3value = 0x68.
With these two values DSMARK goes to the packet, extract the DS field integer value and applies the following operation:
 New_DS = (Old_DS & mask) | value
where & and | are the bitwise and and or operators. The new calculated DS field value is placed back on the packet's DS field. In the figure a packet with the DS field set to 0x0 (best-effort DSCP) is entering. When DSMARK takes care the packet leaves the queue marked as DS field 0x68 corresponding to the DS Assure Forwarding class AF31. Observe that the packet enters as a common best-effort packet and leaves the queue with some money in its pocket; as a gentleman AF31 packet. Exactly what we wanted to do.
Observe that 0x3 mask preserves the two rightmost bits of the packet in case they (one or both of them) are set. This is a valid approach being the packet already marked to indicate some ECN condition. The 0x68 value bitwise or operation sets packet's DS field to 0x68.
What we have to learn now is how to fill our internal table, this means, how we make the relationship between index (or classid) values and pairs of mask-value values. But first let us present table 2.9.1 that helps a lot when dealing with those complicated DSCPs and DSs, even more because we have to deal with them in binary representation and hexadecimal representation.
 
 
Great!! In the table we have DS classes AF1 to AF4 and EF. DP is the 'drop precedence'. From left to right we have the DSCP value defined by the DS specification; the binary b-DSCP (add two zeroes to the left of the DSCP value); the hexadecimal x-DSCP value; the binary b-DS field value (add two zeroes to the right of the DSCP value); and the hexadecimal x-DS field value.
 
With this table and the mask and value values we can make practically what we want. For example:
 
  

What you want to do
mask
value
We would like to set any packet entering the classid as DS class AF23:0x00x58
We would like to set any packet entering the classid as DS class AF12 but preserving the ECN bits:0x30x30
We would like to change the 'drop precedence' to any DS AF packet entering the classid from what it has to 'drop precedence' 2; we want to preserve ECN bits. (0xe3 is 11100011; it preserves the first three class bits and the last two ECN bits. It sets to zero the 'drop precedence' bits. 0x10 is 00010000 which sets to 2 the 'drop precedence' bits).0xe30x10
We would like to change the class to any DS AF packet entering the classid from what is has to class AF3. The 'drop precedence' and ECN bits must be preserved. (0x1f is 00011111; it sets to zero the AF class bits and preserves the 'drop precedence' and ECN bits. 0x60 is 01100000 which sets to 3 the AF class bits).0x1f0x60
Okay fellows!! We are almost experts working with these things. Let's see now how to create the DSMARK table. Suppose we want to create the following table:
Thinking a little to understand what this table does with the packets is left as an exercise. Configuring the queue is very easy. We will use an eight elements DSMARK table:
There's really nothing to explain. First command creates the DSMARK queuing discipline. Next seven commands change each class (already created with the first command) to build our table.
Well, my friends, all this stuff is clear but we left behind how to place the packets entering the DSMARK into the different classes. We need a little help from a DSMARK attached filter. We will use theu32 filter classifier assuming (to simplify things because we want just to show how to set the filter) that each DSMARK class from 1 to 7 correspond to one network ranging from 192.168.1/24 to 192.168.7/24 respectively. You can select more complicated filter elements to make your tests:
This filter, for example, places flows coming from network 192.168.1/24 on DSMARK class 1:1; packets from this network will leave DSMARK with their DS field value changed to 0xb8, which corresponds to DS class EF (Expedited Forwarding PHB).
DSMARK goes very nice. We know already how to set the discipline, how to create its classes and how to attach a filter for packet classifying. But, what about the set_tc_index parameter? To understand this we are going to steal next figure from Differentiated Services on Linux [10]:
This figure represents a DSMARK queuing discipline. Recall when we studied GRED that DS on Linux designers extend the packet buffer to add a new field called tc_index. The packet buffer is accessed using a pointer called skb. Then, to access the new field skb->tc_index is used.  This field is represented by the bottom dashed line in the figure.
The packet's buffer structure (struct sk_buff) contains the pointer iph that points to another structure (struct iphdr). This structure contains the field tos where the packet's TOS field is copied when the packet buffer is created. Then, to access the packet's TOS field skb->iph->tos is used. This field is represented by the top dashed line in the figure.
When you create a new DSMARK using the optional set_tc_index parameter, the queuing discipline will copy the packet's TOS field value (the DS field, in fact) contained in skb->iph->tos onto skb->tc_index on packet entrance. This is represented in the figure by the vertical line going from top to bottom, just in the DSMARK queue entrance. Perhaps, you should be asking: but, why?
 
The skb->tc_index field is a very important component of the Differentiated Service on Linux architecture. Recall, for example, that the 4-rightmost bits of this field are used to select a virtual RED queue where the packet is going to be placed when using the GRED queuing discipline.
 
But skb->tc_index usefulness doesn't end here. This field is also read by the special tcindex classifier to select a class identifier. This is shown in the figure by the next vertical line, this time going from bottom to top. The tcindex classifier reads the skb->tc_index value, could perform (or not) some bitwise operations on it, and use the final result to select a class identifier to the next inner queuing discipline. This is shown in the figure by the arrow going from the tcindex green filter element to the yelow classid of the inner queuing discipline.
 
When the tcindex classifier returns a class identifier value to the DSMARK queuing discipline as was explained above, the discipline (not the classifier, aswas very intelligently pointed out to me for the german student Steffen Moser in a very pleasing e-mail information exchange), copies back this value again onto the skb->tc_index field. This is shown in the figure by the next (third) vertical line, this time going back from the yellow classid block to the skb->tc_index bottom field.
 
The skb->tc_index value is also used to select a class identifier (an index) to enter in the DSMARK internal table and get the mask and value values.The command tc class change dev eth0 classid 1:1 dsmark mask 0x3 value 0x28, for example, is in fact ordering:  packets with its skb->tc_indexfield set to 1 (the minor value of the classid) must go to the index 1 internal DSMARK table register, and get from here the mask and value values. Next the packet's DS field value is read from the skb->iph->tos field, the bitwise and and or operation is applied, and the final value is placed back onto theskb->iph->tos field, where finally it will be copied back in the actual packet's DS field, just before it leaves the queuing discipline to be passed to the outgoing driver interface. This entire process is shown in the internal table schematic representation to the right of the DSMARK figure, above.
 
  

I think (surely you share this mind, too) that tcindex classifier behavior understanding is so important for the Differentiated Service on Linux implementation, that I have reserved the next section to study this terrific and horrific monster. For now, it's over. See you later, alligator...

http://opalsoft.net/qos/DS-29.htm