Traffic Shaping/Control Tom Eastep Arne Bernin 2006-05-22 2001-2006 Thomas M. Eastep 2005 Arne Bernin & Thomas M. Eastep Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover, and with no Back-Cover Texts. A copy of the license is included in the section entitled GNU Free Documentation License. Traffic shaping is complex and the Shorewall community is not well equiped to answer traffic shaping questions. So if you are the type of person who needs "insert tab A into slot B" instructions for everything that you do, then please don't try to implement traffic shaping using Shorewall. You will just frustrate yourself and we won't be able to help you. Said another way, reading just Shorewall documentation is probably not going to give you enough background to use this material. Shorewall may make iptables easy but the Shorewall team simply can't be expected to spoon-feed Linux traffic control to you (please remember that the user's manual for a tractor doesn't teach you to grow corn either). At a minimum, you will need to refer to at least the following additional information: The LARTC HOWTO: http://www.lartc.org Some of the documents listed at http://www.netfilter.org/documentation/index.html#documentation-howto. The tutorial by Oskar Andreasson is particularly good. The output of man iptables
Introduction Starting with Version 2.5.5, Shorewall has builtin support for traffic shaping and control. Before this version, the support was quite limited. You were able to use your own tcstart script (and you still are), but besides the tcrules file it was not possible to define classes or queueing discplines inside the Shorewall config files. The support for traffic shaping and control still does not cover all options available (and especially all algorithms that can be used to queue traffic) in the Linux kernel but it should fit most needs. If you are using your own script for traffic control and you still want to use it in the future, you will find information on how to do this, later in this document. But for this to work, you will also need to enable traffic shaping in the kernel and Shorewall as covered by the next sections.
Linux traffic shaping and control This section gives a brief introduction of how controlling traffic with the linux kernel works. Although this might be enough for configuring it in the Shorewall configuration files, we strongly recommend that you take a deeper look into the Linux Advanced Routing and Shaping HOWTO. At the time of writing this, the current version is 1.0.0. Since kernel 2.2 Linux has extensive support for controlling traffic. You can define different algorithms that are used to queue the traffic before it leaves an interface. The standard one is called pfifo and is (as the name suggests) of the type First In First out. This means, that it does not shape anything, if you have a connection that eats up all your bandwidth, this qeueing algorithm will not stop it from doing so. For Shorewall traffic shaping we use two algorithms, one is called HTB (Hierarchical Token Bucket) and SFQ (Stochastic Fairness Queuing). SFQ is easy to explain: it just tries to track your connections (tcp or udp streams) and balances the traffic between them. This normally works well. HTB allows you to define a set of classes, and you can put the traffic you want into these classes. You can define minimum and maximum bandwitdh settings for those classes and order them hierachically (the less priorized classes only get bandwitdth if the more important have what they need). Shorewall builtin traffic shaping allows you to define these classes (and their bandwidth limits), and it uses SFQ inside these classes to make sure, that different data streams are handled equally. You can only shape outgoing traffic. The reason for this is simple, the packets were already received by your network card before you can decide what to do with them. So the only choice would be to drop them which normally makes no sense (since you received the packet already, it went through the possible bottleneck (the incoming connection). The next possible bottleneck might come if the packet leaves on another interface, so this will be the place where queuing might occur. So, defining queues for incoming packets is not very useful, you just want to have it forwarded to the outgoing interface as fast as possible. There is one exception, though. Limiting incoming traffic to a value a bit slower than your actual line speed will avoid queueing on the other end of that connection. This is mostly useful if you don't have access to traffic control on the other side and if this other side has a faster network connection than you do (the line speed between the systems is the bottleneck, e.g. a DSL or Cable Modem connection to your provider's router, the router itself is normally connected to a much faster backbone). So, if you drop packets that are coming in too fast, the underlying protocol might recognize this and slow down the connection. TCP has a builtin mechanism for this, UDP has not (but the protocol over UDP might recognize it , if there is any). The reason why queing is bad in these cases is, that you might have packets which need to be priorized over others, e.g. VoIP or ssh. For this type of connections it is important that packets arrive in a certain amount of time. For others like http downloads, it does not really matter if it takes a few seconds more. If you have a large queue on the other side and the router there does not care about QoS or the QoS bits are not set properly, your important packets will go into the same queue as your less timecritical download packets which will result in a large delay. You shape and control outgoing traffic by assigning the traffic to classes. Each class is associated with exactly one network interface and has a number of attributes: PRIORITY - Used to give preference to one class over another when selecting a packet to send. The priority is a numeric value with 1 being the highest priority, 2 being the next highest, and so on. RATE - The minimum bandwidth this class should get, when the traffic load rises. Classes with a higher priority (lower PRIORITY value) are served even if there are others that have a guaranteed bandwith but have a lower priority (higher PRIORITY value). CEIL - The maximum bandwidth the class is allowed to use when the link is idle. MARK - Netfilter has a facility for marking packets. Packet marks have a numeric value which is limited in Shorewall to the values 1-255. You assign packet marks to different types of traffic using entries in the /etc/shorewall/tcrules file. One class for each interface must be designated as the default class. This is the class to which unmarked traffic (packets to which you have not assigned a mark value in /etc/shorewall/tcrules) is assigned. Netfilter also supports a mark value on each connection. You can assign connection mark values in /etc/shorewall/tcrules, you can copy the current packet's mark to the connection mark (SAVE), or you can copy the connection mark value to the current packet's mark (RESTORE).
Linux Kernel Configuration You will need at least kernel 2.4.18 for this to work, please take a look at the following screenshot for what settings you need to enable. For builtin support, you need the HTB scheduler, the Ingress scheduler, the PRIO pseudoscheduler and SFQ queue. The other scheduler or queue algorithms are not needed. You also need the u32 and fw classifiers from near the bottom of the main Networking Options menu (not shown). This screen shot shows how I've configured QoS in my 2.6.13 Kernel: Kernel configuration changed in 2.6.16 -- here's my recommendation.
Enable TC support in Shorewall You need this support whether you use the builtin support or whether you provide your own tcstart script. To enable the builtin traffic shaping and control in Shorewall, you have to do the following: Set TC_ENABLED to "Internal" in /etc/shorewall/shorewall.conf. Setting TC_ENABLED=Yes causes Shorewall to look for an external tcstart file (See a later section for details). Setting CLEAR_TC parameter in /etc/shorewall/shorewall.conf to Yes will clear the traffic shaping configuration during Shorewall [re]start and Shorewall stop. This is normally what you want when using the builtin support (and also if you use your own tcstart script) The other steps that follow depend on whether you use your own script or the builtin solution. They will be explained in the following sections.
Using builtin traffic shaping/control For defining bandwidths (for either devices or classes) please use kbit or kbps(for Kilobytes per second) and make sure there is NO space between the number and the unit (it is 100kbit not 100 kbit). Using mbit, mbps or a raw number (which means bytes) could be used, but note that only integer numbers are supported (0.5 is not valid). To properly configure the settings for your devices you might need to find out the real up- and downstream rates you have. This is especially the case, if you are using a DSL connection or one of another type that do not have a guaranteed bandwidth. Don't trust the values your provider tells you for this; especially measuring the real download speed is important! There are several online tools that help you find out; search for "dsl speed test" on google (For Germany you can use arcor speed check). Be sure to choose a test located near you.
/etc/shorewall/tcdevices This file allows you to define the incoming and outgoing bandwidth for the devices you want traffic shaping to be enabled. That means, if you want to use traffic shaping for a device, you have to define it here. Columns in the file are as follows: INTERFACE - Name of interface. Each interface may be listed only once in this file. You may NOT specify the name of an alias (e.g., eth0:0) here; see FAQ #18. You man NOT specify wildcards here, e.g. if you have multiple ppp interfaces, you need to put them all in here! With Shorewall versions prior to 3.0.8 and 3.2.0 Beta 8, the device named in this column must exist at the time that Shorewall is started, restarted or refreshed. Beginning with Shorewall 3.0.8 and 3.2.0 Beta 8, Shorewall will determine if the device exists and will only configure the device if it exists. If it doesn't exist, the following warning is issued: WARNING: Device <device name> not found -- traffic-shaping configuration skipped IN-BANDWIDTH - The incoming Bandwidth of that interface. Please note that you are not able to do traffic shaping on incoming traffic, as the traffic is already received before you could do so. This Column allows you to define the maximum traffic allowed for this interface in total, if the rate is exceeded, the packets are dropped. You want this mainly if you have a DSL or Cable Connection to avoid queuing at your providers side. If you don't want any traffic to be dropped set this to a value faster than your interface maximum rate. To determine the optimum value for this setting, we recommend that you start by setting it significantly below your measured download bandwidth (20% or so). While downloading, measure the ping response time from the firewall to the upstream router as you gradually increase the setting.The optimal setting is at the point beyond which the ping time increases sharply as you increase the setting. OUT-BANDWIDTH - Specifiy the outgoing bandwidth of that interface. This is the maximum speed your connection can handle. It is also the speed you can refer as "full" if you define the tc classes. Outgoing traffic above this rate will be dropped. Suppose you are using PPP over Ethernet (DSL) and ppp0 is the interface for this. The device has an outgoing bandwidth of 500kbit and an incoming bandwidth of 6000kbit #INTERFACE IN-BANDWITH OUT-BANDWIDTH ppp0 6000kbit 500kbit
/etc/shorewall/tcclasses This file allows you to define the actual classes that are used to split the outgoing traffic. INTERFACE - Name of interface. Must match the name of an interface with an entry in /etc/shorewall/tcdevices. MARK - The mark value which is an integer in the range 1-255. You define these marks in the tcrules file, marking the traffic you want to go into the queueing classes defined in here. You can use the same marks for different Interfaces. RATE - The minimum bandwidth this class should get, when the traffic load rises. Please note that first the classes which equal or a lesser priority value are served even if there are others that have a guaranteed bandwith but a lower priority. CEIL - The maximum bandwidth this class is allowed to use when the link is idle. Useful if you have traffic which can get full speed when more important services (e.g. interactive like ssh) are not used. You can use the value "full" in here for setting the maximum bandwidth to the defined output bandwidth of that interface. PRIORITY - you have to define a priority for the class. packets in a class with a higher priority (=lesser value) are handled before less priorized onces. You can just define the mark value here also, if you are increasing the mark values with lesser priority. OPTIONS - A comma-separated list of options including the following: default - this is the default class for that interface where all traffic should go, that is not classified otherwise. defining default for exactly one class per interface is mandatory! tos-<tosname> - this lets you define a filter for the given <tosname> which lets you define a value of the Type Of Service bits in the ip package which causes the package to go in this class. Please note, that this filter overrides all mark settings, so if you define a tos filter for a class all traffic having that mark will go in it regardless of the mark on the package. You can use the following for this option: tos-minimize-delay (16) tos-maximize-throughput (8) tos-maximize-reliability (4) tos-minimize-cost (2) tos-normal-service (0) Each of this options is only valid for one class per interface. tcp-ack - if defined causes an tc filter to be created that puts all tcp ack packets on that interface that have an size of <=64 Bytes to go in this class. This is useful for speeding up downloads. Please note that the size of the ack packets is limited to 64 bytes as some applications (p2p for example) use to make every package an ack package which would cause them all into here. We want only packets WITHOUT payload to match, so the size limit. Bigger packets just take their normal way into the classes. This option is only valid for class per interface.
/etc/shorewall/tcrules The fwmark classifier provides a convenient way to classify packets for traffic shaping. The /etc/shorewall/tcrules file is used for specifying these marks in a tabular fashion. Normally, packet marking occurs in the PREROUTING chain before any address rewriting takes place. This makes it impossible to mark inbound packets based on their destination address when SNAT or Masquerading are being used. You can cause packet marking to occur in the FORWARD chain by using the MARK_IN_FORWARD_CHAIN option in shorewall.conf. Columns in the file are as follows: MARK or CLASSIFY - MARK specifies the mark value is to be assigned in case of a match. This is an integer in the range 1-255. This value may be optionally followed by : and either F or P to designate that the marking will occur in the FORWARD or PREROUTING chains respectively. If this additional specification is omitted, the chain used to mark packets will be determined by the setting of the MARK_IN_FORWARD_CHAIN option in shorewall.conf. To use CLASSIFY, your kernel and iptables must include CLASSIFY target support. In that case, this column contains a classification (classid) of the form <major>:<minor> where <major> and <minor> are integers. Corresponds to the 'class' specification in these traffic shaping modules: atm cbq dsmark pfifo_fast htb prio Classify always occurs in the POSTROUTING chain. When used with the builtin traffic shaper, the <major> class is the device number (the first entry in /etc/shorewall/tcdevices is device 1, the second is device 2 and so on) and the <minor> class is the MARK value of the class + 100. SOURCE - Source of the packet. A comma-separated list of interface names, IP addresses, MAC addresses and/or subnets for packets being routed through a common path. List elements may also consist of an interface name followed by ":" and an address (e.g., eth1:192.168.1.0/24). For example, all packets for connections masqueraded to eth0 from other interfaces can be matched in a single rule with several alternative SOURCE criteria. However, a connection whose packets gets to eth0 in a different way, e.g., direct from the firewall itself, needs a different rule. Accordingly, use $FW in its own separate rule for packets originating on the firewall. In such a rule, the MARK column may NOT specify either ":P" or ":F" because marking for firewall-originated packets always occurs in the OUTPUT chain. MAC addresses must be prefixed with "~" and use "-" as a separator. Example: ~00-A0-C9-15-39-78 DEST - Destination of the packet. Comma separated list of IP addresses and/or subnets. If your kernel and iptables include iprange match support, IP address ranges are also allowed. List elements may also consist of an interface name followed by ":" and an address (e.g., eth1:192.168.1.0/24). If the MARK column specificies a classification of the form <major>:<minor> then this column may also contain an interface name. PROTO - Protocol - Must be "tcp", "udp", "icmp", "ipp2p", "ipp2p:udp", "ipp2p:all" a number, or "all". "ipp2p" requires ipp2p match support in your kernel and iptables. PORT(S) - Destination Ports. A comma-separated list of Port names (from /etc/services), port numbers or port ranges; if the protocol is "icmp", this column is interpreted as the destination icmp-type(s). If the protocol is ipp2p, this column is interpreted as an ipp2p option without the leading "--" (example "bit" for bit-torrent). If no PORT is given, "ipp2p" is assumed. This column is ignored if PROTOCOL = all but must be entered if any of the following field is supplied. In that case, it is suggested that this field contain "-" CLIENT PORT(S) - (Optional) Port(s) used by the client. If omitted, any source port is acceptable. Specified as a comma-separate list of port names, port numbers or port ranges. USER/GROUP (Added in Shorewall version 1.4.10) - (Optional) This column may only be non-empty if the SOURCE is the firewall itself. When this column is non-empty, the rule applies only if the program generating the output is running under the effective user and/or group. It may contain : [!][<user name or number>]:[<group name or number>][+<program name>] The colon is optionnal when specifying only a user. Examples: joe #program must be run by joe :kids #program must be run by a member of the 'kids' group !:kids #program must not be run by a member of the 'kids' group +upnpd #program named upnpd (This feature was removed from Netfilter in kernel version 2.6.14). TEST - Defines a test on the existing packet or connection mark. The rule will match only if the test returns true. Tests have the format [!]<value>[/<mask>][:C] Where: ! Inverts the test (not equal) <value> Value of the packet or connection mark. <mask> A mask to be applied to the mark before testing :C Designates a connection mark. If omitted, the packet mark's value is tested. LENGTH (Optional, added in Shorewall version 3.2.0) Packet Length - This field, if present, allows you to match the length of a packet against a specific value or range of values. A range is specified in the form <min>:<max> where either <min> or <max> (but not both) may be omitted. If <min> is omitted, then 0 is assumed; if <max> is omitted, than any packet that is <min> or longer will match. You must have iptables length support for this to work. If you let it empy or place an "-" here, no length match will be done. Examples: 1024, 64:1500, :100 TOS (Optional, added in Shorewall version 3.2.0 Beta 6) Type of Service. Either a standard name, or a numeric value to match.
Minimize-Delay (16) Maximize-Throughput (8) Maximize-Reliability (4) Minimize-Cost (2) Normal-Service (0)
All packets arriving on eth1 should be marked with 1. All packets arriving on eth2 and eth3 should be marked with 2. All packets originating on the firewall itself should be marked with 3. #MARK SOURCE DESTINATION PROTOCOL PORT(S) 1 eth1 0.0.0.0/0 all 2 eth2 0.0.0.0/0 all 2 eth3 0.0.0.0/0 all 3 $FW 0.0.0.0/0 all All GRE (protocol 47) packets not originating on the firewall and destined for 155.186.235.151 should be marked with 12. #MARK SOURCE DESTINATION PROTOCOL PORT(S) 12 0.0.0.0/0 155.182.235.151 47 All SSH request packets originating in 192.168.1.0/24 and destined for 155.186.235.151 should be marked with 22. #MARK SOURCE DESTINATION PROTOCOL PORT(S) 22 192.168.1.0/24 155.182.235.151 tcp 22 All SSH packets packets going out of the first device in in /etc/shorewall/tcdevices should be assigned to the class with mark value 10. #MARK SOURCE DESTINATION PROTOCOL PORT(S) CLIENT # PORT(S) 1:110 0.0.0.0/0 0.0.0.0/0 tcp 22 1:110 0.0.0.0/0 0.0.0.0/0 tcp - 22 Mark all ICMP echo traffic with packet mark 1. Mark all peer to peer traffic with packet mark 4. This is a little more complex than otherwise expected. Since the ipp2p module is unable to determine all packets in a connection are P2P packets, we mark the entire connection as P2P if any of the packets are determined to match. We assume packet/connection mark 0 to means unclassified. #MARK SOURCE DESTINATION PROTOCOL PORT(S) CLIENT USER/ TEST # PORT(S) GROUP 1 0.0.0.0/0 0.0.0.0/0 icmp echo-request 1 0.0.0.0/0 0.0.0.0/0 icmp echo-reply RESTORE 0.0.0.0/0 0.0.0.0/0 all - - - 0 CONTINUE 0.0.0.0/0 0.0.0.0/0 all - - - !0 4 0.0.0.0/0 0.0.0.0/0 ipp2p:all SAVE 0.0.0.0/0 0.0.0.0/0 all - - - !0 The last four rules can be translated as:
"If a packet hasn't been classifed (packet mark is 0), copy the connection mark to the packet mark. If the packet mark is set, we're done. If the packet is P2P, set the packet mark to 4. If the packet mark has been set, save it to the connection mark."
ppp devices If you use ppp/pppoe/pppoa) to connect to your internet provider and you use traffic shaping you need to restart shorewall traffic shaping. The reason for this is, that if the ppp connection gets restarted (and it usally does this at least daily), all tc filters/qdiscs related to that interface are deleted. The easiest way to achieve this, is just to restart shorewall once the link is up. To achieve this add a small executable script to/etc/ppp/ip-up.d. #! /bin/sh /sbin/shorewall refresh
Real life examples
Configuration to replace Wondershaper You are able to fully replace the wondershaper script by using the buitin traffic control.You can find example configuration files at "http://www1.shorewall.net/pub/shorewall/Samples/tc4shorewall/. Please note that they are just examples and need to be adjusted to work for you. In this examples it is assumed that your interface for you internet connection is ppp0 (for DSL) , if you use another connection type, you have to change it. You also need to change the settings in the tcdevices.wondershaper file to reflect your line speed. The relevant lines of the config files follow here. Please note that this is just an 1:1 replacement doing exactly what wondershaper should do. You are free to change it...
tcdevices file #INTERFACE IN-BANDWITH OUT-BANDWIDTH ppp0 5000kbit 500kbit
tcclasses file #INTERFACE MARK RATE CEIL PRIORITY OPTIONS ppp0 1 full full 1 tcp-ack,tos-minimize-delay ppp0 2 9*full/10 9*full/10 2 default ppp0 3 8*full/10 8*full/10 2
tcrules file #MARK SOURCE DEST PROTO PORT(S) CLIENT USER # PORT(S) 1:F 0.0.0.0/0 0.0.0.0/0 icmp echo-request 1:F 0.0.0.0/0 0.0.0.0/0 icmp echo-reply # mark traffic which should have a lower priority with a 3: # mldonkey 3 0.0.0.0/0 0.0.0.0/0 udp - 4666 Wondershaper allows you to define a set of hosts and/or ports you want to classify as low priority. To achieve this , you have to add these hosts to tcrules and set the mark to 3 (true if you use the example configuration files).
Setting hosts to low priority lets assume the following settings from your old wondershaper script (don't assume these example values are really useful, they are only used for demonstrating ;-): # low priority OUTGOING traffic - you can leave this blank if you want # low priority source netmasks NOPRIOHOSTSRC="192.168.1.128/25 192.168.3.28" # low priority destination netmasks NOPRIOHOSTDST=60.0.0.0/24 # low priority source ports NOPRIOPORTSRC="6662 6663" # low priority destination ports NOPRIOPORTDST="6662 6663" This would result in the following additional settings to the tcrules file: 3 192.168.1.128/25 0.0.0.0/0 all 3 192.168.3.28 0.0.0.0/0 all 3 0.0.0.0/0 60.0.0.0/24 all 3 0.0.0.0/0 0.0.0.0/0 udp 6662,6663 3 0.0.0.0/0 0.0.0.0/0 udp - 6662,6663 3 0.0.0.0/0 0.0.0.0/0 tcp 6662,6663 3 0.0.0.0/0 0.0.0.0/0 tcp - 6662,6663
A simple setup This is a simple setup for people sharing an internet connection and using different computers for this. It just basically shapes between 2 hosts which have the ip addresses 192.168.2.23 and 192.168.2.42
tcdevices file #INTERFACE IN-BANDWITH OUT-BANDWIDTH ppp0 6000kbit 700kbit We have 6mbit down and 700kbit upstream.
tcclasses file #INTERFACE MARK RATE CEIL PRIORITY OPTIONS ppp0 1 10kbit 50kbit 1 tcp-ack ppp0 2 300kbit full 2 ppp0 3 300kbit full 2 ppp0 4 90kbit 200kbit 3 default We add a class for tcp ack packets with highest priority, so that downloads are fast. The following 2 classes share most of the bandwidth between the 2 hosts, if the connection is idle, they may use full speed. As the hosts should be treated equally they have the same priority. The last class is for the remaining traffic.
tcrules file #MARK SOURCE DEST PROTO PORT(S) CLIENT USER # PORT(S) 1:F 0.0.0.0/0 0.0.0.0/0 icmp echo-request 1:F 0.0.0.0/0 0.0.0.0/0 icmp echo-reply 2:F 192.168.2.23 0.0.0.0/0 all 3:F 192.168.2.42 0.0.0.0/0 all We mark icmp ping and replies so they will go into the fast interactive class and set a mark for each host.
Using your own tc script
Replacing builtin tcstart file If you prefer your own tcstart file, just install it in /etc/shorewall/tcstart. In your tcstart script, when you want to run the tc utility, use the run_tc function supplied by Shorewall if you want tc errors to stop the firewall. Set TC_ENABLED=Yes and CLEAR_TC=Yes Supply an /etc/shorewall/tcstart script to configure your traffic shaping rules. Optionally supply an /etc/shorewall/tcclear script to stop traffic shaping. That is usually unnecessary. If your tcstart script uses the fwmark classifier, you can mark packets using entries in /etc/shorewall/tcrules.
Traffic control outside Shorewall To start traffic shaping when you bring up your network interfaces, you will have to arrange for your traffic shaping configuration script to be run at that time. How you do that is distribution dependent and will not be covered here. You then should: Set TC_ENABLED=No and CLEAR_TC=No If your script uses the fwmark classifier, you can mark packets using entries in /etc/shorewall/tcrules.