Scalability and Performance Tom Eastep 2006 Thomas M. Eastep Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover, and with no Back-Cover Texts. A copy of the license is included in the section entitled GNU Free Documentation License.
Introduction The performance of the shorewall start and shorewall restart commands is a frequent topic of questions. This article attempts to explain the scalability issues involved and to offer some tips for reducing the time required to compile a Shorewall configuration and to execute the compiled script.
Host Groups In this article, we will use the term host group to refer to a set of IP addresses accessed through a particular interface. In a Shorewall configuration, there is one host group for: Each entry in /etc/shorewall/interfaces that contains the name of a zone in the first column. Each entry in /etc/shorewall/hosts. As you can see, each host group is associated with a single zone.
Scaling by Host Groups For each host group, it is possible to attempt connections to every other host group; and if the host group has the routeback option, then it is possible for connections to be attempted from the host group to itself. So if there are H host groups defined in a Shorewall configuration, then the number of unique pairs of (source host group, destination host group) is H*H or H2. In other words, the number of combinations is the square of the number of host groups and increasing the number of groups from H to H+1 adds H + H + 1 = 2H + 1 additional combinations.
Scaling by Zones A similar scaling issue applies to Shorewall zones. If there are Z zones, then connections may be attempted from a given zone Zn to all of the other zones (including to Zn itself). Hence, the number of combinations is the square of the number of zones or Z2.
Scaling within the Shorewall Code Shorewall is written entirely in Bourne Shell. While this allows Shorewall to run on a wide range of distributions (included embedded ones), the shell programming environment is not ideal for writing the compiler portion of Shorewall. As a consequence, the code must repeatedly perform sequential searches of lists. If a list has N elements and a sequential search is made for each of those elements, then the number of comparisons is 1 + 2 + 3 + .... + N = N * (N + 1 ) / 2. So again, we see order N2 scaling.
Improving Performance Achieving good performance boils down to two things: Use a light-weight shell and fast hardware. Especially in the compiler, a light-weight shell such as ash or dash can provide considerable improvement over bash. With all of the order N2 scaling that is implicit in the problem being solved, it is vital to keep N small. So while it is tempting to create lots of zones through entries in /etc/shorewall/hosts, such configurations always perform badly. In these cases, it is much better to have more rules than more zones because the performance scales linearly with the number of rules whereas it scales geometrically with the number of zones. Another tip worth noting has to do with the use of shell variables. Suppose that the following appears in /etc/shorewall/params: HOSTS=<ip1>,<ip2>,<ip3>,...<ipN> and suppose that $HOSTS appears in the SOURCE column of M ACCEPT rules. That would generate a total of N * M iptables ACCEPT rules. On the other hand, consider the following:
/etc/shorewall/actions: AcceptHosts /etc/shorewall/action.AcceptHosts: #TARGET SOURCE DEST PROTO DEST SOURCE ORIGINAL RATE # PORT PORT(S) DEST LIMIT ACCEPT $HOSTS
If the M ACCEPT rules are now replaced with M AcceptHosts rules, the total number of rules will be N + M.