<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"> <article> <!--$Id$--> <articleinfo> <title>Shorewall Internals</title> <authorgroup> <author> <firstname>Tom</firstname> <surname>Eastep</surname> </author> </authorgroup> <pubdate><?dbtimestamp format="Y/m/d"?></pubdate> <copyright> <year>2012</year> <holder>Thomas M. Eastep</holder> </copyright> <legalnotice> <para>Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover, and with no Back-Cover Texts. A copy of the license is included in the section entitled <quote><ulink url="GnuCopyright.htm">GNU Free Documentation License</ulink></quote>.</para> </legalnotice> </articleinfo> <section> <title>Introduction</title> <para>This document provides an overview of Shorewall internals. It is intended to ease the task of approaching the Shorewall code base by providing a roadmap of what you will find there.</para> <section> <title>History</title> <para>Shorewall was originally written entirely in Bourne Shell. The chief advantage of this approach was that virtually any platform supports the shell, including small embedded environments. The initial release was in early 2001. This version ran iptables, ip, etc. immediately after processing the corresponding configuration entry. If an error was encountered, the firewall was stopped. For this reason, the <filename>routestopped</filename> file had to be very simple and foolproof.</para> <para>In Shorewall 3.2.0 (July 2006), the implementation was changed to use the current compile-then-execute architecture. This was accompilished by modifying the existing code rather than writing a compiler/generator from scratch. The resulting code was fragile and hard to maintain. 3.2.0 also marked the introduction of Shorewall-lite.</para> <para>By 2007, the compiler had become unmaintainable and needed to be rewritten. I made the decision to write the compiler in Perl and released it as a separate Shorewall-perl packets in Shorewall 4.0.0 (July 2007). The shell-based compiler was packaged in a Shorewall-shell package. An option (SHOREWALL_COMPILER) in shorewall.conf specified which compiler to use. The Perl-based compiler was siginificantly faster, and the compiled script also ran much faster thanks to its use of iptables-restore.</para> <para>Shorewall6 was introduced in Shorewall 4.2.4 (December 2008).</para> <para>Support for the old Shell-based compiler was eliminated in Shorewall 4.4.0 (July 2009).</para> <para>Shorewall 4.5.0 (February 2012) marked the introduction of the current architecture and packaging.</para> </section> <section> <title>Architecture</title> <para>The components of the Shorewall product suite fall into five broad categories:</para> <orderedlist> <listitem> <para>Build/Install subsystem</para> </listitem> </orderedlist> <orderedlist> <listitem> <para>Command Line Interface (CLI)</para> </listitem> <listitem> <para>Run-time Libraries</para> </listitem> <listitem> <para>Compiler</para> </listitem> <listitem> <para>Configuration files (including actions and macros)</para> </listitem> </orderedlist> <section> <title>Build/Install Subsystem</title> <para>The Shorewall Build/Install subsystem packages the products for release and installs them on an end-user's or a packager's system. It is diagrammed in the following graphic.</para> <graphic fileref="images/BuildInstall.png"/> <para>The build environment components are not released and are discussed in the <ulink url="Build.html">Shorewall Build Article</ulink>.</para> <para>The end-user/packager environment consists of the <filename>configure</filename> and <filename>configure.pl</filename> programs in Shorewall-core and an <filename>install.sh</filename> program in each product.</para> </section> <section> <title>CLI</title> <para>The CLI is written entirely in Bourne Shell so as to allow it to run on small embedded systems within the -lite products. The CLI programs themselves are very small; then set global variables then call into the CLI libraries. Here's an example (/sbin/shorewall):</para> <programlisting>PRODUCT=shorewall # # This is modified by the installer when ${SHAREDIR} != /usr/share # . /usr/share/shorewall/shorewallrc g_program=$PRODUCT g_libexec="$LIBEXECDIR" g_sharedir="$SHAREDIR"/shorewall g_sbindir="$SBINDIR" g_perllib="$PERLLIBDIR" g_confdir="$CONFDIR"/shorewall g_readrc=1 . $g_sharedir/lib.cli shorewall_cli $@</programlisting> <para>As you can see, it sets the PRODUCT variable, loads the shorewallrc file, sets the global variables (all of which have names beginning with "g_", loads <filename>lib.cli</filename>, and calls shorewall_cli passing its own arguments.</para> <para>There are two CLI libraries: <filename>lib.cli</filename> in Shorewall Core and <filename>lib.cli-std </filename>in Shorewall. The <filename>lib.cli</filename> library is always loaded by the CLI programs; <filename>lib-cli-std</filename> is also loaded when the product is 'shorewall' or 'shorewall6'. <filename>lib.cli-std</filename> overloads some functions in <filename>lib.cli</filename> and also provides logic for the additional commands supported by the full products.</para> <para>The CLI libraries load two additional Shell libraries from Shorewall.core: <filename>lib.base</filename> and <filename>lib.common</filename> (actually, <filename>lib.base</filename> loads <filename>lib.common</filename>). These libraries are separete from <filename>lib.cli</filename> for both historical and practicle reasons. <filename>lib.base</filename> (aka functions) can be loaded by application programs, although this was more common in the early years of Shorewall. In addition to being loaded by the CLIs, <filename>lib.common</filename> is also copied into the generated script by the compilers.</para> </section> <section> <title>Run-time Libraries</title> <para>Thare are two libraries that are copied into the generated script by the compiler: <filename>lib.common</filename> from Shorewall-core and <filename>lib.core</filename> from Shorewall. The "outer block" of the generated script comes from the Shorewall file <filename>prog.footer</filename>.</para> </section> <section id="Compiler"> <title>Compiler</title> <para>With the exception of the <filename>getparams</filename> Shell program, the compiler is written in Perl. The compiler main program is compiler.pl from Shorewall.conf; it's run-line arguments are described in the <ulink url="Shorewall-perl.html%23compiler.pl">Shorewall Perl Article</ulink>. It is invoked by the <emphasis>compiler</emphasis> function in <filename>lib.cli-std</filename>.</para> <para>The compiler is modularized as follows:</para> <itemizedlist> <listitem> <para><filename>Accounting.pm</filename> (Shorewall::Accounting). Processes the <filename>accounting</filename> file.</para> </listitem> <listitem> <para><filename>Chains.pm</filename> (Shorewall::Chains). This is the module that provides an interface to iptables/Netfilter for the other modules. The optimizer is included in this module.</para> </listitem> <listitem> <para><filename>Config.pm</filename> (Shorewall::Config). This is a multi-purpose module that supplies several related services:</para> <itemizedlist> <listitem> <para>Error and Progress message production.</para> </listitem> <listitem> <para>Pre-processor. Supplies all configuration file handling including variable expansion, ?IF...?ELSE...?ENDIF processing, INCLUDE directives and embedded Shell and Perl.</para> </listitem> <listitem> <para>Output script file creation with functions to write into the script. The latter functions are no-ops when the <command>check</command> command is being executed.</para> </listitem> <listitem> <para>Capability Detection</para> </listitem> <listitem> <para>Miscellaneous utility functions.</para> </listitem> </itemizedlist> </listitem> <listitem> <para><filename>Compiler.pm</filename> (Shorewall::Compiler). The compiler() function in this module contains the top-leve of the compiler.</para> </listitem> <listitem> <para><filename>IPAddrs.pm</filename> (Shorewall::IPAddrs) - IP Address validation and manipulation (both IPv4 and IPv6). Also interfaces to NSS for protocol/service name resolution.</para> </listitem> <listitem> <para><filename>Misc.pm</filename> (Shorewall::Misc) - Provides services that don't fit well into the other modules.</para> </listitem> <listitem> <para><filename>Nat.pm</filename> (Shorewall::Nat) - Handles all nat table rules. Processes the <filename>masq</filename>, <filename>nat</filename> and <filename>netmap</filename> files.</para> </listitem> <listitem> <para><filename>Proc.pm</filename> (Shorewall::Proc) - Handles manipulation of <filename>/proc/sys/</filename>.</para> </listitem> <listitem> <para><filename>Providers.pm</filename> (Shorewall::Providers) - Handles policy routing; processes the <filename>providers</filename> file.</para> </listitem> <listitem> <para><filename>Proxyarp.pm</filename> (Shorewall::Proxyarp) - Processes the <filename>proxyarp</filename> file.</para> </listitem> <listitem> <para><filename>Raw.pm</filename> (Shorewall::Raw) - Handles the raw table; processes the <filename>conntrack</filename> (formerly <filename>notrack</filename>) file.</para> </listitem> <listitem> <para><filename>Rules.pm</filename> (Shorewall::Rules) - Contains the logic for process the <filename>policy</filename> and <filename>rules</filename> files, including <filename>macros</filename> and <filename>actions</filename>.</para> </listitem> <listitem> <para><filename>Tc.pm</filename> (Shorewall::Tc) - Handles traffic shaping.</para> </listitem> <listitem> <para><filename>Tunnels.pm</filename> (Shorewall::Tunnels) - Processes the <filename>tunnels</filename> file.</para> </listitem> <listitem> <para><filename>Zones.pm</filename> (Shorewall::Zones) - Processes the <filename>zones</filename>, <filename>interfaces</filename> and <filename>hosts</filename> files. Provides the interface to zones and interfaces to the other modules.</para> </listitem> </itemizedlist> <para>Because the params file can contain arbitrary shell code, it must be processed by a shell. The body of <filename>getparams</filename> is as follows:</para> <programlisting># Parameters: # # $1 = Path name of params file # $2 = $CONFIG_PATH # $3 = Address family (4 or 6) # if [ "$3" = 6 ]; then PRODUCT=shorewall6 else PRODUCT=shorewall fi # # This is modified by the installer when ${SHAREDIR} != /usr/share # . /usr/share/shorewall/shorewallrc g_program="$PRODUCT" g_libexec="$LIBEXECDIR" g_sharedir="$SHAREDIR"/shorewall g_sbindir="$SBINDIR" g_perllib="$PERLLIBDIR" g_confdir="$CONFDIR/$PRODUCT" g_readrc=1 . $g_sharedir/lib.cli CONFIG_PATH="$2" set -a . $1 >&2 # Avoid spurious output on STDOUT set +a export -p</programlisting> <para>The program establishes the environment of the Shorewall or Shoreall6 CLI program since that is the environment in which the <filename>params</filename> file has been traditionally processed. It then sets the -<option>a</option> option so that all newly-created variables will be exported and invokes the <filename><filename>params</filename></filename> file. Because the STDOUT file is a pipe back to the compiler, no spurious output must be sent to that file; so <filename>getparams</filename> redirect <filename>params</filename> output to STDOUT. After the script has executed, an <command>export -p</command> command is executed to send the contents of the environ array back to the compiler.</para> <para>Regrettably, the various shells (and even different versions of the same shell) produce quite different output from <command>export -p</command>. The Perl function Shorewall::Config::getparams() detects which species of shell was being used and stores the variable settings into the %params hash. Variables that are also in %ENV are only stored in %params if there value in the output from the <filename>getparams</filename> script is different from that in %ENV.</para> </section> <section> <title>Configuration Files</title> <para>The configuration files are all well-documented. About the only thing worth noting is that some macros and actions are duplicated in the Shorewall and Shorewall6 packages. Because the Shorewall6 default CONFIG_PATH looks in ${SHAREDIR}/shorewall6 before looking in ${SHARDIR_/shorewall, this allows Shorewall6 to implement IPv6-specific handling where required.</para> </section> </section> <section> <title>The Generated Script</title> <para>The generated script is completely self-contained so as to avoid version dependencies between the Shorewall version used to create the script and the version of Shorewall-common installed on the remote firewall.</para> <para>The operation of the generated script is illustrated in this diagram.</para> <graphic fileref="images/RunningScript.png"/> <para>The Netfilter ruleset is sometimes dependent on the environment when the script runs. Dynamic IP addresses and gateways, for example, must be detected when the script runs. As a consequence, it is the generated script and not the compiler that creates the input for iptables-restore. While that input could be passed to iptables-restore in a pipe, it is written to <filename>${VARDIR}/.iptables_restore-input</filename> so that it is available for post-mortem analysis in the event that iptables-restore fails. For the other utilities (ip, tc, ipset, etc), the script runs them passing their input on the run-line.</para> </section> </section> <section> <title>Compiler Internals</title> <para>Because the compiler is the most complex part of the Shorewall product suite, I've chosen to document it first. Before diving into the details of the individual modules, lets take a look at a few general things.</para> <section> <title>Modularization</title> <para>While the compiler is modularized and uses encapsulation, it is not object-oriented. This is due to the fact that much of the compiler was written by manually translating the earlier Shell code.</para> <para>Module data is not completely encapsulated. Heavily used tables, most notably the Chain Table (%chain_table) in Shorewall::Chains is exported for read access. Updates to module data is always encapsulated.</para> </section> <section> <title>Module Initialization</title> <para>While currently unused and untested, the Compiler modules are designed to be able to be loaded into a parent Perl program and the compiler executed repeatedly without unloading the modules. To accomodate that usage scenario, variable data is not initialized at declaration time or in an INIT block, but is rather initialized in an <firstterm>initialize</firstterm> function. Because off of these functions have the same name ("initialize"), they are not exported but are rather called using a fully-qualified name (e.g., "Shorewall::Config::initialize").</para> <para>Most of the the initialization functions accept arguements. Those most common argument is the address family (4 or 6), depending on whether an IPv4 or IPv6 firewall is being compiled. Each of the modules that are address-family dependent have their own $family private (my) variable.</para> </section> <section> <title>Module Dependence</title> <para>Here is the module dependency tree. To simplify the diagram, direct dependencies are not shown where there is also a transitive dependency.</para> <graphic fileref="images/ModuleDepencency.png"/> </section> <section> <title>Config Module</title> <para>As mentioned above, the Config module offers several related services. Each will be described in a separate sub-section.</para> <section> <title>Pre-processor</title> <para>Unlike preprocessors like ccp, the Shorewall pre-processor does it's work each time that the higher-level modules asks for the next line of input.</para> <para>The major exported functions in the pre-processor are:</para> <variablelist> <varlistentry> <term>open_file( $ )</term> <listitem> <para>The single argument names the file to be opened and is usually a simple filename such as <filename>shorewall.conf</filename>. <emphasis role="bold">open_file</emphasis> calls <emphasis role="bold">find_file</emphasis> who traverses the CONFIG_PATH looking for a file with the requested name. If the file is found and has non-zero size, it is opened, module-global variables are set as follows, and the fully-qualified name of the file is returned by the function.</para> <variablelist> <varlistentry> <term>$currentfile</term> <listitem> <para>Handle for the file open</para> </listitem> </varlistentry> <varlistentry> <term>$currentfilename (exported)</term> <listitem> <para>The fully-qualified name of the file.</para> </listitem> </varlistentry> <varlistentry> <term>$currentlinenumber</term> <listitem> <para>Set to zero.</para> </listitem> </varlistentry> </variablelist> <para>If the file is not found or if it has zero size, false ('') is returned.</para> </listitem> </varlistentry> <varlistentry> <term>push_open( $ )</term> <listitem> <para>Sometimes, the higher-level modules need to suspend processing of the current file and open another file. An obvious example is when the Rules module encounters a macro invocation and needs to process the corresponding macro file. The push_open function is called in these cases.</para> <para><emphasis role="bold">push_open</emphasis> pushes <emphasis role="bold">$currentfile</emphasis>, <emphasis role="bold">$currentfilename</emphasis>, <emphasis role="bold">$currentlinenumber</emphasis> and <emphasis role="bold">$ifstack</emphasis> onto <emphasis role="bold">@includestack</emphasis>, copies <emphasis role="bold">@includestack</emphasis> into a local array, pushes a reference to the local array onto <emphasis role="bold">@openstack</emphasis>, and empties <emphasis role="bold">@includestack</emphasis></para> <para>As its final step, <emphasis role="bold">push_open</emphasis> calls <emphasis role="bold">open_file</emphasis>.</para> </listitem> </varlistentry> <varlistentry> <term>pop_open()</term> <listitem> <para>The <emphasis role="bold">pop_open</emphasis> function must be called after the file opened by <emphasis role="bold">push_open</emphasis> is processed. This is true even in the case where <emphasis role="bold">push_open</emphasis> returned false.</para> <para><emphasis role="bold">pop_open</emphasis> pops <emphasis role="bold">@openstack</emphasis> and restores <emphasis role="bold">$currentfile</emphasis>, <emphasis role="bold">$currentfilename</emphasis>, <emphasis role="bold">$currentlinenumber</emphasis>, <emphasis role="bold">$ifstack</emphasis> and <emphasis role="bold">@includestack</emphasis>.</para> </listitem> </varlistentry> <varlistentry> <term>close_file()</term> <listitem> <para><emphasis role="bold">close_file</emphasis> is called to close the current file. Higher-level modules should only call <emphasis role="bold">close_file</emphasis> to close the current file prior to end-of-file.</para> </listitem> </varlistentry> <varlistentry> <term>first_entry( $ )</term> <listitem> <para>This function is called to specify what happens when the first non-commentary and no-blank line is read from the open file. The argument may be either a scalar or a function reference. If the argument is a scalar then it is treaded as a progress message that should be issued if the VERBOSITY setting is >= 1. If the argument is a function reference, the function (usually a closure) is called.</para> <para><emphasis role="bold">first_entry</emphasis> may called after a successful call to <emphasis role="bold">open_file</emphasis>. If it is not called, then the pre-processor takes no action when the first non-blank non-commentary line is found.</para> <para><emphasis role="bold">first_entry</emphasis> returns no significant value.</para> </listitem> </varlistentry> <varlistentry> <term>read_a_line( $ )</term> <listitem> <para>This function delivers the next logical input line to the caller. The single argument is defined by the following constants:</para> <programlisting>use constant { PLAIN_READ => 0, # No read_a_line options EMBEDDED_ENABLED => 1, # Look for embedded Shell and Perl EXPAND_VARIABLES => 2, # Expand Shell variables STRIP_COMMENTS => 4, # Remove comments SUPPRESS_WHITESPACE => 8, # Ignore blank lines CHECK_GUNK => 16, # Look for unprintable characters CONFIG_CONTINUATION => 32, # Suppress leading whitespace if # continued line ends in ',' or ':' DO_INCLUDE => 64, # Look for INCLUDE <filename> NORMAL_READ => -1 # All options };</programlisting> <para>The actual argument may be a bit-wise OR of any of these constants.</para> <para>The function does not return the logical line; that line is rather stored in the module-global variable <emphasis role="bold">$currentline</emphasis> (exported). The function simply returns true if a line was read or false if end-of-file was reached. <emphasis role="bold">read_a_line</emphasis> automatically calls <emphasis role="bold">close_file</emphasis> at EOF.</para> </listitem> </varlistentry> <varlistentry> <term>split_line1</term> <listitem> <para>Most of the callers of <emphasis role="bold">read_a_line</emphasis> want to treat each line as whitespace-separated columns. The <emphasis role="bold">split_line</emphasis> and <emphasis role="bold">split_line1</emphasis> functions return an array containing the contents of those columns.</para> <para>The arguments to <emphasis role="bold">split_line1</emphasis> are:</para> <itemizedlist> <listitem> <para>A <option>name</option> => <replaceable>column-number</replaceable> pair for each of the columns in the file. These are used to process lines that use the <ulink url="configuration_file_basics.htm#Pairs">alternate input methods</ulink> and also serve to define the number of columns in the file's records.</para> </listitem> <listitem> <para>A hash reference defining <option>keyword</option> => <replaceable>number-of-columns</replaceable> pairs. For example "{ COMMENT => 0, FORMAT 2 }" allows COMMENT lines of an unlimited number of space-separated tokens and it allows FORMAT lines with exactly two columns. The hash reference must be the last argument passed.</para> </listitem> </itemizedlist> <para>If there are fewer space-separated tokens on the line than specified in the arguments, then "-" is returned for the omitted trailing columns.</para> </listitem> </varlistentry> <varlistentry> <term>split_line</term> <listitem> <para><emphasis role="bold">split_line</emphasis> simply returns <emphasis role="bold">split_line1( @_, {} )</emphasis>.</para> </listitem> </varlistentry> </variablelist> </section> <section> <title>Error and Progress Message Production</title> <para>There are several exported functions dealing with error and warning messages:</para> <variablelist> <varlistentry> <term>fatal_error</term> <listitem> <para>The argument(s) to this function describe the error. The generated error message is:</para> <simplelist> <member>"ERROR: @_" followed by the name of the file and the line number where the error occurred.</member> </simplelist> <para>The mesage is written to the STARTUP_LOG, if any.</para> <para>The function does not return but rather passes the message to <emphasis role="bold">die</emphasis> or to <emphasis role="bold">confess</emphasis>, depending on whether the "-T" option was specified.</para> </listitem> </varlistentry> <varlistentry> <term>warning_message</term> <listitem> <para>The warning_message is very similar to fatal_error but avoids calling <emphasis role="bold">die</emphasis> or <emphasis role="bold">confess</emphasis>. It also prefixes the argument(s) with "WARNING: " rather than "ERROR: ".</para> <para>It message is written to Standard Out and to the STARTUP_LOG, if any.</para> </listitem> </varlistentry> <varlistentry> <term>progress_message, progress_message2, progress_message3 and progress_message_nocompress</term> <listitem> <para>These procedures conditionally write their argument(s) to Standard Out and to the STARTUP_LOG (if any), depending on the settings of VERBOSITY and and LOG_VERBOSITY respectively.</para> <itemizedlist> <listitem> <para><emphasis role="bold">progress_message</emphasis> only write messages when the verbosity is 2. This function also preserves leading whitespace while removing superflous embedded whitespace from the messages.</para> </listitem> <listitem> <para><emphasis role="bold">progress_message2</emphasis> writes messages with the verbosity is >= 1.</para> </listitem> <listitem> <para><emphasis role="bold">progress_message3</emphasis> writes messages when the verbosity is >= 0.</para> </listitem> <listitem> <para><emphasis role="bold">progress_message_nocompress</emphasis> is like <emphasis role="bold">progress_message</emphasis> except that it does not preserve leading whitespace nor does it eliminate superfluous embedded whitespacve from the messages.</para> </listitem> </itemizedlist> </listitem> </varlistentry> </variablelist> </section> <section> <title>Script File Handling</title> <para>The functions involved in script file creation are:</para> <variablelist> <varlistentry> <term>create_temp_script( $$ )</term> <listitem> <para>This function creates and opens a temporary file in the directory where the final script is to be placed; this function is not called when the <command>check</command> command is being processed. The first argument is the fully-qualified name of the output script; the second (boolean) argument determines if the compilation is for export. The function returns no meaningful value but sets module-global variables as follows:</para> <variablelist> <varlistentry> <term>$script</term> <listitem> <para>Handle of the open script file.</para> </listitem> </varlistentry> <varlistentry> <term>$dir</term> <listitem> <para>The directory in which the script was created.</para> </listitem> </varlistentry> <varlistentry> <term>$tempfile</term> <listitem> <para>The name of the temporary file.</para> </listitem> </varlistentry> <varlistentry> <term>$file</term> <listitem> <para>This fully-qualified name of the script file.</para> </listitem> </varlistentry> </variablelist> </listitem> </varlistentry> <varlistentry> <term>finalize_script( $ )</term> <listitem> <para>This function closes the temporary file and renames it to the </para> </listitem> </varlistentry> </variablelist> <para/> </section> </section> </section> </article>