kses 0.2.1 README [kses strips evil scripts!] ================= * INTRODUCTION * Welcome to kses - an HTML/XHTML filter written in PHP. It removes all unwanted HTML elements and attributes, no matter how malformed HTML input you give it. It also does several checks on attribute values. kses can be used to avoid Cross-Site Scripting (XSS), Buffer Overflows and Denial of Service attacks, among other things. The program is released under the terms of the GNU General Public License. You should look into what that means, before using kses in your programs. You can find the full text of the license in the file COPYING. * FEATURES * Some of kses' current features are: * It will only allow the HTML elements and attributes that it was explicitly told to allow. * Element and attribute names are case-insensitive (a href vs A HREF). * It will understand and process whitespace correctly. * Attribute values can be surrounded with quotes, apostrophes or nothing. * It will accept valueless attributes with just names and no values (selected). * It will accept XHTML's closing " /" marks. * Attribute values that are surrounded with nothing will get quotes to avoid producing non-W3C conforming HTML (<a href=http://sourceforge.net/projects/kses> works but isn't valid HTML). * It handles lots of types of malformed HTML, by interpreting the existing code the best it can and then rebuilding new code from it. That's a better approach than trying to process existing code, as you're bound to forget about some weird special case somewhere. It handles problems like never-ending quotes and tags gracefully. * It will remove additional "<" and ">" characters that people may try to sneak in somewhere. * It supports checking attribute values for minimum/maximum length and minimum/maximum value, to protect against Buffer Overflows and Denial of Service attacks against WWW clients and various servers. You can stop <iframe src= width= height=> from having too high values for width and height, for instance. * It has got a system for whitelisting URL protocols. You can say that attribute values may only start with http:, https:, ftp: and gopher:, but no other URL protocols (javascript:, java:, about:, telnet:..). The functions that do this work handle whitespace, upper/lower case, HTML entities ("javascript:") and repeated entries ("javascript:javascript:alert(57)"). It also normalizes HTML entities as a nice side effect. * It removes Netscape 4's JavaScript entities ("&{alert(57)};"). * It handles NULL bytes and Opera's chr(173) whitespace characters. * There is both a procedural version and an object-oriented version of kses. * USE IT * It's very easy to use kses in your own PHP web application! Basic usage looks like this: <?php include 'kses.php'; $allowed = array('b' => array(), 'i' => array(), 'a' => array('href' => 1, 'title' => 1), 'p' => array('align' => 1), 'br' => array()); $val = $_POST['val']; if (get_magic_quotes_gpc()) $val = stripslashes($val); # You must strip slashes from magic quotes, or kses will get confused. $val = kses($val, $allowed); # The filtering takes place here. # Do something with $val. ?> This definition of $allowed means that only the elements B, I, A, P and BR are allowed (along with their closing tags /B, /I, /A, /P and /BR). B, I and BR may not have any attributes. A may only have the attributes HREF and TITLE, while P may only have the attribute ALIGN. You can list the elements and attributes in the array in any mixture of upper and lower case. kses will also recognize HTML code that uses both lower and upper case. It's important to select the right allowed attributes, so you won't open up an XSS hole by mistake. Some important attributes that you mustn't allow include but are not limited to: 1) style, and 2) all intrinsic events attributes (onMouseOver and so on, on* really). I'll write more about this in the documentation that will be distributed with future versions of kses. It's also important to note that kses' HTML input must be cleaned of all slashes coming from magic quotes. If the rest of your code requires these slashes to be present, you can always add them again after calling kses with a simple addslashes() call. You should take a look at the documentation in the docs/ directory and the examples in the examples/ directory, to get more information on how to use kses. The object-oriented version of kses is also worth checking out, and it's included in the oop/ directory. * UPGRADING FROM 0.1.0 OR 0.2.0 TO 0.2.1 * kses 0.2.1 is backwards compatible with 0.1.0 and 0.2.0, so upgrading should just be a matter of using a new version of kses.php instead of an old one! When you're ready to start using 0.2.1's new features, you can read about them in the files in the docs/ directory. The ChangeLog also summarizes the new features in this release. * NEW VERSIONS, MAILING LISTS AND BUG REPORTS * If you want to download new versions, subscribe to the kses-general mailing list or even take part in the development of kses, we refer you to its homepage at http://sourceforge.net/projects/kses . New developers and beta testers are more than welcome! If you have any bug reports, suggestions for improvement or simply want to tell us that you use kses for some project, feel free to post to the kses-general mailing list. If you have found any security problems (particularly XSS, naturally) in kses, please contact Ulf privately at metaur at users dot sourceforge dot net so he can correct it before you or someone else tells the public about it. (No, it's not a security problem in kses if some program that uses it allows a bad attribute, silly. If kses is told to accept the element body with the attributes style and onLoad, it will accept them, even if that's a really bad idea, securitywise.) * OTHER HTML FILTERS * Here are the other stand-alone, open source HTML filters that we currently know of: * XSS filter for PHP4 - the filter from Squirrelmail PHP Konstantin Riabitsev http://www.mricon.com/html/phpfilter.html * HTML::StripScripts and related CPAN modules Perl Nick Cleaton http://search.cpan.org/perldoc?HTML%3A%3AStripScripts There are also a lot of HTML filters that were written specifically for some program. Some of them are better than others. Please write to the kses-general mailing list if you know of any other stand-alone, open-source filters. * DEDICATION * kses 0.2.1 is dedicated to Mischa the cat. * MISC * The kses code is based on an HTML filter that Ulf wrote on his own back in 2002 for the open-source project Gnuheter ( http://savannah.nongnu.org/projects/ gnuheter ). Gnuheter is a fork from PHP-Nuke. The HTML filter has been improved a lot since then. To stop people from having sleepless nights, we feel the urgent need to state that kses doesn't have anything to do with the KDE project, despite having a name that starts with a K. In case someone was wondering, Ulf is available for kses-related consulting. Finally, the name kses comes from the terms XSS and access. It's also a recursive acronym (every open-source project should have one!) for "kses strips evil scripts". // Ulf and the kses gang, September 2003