Added HTMLPurifier (http://htmlpurifier.org/) Version 3.3.0

- can be used via html class like: 

        $clean_html = html::purify($html);

- using it now in eTemplate to remove malicious code from html:
  a) when displaying "formatted text"
  b) when "formatted text" get's input by the user
This commit is contained in:
Ralf Becker 2009-05-19 17:32:06 +00:00
parent 75850fd66b
commit 8f797be836
332 changed files with 25369 additions and 1 deletions

View File

@ -1157,6 +1157,8 @@ class etemplate extends boetemplate
{ {
$value = nl2br(html::htmlspecialchars($value)); $value = nl2br(html::htmlspecialchars($value));
} }
$value = html::purify($value);
if (!$readonly) if (!$readonly)
{ {
$mode = $mode ? $mode : 'simple'; $mode = $mode ? $mode : 'simple';
@ -1952,7 +1954,7 @@ class etemplate extends boetemplate
} }
break; break;
case 'htmlarea': case 'htmlarea':
self::set_array($content,$form_name,$value); self::set_array($content,$form_name,html::purify($value));
break; break;
case 'int': case 'int':
case 'float': case 'float':

View File

@ -1244,5 +1244,35 @@ class html
return $html; return $html;
} }
/**
* Runs HTMLPurifier over supplied html to remove malicious code
*
* @param string $html
* @param HTMLPurifier_Config $config=null
*/
static function purify($html,$config=null)
{
static $purifier;
if (is_null($purifier) || !is_null($config))
{
// add htmlpurifiers library to include_path
require_once(EGW_API_INC.'/htmlpurifier/library/HTMLPurifier.path.php');
// include most of the required files, for best performance with bytecode caches
require_once(EGW_API_INC.'/htmlpurifier/library/HTMLPurifier.includes.php');
// installs an autoloader for other files
require_once(EGW_API_INC.'/htmlpurifier/library/HTMLPurifier.autoload.php');
if (is_null($config))
{
$config = HTMLPurifier_Config::createDefault();
$config->set('Core', 'Encoding', self::$charset);
$config->set('Cache', 'SerializerPath', $GLOBALS['egw_info']['server']['temp_dir']);
}
$purifier = new HTMLPurifier($config);
}
return $purifier->purify( $html );
}
} }
html::_init_static(); html::_init_static();

View File

@ -0,0 +1,9 @@
CREDITS
Almost everything written by Edward Z. Yang (Ambush Commander). Lots of thanks
to the DevNetwork Community for their help (see docs/ref-devnetwork.html for
more details), Feyd especially (namely IPv6 and optimization). Thanks to RSnake
for letting me package his fantastic XSS cheatsheet for a smoketest.
vim: et sw=4 sts=4

373
phpgwapi/inc/htmlpurifier/INSTALL Executable file
View File

@ -0,0 +1,373 @@
Install
How to install HTML Purifier
HTML Purifier is designed to run out of the box, so actually using the
library is extremely easy. (Although... if you were looking for a
step-by-step installation GUI, you've downloaded the wrong software!)
While the impatient can get going immediately with some of the sample
code at the bottom of this library, it's well worth reading this entire
document--most of the other documentation assumes that you are familiar
with these contents.
---------------------------------------------------------------------------
1. Compatibility
HTML Purifier is PHP 5 only, and is actively tested from PHP 5.0.5 and
up. It has no core dependencies with other libraries. PHP
4 support was deprecated on December 31, 2007 with HTML Purifier 3.0.0.
These optional extensions can enhance the capabilities of HTML Purifier:
* iconv : Converts text to and from non-UTF-8 encodings
* bcmath : Used for unit conversion and imagecrash protection
* tidy : Used for pretty-printing HTML
---------------------------------------------------------------------------
2. Reconnaissance
A big plus of HTML Purifier is its inerrant support of standards, so
your web-pages should be standards-compliant. (They should also use
semantic markup, but that's another issue altogether, one HTML Purifier
cannot fix without reading your mind.)
HTML Purifier can process these doctypes:
* XHTML 1.0 Transitional (default)
* XHTML 1.0 Strict
* HTML 4.01 Transitional
* HTML 4.01 Strict
* XHTML 1.1
...and these character encodings:
* UTF-8 (default)
* Any encoding iconv supports (with crippled internationalization support)
These defaults reflect what my choices would be if I were authoring an
HTML document, however, what you choose depends on the nature of your
codebase. If you don't know what doctype you are using, you can determine
the doctype from this identifier at the top of your source code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
...and the character encoding from this code:
<meta http-equiv="Content-type" content="text/html;charset=ENCODING">
If the character encoding declaration is missing, STOP NOW, and
read 'docs/enduser-utf8.html' (web accessible at
http://htmlpurifier.org/docs/enduser-utf8.html). In fact, even if it is
present, read this document anyway, as many websites specify their
document's character encoding incorrectly.
---------------------------------------------------------------------------
3. Including the library
The procedure is quite simple:
require_once '/path/to/library/HTMLPurifier.auto.php';
This will setup an autoloader, so the library's files are only included
when you use them.
Only the contents in the library/ folder are necessary, so you can remove
everything else when using HTML Purifier in a production environment.
If you installed HTML Purifier via PEAR, all you need to do is:
require_once 'HTMLPurifier.auto.php';
Please note that the usual PEAR practice of including just the classes you
want will not work with HTML Purifier's autoloading scheme.
Advanced users, read on; other users can skip to section 4.
Autoload compatibility
----------------------
HTML Purifier attempts to be as smart as possible when registering an
autoloader, but there are some cases where you will need to change
your own code to accomodate HTML Purifier. These are those cases:
PHP VERSION IS LESS THAN 5.1.2, AND YOU'VE DEFINED __autoload
Because spl_autoload_register() doesn't exist in early versions
of PHP 5, HTML Purifier has no way of adding itself to the autoload
stack. Modify your __autoload function to test
HTMLPurifier_Bootstrap::autoload($class)
For example, suppose your autoload function looks like this:
function __autoload($class) {
require str_replace('_', '/', $class) . '.php';
return true;
}
A modified version with HTML Purifier would look like this:
function __autoload($class) {
if (HTMLPurifier_Bootstrap::autoload($class)) return true;
require str_replace('_', '/', $class) . '.php';
return true;
}
Note that there *is* some custom behavior in our autoloader; the
original autoloader in our example would work for 99% of the time,
but would fail when including language files.
AN __autoload FUNCTION IS DECLARED AFTER OUR AUTOLOADER IS REGISTERED
spl_autoload_register() has the curious behavior of disabling
the existing __autoload() handler. Users need to explicitly
spl_autoload_register('__autoload'). Because we use SPL when it
is available, __autoload() will ALWAYS be disabled. If __autoload()
is declared before HTML Purifier is loaded, this is not a problem:
HTML Purifier will register the function for you. But if it is
declared afterwards, it will mysteriously not work. This
snippet of code (after your autoloader is defined) will fix it:
spl_autoload_register('__autoload')
Users should also be on guard if they use a version of PHP previous
to 5.1.2 without an autoloader--HTML Purifier will define __autoload()
for you, which can collide with an autoloader that was added by *you*
later.
For better performance
----------------------
Opcode caches, which greatly speed up PHP initialization for scripts
with large amounts of code (HTML Purifier included), don't like
autoloaders. We offer an include file that includes all of HTML Purifier's
files in one go in an opcode cache friendly manner:
// If /path/to/library isn't already in your include path, uncomment
// the below line:
// require '/path/to/library/HTMLPurifier.path.php';
require 'HTMLPurifier.includes.php';
Optional components still need to be included--you'll know if you try to
use a feature and you get a class doesn't exists error! The autoloader
can be used in conjunction with this approach to catch classes that are
missing. Simply add this afterwards:
require 'HTMLPurifier.autoload.php';
Standalone version
------------------
HTML Purifier has a standalone distribution; you can also generate
a standalone file from the full version by running the script
maintenance/generate-standalone.php . The standalone version has the
benefit of having most of its code in one file, so parsing is much
faster and the library is easier to manage.
If HTMLPurifier.standalone.php exists in the library directory, you
can use it like this:
require '/path/to/HTMLPurifier.standalone.php';
This is equivalent to including HTMLPurifier.includes.php, except that
the contents of standalone/ will be added to your path. To override this
behavior, specify a new HTMLPURIFIER_PREFIX where standalone files can
be found (usually, this will be one directory up, the "true" library
directory in full distributions). Don't forget to set your path too!
The autoloader can be added to the end to ensure the classes are
loaded when necessary; otherwise you can manually include them.
To use the autoloader, use this:
require 'HTMLPurifier.autoload.php';
For advanced users
------------------
HTMLPurifier.auto.php performs a number of operations that can be done
individually. These are:
HTMLPurifier.path.php
Puts /path/to/library in the include path. For high performance,
this should be done in php.ini.
HTMLPurifier.autoload.php
Registers our autoload handler HTMLPurifier_Bootstrap::autoload($class).
You can do these operations by yourself--in fact, you must modify your own
autoload handler if you are using a version of PHP earlier than PHP 5.1.2
(See "Autoload compatibility" above).
---------------------------------------------------------------------------
4. Configuration
HTML Purifier is designed to run out-of-the-box, but occasionally HTML
Purifier needs to be told what to do. If you answer no to any of these
questions, read on; otherwise, you can skip to the next section (or, if you're
into configuring things just for the heck of it, skip to 4.3).
* Am I using UTF-8?
* Am I using XHTML 1.0 Transitional?
If you answered no to any of these questions, instantiate a configuration
object and read on:
$config = HTMLPurifier_Config::createDefault();
4.1. Setting a different character encoding
You really shouldn't use any other encoding except UTF-8, especially if you
plan to support multilingual websites (read section three for more details).
However, switching to UTF-8 is not always immediately feasible, so we can
adapt.
HTML Purifier uses iconv to support other character encodings, as such,
any encoding that iconv supports <http://www.gnu.org/software/libiconv/>
HTML Purifier supports with this code:
$config->set('Core', 'Encoding', /* put your encoding here */);
An example usage for Latin-1 websites (the most common encoding for English
websites):
$config->set('Core', 'Encoding', 'ISO-8859-1');
Note that HTML Purifier's support for non-Unicode encodings is crippled by the
fact that any character not supported by that encoding will be silently
dropped, EVEN if it is ampersand escaped. If you want to work around
this, you are welcome to read docs/enduser-utf8.html for a fix,
but please be cognizant of the issues the "solution" creates (for this
reason, I do not include the solution in this document).
4.2. Setting a different doctype
For those of you using HTML 4.01 Transitional, you can disable
XHTML output like this:
$config->set('HTML', 'Doctype', 'HTML 4.01 Transitional');
Other supported doctypes include:
* HTML 4.01 Strict
* HTML 4.01 Transitional
* XHTML 1.0 Strict
* XHTML 1.0 Transitional
* XHTML 1.1
4.3. Other settings
There are more configuration directives which can be read about
here: <http://htmlpurifier.org/live/configdoc/plain.html> They're a bit boring,
but they can help out for those of you who like to exert maximum control over
your code. Some of the more interesting ones are configurable at the
demo <http://htmlpurifier.org/demo.php> and are well worth looking into
for your own system.
For example, you can fine tune allowed elements and attributes, convert
relative URLs to absolute ones, and even autoparagraph input text! These
are, respectively, %HTML.Allowed, %URI.MakeAbsolute and %URI.Base, and
%AutoFormat.AutoParagraph. The %Namespace.Directive naming convention
translates to:
$config->set('Namespace', 'Directive', $value);
E.g.
$config->set('HTML', 'Allowed', 'p,b,a[href],i');
$config->set('URI', 'Base', 'http://www.example.com');
$config->set('URI', 'MakeAbsolute', true);
$config->set('AutoFormat', 'AutoParagraph', true);
---------------------------------------------------------------------------
5. Caching
HTML Purifier generates some cache files (generally one or two) to speed up
its execution. For maximum performance, make sure that
library/HTMLPurifier/DefinitionCache/Serializer is writeable by the webserver.
If you are in the library/ folder of HTML Purifier, you can set the
appropriate permissions using:
chmod -R 0755 HTMLPurifier/DefinitionCache/Serializer
If the above command doesn't work, you may need to assign write permissions
to all. This may be necessary if your webserver runs as nobody, but is
not recommended since it means any other user can write files in the
directory. Use:
chmod -R 0777 HTMLPurifier/DefinitionCache/Serializer
You can also chmod files via your FTP client; this option
is usually accessible by right clicking the corresponding directory and
then selecting "chmod" or "file permissions".
Starting with 2.0.1, HTML Purifier will generate friendly error messages
that will tell you exactly what you have to chmod the directory to, if in doubt,
follow its advice.
If you are unable or unwilling to give write permissions to the cache
directory, you can either disable the cache (and suffer a performance
hit):
$config->set('Core', 'DefinitionCache', null);
Or move the cache directory somewhere else (no trailing slash):
$config->set('Cache', 'SerializerPath', '/home/user/absolute/path');
---------------------------------------------------------------------------
6. Using the code
The interface is mind-numbingly simple:
$purifier = new HTMLPurifier();
$clean_html = $purifier->purify( $dirty_html );
...or, if you're using the configuration object:
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify( $dirty_html );
That's it! For more examples, check out docs/examples/ (they aren't very
different though). Also, docs/enduser-slow.html gives advice on what to
do if HTML Purifier is slowing down your application.
---------------------------------------------------------------------------
7. Quick install
First, make sure library/HTMLPurifier/DefinitionCache/Serializer is
writable by the webserver (see Section 5: Caching above for details).
If your website is in UTF-8 and XHTML Transitional, use this code:
<?php
require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
$purifier = new HTMLPurifier();
$clean_html = $purifier->purify($dirty_html);
?>
If your website is in a different encoding or doctype, use this code:
<?php
require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('Core', 'Encoding', 'ISO-8859-1'); // replace with your encoding
$config->set('HTML', 'Doctype', 'HTML 4.01 Transitional'); // replace with your doctype
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);
?>
vim: et sw=4 sts=4

504
phpgwapi/inc/htmlpurifier/LICENSE Executable file
View File

@ -0,0 +1,504 @@
GNU LESSER GENERAL PUBLIC LICENSE
Version 2.1, February 1999
Copyright (C) 1991, 1999 Free Software Foundation, Inc.
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
[This is the first released version of the Lesser GPL. It also counts
as the successor of the GNU Library Public License, version 2, hence
the version number 2.1.]
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
Licenses are intended to guarantee your freedom to share and change
free software--to make sure the software is free for all its users.
This license, the Lesser General Public License, applies to some
specially designated software packages--typically libraries--of the
Free Software Foundation and other authors who decide to use it. You
can use it too, but we suggest you first think carefully about whether
this license or the ordinary General Public License is the better
strategy to use in any particular case, based on the explanations below.
When we speak of free software, we are referring to freedom of use,
not price. Our General Public Licenses are designed to make sure that
you have the freedom to distribute copies of free software (and charge
for this service if you wish); that you receive source code or can get
it if you want it; that you can change the software and use pieces of
it in new free programs; and that you are informed that you can do
these things.
To protect your rights, we need to make restrictions that forbid
distributors to deny you these rights or to ask you to surrender these
rights. These restrictions translate to certain responsibilities for
you if you distribute copies of the library or if you modify it.
For example, if you distribute copies of the library, whether gratis
or for a fee, you must give the recipients all the rights that we gave
you. You must make sure that they, too, receive or can get the source
code. If you link other code with the library, you must provide
complete object files to the recipients, so that they can relink them
with the library after making changes to the library and recompiling
it. And you must show them these terms so they know their rights.
We protect your rights with a two-step method: (1) we copyright the
library, and (2) we offer you this license, which gives you legal
permission to copy, distribute and/or modify the library.
To protect each distributor, we want to make it very clear that
there is no warranty for the free library. Also, if the library is
modified by someone else and passed on, the recipients should know
that what they have is not the original version, so that the original
author's reputation will not be affected by problems that might be
introduced by others.
Finally, software patents pose a constant threat to the existence of
any free program. We wish to make sure that a company cannot
effectively restrict the users of a free program by obtaining a
restrictive license from a patent holder. Therefore, we insist that
any patent license obtained for a version of the library must be
consistent with the full freedom of use specified in this license.
Most GNU software, including some libraries, is covered by the
ordinary GNU General Public License. This license, the GNU Lesser
General Public License, applies to certain designated libraries, and
is quite different from the ordinary General Public License. We use
this license for certain libraries in order to permit linking those
libraries into non-free programs.
When a program is linked with a library, whether statically or using
a shared library, the combination of the two is legally speaking a
combined work, a derivative of the original library. The ordinary
General Public License therefore permits such linking only if the
entire combination fits its criteria of freedom. The Lesser General
Public License permits more lax criteria for linking other code with
the library.
We call this license the "Lesser" General Public License because it
does Less to protect the user's freedom than the ordinary General
Public License. It also provides other free software developers Less
of an advantage over competing non-free programs. These disadvantages
are the reason we use the ordinary General Public License for many
libraries. However, the Lesser license provides advantages in certain
special circumstances.
For example, on rare occasions, there may be a special need to
encourage the widest possible use of a certain library, so that it becomes
a de-facto standard. To achieve this, non-free programs must be
allowed to use the library. A more frequent case is that a free
library does the same job as widely used non-free libraries. In this
case, there is little to gain by limiting the free library to free
software only, so we use the Lesser General Public License.
In other cases, permission to use a particular library in non-free
programs enables a greater number of people to use a large body of
free software. For example, permission to use the GNU C Library in
non-free programs enables many more people to use the whole GNU
operating system, as well as its variant, the GNU/Linux operating
system.
Although the Lesser General Public License is Less protective of the
users' freedom, it does ensure that the user of a program that is
linked with the Library has the freedom and the wherewithal to run
that program using a modified version of the Library.
The precise terms and conditions for copying, distribution and
modification follow. Pay close attention to the difference between a
"work based on the library" and a "work that uses the library". The
former contains code derived from the library, whereas the latter must
be combined with the library in order to run.
GNU LESSER GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License Agreement applies to any software library or other
program which contains a notice placed by the copyright holder or
other authorized party saying it may be distributed under the terms of
this Lesser General Public License (also called "this License").
Each licensee is addressed as "you".
A "library" means a collection of software functions and/or data
prepared so as to be conveniently linked with application programs
(which use some of those functions and data) to form executables.
The "Library", below, refers to any such software library or work
which has been distributed under these terms. A "work based on the
Library" means either the Library or any derivative work under
copyright law: that is to say, a work containing the Library or a
portion of it, either verbatim or with modifications and/or translated
straightforwardly into another language. (Hereinafter, translation is
included without limitation in the term "modification".)
"Source code" for a work means the preferred form of the work for
making modifications to it. For a library, complete source code means
all the source code for all modules it contains, plus any associated
interface definition files, plus the scripts used to control compilation
and installation of the library.
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running a program using the Library is not restricted, and output from
such a program is covered only if its contents constitute a work based
on the Library (independent of the use of the Library in a tool for
writing it). Whether that is true depends on what the Library does
and what the program that uses the Library does.
1. You may copy and distribute verbatim copies of the Library's
complete source code as you receive it, in any medium, provided that
you conspicuously and appropriately publish on each copy an
appropriate copyright notice and disclaimer of warranty; keep intact
all the notices that refer to this License and to the absence of any
warranty; and distribute a copy of this License along with the
Library.
You may charge a fee for the physical act of transferring a copy,
and you may at your option offer warranty protection in exchange for a
fee.
2. You may modify your copy or copies of the Library or any portion
of it, thus forming a work based on the Library, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) The modified work must itself be a software library.
b) You must cause the files modified to carry prominent notices
stating that you changed the files and the date of any change.
c) You must cause the whole of the work to be licensed at no
charge to all third parties under the terms of this License.
d) If a facility in the modified Library refers to a function or a
table of data to be supplied by an application program that uses
the facility, other than as an argument passed when the facility
is invoked, then you must make a good faith effort to ensure that,
in the event an application does not supply such function or
table, the facility still operates, and performs whatever part of
its purpose remains meaningful.
(For example, a function in a library to compute square roots has
a purpose that is entirely well-defined independent of the
application. Therefore, Subsection 2d requires that any
application-supplied function or table used by this function must
be optional: if the application does not supply it, the square
root function must still compute square roots.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Library,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Library, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote
it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Library.
In addition, mere aggregation of another work not based on the Library
with the Library (or with a work based on the Library) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may opt to apply the terms of the ordinary GNU General Public
License instead of this License to a given copy of the Library. To do
this, you must alter all the notices that refer to this License, so
that they refer to the ordinary GNU General Public License, version 2,
instead of to this License. (If a newer version than version 2 of the
ordinary GNU General Public License has appeared, then you can specify
that version instead if you wish.) Do not make any other change in
these notices.
Once this change is made in a given copy, it is irreversible for
that copy, so the ordinary GNU General Public License applies to all
subsequent copies and derivative works made from that copy.
This option is useful when you wish to copy part of the code of
the Library into a program that is not a library.
4. You may copy and distribute the Library (or a portion or
derivative of it, under Section 2) in object code or executable form
under the terms of Sections 1 and 2 above provided that you accompany
it with the complete corresponding machine-readable source code, which
must be distributed under the terms of Sections 1 and 2 above on a
medium customarily used for software interchange.
If distribution of object code is made by offering access to copy
from a designated place, then offering equivalent access to copy the
source code from the same place satisfies the requirement to
distribute the source code, even though third parties are not
compelled to copy the source along with the object code.
5. A program that contains no derivative of any portion of the
Library, but is designed to work with the Library by being compiled or
linked with it, is called a "work that uses the Library". Such a
work, in isolation, is not a derivative work of the Library, and
therefore falls outside the scope of this License.
However, linking a "work that uses the Library" with the Library
creates an executable that is a derivative of the Library (because it
contains portions of the Library), rather than a "work that uses the
library". The executable is therefore covered by this License.
Section 6 states terms for distribution of such executables.
When a "work that uses the Library" uses material from a header file
that is part of the Library, the object code for the work may be a
derivative work of the Library even though the source code is not.
Whether this is true is especially significant if the work can be
linked without the Library, or if the work is itself a library. The
threshold for this to be true is not precisely defined by law.
If such an object file uses only numerical parameters, data
structure layouts and accessors, and small macros and small inline
functions (ten lines or less in length), then the use of the object
file is unrestricted, regardless of whether it is legally a derivative
work. (Executables containing this object code plus portions of the
Library will still fall under Section 6.)
Otherwise, if the work is a derivative of the Library, you may
distribute the object code for the work under the terms of Section 6.
Any executables containing that work also fall under Section 6,
whether or not they are linked directly with the Library itself.
6. As an exception to the Sections above, you may also combine or
link a "work that uses the Library" with the Library to produce a
work containing portions of the Library, and distribute that work
under terms of your choice, provided that the terms permit
modification of the work for the customer's own use and reverse
engineering for debugging such modifications.
You must give prominent notice with each copy of the work that the
Library is used in it and that the Library and its use are covered by
this License. You must supply a copy of this License. If the work
during execution displays copyright notices, you must include the
copyright notice for the Library among them, as well as a reference
directing the user to the copy of this License. Also, you must do one
of these things:
a) Accompany the work with the complete corresponding
machine-readable source code for the Library including whatever
changes were used in the work (which must be distributed under
Sections 1 and 2 above); and, if the work is an executable linked
with the Library, with the complete machine-readable "work that
uses the Library", as object code and/or source code, so that the
user can modify the Library and then relink to produce a modified
executable containing the modified Library. (It is understood
that the user who changes the contents of definitions files in the
Library will not necessarily be able to recompile the application
to use the modified definitions.)
b) Use a suitable shared library mechanism for linking with the
Library. A suitable mechanism is one that (1) uses at run time a
copy of the library already present on the user's computer system,
rather than copying library functions into the executable, and (2)
will operate properly with a modified version of the library, if
the user installs one, as long as the modified version is
interface-compatible with the version that the work was made with.
c) Accompany the work with a written offer, valid for at
least three years, to give the same user the materials
specified in Subsection 6a, above, for a charge no more
than the cost of performing this distribution.
d) If distribution of the work is made by offering access to copy
from a designated place, offer equivalent access to copy the above
specified materials from the same place.
e) Verify that the user has already received a copy of these
materials or that you have already sent this user a copy.
For an executable, the required form of the "work that uses the
Library" must include any data and utility programs needed for
reproducing the executable from it. However, as a special exception,
the materials to be distributed need not include anything that is
normally distributed (in either source or binary form) with the major
components (compiler, kernel, and so on) of the operating system on
which the executable runs, unless that component itself accompanies
the executable.
It may happen that this requirement contradicts the license
restrictions of other proprietary libraries that do not normally
accompany the operating system. Such a contradiction means you cannot
use both them and the Library together in an executable that you
distribute.
7. You may place library facilities that are a work based on the
Library side-by-side in a single library together with other library
facilities not covered by this License, and distribute such a combined
library, provided that the separate distribution of the work based on
the Library and of the other library facilities is otherwise
permitted, and provided that you do these two things:
a) Accompany the combined library with a copy of the same work
based on the Library, uncombined with any other library
facilities. This must be distributed under the terms of the
Sections above.
b) Give prominent notice with the combined library of the fact
that part of it is a work based on the Library, and explaining
where to find the accompanying uncombined form of the same work.
8. You may not copy, modify, sublicense, link with, or distribute
the Library except as expressly provided under this License. Any
attempt otherwise to copy, modify, sublicense, link with, or
distribute the Library is void, and will automatically terminate your
rights under this License. However, parties who have received copies,
or rights, from you under this License will not have their licenses
terminated so long as such parties remain in full compliance.
9. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Library or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Library (or any work based on the
Library), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Library or works based on it.
10. Each time you redistribute the Library (or any work based on the
Library), the recipient automatically receives a license from the
original licensor to copy, distribute, link with or modify the Library
subject to these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties with
this License.
11. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Library at all. For example, if a patent
license would not permit royalty-free redistribution of the Library by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Library.
If any portion of this section is held invalid or unenforceable under any
particular circumstance, the balance of the section is intended to apply,
and the section as a whole is intended to apply in other circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
12. If the distribution and/or use of the Library is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Library under this License may add
an explicit geographical distribution limitation excluding those countries,
so that distribution is permitted only in or among countries not thus
excluded. In such case, this License incorporates the limitation as if
written in the body of this License.
13. The Free Software Foundation may publish revised and/or new
versions of the Lesser General Public License from time to time.
Such new versions will be similar in spirit to the present version,
but may differ in detail to address new problems or concerns.
Each version is given a distinguishing version number. If the Library
specifies a version number of this License which applies to it and
"any later version", you have the option of following the terms and
conditions either of that version or of any later version published by
the Free Software Foundation. If the Library does not specify a
license version number, you may choose any version ever published by
the Free Software Foundation.
14. If you wish to incorporate parts of the Library into other free
programs whose distribution conditions are incompatible with these,
write to the author to ask for permission. For software which is
copyrighted by the Free Software Foundation, write to the Free
Software Foundation; we sometimes make exceptions for this. Our
decision will be guided by the two goals of preserving the free status
of all derivatives of our free software and of promoting the sharing
and reuse of software generally.
NO WARRANTY
15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO
WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW.
EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR
OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY
KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE
LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME
THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY
AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU
FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR
CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE
LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING
RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A
FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF
SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Libraries
If you develop a new library, and you want it to be of the greatest
possible use to the public, we recommend making it free software that
everyone can redistribute and change. You can do so by permitting
redistribution under these terms (or, alternatively, under the terms of the
ordinary General Public License).
To apply these terms, attach the following notices to the library. It is
safest to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least the
"copyright" line and a pointer to where the full notice is found.
<one line to give the library's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with this library; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Also add information on how to contact you by electronic and paper mail.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the library, if
necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the
library `Frob' (a library for tweaking knobs) written by James Random Hacker.
<signature of Ty Coon>, 1 April 1990
Ty Coon, President of Vice
That's all there is to it!
vim: et sw=4 sts=4

842
phpgwapi/inc/htmlpurifier/NEWS Executable file
View File

@ -0,0 +1,842 @@
NEWS ( CHANGELOG and HISTORY ) HTMLPurifier
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
= KEY ====================
# Breaks back-compat
! Feature
- Bugfix
+ Sub-comment
. Internal change
==========================
3.3.0, released 2009-02-16
! Implement CSS property 'overflow' when %CSS.AllowTricky is true.
! Implement generic property list classess
- Fix bug with testEncodingSupportsASCII() algorithm when iconv() implementation
does not do the "right thing" with characters not supported in the output
set.
- Spellcheck UTF-8: The Secret To Character Encoding
- Fix improper removal of the contents of elements with only whitespace. Thanks
Eric Wald for reporting.
- Fix broken test suite in versions of PHP without spl_autoload_register()
- Fix degenerate case with YouTube filter involving double hyphens.
Thanks Pierre Attar for reporting.
- Fix YouTube rendering problem on certain versions of Firefox.
- Fix CSSDefinition Printer problems with decorators
- Add text parameter to unit tests, forces text output
. Add verbose mode to command line test runner, use (--verbose)
. Turn on unit tests for UnitConverter
. Fix missing version number in configuration %Attr.DefaultImageAlt (added 3.2.0)
. Fix newline errors that caused spurious failures when CRLF HTML Purifier was
tested on Linux.
. Removed trailing whitespace from all text files, see
remote-trailing-whitespace.php maintenance script.
. Convert configuration to use property list backend.
3.2.0, released 2008-10-31
# Using %Core.CollectErrors forces line number/column tracking on, whereas
previously you could theoretically turn it off.
# HTMLPurifier_Injector->notifyEnd() is formally deprecated. Please
use handleEnd() instead.
! %Output.AttrSort for when you need your attributes in alphabetical order to
deal with a bug in FCKEditor. Requested by frank farmer.
! Enable HTML comments when %HTML.Trusted is on. Requested by Waldo Jaquith.
! Proper support for name attribute. It is now allowed and equivalent to the id
attribute in a and img tags, and is only converted to id when %HTML.TidyLevel
is heavy (for all doctypes).
! %AutoFormat.RemoveEmpty to remove some empty tags from documents. Please don't
use on hand-written HTML.
! Add error-cases for unsupported elements in MakeWellFormed. This enables
the strategy to be used, standalone, on untrusted input.
! %Core.AggressivelyFixLt is on by default. This causes more sensible
processing of left angled brackets in smileys and other whatnot.
! Test scripts now have a 'type' parameter, which lets you say 'htmlpurifier',
'phpt', 'vtest', etc. in order to only execute those tests. This supercedes
the --only-phpt parameter, although for backwards-compatibility the flag
will still work.
! AutoParagraph auto-formatter will now preserve double-newlines upon output.
Users who are not performing inbound filtering, this may seem a little
useless, but as a bonus, the test suite and handling of edge cases is also
improved.
! Experimental implementation of forms for %HTML.Trusted
! Track column numbers when maintain line numbers is on
! Proprietary 'background' attribute on table-related elements converted into
corresponding CSS. Thanks Fusemail for sponsoring this feature!
! Add forward(), forwardUntilEndToken(), backward() and current() to Injector
supertype.
! HTMLPurifier_Injector->handleEnd() permits modification to end tokens. The
time of operation varies slightly from notifyEnd() as *all* end tokens are
processed by the injector before they are subject to the well-formedness rules.
! %Attr.DefaultImageAlt allows overriding default behavior of setting alt to
basename of image when not present.
! %AutoFormat.DisplayLinkURI neuters <a> tags into plain text URLs.
- Fix two bugs in %URI.MakeAbsolute; one involving empty paths in base URLs,
the other involving an undefined $is_folder error.
- Throw error when %Core.Encoding is set to a spurious value. Previously,
this errored silently and returned false.
- Redirected stderr to stdout for flush error output.
- %URI.DisableExternal will now use the host in %URI.Base if %URI.Host is not
available.
- Do not re-munge URL if the output URL has the same host as the input URL.
Requested by Chris.
- Fix error in documentation regarding %Filter.ExtractStyleBlocks
- Prevent <![CDATA[<body></body>]]> from triggering %Core.ConvertDocumentToFragment
- Fix bug with inline elements in blockquotes conflicting with strict doctype
- Detect if HTML support is disabled for DOM by checking for loadHTML() method.
- Fix bug where dots and double-dots in absolute URLs without hostname were
not collapsed by URIFilter_MakeAbsolute.
- Fix bug with anonymous modules operating on SafeEmbed or SafeObject elements
by reordering their addition.
- Will now throw exception on many error conditions during lexer creation; also
throw an exception when MaintainLineNumbers is true, but a non-tracksLineNumbers
is being used.
- Detect if domxml extension is loaded, and use DirectLEx accordingly.
- Improve handling of big numbers with floating point arithmetic in UnitConverter.
Reported by David Morton.
. Strategy_MakeWellFormed now operates in-place, saving memory and allowing
for more interesting filter-backtracking
. New HTMLPurifier_Injector->rewind() functionality, allows injectors to rewind
index to reprocess tokens.
. StringHashParser now allows for multiline sections with "empty" content;
previously the section would remain undefined.
. Added --quick option to multitest.php, which tests only the most recent
release for each series.
. Added --distro option to multitest.php, which accepts either 'normal' or
'standalone'. This supercedes --exclude-normal and --exclude-standalone
3.1.1, released 2008-06-19
# %URI.Munge now, by default, does not munge resources (for example, <img src="">)
In order to enable this again, please set %URI.MungeResources to true.
! More robust imagecrash protection with height/width CSS with %CSS.MaxImgLength,
and height/width HTML with %HTML.MaxImgLength.
! %URI.MungeSecretKey for secure URI munging. Thanks Chris
for sponsoring this feature. Check out the corresponding documentation
for details. (Att Nightly testers: The API for this feature changed before
the general release. Namely, rename your directives %URI.SecureMungeSecretKey =>
%URI.MungeSecretKey and and %URI.SecureMunge => %URI.Munge)
! Implemented post URI filtering. Set member variable $post to true to set
a URIFilter as such.
! Allow modules to define injectors via $info_injector. Injectors are
automatically disabled if injector's needed elements are not found.
! Support for "safe" objects added, use %HTML.SafeObject and %HTML.SafeEmbed.
Thanks Chris for sponsoring. If you've been using ad hoc code from the
forums, PLEASE use this instead.
! Added substitutions for %e, %n, %a and %p in %URI.Munge (in order,
embedded, tag name, attribute name, CSS property name). See %URI.Munge
for more details. Requested by Jochem Blok.
- Disable percent height/width attributes for img.
- AttrValidator operations are now atomic; updates to attributes are not
manifest in token until end of operations. This prevents naughty internal
code from directly modifying CurrentToken when they're not supposed to.
This semantics change was requested by frank farmer.
- Percent encoding checks enabled for URI query and fragment
- Fix stray backslashes in font-family; CSS Unicode character escapes are
now properly resolved (although *only* in font-family). Thanks Takeshi Terada
for reporting.
- Improve parseCDATA algorithm to take into account newline normalization
- Account for browser confusion between Yen character and backslash in
Shift_JIS encoding. This fix generalizes to any other encoding which is not
a strict superset of printable ASCII. Thanks Takeshi Terada for reporting.
- Fix missing configuration parameter in Generator calls. Thanks vs for the
partial patch.
- Improved adherence to Unicode by checking for non-character codepoints.
Thanks Geoffrey Sneddon for reporting. This may result in degraded
performance for extremely large inputs.
- Allow CSS property-value pair ''text-decoration: none''. Thanks Jochem Blok
for reporting.
. Added HTMLPurifier_UnitConverter and HTMLPurifier_Length for convenient
handling of CSS-style lengths. HTMLPurifier_AttrDef_CSS_Length now uses
this class.
. API of HTMLPurifier_AttrDef_CSS_Length changed from __construct($disable_negative)
to __construct($min, $max). __construct(true) is equivalent to
__construct('0').
. Added HTMLPurifier_AttrDef_Switch class
. Rename HTMLPurifier_HTMLModule_Tidy->construct() to setup() and bubble method
up inheritance hierarchy to HTMLPurifier_HTMLModule. All HTMLModules
get this called with the configuration object. All modules now
use this rather than __construct(), although legacy code using constructors
will still work--the new format, however, lets modules access the
configuration object for HTML namespace dependant tweaks.
. AttrDef_HTML_Pixels now takes a single construction parameter, pixels.
. ConfigSchema data-structure heavily optimized; on average it uses a third
the memory it did previously. The interface has changed accordingly,
consult changes to HTMLPurifier_Config for details.
. Variable parsing types now are magic integers instead of strings
. Added benchmark for ConfigSchema
. HTMLPurifier_Generator requires $config and $context parameters. If you
don't know what they should be, use HTMLPurifier_Config::createDefault()
and new HTMLPurifier_Context().
. Printers now properly distinguish between output configuration, and
target configuration. This is not applicable to scripts using
the Printers for HTML Purifier related tasks.
. HTML/CSS Printers must be primed with prepareGenerator($gen_config), otherwise
fatal errors will ensue.
. URIFilter->prepare can return false in order to abort loading of the filter
. Factory for AttrDef_URI implemented, URI#embedded to indicate URI that embeds
an external resource.
. %URI.Munge functionality factored out into a post-filter class.
. Added CurrentCSSProperty context variable during CSS validation
3.1.0, released 2008-05-18
# Unnecessary references to objects (vestiges of PHP4) removed from method
signatures. The following methods do not need references when assigning from
them and will result in E_STRICT errors if you try:
+ HTMLPurifier_Config->get*Definition() [* = HTML, CSS]
+ HTMLPurifier_ConfigSchema::instance()
+ HTMLPurifier_DefinitionCacheFactory::instance()
+ HTMLPurifier_DefinitionCacheFactory->create()
+ HTMLPurifier_DoctypeRegistry->register()
+ HTMLPurifier_DoctypeRegistry->get()
+ HTMLPurifier_HTMLModule->addElement()
+ HTMLPurifier_HTMLModule->addBlankElement()
+ HTMLPurifier_LanguageFactory::instance()
# Printer_ConfigForm's get*() functions were static-ified
# %HTML.ForbiddenAttributes requires attribute declarations to be in the
form of tag@attr, NOT tag.attr (which will throw an error and won't do
anything). This is for forwards compatibility with XML; you'd do best
to migrate an %HTML.AllowedAttributes directives to this syntax too.
! Allow index to be false for config from form creation
! Added HTMLPurifier::VERSION constant
! Commas, not dashes, used for serializer IDs. This change is forwards-compatible
and allows for version numbers like "3.1.0-dev".
! %HTML.Allowed deals gracefully with whitespace anywhere, anytime!
! HTML Purifier's URI handling is a lot more robust, with much stricter
validation checks and better percent encoding handling. Thanks Gareth Heyes
for indicating security vulnerabilities from lax percent encoding.
! Bootstrap autoloader deals more robustly with classes that don't exist,
preventing class_exists($class, true) from barfing.
- InterchangeBuilder now alphabetizes its lists
- Validation error in configdoc output fixed
- Iconv and other encoding errors muted even with custom error handlers that
do not honor error_reporting
- Add protection against imagecrash attack with CSS height/width
- HTMLPurifier::instance() created for consistency, is equivalent to getInstance()
- Fixed and revamped broken ConfigForm smoketest
- Bug with bool/null fields in Printer_ConfigForm fixed
- Bug with global forbidden attributes fixed
- Improved error messages for allowed and forbidden HTML elements and attributes
- Missing (or null) in configdoc documentation restored
- If DOM throws and exception during parsing with PH5P (occurs in newer versions
of DOM), HTML Purifier punts to DirectLex
- Fatal error with unserialization of ScriptRequired
- Created directories are now chmod'ed properly
- Fixed bug with fallback languages in LanguageFactory
- Standalone testing setup properly with autoload
. Out-of-date documentation revised
. UTF-8 encoding check optimization as suggested by Diego
. HTMLPurifier_Error removed in favor of exceptions
. More copy() function removed; should use clone instead
. More extensive unit tests for HTMLDefinition
. assertPurification moved to central harness
. HTMLPurifier_Generator accepts $config and $context parameters during
instantiation, not runtime
. Double-quotes outside of attribute values are now unescaped
3.1.0rc1, released 2008-04-22
# Autoload support added. Internal require_once's removed in favor of an
explicit require list or autoloading. To use HTML Purifier,
you must now either use HTMLPurifier.auto.php
or HTMLPurifier.includes.php; setting the include path and including
HTMLPurifier.php is insufficient--in such cases include HTMLPurifier.autoload.php
as well to register our autoload handler (or modify your autoload function
to check HTMLPurifier_Bootstrap::getPath($class)). You can also use
HTMLPurifier.safe-includes.php for a less performance friendly but more
user-friendly library load.
# HTMLPurifier_ConfigSchema static functions are officially deprecated. Schema
information is stored in the ConfigSchema directory, and the
maintenance/generate-schema-cache.php generates the schema.ser file, which
is now instantiated. Support for userland schema changes coming soon!
# HTMLPurifier_Config will now throw E_USER_NOTICE when you use a directive
alias; to get rid of these errors just modify your configuration to use
the new directive name.
# HTMLPurifier->addFilter is deprecated; built-in filters can now be
enabled using %Filter.$filter_name or by setting your own filters using
%Filter.Custom
# Directive-level safety properties superceded in favor of module-level
safety. Internal method HTMLModule->addElement() has changed, although
the externally visible HTMLDefinition->addElement has *not* changed.
! Extra utility classes for testing and non-library operations can
be found in extras/. Specifically, these are FSTools and ConfigDoc.
You may find a use for these in your own project, but right now they
are highly experimental and volatile.
! Integration with PHPT allows for automated smoketests
! Limited support for proprietary HTML elements, namely <marquee>, sponsored
by Chris. You can enable them with %HTML.Proprietary if your client
demands them.
! Support for !important CSS cascade modifier. By default, this will be stripped
from CSS, but you can enable it using %CSS.AllowImportant
! Support for display and visibility CSS properties added, set %CSS.AllowTricky
to true to use them.
! HTML Purifier now has its own Exception hierarchy under HTMLPurifier_Exception.
Developer error (not enduser error) can cause these to be triggered.
! Experimental kses() wrapper introduced with HTMLPurifier.kses.php
! Finally %CSS.AllowedProperties for tweaking allowed CSS properties without
mucking around with HTMLPurifier_CSSDefinition
! ConfigDoc output has been enhanced with version and deprecation info.
! %HTML.ForbiddenAttributes and %HTML.ForbiddenElements implemented.
- Autoclose now operates iteratively, i.e. <span><span><div> now has
both span tags closed.
- Various HTMLPurifier_Config convenience functions now accept another parameter
$schema which defines what HTMLPurifier_ConfigSchema to use besides the
global default.
- Fix bug with trusted script handling in libxml versions later than 2.6.28.
- Fix bug in ExtractStyleBlocks with comments in style tags
- Fix bug in comment parsing for DirectLex
- Flush output now displayed when in command line mode for unit tester
- Fix bug with rgb(0, 1, 2) color syntax with spaces inside shorthand syntax
- HTMLPurifier_HTMLDefinition->addAttribute can now be called multiple times
on the same element without emitting errors.
- Fixed fatal error in PH5P lexer with invalid tag names
. Plugins now get their own changelogs according to project conventions.
. Convert tokens to use instanceof, reducing memory footprint and
improving comparison speed.
. Dry runs now supported in SimpleTest; testing facilities improved
. Bootstrap class added for handling autoloading functionality
. Implemented recursive glob at FSTools->globr
. ConfigSchema now has instance methods for all corresponding define*
static methods.
. A couple of new historical maintenance scripts were added.
. HTMLPurifier/HTMLModule/Tidy/XHTMLAndHTML4.php split into two files
. tests/index.php can now be run from any directory.
. HTMLPurifier_Token subclasses split into seperate files
. HTMLPURIFIER_PREFIX now is defined in Bootstrap.php, NOT HTMLPurifier.php
. HTMLPURIFIER_PREFIX can now be defined outside of HTML Purifier
. New --php=php flag added, allows PHP executable to be specified (command
line only!)
. htmlpurifier_add_test() preferred method to translate test files in to
classes, because it handles PHPT files too.
. Debugger class is deprecated and will be removed soon.
. Command line argument parsing for testing scripts revamped, now --opt value
format is supported.
. Smoketests now cleanup after magic quotes
. Generator now can output comments (however, comments are still stripped
from HTML Purifier output)
. HTMLPurifier_ConfigSchema->validate() deprecated in favor of
HTMLPurifier_VarParser->parse()
. Integers auto-cast into float type by VarParser.
. HTMLPURIFIER_STRICT removed; no validation is performed on runtime, only
during cache generation
. Reordered script calls in maintenance/flush.php
. Command line scripts now honor exit codes
. When --flush fails in unit testers, abort tests and print message
. Improved documentation in docs/dev-flush.html about the maintenance scripts
. copy() methods removed in favor of clone keyword
3.0.0, released 2008-01-06
# HTML Purifier is PHP 5 only! The 2.1.x branch will be maintained
until PHP 4 is completely deprecated, but no new features will be added
to it.
+ Visibility declarations added
+ Constructor methods renamed to __construct()
+ PHP4 reference cruft removed (in progress)
! CSS properties are now case-insensitive
! DefinitionCacheFactory now can register new implementations
! New HTMLPurifier_Filter_ExtractStyleBlocks for extracting <style> from
documents and cleaning their contents up. Requires the CSSTidy library
<http://csstidy.sourceforge.net/>. You can access the blocks with the
'StyleBlocks' Context variable ($purifier->context->get('StyleBlocks')).
The output CSS can also be "scoped" for a specific element, use:
%Filter.ExtractStyleBlocksScope
! Experimental support for some proprietary CSS attributes allowed:
opacity (and all of the browser-specific equivalents) and scrollbar colors.
Enable by setting %CSS.Proprietary to true.
- Colors missing # but in hex form will be corrected
- CSS Number algorithm improved
- Unit testing and multi-testing now on steroids: command lines,
XML output, and other goodies now added.
. Unit tests for Injector improved
. New classes:
+ HTMLPurifier_AttrDef_CSS_AlphaValue
+ HTMLPurifier_AttrDef_CSS_Filter
. Multitest now has a file docblock
2.1.3, released 2007-11-05
! tests/multitest.php allows you to test multiple versions by running
tests/index.php through multiple interpreters using `phpv` shell
script (you must provide this script!)
- Fixed poor include ordering for Email URI AttrDefs, causes fatal errors
on some systems.
- Injector algorithm further refined: off-by-one error regarding skip
counts for dormant injectors fixed
- Corrective blockquote definition now enabled for HTML 4.01 Strict
- Fatal error when <img> tag (or any other element with required attributes)
has 'id' attribute fixed, thanks NykO18 for reporting
- Fix warning emitted when a non-supported URI scheme is passed to the
MakeAbsolute URIFilter, thanks NykO18 (again)
- Further refine AutoParagraph injector. Behavior inside of elements
allowing paragraph tags clarified: only inline content delimeted by
double newlines (not block elements) are paragraphed.
- Buggy treatment of end tags of elements that have required attributes
fixed (does not manifest on default tag-set)
- Spurious internal content reorganization error suppressed
- HTMLDefinition->addElement now returns a reference to the created
element object, as implied by the documentation
- Phorum mod's HTML Purifier help message expanded (unreleased elsewhere)
- Fix a theoretical class of infinite loops from DirectLex reported
by Nate Abele
- Work around unnecessary DOMElement type-cast in PH5P that caused errors
in PHP 5.1
- Work around PHP 4 SimpleTest lack-of-error complaining for one-time-only
HTMLDefinition errors, this may indicate problems with error-collecting
facilities in PHP 5
- Make ErrorCollectorEMock work in both PHP 4 and PHP 5
- Make PH5P work with PHP 5.0 by removing unnecessary array parameter typedef
. %Core.AcceptFullDocuments renamed to %Core.ConvertDocumentToFragment
to better communicate its purpose
. Error unit tests can now specify the expectation of no errors. Future
iterations of the harness will be extremely strict about what errors
are allowed
. Extend Injector hooks to allow for more powerful injector routines
. HTMLDefinition->addBlankElement created, as according to the HTMLModule
method
. Doxygen configuration file updated, with minor improvements
. Test runner now checks for similarly named files in conf/ directory too.
. Minor cosmetic change to flush-definition-cache.php: trailing newline is
outputted
. Maintenance script for generating PH5P patch added, original PH5P source
file also added under version control
. Full unit test runner script title made more descriptive with PHP version
. Updated INSTALL file to state that 4.3.7 is the earliest version we
are actively testing
2.1.2, released 2007-09-03
! Implemented Object module for trusted users
! Implemented experimental HTML5 parsing mode using PH5P. To use, add
this to your code:
require_once 'HTMLPurifier/Lexer/PH5P.php';
$config->set('Core', 'LexerImpl', 'PH5P');
Note that this Lexer introduces some classes not in the HTMLPurifier
namespace. Also, this is PHP5 only.
! CSS property border-spacing implemented
- Fix non-visible parsing error in DirectLex with empty tags that have
slashes inside attribute values.
- Fix typo in CSS definition: border-collapse:seperate; was incorrectly
accepted as valid CSS. Usually non-visible, because this styling is the
default for tables in most browsers. Thanks Brett Zamir for pointing
this out.
- Fix validation errors in configuration form
- Hammer out a bunch of edge-case bugs in the standalone distribution
- Inclusion reflection removed from URISchemeRegistry; you must manually
include any new schema files you wish to use
- Numerous typo fixes in documentation thanks to Brett Zamir
. Unit test refactoring for one logical test per test function
. Config and context parameters in ComplexHarness deprecated: instead, edit
the $config and $context member variables
. HTML wrapper in DOMLex now takes DTD identifiers into account; doesn't
really make a difference, but is good for completeness sake
. merge-library.php script refactored for greater code reusability and
PHP4 compatibility
2.1.1, released 2007-08-04
- Fix show-stopper bug in %URI.MakeAbsolute functionality
- Fix PHP4 syntax error in standalone version
. Add prefix directory to include path for standalone, this prevents
other installations from clobbering the standalone's URI schemes
. Single test methods can be invoked by prefixing with __only
2.1.0, released 2007-08-02
# flush-htmldefinition-cache.php superseded in favor of a generic
flush-definition-cache.php script, you can clear a specific cache
by passing its name as a parameter to the script
! Phorum mod implemented for HTML Purifier
! With %Core.AggressivelyFixLt, <3 and similar emoticons no longer
trigger HTML removal in PHP5 (DOMLex). This directive is not necessary
for PHP4 (DirectLex).
! Standalone file now available, which greatly reduces the amount of
includes (although there are still a few files that reside in the
standalone folder)
! Relative URIs can now be transformed into their absolute equivalents
using %URI.Base and %URI.MakeAbsolute
! Ruby implemented for XHTML 1.1
! You can now define custom URI filtering behavior, see enduser-uri-filter.html
for more details
! UTF-8 font names now supported in CSS
- AutoFormatters emit friendly error messages if tags or attributes they
need are not allowed
- ConfigForm's compactification of directive names is now configurable
- AutoParagraph autoformatter algorithm refined after field-testing
- XHTML 1.1 now applies XHTML 1.0 Strict cleanup routines, namely
blockquote wrapping
- Contents of <style> tags removed by default when tags are removed
. HTMLPurifier_Config->getSerial() implemented, this is extremely useful
for output cache invalidation
. ConfigForm printer now can retrieve CSS and JS files as strings, in
case HTML Purifier's directory is not publically accessible
. Introduce new text/itext configuration directive values: these represent
longer strings that would be more appropriately edited with a textarea
. Allow newlines to act as separators for lists, hashes, lookups and
%HTML.Allowed
. ConfigForm generates textareas instead of text inputs for lists, hashes,
lookups, text and itext fields
. Hidden element content removal genericized: %Core.HiddenElements can
be used to customize this behavior, by default <script> and <style> are
hidden
. Added HTMLPURIFIER_PREFIX constant, should be used instead of dirname(__FILE__)
. Custom ChildDef added to default include list
. URIScheme reflection improved: will not attempt to include file if class
already exists. May clobber autoload, so I need to keep an eye on it
. ConfigSchema heavily optimized, will only collect information and validate
definitions when HTMLPURIFIER_SCHEMA_STRICT is true.
. AttrDef_URI unit tests and implementation refactored
. benchmarks/ directory now protected from public view with .htaccess file;
run the tests via command line
. URI scheme is munged off if there is no authority and the scheme is the
default one
. All unit tests inherit from HTMLPurifier_Harness, not UnitTestCase
. Interface for URIScheme changed
. Generic URI object to hold components of URI added, most systems involved
in URI validation have been migrated to use it
. Custom filtering for URIs factored out to URIDefinition interface for
maximum extensibility
2.0.1, released 2007-06-27
! Tag auto-closing now based on a ChildDef heuristic rather than a
manually set auto_close array; some behavior may change
! Experimental AutoFormat functionality added: auto-paragraph and
linkify your HTML input by setting %AutoFormat.AutoParagraph and
%AutoFormat.Linkify to true
! Newlines normalized internally, and then converted back to the
value of PHP_EOL. If this is not desired, set your newline format
using %Output.Newline.
! Beta error collection, messages are implemented for the most generic
cases involving Lexing or Strategies
- Clean up special case code for <script> tags
- Reorder includes for DefinitionCache decorators, fixes a possible
missing class error
- Fixed bug where manually modified definitions were not saved via cache
(mostly harmless, except for the fact that it would be a little slower)
- Configuration objects with different serials do not clobber each
others when revision numbers are unequal
- Improve Serializer DefinitionCache directory permissions checks
- DefinitionCache no longer throws errors when it encounters old
serial files that do not conform to the current style
- Stray xmlns attributes removed from configuration documentation
- configForm.php smoketest no longer has XSS vulnerability due to
unescaped print_r output
- Printer adheres to configuration's directives on output format
- Fix improperly named form field in ConfigForm printer
. Rewire some test-cases to swallow errors rather than expect them
. HTMLDefinition printer updated with some of the new attributes
. DefinitionCache keys reordered to reflect precedence: version number,
hash, then revision number
. %Core.DefinitionCache renamed to %Cache.DefinitionImpl
. Interlinking in configuration documentation added using
Injector_PurifierLinkify
. Directives now keep track of aliases to themselves
. Error collector now requires a severity to be passed, use PHP's internal
error constants for this
. HTMLPurifier_Config::getAllowedDirectivesForForm implemented, allows
much easier selective embedding of configuration values
. Doctype objects now accept public and system DTD identifiers
. %HTML.Doctype is now constrained by specific values, to specify a custom
doctype use new %HTML.CustomDoctype
. ConfigForm truncates long directives to keep the form small, and does
not re-output namespaces
2.0.0, released 2007-06-20
# Completely refactored HTMLModuleManager, decentralizing safety
information
# Transform modules changed to Tidy modules, which offer more flexibility
and better modularization
# Configuration object now finalizes itself when a read operation is
performed on it, ensuring that its internal state stays consistent.
To revert this behavior, you can set the $autoFinalize member variable
off, but it's not recommended.
# New compact syntax for AttrDef objects that can be used to instantiate
new objects via make()
# Definitions (esp. HTMLDefinition) are now cached for a significant
performance boost. You can disable caching by setting %Core.DefinitionCache
to null. You CANNOT edit raw definitions without setting the corresponding
DefinitionID directive (%HTML.DefinitionID for HTMLDefinition).
# Contents between <script> tags are now completely removed if <script>
is not allowed
# Prototype-declarations for Lexer removed in favor of configuration
determination of Lexer implementations.
! HTML Purifier now works in PHP 4.3.2.
! Configuration form-editing API makes tweaking HTMLPurifier_Config a
breeze!
! Configuration directives that accept hashes now allow new string
format: key1:value1,key2:value2
! ConfigDoc now factored into OOP design
! All deprecated elements now natively supported
! Implement TinyMCE styled whitelist specification format in
%HTML.Allowed
! Config object gives more friendly error messages when things go wrong
! Advanced API implemented: easy functions for creating elements (addElement)
and attributes (addAttribute) on HTMLDefinition
! Add native support for required attributes
- Deprecated and removed EnableRedundantUTF8Cleaning. It didn't even work!
- DOMLex will not emit errors when a custom error handler that does not
honor error_reporting is used
- StrictBlockquote child definition refrains from wrapping whitespace
in tags now.
- Bug resulting from tag transforms to non-allowed elements fixed
- ChildDef_Custom's regex generation has been improved, removing several
false positives
. Unit test for ElementDef created, ElementDef behavior modified to
be more flexible
. Added convenience functions for HTMLModule constructors
. AttrTypes now has accessor functions that should be used instead
of directly manipulating info
. TagTransform_Center deprecated in favor of generic TagTransform_Simple
. Add extra protection in AttrDef_URI against phantom Schemes
. Doctype object added to HTMLDefinition which describes certain aspects
of the operational document type
. Lexer is now pre-emptively included, with a conditional include for the
PHP5 only version.
. HTMLDefinition and CSSDefinition have a common parent class: Definition.
. DirectLex can now track line-numbers
. Preliminary error collector is in place, although no code actually reports
errors yet
. Factor out most of ValidateAttributes to new AttrValidator class
1.6.1, released 2007-05-05
! Support for more deprecated attributes via transformations:
+ hspace and vspace in img
+ size and noshade in hr
+ nowrap in td
+ clear in br
+ align in caption, table, img and hr
+ type in ul, ol and li
! DirectLex now preserves text in which a < bracket is followed by
a non-alphanumeric character. This means that certain emoticons
are now preserved.
! %Core.RemoveInvalidImg is now operational, when set to false invalid
images will hang around with an empty src
! target attribute in a tag supported, use %Attr.AllowedFrameTargets
to enable
! CSS property white-space now allows nowrap (supported in all modern
browsers) but not others (which have spotty browser implementations)
! XHTML 1.1 mode now sort-of works without any fatal errors, and
lang is now moved over to xml:lang.
! Attribute transformation smoketest available at smoketests/attrTransform.php
! Transformation of font's size attribute now handles super-large numbers
- Possibly fatal bug with __autoload() fixed in module manager
- Invert HTMLModuleManager->addModule() processing order to check
prefixes first and then the literal module
- Empty strings get converted to empty arrays instead of arrays with
an empty string in them.
- Merging in attribute lists now works.
. Demo script removed: it has been added to the website's repository
. Basic.php script modified to work out of the box
. Refactor AttrTransform classes to reduce duplication
. AttrTransform_TextAlign axed in favor of a more general
AttrTransform_EnumToCSS, refer to HTMLModule/TransformToStrict.php to
see how the new equivalent is implemented
. Unit tests now use exclusively assertIdentical
1.6.0, released 2007-04-01
! Support for most common deprecated attributes via transformations:
+ bgcolor in td, th, tr and table
+ border in img
+ name in a and img
+ width in td, th and hr
+ height in td, th
! Support for CSS attribute 'height' added
! Support for rel and rev attributes in a tags added, use %Attr.AllowedRel
and %Attr.AllowedRev to activate
- You can define ID blacklists using regular expressions via
%Attr.IDBlacklistRegexp
- Error messages are emitted when you attempt to "allow" elements or
attributes that HTML Purifier does not support
- Fix segfault in unit test. The problem is not very reproduceable and
I don't know what causes it, but a six line patch fixed it.
1.5.0, released 2007-03-23
! Added a rudimentary I18N and L10N system modeled off MediaWiki. It
doesn't actually do anything yet, but keep your eyes peeled.
! docs/enduser-utf8.html explains how to use UTF-8 and HTML Purifier
! Newly structured HTMLDefinition modeled off of XHTML 1.1 modules.
I am loathe to release beta quality APIs, but this is exactly that;
don't use the internal interfaces if you're not willing to do migration
later on.
- Allow 'x' subtag in language codes
- Fixed buggy chameleon-support for ins and del
. Added support for IDREF attributes (i.e. for)
. Renamed HTMLPurifier_AttrDef_Class to HTMLPurifier_AttrDef_Nmtokens
. Removed context variable ParentType, replaced with IsInline, which
is false when you're not inline and an integer of the parent that
caused you to become inline when you are (so possibly zero)
. Removed ElementDef->type in favor of ElementDef->descendants_are_inline
and HTMLDefinition->content_sets
. StrictBlockquote now reports what elements its supposed to allow,
rather than what it does allow
. Removed HTMLDefinition->info_flow_elements in favor of
HTMLDefinition->content_sets['Flow']
. Removed redundant "exclusionary" definitions from DTD roster
. StrictBlockquote now requires a construction parameter as if it
were an Required ChildDef, this is the "real" set of allowed elements
. AttrDef partitioned into HTML, CSS and URI segments
. Modify Youtube filter regexp to be multiline
. Require both PHP5 and DOM extension in order to use DOMLex, fixes
some edge cases where a DOMDocument class exists in a PHP4 environment
due to DOM XML extension.
1.4.1, released 2007-01-21
! docs/enduser-youtube.html updated according to new functionality
- YouTube IDs can have underscores and dashes
1.4.0, released 2007-01-21
! Implemented list-style-image, URIs now allowed in list-style
! Implemented background-image, background-repeat, background-attachment
and background-position CSS properties. Shorthand property background
supports all of these properties.
! Configuration documentation looks nicer
! Added %Core.EscapeNonASCIICharacters to workaround loss of Unicode
characters while %Core.Encoding is set to a non-UTF-8 encoding.
! Support for configuration directive aliases added
! Config object can now be instantiated from ini files
! YouTube preservation code added to the core, with two lines of code
you can add it as a filter to your code. See smoketests/preserveYouTube.php
for sample code.
! Moved SLOW to docs/enduser-slow.html and added code examples
- Replaced version check with functionality check for DOM (thanks Stephen
Khoo)
. Added smoketest 'all.php', which loads all other smoketests via frames
. Implemented AttrDef_CSSURI for url(http://google.com) style declarations
. Added convenient single test selector form on test runner
1.3.2, released 2006-12-25
! HTMLPurifier object now accepts configuration arrays, no need to manually
instantiate a configuration object
! Context object now accessible to outside
! Added enduser-youtube.html, explains how to embed YouTube videos. See
also corresponding smoketest preserveYouTube.php.
! Added purifyArray(), which takes a list of HTML and purifies it all
! Added static member variable $version to HTML Purifier with PHP-compatible
version number string.
- Fixed fatal error thrown by upper-cased language attributes
- printDefinition.php: added labels, added better clarification
. HTMLPurifier_Config::create() added, takes mixed variable and converts into
a HTMLPurifier_Config object.
1.3.1, released 2006-12-06
! Added HTMLPurifier.func.php stub for a convenient function to call the library
- Fixed bug in RemoveInvalidImg code that caused all images to be dropped
(thanks to .mario for reporting this)
. Standardized all attribute handling variables to attr, made it plural
1.3.0, released 2006-11-26
# Invalid images are now removed, rather than replaced with a dud
<img src="" alt="Invalid image" />. Previous behavior can be restored
with new directive %Core.RemoveInvalidImg set to false.
! (X)HTML Strict now supported
+ Transparently handles inline elements in block context (blockquote)
! Added GET method to demo for easier validation, added 50kb max input size
! New directive %HTML.BlockWrapper, for block-ifying inline elements
! New directive %HTML.Parent, allows you to only allow inline content
! New directives %HTML.AllowedElements and %HTML.AllowedAttributes to let
users narrow the set of allowed tags
! <li value="4"> and <ul start="2"> now allowed in loose mode
! New directives %URI.DisableExternalResources and %URI.DisableResources
! New directive %Attr.DisableURI, which eliminates all hyperlinking
! New directive %URI.Munge, munges URI so you can use some sort of redirector
service to avoid PageRank leaks or warn users that they are exiting your site.
! Added spiffy new smoketest printDefinition.php, which lets you twiddle with
the configuration settings and see how the internal rules are affected.
! New directive %URI.HostBlacklist for blocking links to bad hosts.
xssAttacks.php smoketest updated accordingly.
- Added missing type to ChildDef_Chameleon
- Remove Tidy option from demo if there is not Tidy available
. ChildDef_Required guards against empty tags
. Lookup table HTMLDefinition->info_flow_elements added
. Added peace-of-mind variable initialization to Strategy_FixNesting
. Added HTMLPurifier->info_parent_def, parent child processing made special
. Added internal documents briefly summarizing future progression of HTML
. HTMLPurifier_Config->getBatch($namespace) added
. More lenient casting to bool from string in HTMLPurifier_ConfigSchema
. Refactored ChildDef classes into their own files
1.2.0, released 2006-11-19
# ID attributes now disabled by default. New directives:
+ %HTML.EnableAttrID - restores old behavior by allowing IDs
+ %Attr.IDPrefix - %Attr.IDBlacklist alternative that munges all user IDs
so that they don't collide with your IDs
+ %Attr.IDPrefixLocal - Same as above, but for when there are multiple
instances of user content on the page
+ Profuse documentation on how to use these available in docs/enduser-id.txt
! Added MODx plugin <http://modxcms.com/forums/index.php/topic,6604.0.html>
! Added percent encoding normalization
! XSS attacks smoketest given facelift
! Configuration documentation now has table of contents
! Added %URI.DisableExternal, which prevents links to external websites. You
can also use %URI.Host to permit absolute linking to subdomains
! Non-accessible resources (ex. mailto) blocked from embedded URIs (img src)
- Type variable in HTMLDefinition was not being set properly, fixed
- Documentation updated
+ TODO added request Phalanger
+ TODO added request Native compression
+ TODO added request Remove redundant tags
+ TODO added possible plaintext formatter for HTML Purifier documentation
+ Updated ConfigDoc TODO
+ Improved inline comments in AttrDef/Class.php, AttrDef/CSS.php
and AttrDef/Host.php
+ Revamped documentation into HTML, along with misc updates
- HTMLPurifier_Context doesn't throw a variable reference error if you attempt
to retrieve a non-existent variable
. Switched to purify()-wide Context object registry
. Refactored unit tests to minimize duplication
. XSS attack sheet updated
. configdoc.xml now has xml:space attached to default value nodes
. Allow configuration directives to permit null values
. Cleaned up test-cases to remove unnecessary swallowErrors()
1.1.2, released 2006-09-30
! Add HTMLPurifier.auto.php stub file that configures include_path
- Documentation updated
+ INSTALL document rewritten
+ TODO added semi-lossy conversion
+ API Doxygen docs' file exclusions updated
+ Added notes on HTML versus XML attribute whitespace handling
+ Noted that HTMLPurifier_ChildDef_Custom isn't being used
+ Noted that config object's definitions are cached versions
- Fixed lack of attribute parsing in HTMLPurifier_Lexer_PEARSax3
- ftp:// URIs now have their typecodes checked
- Hooked up HTMLPurifier_ChildDef_Custom's unit tests (they weren't being run)
. Line endings standardized throughout project (svn:eol-style standardized)
. Refactored parseData() to general Lexer class
. Tester named "HTML Purifier" not "HTMLPurifier"
1.1.1, released 2006-09-24
! Configuration option to optionally Tidy up output for indentation to make up
for dropped whitespace by DOMLex (pretty-printing for the entire application
should be done by a page-wide Tidy)
- Various documentation updates
- Fixed parse error in configuration documentation script
- Fixed fatal error in benchmark scripts, slightly augmented
- As far as possible, whitespace is preserved in-between table children
- Sample test-settings.php file included
1.1.0, released 2006-09-16
! Directive documentation generation using XSLT
! XHTML can now be turned off, output becomes <br>
- Made URI validator more forgiving: will ignore leading and trailing
quotes, apostrophes and less than or greater than signs.
- Enforce alphanumeric namespace and directive names for configuration.
- Table child definition made more flexible, will fix up poorly ordered elements
. Renamed ConfigDef to ConfigSchema
1.0.1, released 2006-09-04
- Fixed slight bug in DOMLex attribute parsing
- Fixed rejection of case-insensitive configuration values when there is a
set of allowed values. This manifested in %Core.Encoding.
- Fixed rejection of inline style declarations that had lots of extra
space in them. This manifested in TinyMCE.
1.0.0, released 2006-09-01
! Shorthand CSS properties implemented: font, border, background, list-style
! Basic color keywords translated into hexadecimal values
! Table CSS properties implemented
! Support for charsets other than UTF-8 (defined by iconv)
! Malformed UTF-8 and non-SGML character detection and cleaning implemented
- Fixed broken numeric entity conversion
- API documentation completed
. (HTML|CSS)Definition de-singleton-ized
1.0.0beta, released 2006-08-16
! First public release, most functionality implemented. Notable omissions are:
+ Shorthand CSS properties
+ Table CSS properties
+ Deprecated attribute transformations
vim: et sw=4 sts=4

View File

@ -0,0 +1,11 @@
<?php
/**
* This is a stub include that automatically configures the include path.
*/
set_include_path(dirname(__FILE__) . PATH_SEPARATOR . get_include_path() );
require_once 'HTMLPurifier/Bootstrap.php';
require_once 'HTMLPurifier.autoload.php';
// vim: et sw=4 sts=4

View File

@ -0,0 +1,21 @@
<?php
/**
* @file
* Convenience file that registers autoload handler for HTML Purifier.
*/
if (function_exists('spl_autoload_register') && function_exists('spl_autoload_unregister')) {
// We need unregister for our pre-registering functionality
HTMLPurifier_Bootstrap::registerAutoload();
if (function_exists('__autoload')) {
// Be polite and ensure that userland autoload gets retained
spl_autoload_register('__autoload');
}
} elseif (!function_exists('__autoload')) {
function __autoload($class) {
return HTMLPurifier_Bootstrap::autoload($class);
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,23 @@
<?php
/**
* @file
* Defines a function wrapper for HTML Purifier for quick use.
* @note ''HTMLPurifier()'' is NOT the same as ''new HTMLPurifier()''
*/
/**
* Purify HTML.
* @param $html String HTML to purify
* @param $config Configuration to use, can be any value accepted by
* HTMLPurifier_Config::create()
*/
function HTMLPurifier($html, $config = null) {
static $purifier = false;
if (!$purifier) {
$purifier = new HTMLPurifier();
}
return $purifier->purify($html, $config);
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,206 @@
<?php
/**
* @file
* This file was auto-generated by generate-includes.php and includes all of
* the core files required by HTML Purifier. Use this if performance is a
* primary concern and you are using an opcode cache. PLEASE DO NOT EDIT THIS
* FILE, changes will be overwritten the next time the script is run.
*
* @version 3.3.0
*
* @warning
* You must *not* include any other HTML Purifier files before this file,
* because 'require' not 'require_once' is used.
*
* @warning
* This file requires that the include path contains the HTML Purifier
* library directory; this is not auto-set.
*/
require 'HTMLPurifier.php';
require 'HTMLPurifier/AttrCollections.php';
require 'HTMLPurifier/AttrDef.php';
require 'HTMLPurifier/AttrTransform.php';
require 'HTMLPurifier/AttrTypes.php';
require 'HTMLPurifier/AttrValidator.php';
require 'HTMLPurifier/Bootstrap.php';
require 'HTMLPurifier/Definition.php';
require 'HTMLPurifier/CSSDefinition.php';
require 'HTMLPurifier/ChildDef.php';
require 'HTMLPurifier/Config.php';
require 'HTMLPurifier/ConfigSchema.php';
require 'HTMLPurifier/ContentSets.php';
require 'HTMLPurifier/Context.php';
require 'HTMLPurifier/DefinitionCache.php';
require 'HTMLPurifier/DefinitionCacheFactory.php';
require 'HTMLPurifier/Doctype.php';
require 'HTMLPurifier/DoctypeRegistry.php';
require 'HTMLPurifier/ElementDef.php';
require 'HTMLPurifier/Encoder.php';
require 'HTMLPurifier/EntityLookup.php';
require 'HTMLPurifier/EntityParser.php';
require 'HTMLPurifier/ErrorCollector.php';
require 'HTMLPurifier/ErrorStruct.php';
require 'HTMLPurifier/Exception.php';
require 'HTMLPurifier/Filter.php';
require 'HTMLPurifier/Generator.php';
require 'HTMLPurifier/HTMLDefinition.php';
require 'HTMLPurifier/HTMLModule.php';
require 'HTMLPurifier/HTMLModuleManager.php';
require 'HTMLPurifier/IDAccumulator.php';
require 'HTMLPurifier/Injector.php';
require 'HTMLPurifier/Language.php';
require 'HTMLPurifier/LanguageFactory.php';
require 'HTMLPurifier/Length.php';
require 'HTMLPurifier/Lexer.php';
require 'HTMLPurifier/PercentEncoder.php';
require 'HTMLPurifier/PropertyList.php';
require 'HTMLPurifier/PropertyListIterator.php';
require 'HTMLPurifier/Strategy.php';
require 'HTMLPurifier/StringHash.php';
require 'HTMLPurifier/StringHashParser.php';
require 'HTMLPurifier/TagTransform.php';
require 'HTMLPurifier/Token.php';
require 'HTMLPurifier/TokenFactory.php';
require 'HTMLPurifier/URI.php';
require 'HTMLPurifier/URIDefinition.php';
require 'HTMLPurifier/URIFilter.php';
require 'HTMLPurifier/URIParser.php';
require 'HTMLPurifier/URIScheme.php';
require 'HTMLPurifier/URISchemeRegistry.php';
require 'HTMLPurifier/UnitConverter.php';
require 'HTMLPurifier/VarParser.php';
require 'HTMLPurifier/VarParserException.php';
require 'HTMLPurifier/AttrDef/CSS.php';
require 'HTMLPurifier/AttrDef/Enum.php';
require 'HTMLPurifier/AttrDef/Integer.php';
require 'HTMLPurifier/AttrDef/Lang.php';
require 'HTMLPurifier/AttrDef/Switch.php';
require 'HTMLPurifier/AttrDef/Text.php';
require 'HTMLPurifier/AttrDef/URI.php';
require 'HTMLPurifier/AttrDef/CSS/Number.php';
require 'HTMLPurifier/AttrDef/CSS/AlphaValue.php';
require 'HTMLPurifier/AttrDef/CSS/Background.php';
require 'HTMLPurifier/AttrDef/CSS/BackgroundPosition.php';
require 'HTMLPurifier/AttrDef/CSS/Border.php';
require 'HTMLPurifier/AttrDef/CSS/Color.php';
require 'HTMLPurifier/AttrDef/CSS/Composite.php';
require 'HTMLPurifier/AttrDef/CSS/DenyElementDecorator.php';
require 'HTMLPurifier/AttrDef/CSS/Filter.php';
require 'HTMLPurifier/AttrDef/CSS/Font.php';
require 'HTMLPurifier/AttrDef/CSS/FontFamily.php';
require 'HTMLPurifier/AttrDef/CSS/ImportantDecorator.php';
require 'HTMLPurifier/AttrDef/CSS/Length.php';
require 'HTMLPurifier/AttrDef/CSS/ListStyle.php';
require 'HTMLPurifier/AttrDef/CSS/Multiple.php';
require 'HTMLPurifier/AttrDef/CSS/Percentage.php';
require 'HTMLPurifier/AttrDef/CSS/TextDecoration.php';
require 'HTMLPurifier/AttrDef/CSS/URI.php';
require 'HTMLPurifier/AttrDef/HTML/Bool.php';
require 'HTMLPurifier/AttrDef/HTML/Color.php';
require 'HTMLPurifier/AttrDef/HTML/FrameTarget.php';
require 'HTMLPurifier/AttrDef/HTML/ID.php';
require 'HTMLPurifier/AttrDef/HTML/Pixels.php';
require 'HTMLPurifier/AttrDef/HTML/Length.php';
require 'HTMLPurifier/AttrDef/HTML/LinkTypes.php';
require 'HTMLPurifier/AttrDef/HTML/MultiLength.php';
require 'HTMLPurifier/AttrDef/HTML/Nmtokens.php';
require 'HTMLPurifier/AttrDef/URI/Email.php';
require 'HTMLPurifier/AttrDef/URI/Host.php';
require 'HTMLPurifier/AttrDef/URI/IPv4.php';
require 'HTMLPurifier/AttrDef/URI/IPv6.php';
require 'HTMLPurifier/AttrDef/URI/Email/SimpleCheck.php';
require 'HTMLPurifier/AttrTransform/Background.php';
require 'HTMLPurifier/AttrTransform/BdoDir.php';
require 'HTMLPurifier/AttrTransform/BgColor.php';
require 'HTMLPurifier/AttrTransform/BoolToCSS.php';
require 'HTMLPurifier/AttrTransform/Border.php';
require 'HTMLPurifier/AttrTransform/EnumToCSS.php';
require 'HTMLPurifier/AttrTransform/ImgRequired.php';
require 'HTMLPurifier/AttrTransform/ImgSpace.php';
require 'HTMLPurifier/AttrTransform/Input.php';
require 'HTMLPurifier/AttrTransform/Lang.php';
require 'HTMLPurifier/AttrTransform/Length.php';
require 'HTMLPurifier/AttrTransform/Name.php';
require 'HTMLPurifier/AttrTransform/SafeEmbed.php';
require 'HTMLPurifier/AttrTransform/SafeObject.php';
require 'HTMLPurifier/AttrTransform/SafeParam.php';
require 'HTMLPurifier/AttrTransform/ScriptRequired.php';
require 'HTMLPurifier/AttrTransform/Textarea.php';
require 'HTMLPurifier/ChildDef/Chameleon.php';
require 'HTMLPurifier/ChildDef/Custom.php';
require 'HTMLPurifier/ChildDef/Empty.php';
require 'HTMLPurifier/ChildDef/Required.php';
require 'HTMLPurifier/ChildDef/Optional.php';
require 'HTMLPurifier/ChildDef/StrictBlockquote.php';
require 'HTMLPurifier/ChildDef/Table.php';
require 'HTMLPurifier/DefinitionCache/Decorator.php';
require 'HTMLPurifier/DefinitionCache/Null.php';
require 'HTMLPurifier/DefinitionCache/Serializer.php';
require 'HTMLPurifier/DefinitionCache/Decorator/Cleanup.php';
require 'HTMLPurifier/DefinitionCache/Decorator/Memory.php';
require 'HTMLPurifier/HTMLModule/Bdo.php';
require 'HTMLPurifier/HTMLModule/CommonAttributes.php';
require 'HTMLPurifier/HTMLModule/Edit.php';
require 'HTMLPurifier/HTMLModule/Forms.php';
require 'HTMLPurifier/HTMLModule/Hypertext.php';
require 'HTMLPurifier/HTMLModule/Image.php';
require 'HTMLPurifier/HTMLModule/Legacy.php';
require 'HTMLPurifier/HTMLModule/List.php';
require 'HTMLPurifier/HTMLModule/Name.php';
require 'HTMLPurifier/HTMLModule/NonXMLCommonAttributes.php';
require 'HTMLPurifier/HTMLModule/Object.php';
require 'HTMLPurifier/HTMLModule/Presentation.php';
require 'HTMLPurifier/HTMLModule/Proprietary.php';
require 'HTMLPurifier/HTMLModule/Ruby.php';
require 'HTMLPurifier/HTMLModule/SafeEmbed.php';
require 'HTMLPurifier/HTMLModule/SafeObject.php';
require 'HTMLPurifier/HTMLModule/Scripting.php';
require 'HTMLPurifier/HTMLModule/StyleAttribute.php';
require 'HTMLPurifier/HTMLModule/Tables.php';
require 'HTMLPurifier/HTMLModule/Target.php';
require 'HTMLPurifier/HTMLModule/Text.php';
require 'HTMLPurifier/HTMLModule/Tidy.php';
require 'HTMLPurifier/HTMLModule/XMLCommonAttributes.php';
require 'HTMLPurifier/HTMLModule/Tidy/Name.php';
require 'HTMLPurifier/HTMLModule/Tidy/Proprietary.php';
require 'HTMLPurifier/HTMLModule/Tidy/XHTMLAndHTML4.php';
require 'HTMLPurifier/HTMLModule/Tidy/Strict.php';
require 'HTMLPurifier/HTMLModule/Tidy/Transitional.php';
require 'HTMLPurifier/HTMLModule/Tidy/XHTML.php';
require 'HTMLPurifier/Injector/AutoParagraph.php';
require 'HTMLPurifier/Injector/DisplayLinkURI.php';
require 'HTMLPurifier/Injector/Linkify.php';
require 'HTMLPurifier/Injector/PurifierLinkify.php';
require 'HTMLPurifier/Injector/RemoveEmpty.php';
require 'HTMLPurifier/Injector/SafeObject.php';
require 'HTMLPurifier/Lexer/DOMLex.php';
require 'HTMLPurifier/Lexer/DirectLex.php';
require 'HTMLPurifier/Strategy/Composite.php';
require 'HTMLPurifier/Strategy/Core.php';
require 'HTMLPurifier/Strategy/FixNesting.php';
require 'HTMLPurifier/Strategy/MakeWellFormed.php';
require 'HTMLPurifier/Strategy/RemoveForeignElements.php';
require 'HTMLPurifier/Strategy/ValidateAttributes.php';
require 'HTMLPurifier/TagTransform/Font.php';
require 'HTMLPurifier/TagTransform/Simple.php';
require 'HTMLPurifier/Token/Comment.php';
require 'HTMLPurifier/Token/Tag.php';
require 'HTMLPurifier/Token/Empty.php';
require 'HTMLPurifier/Token/End.php';
require 'HTMLPurifier/Token/Start.php';
require 'HTMLPurifier/Token/Text.php';
require 'HTMLPurifier/URIFilter/DisableExternal.php';
require 'HTMLPurifier/URIFilter/DisableExternalResources.php';
require 'HTMLPurifier/URIFilter/HostBlacklist.php';
require 'HTMLPurifier/URIFilter/MakeAbsolute.php';
require 'HTMLPurifier/URIFilter/Munge.php';
require 'HTMLPurifier/URIScheme/ftp.php';
require 'HTMLPurifier/URIScheme/http.php';
require 'HTMLPurifier/URIScheme/https.php';
require 'HTMLPurifier/URIScheme/mailto.php';
require 'HTMLPurifier/URIScheme/news.php';
require 'HTMLPurifier/URIScheme/nntp.php';
require 'HTMLPurifier/VarParser/Flexible.php';
require 'HTMLPurifier/VarParser/Native.php';

View File

@ -0,0 +1,30 @@
<?php
/**
* @file
* Emulation layer for code that used kses(), substituting in HTML Purifier.
*/
require_once dirname(__FILE__) . '/HTMLPurifier.auto.php';
function kses($string, $allowed_html, $allowed_protocols = null) {
$config = HTMLPurifier_Config::createDefault();
$allowed_elements = array();
$allowed_attributes = array();
foreach ($allowed_html as $element => $attributes) {
$allowed_elements[$element] = true;
foreach ($attributes as $attribute => $x) {
$allowed_attributes["$element.$attribute"] = true;
}
}
$config->set('HTML', 'AllowedElements', $allowed_elements);
$config->set('HTML', 'AllowedAttributes', $allowed_attributes);
$allowed_schemes = array();
if ($allowed_protocols !== null) {
$config->set('URI', 'AllowedSchemes', $allowed_protocols);
}
$purifier = new HTMLPurifier($config);
return $purifier->purify($string);
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,11 @@
<?php
/**
* @file
* Convenience stub file that adds HTML Purifier's library file to the path
* without any other side-effects.
*/
set_include_path(dirname(__FILE__) . PATH_SEPARATOR . get_include_path() );
// vim: et sw=4 sts=4

View File

@ -0,0 +1,236 @@
<?php
/*! @mainpage
*
* HTML Purifier is an HTML filter that will take an arbitrary snippet of
* HTML and rigorously test, validate and filter it into a version that
* is safe for output onto webpages. It achieves this by:
*
* -# Lexing (parsing into tokens) the document,
* -# Executing various strategies on the tokens:
* -# Removing all elements not in the whitelist,
* -# Making the tokens well-formed,
* -# Fixing the nesting of the nodes, and
* -# Validating attributes of the nodes; and
* -# Generating HTML from the purified tokens.
*
* However, most users will only need to interface with the HTMLPurifier
* and HTMLPurifier_Config.
*/
/*
HTML Purifier 3.3.0 - Standards Compliant HTML Filtering
Copyright (C) 2006-2008 Edward Z. Yang
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with this library; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
*/
/**
* Facade that coordinates HTML Purifier's subsystems in order to purify HTML.
*
* @note There are several points in which configuration can be specified
* for HTML Purifier. The precedence of these (from lowest to
* highest) is as follows:
* -# Instance: new HTMLPurifier($config)
* -# Invocation: purify($html, $config)
* These configurations are entirely independent of each other and
* are *not* merged (this behavior may change in the future).
*
* @todo We need an easier way to inject strategies using the configuration
* object.
*/
class HTMLPurifier
{
/** Version of HTML Purifier */
public $version = '3.3.0';
/** Constant with version of HTML Purifier */
const VERSION = '3.3.0';
/** Global configuration object */
public $config;
/** Array of extra HTMLPurifier_Filter objects to run on HTML, for backwards compatibility */
private $filters = array();
/** Single instance of HTML Purifier */
private static $instance;
protected $strategy, $generator;
/**
* Resultant HTMLPurifier_Context of last run purification. Is an array
* of contexts if the last called method was purifyArray().
*/
public $context;
/**
* Initializes the purifier.
* @param $config Optional HTMLPurifier_Config object for all instances of
* the purifier, if omitted, a default configuration is
* supplied (which can be overridden on a per-use basis).
* The parameter can also be any type that
* HTMLPurifier_Config::create() supports.
*/
public function __construct($config = null) {
$this->config = HTMLPurifier_Config::create($config);
$this->strategy = new HTMLPurifier_Strategy_Core();
}
/**
* Adds a filter to process the output. First come first serve
* @param $filter HTMLPurifier_Filter object
*/
public function addFilter($filter) {
trigger_error('HTMLPurifier->addFilter() is deprecated, use configuration directives in the Filter namespace or Filter.Custom', E_USER_WARNING);
$this->filters[] = $filter;
}
/**
* Filters an HTML snippet/document to be XSS-free and standards-compliant.
*
* @param $html String of HTML to purify
* @param $config HTMLPurifier_Config object for this operation, if omitted,
* defaults to the config object specified during this
* object's construction. The parameter can also be any type
* that HTMLPurifier_Config::create() supports.
* @return Purified HTML
*/
public function purify($html, $config = null) {
// :TODO: make the config merge in, instead of replace
$config = $config ? HTMLPurifier_Config::create($config) : $this->config;
// implementation is partially environment dependant, partially
// configuration dependant
$lexer = HTMLPurifier_Lexer::create($config);
$context = new HTMLPurifier_Context();
// setup HTML generator
$this->generator = new HTMLPurifier_Generator($config, $context);
$context->register('Generator', $this->generator);
// set up global context variables
if ($config->get('Core', 'CollectErrors')) {
// may get moved out if other facilities use it
$language_factory = HTMLPurifier_LanguageFactory::instance();
$language = $language_factory->create($config, $context);
$context->register('Locale', $language);
$error_collector = new HTMLPurifier_ErrorCollector($context);
$context->register('ErrorCollector', $error_collector);
}
// setup id_accumulator context, necessary due to the fact that
// AttrValidator can be called from many places
$id_accumulator = HTMLPurifier_IDAccumulator::build($config, $context);
$context->register('IDAccumulator', $id_accumulator);
$html = HTMLPurifier_Encoder::convertToUTF8($html, $config, $context);
// setup filters
$filter_flags = $config->getBatch('Filter');
$custom_filters = $filter_flags['Custom'];
unset($filter_flags['Custom']);
$filters = array();
foreach ($filter_flags as $filter => $flag) {
if (!$flag) continue;
$class = "HTMLPurifier_Filter_$filter";
$filters[] = new $class;
}
foreach ($custom_filters as $filter) {
// maybe "HTMLPurifier_Filter_$filter", but be consistent with AutoFormat
$filters[] = $filter;
}
$filters = array_merge($filters, $this->filters);
// maybe prepare(), but later
for ($i = 0, $filter_size = count($filters); $i < $filter_size; $i++) {
$html = $filters[$i]->preFilter($html, $config, $context);
}
// purified HTML
$html =
$this->generator->generateFromTokens(
// list of tokens
$this->strategy->execute(
// list of un-purified tokens
$lexer->tokenizeHTML(
// un-purified HTML
$html, $config, $context
),
$config, $context
)
);
for ($i = $filter_size - 1; $i >= 0; $i--) {
$html = $filters[$i]->postFilter($html, $config, $context);
}
$html = HTMLPurifier_Encoder::convertFromUTF8($html, $config, $context);
$this->context =& $context;
return $html;
}
/**
* Filters an array of HTML snippets
* @param $config Optional HTMLPurifier_Config object for this operation.
* See HTMLPurifier::purify() for more details.
* @return Array of purified HTML
*/
public function purifyArray($array_of_html, $config = null) {
$context_array = array();
foreach ($array_of_html as $key => $html) {
$array_of_html[$key] = $this->purify($html, $config);
$context_array[$key] = $this->context;
}
$this->context = $context_array;
return $array_of_html;
}
/**
* Singleton for enforcing just one HTML Purifier in your system
* @param $prototype Optional prototype HTMLPurifier instance to
* overload singleton with, or HTMLPurifier_Config
* instance to configure the generated version with.
*/
public static function instance($prototype = null) {
if (!self::$instance || $prototype) {
if ($prototype instanceof HTMLPurifier) {
self::$instance = $prototype;
} elseif ($prototype) {
self::$instance = new HTMLPurifier($prototype);
} else {
self::$instance = new HTMLPurifier();
}
}
return self::$instance;
}
/**
* @note Backwards compatibility, see instance()
*/
public static function getInstance($prototype = null) {
return HTMLPurifier::instance($prototype);
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,200 @@
<?php
/**
* @file
* This file was auto-generated by generate-includes.php and includes all of
* the core files required by HTML Purifier. This is a convenience stub that
* includes all files using dirname(__FILE__) and require_once. PLEASE DO NOT
* EDIT THIS FILE, changes will be overwritten the next time the script is run.
*
* Changes to include_path are not necessary.
*/
$__dir = dirname(__FILE__);
require_once $__dir . '/HTMLPurifier.php';
require_once $__dir . '/HTMLPurifier/AttrCollections.php';
require_once $__dir . '/HTMLPurifier/AttrDef.php';
require_once $__dir . '/HTMLPurifier/AttrTransform.php';
require_once $__dir . '/HTMLPurifier/AttrTypes.php';
require_once $__dir . '/HTMLPurifier/AttrValidator.php';
require_once $__dir . '/HTMLPurifier/Bootstrap.php';
require_once $__dir . '/HTMLPurifier/Definition.php';
require_once $__dir . '/HTMLPurifier/CSSDefinition.php';
require_once $__dir . '/HTMLPurifier/ChildDef.php';
require_once $__dir . '/HTMLPurifier/Config.php';
require_once $__dir . '/HTMLPurifier/ConfigSchema.php';
require_once $__dir . '/HTMLPurifier/ContentSets.php';
require_once $__dir . '/HTMLPurifier/Context.php';
require_once $__dir . '/HTMLPurifier/DefinitionCache.php';
require_once $__dir . '/HTMLPurifier/DefinitionCacheFactory.php';
require_once $__dir . '/HTMLPurifier/Doctype.php';
require_once $__dir . '/HTMLPurifier/DoctypeRegistry.php';
require_once $__dir . '/HTMLPurifier/ElementDef.php';
require_once $__dir . '/HTMLPurifier/Encoder.php';
require_once $__dir . '/HTMLPurifier/EntityLookup.php';
require_once $__dir . '/HTMLPurifier/EntityParser.php';
require_once $__dir . '/HTMLPurifier/ErrorCollector.php';
require_once $__dir . '/HTMLPurifier/ErrorStruct.php';
require_once $__dir . '/HTMLPurifier/Exception.php';
require_once $__dir . '/HTMLPurifier/Filter.php';
require_once $__dir . '/HTMLPurifier/Generator.php';
require_once $__dir . '/HTMLPurifier/HTMLDefinition.php';
require_once $__dir . '/HTMLPurifier/HTMLModule.php';
require_once $__dir . '/HTMLPurifier/HTMLModuleManager.php';
require_once $__dir . '/HTMLPurifier/IDAccumulator.php';
require_once $__dir . '/HTMLPurifier/Injector.php';
require_once $__dir . '/HTMLPurifier/Language.php';
require_once $__dir . '/HTMLPurifier/LanguageFactory.php';
require_once $__dir . '/HTMLPurifier/Length.php';
require_once $__dir . '/HTMLPurifier/Lexer.php';
require_once $__dir . '/HTMLPurifier/PercentEncoder.php';
require_once $__dir . '/HTMLPurifier/PropertyList.php';
require_once $__dir . '/HTMLPurifier/PropertyListIterator.php';
require_once $__dir . '/HTMLPurifier/Strategy.php';
require_once $__dir . '/HTMLPurifier/StringHash.php';
require_once $__dir . '/HTMLPurifier/StringHashParser.php';
require_once $__dir . '/HTMLPurifier/TagTransform.php';
require_once $__dir . '/HTMLPurifier/Token.php';
require_once $__dir . '/HTMLPurifier/TokenFactory.php';
require_once $__dir . '/HTMLPurifier/URI.php';
require_once $__dir . '/HTMLPurifier/URIDefinition.php';
require_once $__dir . '/HTMLPurifier/URIFilter.php';
require_once $__dir . '/HTMLPurifier/URIParser.php';
require_once $__dir . '/HTMLPurifier/URIScheme.php';
require_once $__dir . '/HTMLPurifier/URISchemeRegistry.php';
require_once $__dir . '/HTMLPurifier/UnitConverter.php';
require_once $__dir . '/HTMLPurifier/VarParser.php';
require_once $__dir . '/HTMLPurifier/VarParserException.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS.php';
require_once $__dir . '/HTMLPurifier/AttrDef/Enum.php';
require_once $__dir . '/HTMLPurifier/AttrDef/Integer.php';
require_once $__dir . '/HTMLPurifier/AttrDef/Lang.php';
require_once $__dir . '/HTMLPurifier/AttrDef/Switch.php';
require_once $__dir . '/HTMLPurifier/AttrDef/Text.php';
require_once $__dir . '/HTMLPurifier/AttrDef/URI.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/Number.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/AlphaValue.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/Background.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/BackgroundPosition.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/Border.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/Color.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/Composite.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/DenyElementDecorator.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/Filter.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/Font.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/FontFamily.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/ImportantDecorator.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/Length.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/ListStyle.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/Multiple.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/Percentage.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/TextDecoration.php';
require_once $__dir . '/HTMLPurifier/AttrDef/CSS/URI.php';
require_once $__dir . '/HTMLPurifier/AttrDef/HTML/Bool.php';
require_once $__dir . '/HTMLPurifier/AttrDef/HTML/Color.php';
require_once $__dir . '/HTMLPurifier/AttrDef/HTML/FrameTarget.php';
require_once $__dir . '/HTMLPurifier/AttrDef/HTML/ID.php';
require_once $__dir . '/HTMLPurifier/AttrDef/HTML/Pixels.php';
require_once $__dir . '/HTMLPurifier/AttrDef/HTML/Length.php';
require_once $__dir . '/HTMLPurifier/AttrDef/HTML/LinkTypes.php';
require_once $__dir . '/HTMLPurifier/AttrDef/HTML/MultiLength.php';
require_once $__dir . '/HTMLPurifier/AttrDef/HTML/Nmtokens.php';
require_once $__dir . '/HTMLPurifier/AttrDef/URI/Email.php';
require_once $__dir . '/HTMLPurifier/AttrDef/URI/Host.php';
require_once $__dir . '/HTMLPurifier/AttrDef/URI/IPv4.php';
require_once $__dir . '/HTMLPurifier/AttrDef/URI/IPv6.php';
require_once $__dir . '/HTMLPurifier/AttrDef/URI/Email/SimpleCheck.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/Background.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/BdoDir.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/BgColor.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/BoolToCSS.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/Border.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/EnumToCSS.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/ImgRequired.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/ImgSpace.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/Input.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/Lang.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/Length.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/Name.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/SafeEmbed.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/SafeObject.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/SafeParam.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/ScriptRequired.php';
require_once $__dir . '/HTMLPurifier/AttrTransform/Textarea.php';
require_once $__dir . '/HTMLPurifier/ChildDef/Chameleon.php';
require_once $__dir . '/HTMLPurifier/ChildDef/Custom.php';
require_once $__dir . '/HTMLPurifier/ChildDef/Empty.php';
require_once $__dir . '/HTMLPurifier/ChildDef/Required.php';
require_once $__dir . '/HTMLPurifier/ChildDef/Optional.php';
require_once $__dir . '/HTMLPurifier/ChildDef/StrictBlockquote.php';
require_once $__dir . '/HTMLPurifier/ChildDef/Table.php';
require_once $__dir . '/HTMLPurifier/DefinitionCache/Decorator.php';
require_once $__dir . '/HTMLPurifier/DefinitionCache/Null.php';
require_once $__dir . '/HTMLPurifier/DefinitionCache/Serializer.php';
require_once $__dir . '/HTMLPurifier/DefinitionCache/Decorator/Cleanup.php';
require_once $__dir . '/HTMLPurifier/DefinitionCache/Decorator/Memory.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Bdo.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/CommonAttributes.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Edit.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Forms.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Hypertext.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Image.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Legacy.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/List.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Name.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/NonXMLCommonAttributes.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Object.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Presentation.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Proprietary.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Ruby.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/SafeEmbed.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/SafeObject.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Scripting.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/StyleAttribute.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tables.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Target.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Text.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/XMLCommonAttributes.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/Name.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/Proprietary.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/XHTMLAndHTML4.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/Strict.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/Transitional.php';
require_once $__dir . '/HTMLPurifier/HTMLModule/Tidy/XHTML.php';
require_once $__dir . '/HTMLPurifier/Injector/AutoParagraph.php';
require_once $__dir . '/HTMLPurifier/Injector/DisplayLinkURI.php';
require_once $__dir . '/HTMLPurifier/Injector/Linkify.php';
require_once $__dir . '/HTMLPurifier/Injector/PurifierLinkify.php';
require_once $__dir . '/HTMLPurifier/Injector/RemoveEmpty.php';
require_once $__dir . '/HTMLPurifier/Injector/SafeObject.php';
require_once $__dir . '/HTMLPurifier/Lexer/DOMLex.php';
require_once $__dir . '/HTMLPurifier/Lexer/DirectLex.php';
require_once $__dir . '/HTMLPurifier/Strategy/Composite.php';
require_once $__dir . '/HTMLPurifier/Strategy/Core.php';
require_once $__dir . '/HTMLPurifier/Strategy/FixNesting.php';
require_once $__dir . '/HTMLPurifier/Strategy/MakeWellFormed.php';
require_once $__dir . '/HTMLPurifier/Strategy/RemoveForeignElements.php';
require_once $__dir . '/HTMLPurifier/Strategy/ValidateAttributes.php';
require_once $__dir . '/HTMLPurifier/TagTransform/Font.php';
require_once $__dir . '/HTMLPurifier/TagTransform/Simple.php';
require_once $__dir . '/HTMLPurifier/Token/Comment.php';
require_once $__dir . '/HTMLPurifier/Token/Tag.php';
require_once $__dir . '/HTMLPurifier/Token/Empty.php';
require_once $__dir . '/HTMLPurifier/Token/End.php';
require_once $__dir . '/HTMLPurifier/Token/Start.php';
require_once $__dir . '/HTMLPurifier/Token/Text.php';
require_once $__dir . '/HTMLPurifier/URIFilter/DisableExternal.php';
require_once $__dir . '/HTMLPurifier/URIFilter/DisableExternalResources.php';
require_once $__dir . '/HTMLPurifier/URIFilter/HostBlacklist.php';
require_once $__dir . '/HTMLPurifier/URIFilter/MakeAbsolute.php';
require_once $__dir . '/HTMLPurifier/URIFilter/Munge.php';
require_once $__dir . '/HTMLPurifier/URIScheme/ftp.php';
require_once $__dir . '/HTMLPurifier/URIScheme/http.php';
require_once $__dir . '/HTMLPurifier/URIScheme/https.php';
require_once $__dir . '/HTMLPurifier/URIScheme/mailto.php';
require_once $__dir . '/HTMLPurifier/URIScheme/news.php';
require_once $__dir . '/HTMLPurifier/URIScheme/nntp.php';
require_once $__dir . '/HTMLPurifier/VarParser/Flexible.php';
require_once $__dir . '/HTMLPurifier/VarParser/Native.php';

View File

@ -0,0 +1,128 @@
<?php
/**
* Defines common attribute collections that modules reference
*/
class HTMLPurifier_AttrCollections
{
/**
* Associative array of attribute collections, indexed by name
*/
public $info = array();
/**
* Performs all expansions on internal data for use by other inclusions
* It also collects all attribute collection extensions from
* modules
* @param $attr_types HTMLPurifier_AttrTypes instance
* @param $modules Hash array of HTMLPurifier_HTMLModule members
*/
public function __construct($attr_types, $modules) {
// load extensions from the modules
foreach ($modules as $module) {
foreach ($module->attr_collections as $coll_i => $coll) {
if (!isset($this->info[$coll_i])) {
$this->info[$coll_i] = array();
}
foreach ($coll as $attr_i => $attr) {
if ($attr_i === 0 && isset($this->info[$coll_i][$attr_i])) {
// merge in includes
$this->info[$coll_i][$attr_i] = array_merge(
$this->info[$coll_i][$attr_i], $attr);
continue;
}
$this->info[$coll_i][$attr_i] = $attr;
}
}
}
// perform internal expansions and inclusions
foreach ($this->info as $name => $attr) {
// merge attribute collections that include others
$this->performInclusions($this->info[$name]);
// replace string identifiers with actual attribute objects
$this->expandIdentifiers($this->info[$name], $attr_types);
}
}
/**
* Takes a reference to an attribute associative array and performs
* all inclusions specified by the zero index.
* @param &$attr Reference to attribute array
*/
public function performInclusions(&$attr) {
if (!isset($attr[0])) return;
$merge = $attr[0];
$seen = array(); // recursion guard
// loop through all the inclusions
for ($i = 0; isset($merge[$i]); $i++) {
if (isset($seen[$merge[$i]])) continue;
$seen[$merge[$i]] = true;
// foreach attribute of the inclusion, copy it over
if (!isset($this->info[$merge[$i]])) continue;
foreach ($this->info[$merge[$i]] as $key => $value) {
if (isset($attr[$key])) continue; // also catches more inclusions
$attr[$key] = $value;
}
if (isset($this->info[$merge[$i]][0])) {
// recursion
$merge = array_merge($merge, $this->info[$merge[$i]][0]);
}
}
unset($attr[0]);
}
/**
* Expands all string identifiers in an attribute array by replacing
* them with the appropriate values inside HTMLPurifier_AttrTypes
* @param &$attr Reference to attribute array
* @param $attr_types HTMLPurifier_AttrTypes instance
*/
public function expandIdentifiers(&$attr, $attr_types) {
// because foreach will process new elements we add, make sure we
// skip duplicates
$processed = array();
foreach ($attr as $def_i => $def) {
// skip inclusions
if ($def_i === 0) continue;
if (isset($processed[$def_i])) continue;
// determine whether or not attribute is required
if ($required = (strpos($def_i, '*') !== false)) {
// rename the definition
unset($attr[$def_i]);
$def_i = trim($def_i, '*');
$attr[$def_i] = $def;
}
$processed[$def_i] = true;
// if we've already got a literal object, move on
if (is_object($def)) {
// preserve previous required
$attr[$def_i]->required = ($required || $attr[$def_i]->required);
continue;
}
if ($def === false) {
unset($attr[$def_i]);
continue;
}
if ($t = $attr_types->get($def)) {
$attr[$def_i] = $t;
$attr[$def_i]->required = $required;
} else {
unset($attr[$def_i]);
}
}
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,87 @@
<?php
/**
* Base class for all validating attribute definitions.
*
* This family of classes forms the core for not only HTML attribute validation,
* but also any sort of string that needs to be validated or cleaned (which
* means CSS properties and composite definitions are defined here too).
* Besides defining (through code) what precisely makes the string valid,
* subclasses are also responsible for cleaning the code if possible.
*/
abstract class HTMLPurifier_AttrDef
{
/**
* Tells us whether or not an HTML attribute is minimized. Has no
* meaning in other contexts.
*/
public $minimized = false;
/**
* Tells us whether or not an HTML attribute is required. Has no
* meaning in other contexts
*/
public $required = false;
/**
* Validates and cleans passed string according to a definition.
*
* @param $string String to be validated and cleaned.
* @param $config Mandatory HTMLPurifier_Config object.
* @param $context Mandatory HTMLPurifier_AttrContext object.
*/
abstract public function validate($string, $config, $context);
/**
* Convenience method that parses a string as if it were CDATA.
*
* This method process a string in the manner specified at
* <http://www.w3.org/TR/html4/types.html#h-6.2> by removing
* leading and trailing whitespace, ignoring line feeds, and replacing
* carriage returns and tabs with spaces. While most useful for HTML
* attributes specified as CDATA, it can also be applied to most CSS
* values.
*
* @note This method is not entirely standards compliant, as trim() removes
* more types of whitespace than specified in the spec. In practice,
* this is rarely a problem, as those extra characters usually have
* already been removed by HTMLPurifier_Encoder.
*
* @warning This processing is inconsistent with XML's whitespace handling
* as specified by section 3.3.3 and referenced XHTML 1.0 section
* 4.7. However, note that we are NOT necessarily
* parsing XML, thus, this behavior may still be correct. We
* assume that newlines have been normalized.
*/
public function parseCDATA($string) {
$string = trim($string);
$string = str_replace(array("\n", "\t", "\r"), ' ', $string);
return $string;
}
/**
* Factory method for creating this class from a string.
* @param $string String construction info
* @return Created AttrDef object corresponding to $string
*/
public function make($string) {
// default implementation, return a flyweight of this object.
// If $string has an effect on the returned object (i.e. you
// need to overload this method), it is best
// to clone or instantiate new copies. (Instantiation is safer.)
return $this;
}
/**
* Removes spaces from rgb(0, 0, 0) so that shorthand CSS properties work
* properly. THIS IS A HACK!
*/
protected function mungeRgb($string) {
return preg_replace('/rgb\((\d+)\s*,\s*(\d+)\s*,\s*(\d+)\)/', 'rgb(\1,\2,\3)', $string);
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,87 @@
<?php
/**
* Validates the HTML attribute style, otherwise known as CSS.
* @note We don't implement the whole CSS specification, so it might be
* difficult to reuse this component in the context of validating
* actual stylesheet declarations.
* @note If we were really serious about validating the CSS, we would
* tokenize the styles and then parse the tokens. Obviously, we
* are not doing that. Doing that could seriously harm performance,
* but would make these components a lot more viable for a CSS
* filtering solution.
*/
class HTMLPurifier_AttrDef_CSS extends HTMLPurifier_AttrDef
{
public function validate($css, $config, $context) {
$css = $this->parseCDATA($css);
$definition = $config->getCSSDefinition();
// we're going to break the spec and explode by semicolons.
// This is because semicolon rarely appears in escaped form
// Doing this is generally flaky but fast
// IT MIGHT APPEAR IN URIs, see HTMLPurifier_AttrDef_CSSURI
// for details
$declarations = explode(';', $css);
$propvalues = array();
/**
* Name of the current CSS property being validated.
*/
$property = false;
$context->register('CurrentCSSProperty', $property);
foreach ($declarations as $declaration) {
if (!$declaration) continue;
if (!strpos($declaration, ':')) continue;
list($property, $value) = explode(':', $declaration, 2);
$property = trim($property);
$value = trim($value);
$ok = false;
do {
if (isset($definition->info[$property])) {
$ok = true;
break;
}
if (ctype_lower($property)) break;
$property = strtolower($property);
if (isset($definition->info[$property])) {
$ok = true;
break;
}
} while(0);
if (!$ok) continue;
// inefficient call, since the validator will do this again
if (strtolower(trim($value)) !== 'inherit') {
// inherit works for everything (but only on the base property)
$result = $definition->info[$property]->validate(
$value, $config, $context );
} else {
$result = 'inherit';
}
if ($result === false) continue;
$propvalues[$property] = $result;
}
$context->destroy('CurrentCSSProperty');
// procedure does not write the new CSS simultaneously, so it's
// slightly inefficient, but it's the only way of getting rid of
// duplicates. Perhaps config to optimize it, but not now.
$new_declarations = '';
foreach ($propvalues as $prop => $value) {
$new_declarations .= "$prop:$value;";
}
return $new_declarations ? $new_declarations : false;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,21 @@
<?php
class HTMLPurifier_AttrDef_CSS_AlphaValue extends HTMLPurifier_AttrDef_CSS_Number
{
public function __construct() {
parent::__construct(false); // opacity is non-negative, but we will clamp it
}
public function validate($number, $config, $context) {
$result = parent::validate($number, $config, $context);
if ($result === false) return $result;
$float = (float) $result;
if ($float < 0.0) $result = '0';
if ($float > 1.0) $result = '1';
return $result;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,87 @@
<?php
/**
* Validates shorthand CSS property background.
* @warning Does not support url tokens that have internal spaces.
*/
class HTMLPurifier_AttrDef_CSS_Background extends HTMLPurifier_AttrDef
{
/**
* Local copy of component validators.
* @note See HTMLPurifier_AttrDef_Font::$info for a similar impl.
*/
protected $info;
public function __construct($config) {
$def = $config->getCSSDefinition();
$this->info['background-color'] = $def->info['background-color'];
$this->info['background-image'] = $def->info['background-image'];
$this->info['background-repeat'] = $def->info['background-repeat'];
$this->info['background-attachment'] = $def->info['background-attachment'];
$this->info['background-position'] = $def->info['background-position'];
}
public function validate($string, $config, $context) {
// regular pre-processing
$string = $this->parseCDATA($string);
if ($string === '') return false;
// munge rgb() decl if necessary
$string = $this->mungeRgb($string);
// assumes URI doesn't have spaces in it
$bits = explode(' ', strtolower($string)); // bits to process
$caught = array();
$caught['color'] = false;
$caught['image'] = false;
$caught['repeat'] = false;
$caught['attachment'] = false;
$caught['position'] = false;
$i = 0; // number of catches
$none = false;
foreach ($bits as $bit) {
if ($bit === '') continue;
foreach ($caught as $key => $status) {
if ($key != 'position') {
if ($status !== false) continue;
$r = $this->info['background-' . $key]->validate($bit, $config, $context);
} else {
$r = $bit;
}
if ($r === false) continue;
if ($key == 'position') {
if ($caught[$key] === false) $caught[$key] = '';
$caught[$key] .= $r . ' ';
} else {
$caught[$key] = $r;
}
$i++;
break;
}
}
if (!$i) return false;
if ($caught['position'] !== false) {
$caught['position'] = $this->info['background-position']->
validate($caught['position'], $config, $context);
}
$ret = array();
foreach ($caught as $value) {
if ($value === false) continue;
$ret[] = $value;
}
if (empty($ret)) return false;
return implode(' ', $ret);
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,126 @@
<?php
/* W3C says:
[ // adjective and number must be in correct order, even if
// you could switch them without introducing ambiguity.
// some browsers support that syntax
[
<percentage> | <length> | left | center | right
]
[
<percentage> | <length> | top | center | bottom
]?
] |
[ // this signifies that the vertical and horizontal adjectives
// can be arbitrarily ordered, however, there can only be two,
// one of each, or none at all
[
left | center | right
] ||
[
top | center | bottom
]
]
top, left = 0%
center, (none) = 50%
bottom, right = 100%
*/
/* QuirksMode says:
keyword + length/percentage must be ordered correctly, as per W3C
Internet Explorer and Opera, however, support arbitrary ordering. We
should fix it up.
Minor issue though, not strictly necessary.
*/
// control freaks may appreciate the ability to convert these to
// percentages or something, but it's not necessary
/**
* Validates the value of background-position.
*/
class HTMLPurifier_AttrDef_CSS_BackgroundPosition extends HTMLPurifier_AttrDef
{
protected $length;
protected $percentage;
public function __construct() {
$this->length = new HTMLPurifier_AttrDef_CSS_Length();
$this->percentage = new HTMLPurifier_AttrDef_CSS_Percentage();
}
public function validate($string, $config, $context) {
$string = $this->parseCDATA($string);
$bits = explode(' ', $string);
$keywords = array();
$keywords['h'] = false; // left, right
$keywords['v'] = false; // top, bottom
$keywords['c'] = false; // center
$measures = array();
$i = 0;
$lookup = array(
'top' => 'v',
'bottom' => 'v',
'left' => 'h',
'right' => 'h',
'center' => 'c'
);
foreach ($bits as $bit) {
if ($bit === '') continue;
// test for keyword
$lbit = ctype_lower($bit) ? $bit : strtolower($bit);
if (isset($lookup[$lbit])) {
$status = $lookup[$lbit];
$keywords[$status] = $lbit;
$i++;
}
// test for length
$r = $this->length->validate($bit, $config, $context);
if ($r !== false) {
$measures[] = $r;
$i++;
}
// test for percentage
$r = $this->percentage->validate($bit, $config, $context);
if ($r !== false) {
$measures[] = $r;
$i++;
}
}
if (!$i) return false; // no valid values were caught
$ret = array();
// first keyword
if ($keywords['h']) $ret[] = $keywords['h'];
elseif (count($measures)) $ret[] = array_shift($measures);
elseif ($keywords['c']) {
$ret[] = $keywords['c'];
$keywords['c'] = false; // prevent re-use: center = center center
}
if ($keywords['v']) $ret[] = $keywords['v'];
elseif (count($measures)) $ret[] = array_shift($measures);
elseif ($keywords['c']) $ret[] = $keywords['c'];
if (empty($ret)) return false;
return implode(' ', $ret);
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,43 @@
<?php
/**
* Validates the border property as defined by CSS.
*/
class HTMLPurifier_AttrDef_CSS_Border extends HTMLPurifier_AttrDef
{
/**
* Local copy of properties this property is shorthand for.
*/
protected $info = array();
public function __construct($config) {
$def = $config->getCSSDefinition();
$this->info['border-width'] = $def->info['border-width'];
$this->info['border-style'] = $def->info['border-style'];
$this->info['border-top-color'] = $def->info['border-top-color'];
}
public function validate($string, $config, $context) {
$string = $this->parseCDATA($string);
$string = $this->mungeRgb($string);
$bits = explode(' ', $string);
$done = array(); // segments we've finished
$ret = ''; // return value
foreach ($bits as $bit) {
foreach ($this->info as $propname => $validator) {
if (isset($done[$propname])) continue;
$r = $validator->validate($bit, $config, $context);
if ($r !== false) {
$ret .= $r . ' ';
$done[$propname] = true;
break;
}
}
}
return rtrim($ret);
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,78 @@
<?php
/**
* Validates Color as defined by CSS.
*/
class HTMLPurifier_AttrDef_CSS_Color extends HTMLPurifier_AttrDef
{
public function validate($color, $config, $context) {
static $colors = null;
if ($colors === null) $colors = $config->get('Core', 'ColorKeywords');
$color = trim($color);
if ($color === '') return false;
$lower = strtolower($color);
if (isset($colors[$lower])) return $colors[$lower];
if (strpos($color, 'rgb(') !== false) {
// rgb literal handling
$length = strlen($color);
if (strpos($color, ')') !== $length - 1) return false;
$triad = substr($color, 4, $length - 4 - 1);
$parts = explode(',', $triad);
if (count($parts) !== 3) return false;
$type = false; // to ensure that they're all the same type
$new_parts = array();
foreach ($parts as $part) {
$part = trim($part);
if ($part === '') return false;
$length = strlen($part);
if ($part[$length - 1] === '%') {
// handle percents
if (!$type) {
$type = 'percentage';
} elseif ($type !== 'percentage') {
return false;
}
$num = (float) substr($part, 0, $length - 1);
if ($num < 0) $num = 0;
if ($num > 100) $num = 100;
$new_parts[] = "$num%";
} else {
// handle integers
if (!$type) {
$type = 'integer';
} elseif ($type !== 'integer') {
return false;
}
$num = (int) $part;
if ($num < 0) $num = 0;
if ($num > 255) $num = 255;
$new_parts[] = (string) $num;
}
}
$new_triad = implode(',', $new_parts);
$color = "rgb($new_triad)";
} else {
// hexadecimal handling
if ($color[0] === '#') {
$hex = substr($color, 1);
} else {
$hex = $color;
$color = '#' . $color;
}
$length = strlen($hex);
if ($length !== 3 && $length !== 6) return false;
if (!ctype_xdigit($hex)) return false;
}
return $color;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,38 @@
<?php
/**
* Allows multiple validators to attempt to validate attribute.
*
* Composite is just what it sounds like: a composite of many validators.
* This means that multiple HTMLPurifier_AttrDef objects will have a whack
* at the string. If one of them passes, that's what is returned. This is
* especially useful for CSS values, which often are a choice between
* an enumerated set of predefined values or a flexible data type.
*/
class HTMLPurifier_AttrDef_CSS_Composite extends HTMLPurifier_AttrDef
{
/**
* List of HTMLPurifier_AttrDef objects that may process strings
* @todo Make protected
*/
public $defs;
/**
* @param $defs List of HTMLPurifier_AttrDef objects
*/
public function __construct($defs) {
$this->defs = $defs;
}
public function validate($string, $config, $context) {
foreach ($this->defs as $i => $def) {
$result = $this->defs[$i]->validate($string, $config, $context);
if ($result !== false) return $result;
}
return false;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,28 @@
<?php
/**
* Decorator which enables CSS properties to be disabled for specific elements.
*/
class HTMLPurifier_AttrDef_CSS_DenyElementDecorator extends HTMLPurifier_AttrDef
{
public $def, $element;
/**
* @param $def Definition to wrap
* @param $element Element to deny
*/
public function __construct($def, $element) {
$this->def = $def;
$this->element = $element;
}
/**
* Checks if CurrentToken is set and equal to $this->element
*/
public function validate($string, $config, $context) {
$token = $context->get('CurrentToken', true);
if ($token && $token->name == $this->element) return false;
return $this->def->validate($string, $config, $context);
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,54 @@
<?php
/**
* Microsoft's proprietary filter: CSS property
* @note Currently supports the alpha filter. In the future, this will
* probably need an extensible framework
*/
class HTMLPurifier_AttrDef_CSS_Filter extends HTMLPurifier_AttrDef
{
protected $intValidator;
public function __construct() {
$this->intValidator = new HTMLPurifier_AttrDef_Integer();
}
public function validate($value, $config, $context) {
$value = $this->parseCDATA($value);
if ($value === 'none') return $value;
// if we looped this we could support multiple filters
$function_length = strcspn($value, '(');
$function = trim(substr($value, 0, $function_length));
if ($function !== 'alpha' &&
$function !== 'Alpha' &&
$function !== 'progid:DXImageTransform.Microsoft.Alpha'
) return false;
$cursor = $function_length + 1;
$parameters_length = strcspn($value, ')', $cursor);
$parameters = substr($value, $cursor, $parameters_length);
$params = explode(',', $parameters);
$ret_params = array();
$lookup = array();
foreach ($params as $param) {
list($key, $value) = explode('=', $param);
$key = trim($key);
$value = trim($value);
if (isset($lookup[$key])) continue;
if ($key !== 'opacity') continue;
$value = $this->intValidator->validate($value, $config, $context);
if ($value === false) continue;
$int = (int) $value;
if ($int > 100) $value = '100';
if ($int < 0) $value = '0';
$ret_params[] = "$key=$value";
$lookup[$key] = true;
}
$ret_parameters = implode(',', $ret_params);
$ret_function = "$function($ret_parameters)";
return $ret_function;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,149 @@
<?php
/**
* Validates shorthand CSS property font.
*/
class HTMLPurifier_AttrDef_CSS_Font extends HTMLPurifier_AttrDef
{
/**
* Local copy of component validators.
*
* @note If we moved specific CSS property definitions to their own
* classes instead of having them be assembled at run time by
* CSSDefinition, this wouldn't be necessary. We'd instantiate
* our own copies.
*/
protected $info = array();
public function __construct($config) {
$def = $config->getCSSDefinition();
$this->info['font-style'] = $def->info['font-style'];
$this->info['font-variant'] = $def->info['font-variant'];
$this->info['font-weight'] = $def->info['font-weight'];
$this->info['font-size'] = $def->info['font-size'];
$this->info['line-height'] = $def->info['line-height'];
$this->info['font-family'] = $def->info['font-family'];
}
public function validate($string, $config, $context) {
static $system_fonts = array(
'caption' => true,
'icon' => true,
'menu' => true,
'message-box' => true,
'small-caption' => true,
'status-bar' => true
);
// regular pre-processing
$string = $this->parseCDATA($string);
if ($string === '') return false;
// check if it's one of the keywords
$lowercase_string = strtolower($string);
if (isset($system_fonts[$lowercase_string])) {
return $lowercase_string;
}
$bits = explode(' ', $string); // bits to process
$stage = 0; // this indicates what we're looking for
$caught = array(); // which stage 0 properties have we caught?
$stage_1 = array('font-style', 'font-variant', 'font-weight');
$final = ''; // output
for ($i = 0, $size = count($bits); $i < $size; $i++) {
if ($bits[$i] === '') continue;
switch ($stage) {
// attempting to catch font-style, font-variant or font-weight
case 0:
foreach ($stage_1 as $validator_name) {
if (isset($caught[$validator_name])) continue;
$r = $this->info[$validator_name]->validate(
$bits[$i], $config, $context);
if ($r !== false) {
$final .= $r . ' ';
$caught[$validator_name] = true;
break;
}
}
// all three caught, continue on
if (count($caught) >= 3) $stage = 1;
if ($r !== false) break;
// attempting to catch font-size and perhaps line-height
case 1:
$found_slash = false;
if (strpos($bits[$i], '/') !== false) {
list($font_size, $line_height) =
explode('/', $bits[$i]);
if ($line_height === '') {
// ooh, there's a space after the slash!
$line_height = false;
$found_slash = true;
}
} else {
$font_size = $bits[$i];
$line_height = false;
}
$r = $this->info['font-size']->validate(
$font_size, $config, $context);
if ($r !== false) {
$final .= $r;
// attempt to catch line-height
if ($line_height === false) {
// we need to scroll forward
for ($j = $i + 1; $j < $size; $j++) {
if ($bits[$j] === '') continue;
if ($bits[$j] === '/') {
if ($found_slash) {
return false;
} else {
$found_slash = true;
continue;
}
}
$line_height = $bits[$j];
break;
}
} else {
// slash already found
$found_slash = true;
$j = $i;
}
if ($found_slash) {
$i = $j;
$r = $this->info['line-height']->validate(
$line_height, $config, $context);
if ($r !== false) {
$final .= '/' . $r;
}
}
$final .= ' ';
$stage = 2;
break;
}
return false;
// attempting to catch font-family
case 2:
$font_family =
implode(' ', array_slice($bits, $i, $size - $i));
$r = $this->info['font-family']->validate(
$font_family, $config, $context);
if ($r !== false) {
$final .= $r . ' ';
// processing completed successfully
return rtrim($final);
}
return false;
}
}
return false;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,90 @@
<?php
/**
* Validates a font family list according to CSS spec
* @todo whitelisting allowed fonts would be nice
*/
class HTMLPurifier_AttrDef_CSS_FontFamily extends HTMLPurifier_AttrDef
{
public function validate($string, $config, $context) {
static $generic_names = array(
'serif' => true,
'sans-serif' => true,
'monospace' => true,
'fantasy' => true,
'cursive' => true
);
// assume that no font names contain commas in them
$fonts = explode(',', $string);
$final = '';
foreach($fonts as $font) {
$font = trim($font);
if ($font === '') continue;
// match a generic name
if (isset($generic_names[$font])) {
$final .= $font . ', ';
continue;
}
// match a quoted name
if ($font[0] === '"' || $font[0] === "'") {
$length = strlen($font);
if ($length <= 2) continue;
$quote = $font[0];
if ($font[$length - 1] !== $quote) continue;
$font = substr($font, 1, $length - 2);
$new_font = '';
for ($i = 0, $c = strlen($font); $i < $c; $i++) {
if ($font[$i] === '\\') {
$i++;
if ($i >= $c) {
$new_font .= '\\';
break;
}
if (ctype_xdigit($font[$i])) {
$code = $font[$i];
for ($a = 1, $i++; $i < $c && $a < 6; $i++, $a++) {
if (!ctype_xdigit($font[$i])) break;
$code .= $font[$i];
}
// We have to be extremely careful when adding
// new characters, to make sure we're not breaking
// the encoding.
$char = HTMLPurifier_Encoder::unichr(hexdec($code));
if (HTMLPurifier_Encoder::cleanUTF8($char) === '') continue;
$new_font .= $char;
if ($i < $c && trim($font[$i]) !== '') $i--;
continue;
}
if ($font[$i] === "\n") continue;
}
$new_font .= $font[$i];
}
$font = $new_font;
}
// $font is a pure representation of the font name
if (ctype_alnum($font) && $font !== '') {
// very simple font, allow it in unharmed
$final .= $font . ', ';
continue;
}
// complicated font, requires quoting
// armor single quotes and new lines
$font = str_replace("\\", "\\\\", $font);
$font = str_replace("'", "\\'", $font);
$final .= "'$font', ";
}
$final = rtrim($final, ', ');
if ($final === '') return false;
return $final;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,40 @@
<?php
/**
* Decorator which enables !important to be used in CSS values.
*/
class HTMLPurifier_AttrDef_CSS_ImportantDecorator extends HTMLPurifier_AttrDef
{
public $def, $allow;
/**
* @param $def Definition to wrap
* @param $allow Whether or not to allow !important
*/
public function __construct($def, $allow = false) {
$this->def = $def;
$this->allow = $allow;
}
/**
* Intercepts and removes !important if necessary
*/
public function validate($string, $config, $context) {
// test for ! and important tokens
$string = trim($string);
$is_important = false;
// :TODO: optimization: test directly for !important and ! important
if (strlen($string) >= 9 && substr($string, -9) === 'important') {
$temp = rtrim(substr($string, 0, -9));
// use a temp, because we might want to restore important
if (strlen($temp) >= 1 && substr($temp, -1) === '!') {
$string = rtrim(substr($temp, 0, -1));
$is_important = true;
}
}
$string = $this->def->validate($string, $config, $context);
if ($this->allow && $is_important) $string .= ' !important';
return $string;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,47 @@
<?php
/**
* Represents a Length as defined by CSS.
*/
class HTMLPurifier_AttrDef_CSS_Length extends HTMLPurifier_AttrDef
{
protected $min, $max;
/**
* @param HTMLPurifier_Length $max Minimum length, or null for no bound. String is also acceptable.
* @param HTMLPurifier_Length $max Maximum length, or null for no bound. String is also acceptable.
*/
public function __construct($min = null, $max = null) {
$this->min = $min !== null ? HTMLPurifier_Length::make($min) : null;
$this->max = $max !== null ? HTMLPurifier_Length::make($max) : null;
}
public function validate($string, $config, $context) {
$string = $this->parseCDATA($string);
// Optimizations
if ($string === '') return false;
if ($string === '0') return '0';
if (strlen($string) === 1) return false;
$length = HTMLPurifier_Length::make($string);
if (!$length->isValid()) return false;
if ($this->min) {
$c = $length->compareTo($this->min);
if ($c === false) return false;
if ($c < 0) return false;
}
if ($this->max) {
$c = $length->compareTo($this->max);
if ($c === false) return false;
if ($c > 0) return false;
}
return $length->toString();
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,78 @@
<?php
/**
* Validates shorthand CSS property list-style.
* @warning Does not support url tokens that have internal spaces.
*/
class HTMLPurifier_AttrDef_CSS_ListStyle extends HTMLPurifier_AttrDef
{
/**
* Local copy of component validators.
* @note See HTMLPurifier_AttrDef_CSS_Font::$info for a similar impl.
*/
protected $info;
public function __construct($config) {
$def = $config->getCSSDefinition();
$this->info['list-style-type'] = $def->info['list-style-type'];
$this->info['list-style-position'] = $def->info['list-style-position'];
$this->info['list-style-image'] = $def->info['list-style-image'];
}
public function validate($string, $config, $context) {
// regular pre-processing
$string = $this->parseCDATA($string);
if ($string === '') return false;
// assumes URI doesn't have spaces in it
$bits = explode(' ', strtolower($string)); // bits to process
$caught = array();
$caught['type'] = false;
$caught['position'] = false;
$caught['image'] = false;
$i = 0; // number of catches
$none = false;
foreach ($bits as $bit) {
if ($i >= 3) return; // optimization bit
if ($bit === '') continue;
foreach ($caught as $key => $status) {
if ($status !== false) continue;
$r = $this->info['list-style-' . $key]->validate($bit, $config, $context);
if ($r === false) continue;
if ($r === 'none') {
if ($none) continue;
else $none = true;
if ($key == 'image') continue;
}
$caught[$key] = $r;
$i++;
break;
}
}
if (!$i) return false;
$ret = array();
// construct type
if ($caught['type']) $ret[] = $caught['type'];
// construct image
if ($caught['image']) $ret[] = $caught['image'];
// construct position
if ($caught['position']) $ret[] = $caught['position'];
if (empty($ret)) return false;
return implode(' ', $ret);
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,58 @@
<?php
/**
* Framework class for strings that involve multiple values.
*
* Certain CSS properties such as border-width and margin allow multiple
* lengths to be specified. This class can take a vanilla border-width
* definition and multiply it, usually into a max of four.
*
* @note Even though the CSS specification isn't clear about it, inherit
* can only be used alone: it will never manifest as part of a multi
* shorthand declaration. Thus, this class does not allow inherit.
*/
class HTMLPurifier_AttrDef_CSS_Multiple extends HTMLPurifier_AttrDef
{
/**
* Instance of component definition to defer validation to.
* @todo Make protected
*/
public $single;
/**
* Max number of values allowed.
* @todo Make protected
*/
public $max;
/**
* @param $single HTMLPurifier_AttrDef to multiply
* @param $max Max number of values allowed (usually four)
*/
public function __construct($single, $max = 4) {
$this->single = $single;
$this->max = $max;
}
public function validate($string, $config, $context) {
$string = $this->parseCDATA($string);
if ($string === '') return false;
$parts = explode(' ', $string); // parseCDATA replaced \r, \t and \n
$length = count($parts);
$final = '';
for ($i = 0, $num = 0; $i < $length && $num < $this->max; $i++) {
if (ctype_space($parts[$i])) continue;
$result = $this->single->validate($parts[$i], $config, $context);
if ($result !== false) {
$final .= $result . ' ';
$num++;
}
}
if ($final === '') return false;
return rtrim($final);
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,69 @@
<?php
/**
* Validates a number as defined by the CSS spec.
*/
class HTMLPurifier_AttrDef_CSS_Number extends HTMLPurifier_AttrDef
{
/**
* Bool indicating whether or not only positive values allowed.
*/
protected $non_negative = false;
/**
* @param $non_negative Bool indicating whether negatives are forbidden
*/
public function __construct($non_negative = false) {
$this->non_negative = $non_negative;
}
/**
* @warning Some contexts do not pass $config, $context. These
* variables should not be used without checking HTMLPurifier_Length
*/
public function validate($number, $config, $context) {
$number = $this->parseCDATA($number);
if ($number === '') return false;
if ($number === '0') return '0';
$sign = '';
switch ($number[0]) {
case '-':
if ($this->non_negative) return false;
$sign = '-';
case '+':
$number = substr($number, 1);
}
if (ctype_digit($number)) {
$number = ltrim($number, '0');
return $number ? $sign . $number : '0';
}
// Period is the only non-numeric character allowed
if (strpos($number, '.') === false) return false;
list($left, $right) = explode('.', $number, 2);
if ($left === '' && $right === '') return false;
if ($left !== '' && !ctype_digit($left)) return false;
$left = ltrim($left, '0');
$right = rtrim($right, '0');
if ($right === '') {
return $left ? $sign . $left : '0';
} elseif (!ctype_digit($right)) {
return false;
}
return $sign . $left . '.' . $right;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,40 @@
<?php
/**
* Validates a Percentage as defined by the CSS spec.
*/
class HTMLPurifier_AttrDef_CSS_Percentage extends HTMLPurifier_AttrDef
{
/**
* Instance of HTMLPurifier_AttrDef_CSS_Number to defer number validation
*/
protected $number_def;
/**
* @param Bool indicating whether to forbid negative values
*/
public function __construct($non_negative = false) {
$this->number_def = new HTMLPurifier_AttrDef_CSS_Number($non_negative);
}
public function validate($string, $config, $context) {
$string = $this->parseCDATA($string);
if ($string === '') return false;
$length = strlen($string);
if ($length === 1) return false;
if ($string[$length - 1] !== '%') return false;
$number = substr($string, 0, $length - 1);
$number = $this->number_def->validate($number, $config, $context);
if ($number === false) return false;
return "$number%";
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,38 @@
<?php
/**
* Validates the value for the CSS property text-decoration
* @note This class could be generalized into a version that acts sort of
* like Enum except you can compound the allowed values.
*/
class HTMLPurifier_AttrDef_CSS_TextDecoration extends HTMLPurifier_AttrDef
{
public function validate($string, $config, $context) {
static $allowed_values = array(
'line-through' => true,
'overline' => true,
'underline' => true,
);
$string = strtolower($this->parseCDATA($string));
if ($string === 'none') return $string;
$parts = explode(' ', $string);
$final = '';
foreach ($parts as $part) {
if (isset($allowed_values[$part])) {
$final .= $part . ' ';
}
}
$final = rtrim($final);
if ($final === '') return false;
return $final;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,56 @@
<?php
/**
* Validates a URI in CSS syntax, which uses url('http://example.com')
* @note While theoretically speaking a URI in a CSS document could
* be non-embedded, as of CSS2 there is no such usage so we're
* generalizing it. This may need to be changed in the future.
* @warning Since HTMLPurifier_AttrDef_CSS blindly uses semicolons as
* the separator, you cannot put a literal semicolon in
* in the URI. Try percent encoding it, in that case.
*/
class HTMLPurifier_AttrDef_CSS_URI extends HTMLPurifier_AttrDef_URI
{
public function __construct() {
parent::__construct(true); // always embedded
}
public function validate($uri_string, $config, $context) {
// parse the URI out of the string and then pass it onto
// the parent object
$uri_string = $this->parseCDATA($uri_string);
if (strpos($uri_string, 'url(') !== 0) return false;
$uri_string = substr($uri_string, 4);
$new_length = strlen($uri_string) - 1;
if ($uri_string[$new_length] != ')') return false;
$uri = trim(substr($uri_string, 0, $new_length));
if (!empty($uri) && ($uri[0] == "'" || $uri[0] == '"')) {
$quote = $uri[0];
$new_length = strlen($uri) - 1;
if ($uri[$new_length] !== $quote) return false;
$uri = substr($uri, 1, $new_length - 1);
}
$keys = array( '(', ')', ',', ' ', '"', "'");
$values = array('\\(', '\\)', '\\,', '\\ ', '\\"', "\\'");
$uri = str_replace($values, $keys, $uri);
$result = parent::validate($uri, $config, $context);
if ($result === false) return false;
// escape necessary characters according to CSS spec
// except for the comma, none of these should appear in the
// URI at all
$result = str_replace($keys, $values, $result);
return "url($result)";
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,65 @@
<?php
// Enum = Enumerated
/**
* Validates a keyword against a list of valid values.
* @warning The case-insensitive compare of this function uses PHP's
* built-in strtolower and ctype_lower functions, which may
* cause problems with international comparisons
*/
class HTMLPurifier_AttrDef_Enum extends HTMLPurifier_AttrDef
{
/**
* Lookup table of valid values.
* @todo Make protected
*/
public $valid_values = array();
/**
* Bool indicating whether or not enumeration is case sensitive.
* @note In general this is always case insensitive.
*/
protected $case_sensitive = false; // values according to W3C spec
/**
* @param $valid_values List of valid values
* @param $case_sensitive Bool indicating whether or not case sensitive
*/
public function __construct(
$valid_values = array(), $case_sensitive = false
) {
$this->valid_values = array_flip($valid_values);
$this->case_sensitive = $case_sensitive;
}
public function validate($string, $config, $context) {
$string = trim($string);
if (!$this->case_sensitive) {
// we may want to do full case-insensitive libraries
$string = ctype_lower($string) ? $string : strtolower($string);
}
$result = isset($this->valid_values[$string]);
return $result ? $string : false;
}
/**
* @param $string In form of comma-delimited list of case-insensitive
* valid values. Example: "foo,bar,baz". Prepend "s:" to make
* case sensitive
*/
public function make($string) {
if (strlen($string) > 2 && $string[0] == 's' && $string[1] == ':') {
$string = substr($string, 2);
$sensitive = true;
} else {
$sensitive = false;
}
$values = explode(',', $string);
return new HTMLPurifier_AttrDef_Enum($values, $sensitive);
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,28 @@
<?php
/**
* Validates a boolean attribute
*/
class HTMLPurifier_AttrDef_HTML_Bool extends HTMLPurifier_AttrDef
{
protected $name;
public $minimized = true;
public function __construct($name = false) {$this->name = $name;}
public function validate($string, $config, $context) {
if (empty($string)) return false;
return $this->name;
}
/**
* @param $string Name of attribute
*/
public function make($string) {
return new HTMLPurifier_AttrDef_HTML_Bool($string);
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,32 @@
<?php
/**
* Validates a color according to the HTML spec.
*/
class HTMLPurifier_AttrDef_HTML_Color extends HTMLPurifier_AttrDef
{
public function validate($string, $config, $context) {
static $colors = null;
if ($colors === null) $colors = $config->get('Core', 'ColorKeywords');
$string = trim($string);
if (empty($string)) return false;
if (isset($colors[$string])) return $colors[$string];
if ($string[0] === '#') $hex = substr($string, 1);
else $hex = $string;
$length = strlen($hex);
if ($length !== 3 && $length !== 6) return false;
if (!ctype_xdigit($hex)) return false;
if ($length === 3) $hex = $hex[0].$hex[0].$hex[1].$hex[1].$hex[2].$hex[2];
return "#$hex";
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,21 @@
<?php
/**
* Special-case enum attribute definition that lazy loads allowed frame targets
*/
class HTMLPurifier_AttrDef_HTML_FrameTarget extends HTMLPurifier_AttrDef_Enum
{
public $valid_values = false; // uninitialized value
protected $case_sensitive = false;
public function __construct() {}
public function validate($string, $config, $context) {
if ($this->valid_values === false) $this->valid_values = $config->get('Attr', 'AllowedFrameTargets');
return parent::validate($string, $config, $context);
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,70 @@
<?php
/**
* Validates the HTML attribute ID.
* @warning Even though this is the id processor, it
* will ignore the directive Attr:IDBlacklist, since it will only
* go according to the ID accumulator. Since the accumulator is
* automatically generated, it will have already absorbed the
* blacklist. If you're hacking around, make sure you use load()!
*/
class HTMLPurifier_AttrDef_HTML_ID extends HTMLPurifier_AttrDef
{
// ref functionality disabled, since we also have to verify
// whether or not the ID it refers to exists
public function validate($id, $config, $context) {
if (!$config->get('Attr', 'EnableID')) return false;
$id = trim($id); // trim it first
if ($id === '') return false;
$prefix = $config->get('Attr', 'IDPrefix');
if ($prefix !== '') {
$prefix .= $config->get('Attr', 'IDPrefixLocal');
// prevent re-appending the prefix
if (strpos($id, $prefix) !== 0) $id = $prefix . $id;
} elseif ($config->get('Attr', 'IDPrefixLocal') !== '') {
trigger_error('%Attr.IDPrefixLocal cannot be used unless '.
'%Attr.IDPrefix is set', E_USER_WARNING);
}
//if (!$this->ref) {
$id_accumulator =& $context->get('IDAccumulator');
if (isset($id_accumulator->ids[$id])) return false;
//}
// we purposely avoid using regex, hopefully this is faster
if (ctype_alpha($id)) {
$result = true;
} else {
if (!ctype_alpha(@$id[0])) return false;
$trim = trim( // primitive style of regexps, I suppose
$id,
'A..Za..z0..9:-._'
);
$result = ($trim === '');
}
$regexp = $config->get('Attr', 'IDBlacklistRegexp');
if ($regexp && preg_match($regexp, $id)) {
return false;
}
if (/*!$this->ref && */$result) $id_accumulator->add($id);
// if no change was made to the ID, return the result
// else, return the new id if stripping whitespace made it
// valid, or return false.
return $result ? $id : false;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,41 @@
<?php
/**
* Validates the HTML type length (not to be confused with CSS's length).
*
* This accepts integer pixels or percentages as lengths for certain
* HTML attributes.
*/
class HTMLPurifier_AttrDef_HTML_Length extends HTMLPurifier_AttrDef_HTML_Pixels
{
public function validate($string, $config, $context) {
$string = trim($string);
if ($string === '') return false;
$parent_result = parent::validate($string, $config, $context);
if ($parent_result !== false) return $parent_result;
$length = strlen($string);
$last_char = $string[$length - 1];
if ($last_char !== '%') return false;
$points = substr($string, 0, $length - 1);
if (!is_numeric($points)) return false;
$points = (int) $points;
if ($points < 0) return '0%';
if ($points > 100) return '100%';
return ((string) $points) . '%';
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,53 @@
<?php
/**
* Validates a rel/rev link attribute against a directive of allowed values
* @note We cannot use Enum because link types allow multiple
* values.
* @note Assumes link types are ASCII text
*/
class HTMLPurifier_AttrDef_HTML_LinkTypes extends HTMLPurifier_AttrDef
{
/** Name config attribute to pull. */
protected $name;
public function __construct($name) {
$configLookup = array(
'rel' => 'AllowedRel',
'rev' => 'AllowedRev'
);
if (!isset($configLookup[$name])) {
trigger_error('Unrecognized attribute name for link '.
'relationship.', E_USER_ERROR);
return;
}
$this->name = $configLookup[$name];
}
public function validate($string, $config, $context) {
$allowed = $config->get('Attr', $this->name);
if (empty($allowed)) return false;
$string = $this->parseCDATA($string);
$parts = explode(' ', $string);
// lookup to prevent duplicates
$ret_lookup = array();
foreach ($parts as $part) {
$part = strtolower(trim($part));
if (!isset($allowed[$part])) continue;
$ret_lookup[$part] = true;
}
if (empty($ret_lookup)) return false;
$string = implode(' ', array_keys($ret_lookup));
return $string;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,41 @@
<?php
/**
* Validates a MultiLength as defined by the HTML spec.
*
* A multilength is either a integer (pixel count), a percentage, or
* a relative number.
*/
class HTMLPurifier_AttrDef_HTML_MultiLength extends HTMLPurifier_AttrDef_HTML_Length
{
public function validate($string, $config, $context) {
$string = trim($string);
if ($string === '') return false;
$parent_result = parent::validate($string, $config, $context);
if ($parent_result !== false) return $parent_result;
$length = strlen($string);
$last_char = $string[$length - 1];
if ($last_char !== '*') return false;
$int = substr($string, 0, $length - 1);
if ($int == '') return '*';
if (!is_numeric($int)) return false;
$int = (int) $int;
if ($int < 0) return false;
if ($int == 0) return '0';
if ($int == 1) return '*';
return ((string) $int) . '*';
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,48 @@
<?php
/**
* Validates contents based on NMTOKENS attribute type.
* @note The only current use for this is the class attribute in HTML
* @note Could have some functionality factored out into Nmtoken class
* @warning We cannot assume this class will be used only for 'class'
* attributes. Not sure how to hook in magic behavior, then.
*/
class HTMLPurifier_AttrDef_HTML_Nmtokens extends HTMLPurifier_AttrDef
{
public function validate($string, $config, $context) {
$string = trim($string);
// early abort: '' and '0' (strings that convert to false) are invalid
if (!$string) return false;
// OPTIMIZABLE!
// do the preg_match, capture all subpatterns for reformulation
// we don't support U+00A1 and up codepoints or
// escaping because I don't know how to do that with regexps
// and plus it would complicate optimization efforts (you never
// see that anyway).
$matches = array();
$pattern = '/(?:(?<=\s)|\A)'. // look behind for space or string start
'((?:--|-?[A-Za-z_])[A-Za-z_\-0-9]*)'.
'(?:(?=\s)|\z)/'; // look ahead for space or string end
preg_match_all($pattern, $string, $matches);
if (empty($matches[1])) return false;
// reconstruct string
$new_string = '';
foreach ($matches[1] as $token) {
$new_string .= $token . ' ';
}
$new_string = rtrim($new_string);
return $new_string;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,48 @@
<?php
/**
* Validates an integer representation of pixels according to the HTML spec.
*/
class HTMLPurifier_AttrDef_HTML_Pixels extends HTMLPurifier_AttrDef
{
protected $max;
public function __construct($max = null) {
$this->max = $max;
}
public function validate($string, $config, $context) {
$string = trim($string);
if ($string === '0') return $string;
if ($string === '') return false;
$length = strlen($string);
if (substr($string, $length - 2) == 'px') {
$string = substr($string, 0, $length - 2);
}
if (!is_numeric($string)) return false;
$int = (int) $string;
if ($int < 0) return '0';
// upper-bound value, extremely high values can
// crash operating systems, see <http://ha.ckers.org/imagecrash.html>
// WARNING, above link WILL crash you if you're using Windows
if ($this->max !== null && $int > $this->max) return (string) $this->max;
return (string) $int;
}
public function make($string) {
if ($string === '') $max = null;
else $max = (int) $string;
$class = get_class($this);
return new $class($max);
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,73 @@
<?php
/**
* Validates an integer.
* @note While this class was modeled off the CSS definition, no currently
* allowed CSS uses this type. The properties that do are: widows,
* orphans, z-index, counter-increment, counter-reset. Some of the
* HTML attributes, however, find use for a non-negative version of this.
*/
class HTMLPurifier_AttrDef_Integer extends HTMLPurifier_AttrDef
{
/**
* Bool indicating whether or not negative values are allowed
*/
protected $negative = true;
/**
* Bool indicating whether or not zero is allowed
*/
protected $zero = true;
/**
* Bool indicating whether or not positive values are allowed
*/
protected $positive = true;
/**
* @param $negative Bool indicating whether or not negative values are allowed
* @param $zero Bool indicating whether or not zero is allowed
* @param $positive Bool indicating whether or not positive values are allowed
*/
public function __construct(
$negative = true, $zero = true, $positive = true
) {
$this->negative = $negative;
$this->zero = $zero;
$this->positive = $positive;
}
public function validate($integer, $config, $context) {
$integer = $this->parseCDATA($integer);
if ($integer === '') return false;
// we could possibly simply typecast it to integer, but there are
// certain fringe cases that must not return an integer.
// clip leading sign
if ( $this->negative && $integer[0] === '-' ) {
$digits = substr($integer, 1);
if ($digits === '0') $integer = '0'; // rm minus sign for zero
} elseif( $this->positive && $integer[0] === '+' ) {
$digits = $integer = substr($integer, 1); // rm unnecessary plus
} else {
$digits = $integer;
}
// test if it's numeric
if (!ctype_digit($digits)) return false;
// perform scope tests
if (!$this->zero && $integer == 0) return false;
if (!$this->positive && $integer > 0) return false;
if (!$this->negative && $integer < 0) return false;
return $integer;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,73 @@
<?php
/**
* Validates the HTML attribute lang, effectively a language code.
* @note Built according to RFC 3066, which obsoleted RFC 1766
*/
class HTMLPurifier_AttrDef_Lang extends HTMLPurifier_AttrDef
{
public function validate($string, $config, $context) {
$string = trim($string);
if (!$string) return false;
$subtags = explode('-', $string);
$num_subtags = count($subtags);
if ($num_subtags == 0) return false; // sanity check
// process primary subtag : $subtags[0]
$length = strlen($subtags[0]);
switch ($length) {
case 0:
return false;
case 1:
if (! ($subtags[0] == 'x' || $subtags[0] == 'i') ) {
return false;
}
break;
case 2:
case 3:
if (! ctype_alpha($subtags[0]) ) {
return false;
} elseif (! ctype_lower($subtags[0]) ) {
$subtags[0] = strtolower($subtags[0]);
}
break;
default:
return false;
}
$new_string = $subtags[0];
if ($num_subtags == 1) return $new_string;
// process second subtag : $subtags[1]
$length = strlen($subtags[1]);
if ($length == 0 || ($length == 1 && $subtags[1] != 'x') || $length > 8 || !ctype_alnum($subtags[1])) {
return $new_string;
}
if (!ctype_lower($subtags[1])) $subtags[1] = strtolower($subtags[1]);
$new_string .= '-' . $subtags[1];
if ($num_subtags == 2) return $new_string;
// process all other subtags, index 2 and up
for ($i = 2; $i < $num_subtags; $i++) {
$length = strlen($subtags[$i]);
if ($length == 0 || $length > 8 || !ctype_alnum($subtags[$i])) {
return $new_string;
}
if (!ctype_lower($subtags[$i])) {
$subtags[$i] = strtolower($subtags[$i]);
}
$new_string .= '-' . $subtags[$i];
}
return $new_string;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,34 @@
<?php
/**
* Decorator that, depending on a token, switches between two definitions.
*/
class HTMLPurifier_AttrDef_Switch
{
protected $tag;
protected $withTag, $withoutTag;
/**
* @param string $tag Tag name to switch upon
* @param HTMLPurifier_AttrDef $with_tag Call if token matches tag
* @param HTMLPurifier_AttrDef $without_tag Call if token doesn't match, or there is no token
*/
public function __construct($tag, $with_tag, $without_tag) {
$this->tag = $tag;
$this->withTag = $with_tag;
$this->withoutTag = $without_tag;
}
public function validate($string, $config, $context) {
$token = $context->get('CurrentToken', true);
if (!$token || $token->name !== $this->tag) {
return $this->withoutTag->validate($string, $config, $context);
} else {
return $this->withTag->validate($string, $config, $context);
}
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,15 @@
<?php
/**
* Validates arbitrary text according to the HTML spec.
*/
class HTMLPurifier_AttrDef_Text extends HTMLPurifier_AttrDef
{
public function validate($string, $config, $context) {
return $this->parseCDATA($string);
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,77 @@
<?php
/**
* Validates a URI as defined by RFC 3986.
* @note Scheme-specific mechanics deferred to HTMLPurifier_URIScheme
*/
class HTMLPurifier_AttrDef_URI extends HTMLPurifier_AttrDef
{
protected $parser;
protected $embedsResource;
/**
* @param $embeds_resource_resource Does the URI here result in an extra HTTP request?
*/
public function __construct($embeds_resource = false) {
$this->parser = new HTMLPurifier_URIParser();
$this->embedsResource = (bool) $embeds_resource;
}
public function make($string) {
$embeds = (bool) $string;
return new HTMLPurifier_AttrDef_URI($embeds);
}
public function validate($uri, $config, $context) {
if ($config->get('URI', 'Disable')) return false;
$uri = $this->parseCDATA($uri);
// parse the URI
$uri = $this->parser->parse($uri);
if ($uri === false) return false;
// add embedded flag to context for validators
$context->register('EmbeddedURI', $this->embedsResource);
$ok = false;
do {
// generic validation
$result = $uri->validate($config, $context);
if (!$result) break;
// chained filtering
$uri_def = $config->getDefinition('URI');
$result = $uri_def->filter($uri, $config, $context);
if (!$result) break;
// scheme-specific validation
$scheme_obj = $uri->getSchemeObj($config, $context);
if (!$scheme_obj) break;
if ($this->embedsResource && !$scheme_obj->browsable) break;
$result = $scheme_obj->validate($uri, $config, $context);
if (!$result) break;
// Post chained filtering
$result = $uri_def->postFilter($uri, $config, $context);
if (!$result) break;
// survived gauntlet
$ok = true;
} while (false);
$context->destroy('EmbeddedURI');
if (!$ok) return false;
// back to string
return $uri->toString();
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,17 @@
<?php
abstract class HTMLPurifier_AttrDef_URI_Email extends HTMLPurifier_AttrDef
{
/**
* Unpacks a mailbox into its display-name and address
*/
function unpack($string) {
// needs to be implemented
}
}
// sub-implementations
// vim: et sw=4 sts=4

View File

@ -0,0 +1,21 @@
<?php
/**
* Primitive email validation class based on the regexp found at
* http://www.regular-expressions.info/email.html
*/
class HTMLPurifier_AttrDef_URI_Email_SimpleCheck extends HTMLPurifier_AttrDef_URI_Email
{
public function validate($string, $config, $context) {
// no support for named mailboxes i.e. "Bob <bob@example.com>"
// that needs more percent encoding to be done
if ($string == '') return false;
$string = trim($string);
$result = preg_match('/^[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$/i', $string);
return $result ? $string : false;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,62 @@
<?php
/**
* Validates a host according to the IPv4, IPv6 and DNS (future) specifications.
*/
class HTMLPurifier_AttrDef_URI_Host extends HTMLPurifier_AttrDef
{
/**
* Instance of HTMLPurifier_AttrDef_URI_IPv4 sub-validator
*/
protected $ipv4;
/**
* Instance of HTMLPurifier_AttrDef_URI_IPv6 sub-validator
*/
protected $ipv6;
public function __construct() {
$this->ipv4 = new HTMLPurifier_AttrDef_URI_IPv4();
$this->ipv6 = new HTMLPurifier_AttrDef_URI_IPv6();
}
public function validate($string, $config, $context) {
$length = strlen($string);
if ($string === '') return '';
if ($length > 1 && $string[0] === '[' && $string[$length-1] === ']') {
//IPv6
$ip = substr($string, 1, $length - 2);
$valid = $this->ipv6->validate($ip, $config, $context);
if ($valid === false) return false;
return '['. $valid . ']';
}
// need to do checks on unusual encodings too
$ipv4 = $this->ipv4->validate($string, $config, $context);
if ($ipv4 !== false) return $ipv4;
// A regular domain name.
// This breaks I18N domain names, but we don't have proper IRI support,
// so force users to insert Punycode. If there's complaining we'll
// try to fix things into an international friendly form.
// The productions describing this are:
$a = '[a-z]'; // alpha
$an = '[a-z0-9]'; // alphanum
$and = '[a-z0-9-]'; // alphanum | "-"
// domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
$domainlabel = "$an($and*$an)?";
// toplabel = alpha | alpha *( alphanum | "-" ) alphanum
$toplabel = "$a($and*$an)?";
// hostname = *( domainlabel "." ) toplabel [ "." ]
$match = preg_match("/^($domainlabel\.)*$toplabel\.?$/i", $string);
if (!$match) return false;
return $string;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,39 @@
<?php
/**
* Validates an IPv4 address
* @author Feyd @ forums.devnetwork.net (public domain)
*/
class HTMLPurifier_AttrDef_URI_IPv4 extends HTMLPurifier_AttrDef
{
/**
* IPv4 regex, protected so that IPv6 can reuse it
*/
protected $ip4;
public function validate($aIP, $config, $context) {
if (!$this->ip4) $this->_loadRegex();
if (preg_match('#^' . $this->ip4 . '$#s', $aIP))
{
return $aIP;
}
return false;
}
/**
* Lazy load function to prevent regex from being stuffed in
* cache.
*/
protected function _loadRegex() {
$oct = '(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])'; // 0-255
$this->ip4 = "(?:{$oct}\\.{$oct}\\.{$oct}\\.{$oct})";
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,99 @@
<?php
/**
* Validates an IPv6 address.
* @author Feyd @ forums.devnetwork.net (public domain)
* @note This function requires brackets to have been removed from address
* in URI.
*/
class HTMLPurifier_AttrDef_URI_IPv6 extends HTMLPurifier_AttrDef_URI_IPv4
{
public function validate($aIP, $config, $context) {
if (!$this->ip4) $this->_loadRegex();
$original = $aIP;
$hex = '[0-9a-fA-F]';
$blk = '(?:' . $hex . '{1,4})';
$pre = '(?:/(?:12[0-8]|1[0-1][0-9]|[1-9][0-9]|[0-9]))'; // /0 - /128
// prefix check
if (strpos($aIP, '/') !== false)
{
if (preg_match('#' . $pre . '$#s', $aIP, $find))
{
$aIP = substr($aIP, 0, 0-strlen($find[0]));
unset($find);
}
else
{
return false;
}
}
// IPv4-compatiblity check
if (preg_match('#(?<=:'.')' . $this->ip4 . '$#s', $aIP, $find))
{
$aIP = substr($aIP, 0, 0-strlen($find[0]));
$ip = explode('.', $find[0]);
$ip = array_map('dechex', $ip);
$aIP .= $ip[0] . $ip[1] . ':' . $ip[2] . $ip[3];
unset($find, $ip);
}
// compression check
$aIP = explode('::', $aIP);
$c = count($aIP);
if ($c > 2)
{
return false;
}
elseif ($c == 2)
{
list($first, $second) = $aIP;
$first = explode(':', $first);
$second = explode(':', $second);
if (count($first) + count($second) > 8)
{
return false;
}
while(count($first) < 8)
{
array_push($first, '0');
}
array_splice($first, 8 - count($second), 8, $second);
$aIP = $first;
unset($first,$second);
}
else
{
$aIP = explode(':', $aIP[0]);
}
$c = count($aIP);
if ($c != 8)
{
return false;
}
// All the pieces should be 16-bit hex strings. Are they?
foreach ($aIP as $piece)
{
if (!preg_match('#^[0-9a-fA-F]{4}$#s', sprintf('%04s', $piece)))
{
return false;
}
}
return $original;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,56 @@
<?php
/**
* Processes an entire attribute array for corrections needing multiple values.
*
* Occasionally, a certain attribute will need to be removed and popped onto
* another value. Instead of creating a complex return syntax for
* HTMLPurifier_AttrDef, we just pass the whole attribute array to a
* specialized object and have that do the special work. That is the
* family of HTMLPurifier_AttrTransform.
*
* An attribute transformation can be assigned to run before or after
* HTMLPurifier_AttrDef validation. See HTMLPurifier_HTMLDefinition for
* more details.
*/
abstract class HTMLPurifier_AttrTransform
{
/**
* Abstract: makes changes to the attributes dependent on multiple values.
*
* @param $attr Assoc array of attributes, usually from
* HTMLPurifier_Token_Tag::$attr
* @param $config Mandatory HTMLPurifier_Config object.
* @param $context Mandatory HTMLPurifier_Context object
* @returns Processed attribute array.
*/
abstract public function transform($attr, $config, $context);
/**
* Prepends CSS properties to the style attribute, creating the
* attribute if it doesn't exist.
* @param $attr Attribute array to process (passed by reference)
* @param $css CSS to prepend
*/
public function prependCSS(&$attr, $css) {
$attr['style'] = isset($attr['style']) ? $attr['style'] : '';
$attr['style'] = $css . $attr['style'];
}
/**
* Retrieves and removes an attribute
* @param $attr Attribute array to process (passed by reference)
* @param $key Key of attribute to confiscate
*/
public function confiscateAttr(&$attr, $key) {
if (!isset($attr[$key])) return null;
$value = $attr[$key];
unset($attr[$key]);
return $value;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,23 @@
<?php
/**
* Pre-transform that changes proprietary background attribute to CSS.
*/
class HTMLPurifier_AttrTransform_Background extends HTMLPurifier_AttrTransform {
public function transform($attr, $config, $context) {
if (!isset($attr['background'])) return $attr;
$background = $this->confiscateAttr($attr, 'background');
// some validation should happen here
$this->prependCSS($attr, "background-image:url($background);");
return $attr;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,19 @@
<?php
// this MUST be placed in post, as it assumes that any value in dir is valid
/**
* Post-trasnform that ensures that bdo tags have the dir attribute set.
*/
class HTMLPurifier_AttrTransform_BdoDir extends HTMLPurifier_AttrTransform
{
public function transform($attr, $config, $context) {
if (isset($attr['dir'])) return $attr;
$attr['dir'] = $config->get('Attr', 'DefaultTextDir');
return $attr;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,23 @@
<?php
/**
* Pre-transform that changes deprecated bgcolor attribute to CSS.
*/
class HTMLPurifier_AttrTransform_BgColor extends HTMLPurifier_AttrTransform {
public function transform($attr, $config, $context) {
if (!isset($attr['bgcolor'])) return $attr;
$bgcolor = $this->confiscateAttr($attr, 'bgcolor');
// some validation should happen here
$this->prependCSS($attr, "background-color:$bgcolor;");
return $attr;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,36 @@
<?php
/**
* Pre-transform that changes converts a boolean attribute to fixed CSS
*/
class HTMLPurifier_AttrTransform_BoolToCSS extends HTMLPurifier_AttrTransform {
/**
* Name of boolean attribute that is trigger
*/
protected $attr;
/**
* CSS declarations to add to style, needs trailing semicolon
*/
protected $css;
/**
* @param $attr string attribute name to convert from
* @param $css string CSS declarations to add to style (needs semicolon)
*/
public function __construct($attr, $css) {
$this->attr = $attr;
$this->css = $css;
}
public function transform($attr, $config, $context) {
if (!isset($attr[$this->attr])) return $attr;
unset($attr[$this->attr]);
$this->prependCSS($attr, $this->css);
return $attr;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,18 @@
<?php
/**
* Pre-transform that changes deprecated border attribute to CSS.
*/
class HTMLPurifier_AttrTransform_Border extends HTMLPurifier_AttrTransform {
public function transform($attr, $config, $context) {
if (!isset($attr['border'])) return $attr;
$border_width = $this->confiscateAttr($attr, 'border');
// some validation should happen here
$this->prependCSS($attr, "border:{$border_width}px solid;");
return $attr;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,58 @@
<?php
/**
* Generic pre-transform that converts an attribute with a fixed number of
* values (enumerated) to CSS.
*/
class HTMLPurifier_AttrTransform_EnumToCSS extends HTMLPurifier_AttrTransform {
/**
* Name of attribute to transform from
*/
protected $attr;
/**
* Lookup array of attribute values to CSS
*/
protected $enumToCSS = array();
/**
* Case sensitivity of the matching
* @warning Currently can only be guaranteed to work with ASCII
* values.
*/
protected $caseSensitive = false;
/**
* @param $attr String attribute name to transform from
* @param $enumToCSS Lookup array of attribute values to CSS
* @param $case_sensitive Boolean case sensitivity indicator, default false
*/
public function __construct($attr, $enum_to_css, $case_sensitive = false) {
$this->attr = $attr;
$this->enumToCSS = $enum_to_css;
$this->caseSensitive = (bool) $case_sensitive;
}
public function transform($attr, $config, $context) {
if (!isset($attr[$this->attr])) return $attr;
$value = trim($attr[$this->attr]);
unset($attr[$this->attr]);
if (!$this->caseSensitive) $value = strtolower($value);
if (!isset($this->enumToCSS[$value])) {
return $attr;
}
$this->prependCSS($attr, $this->enumToCSS[$value]);
return $attr;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,42 @@
<?php
// must be called POST validation
/**
* Transform that supplies default values for the src and alt attributes
* in img tags, as well as prevents the img tag from being removed
* because of a missing alt tag. This needs to be registered as both
* a pre and post attribute transform.
*/
class HTMLPurifier_AttrTransform_ImgRequired extends HTMLPurifier_AttrTransform
{
public function transform($attr, $config, $context) {
$src = true;
if (!isset($attr['src'])) {
if ($config->get('Core', 'RemoveInvalidImg')) return $attr;
$attr['src'] = $config->get('Attr', 'DefaultInvalidImage');
$src = false;
}
if (!isset($attr['alt'])) {
if ($src) {
$alt = $config->get('Attr', 'DefaultImageAlt');
if ($alt === null) {
$attr['alt'] = basename($attr['src']);
} else {
$attr['alt'] = $alt;
}
} else {
$attr['alt'] = $config->get('Attr', 'DefaultInvalidImageAlt');
}
}
return $attr;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,44 @@
<?php
/**
* Pre-transform that changes deprecated hspace and vspace attributes to CSS
*/
class HTMLPurifier_AttrTransform_ImgSpace extends HTMLPurifier_AttrTransform {
protected $attr;
protected $css = array(
'hspace' => array('left', 'right'),
'vspace' => array('top', 'bottom')
);
public function __construct($attr) {
$this->attr = $attr;
if (!isset($this->css[$attr])) {
trigger_error(htmlspecialchars($attr) . ' is not valid space attribute');
}
}
public function transform($attr, $config, $context) {
if (!isset($attr[$this->attr])) return $attr;
$width = $this->confiscateAttr($attr, $this->attr);
// some validation could happen here
if (!isset($this->css[$this->attr])) return $attr;
$style = '';
foreach ($this->css[$this->attr] as $suffix) {
$property = "margin-$suffix";
$style .= "$property:{$width}px;";
}
$this->prependCSS($attr, $style);
return $attr;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,40 @@
<?php
/**
* Performs miscellaneous cross attribute validation and filtering for
* input elements. This is meant to be a post-transform.
*/
class HTMLPurifier_AttrTransform_Input extends HTMLPurifier_AttrTransform {
protected $pixels;
public function __construct() {
$this->pixels = new HTMLPurifier_AttrDef_HTML_Pixels();
}
public function transform($attr, $config, $context) {
if (!isset($attr['type'])) $t = 'text';
else $t = strtolower($attr['type']);
if (isset($attr['checked']) && $t !== 'radio' && $t !== 'checkbox') {
unset($attr['checked']);
}
if (isset($attr['maxlength']) && $t !== 'text' && $t !== 'password') {
unset($attr['maxlength']);
}
if (isset($attr['size']) && $t !== 'text' && $t !== 'password') {
$result = $this->pixels->validate($attr['size'], $config, $context);
if ($result === false) unset($attr['size']);
else $attr['size'] = $result;
}
if (isset($attr['src']) && $t !== 'image') {
unset($attr['src']);
}
if (!isset($attr['value']) && ($t === 'radio' || $t === 'checkbox')) {
$attr['value'] = '';
}
return $attr;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,28 @@
<?php
/**
* Post-transform that copies lang's value to xml:lang (and vice-versa)
* @note Theoretically speaking, this could be a pre-transform, but putting
* post is more efficient.
*/
class HTMLPurifier_AttrTransform_Lang extends HTMLPurifier_AttrTransform
{
public function transform($attr, $config, $context) {
$lang = isset($attr['lang']) ? $attr['lang'] : false;
$xml_lang = isset($attr['xml:lang']) ? $attr['xml:lang'] : false;
if ($lang !== false && $xml_lang === false) {
$attr['xml:lang'] = $lang;
} elseif ($xml_lang !== false) {
$attr['lang'] = $xml_lang;
}
return $attr;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,27 @@
<?php
/**
* Class for handling width/height length attribute transformations to CSS
*/
class HTMLPurifier_AttrTransform_Length extends HTMLPurifier_AttrTransform
{
protected $name;
protected $cssName;
public function __construct($name, $css_name = null) {
$this->name = $name;
$this->cssName = $css_name ? $css_name : $name;
}
public function transform($attr, $config, $context) {
if (!isset($attr[$this->name])) return $attr;
$length = $this->confiscateAttr($attr, $this->name);
if(ctype_digit($length)) $length .= 'px';
$this->prependCSS($attr, $this->cssName . ":$length;");
return $attr;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,19 @@
<?php
/**
* Pre-transform that changes deprecated name attribute to ID if necessary
*/
class HTMLPurifier_AttrTransform_Name extends HTMLPurifier_AttrTransform
{
public function transform($attr, $config, $context) {
if (!isset($attr['name'])) return $attr;
$id = $this->confiscateAttr($attr, 'name');
if ( isset($attr['id'])) return $attr;
$attr['id'] = $id;
return $attr;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,15 @@
<?php
class HTMLPurifier_AttrTransform_SafeEmbed extends HTMLPurifier_AttrTransform
{
public $name = "SafeEmbed";
public function transform($attr, $config, $context) {
$attr['allowscriptaccess'] = 'never';
$attr['allownetworking'] = 'internal';
$attr['type'] = 'application/x-shockwave-flash';
return $attr;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,16 @@
<?php
/**
* Writes default type for all objects. Currently only supports flash.
*/
class HTMLPurifier_AttrTransform_SafeObject extends HTMLPurifier_AttrTransform
{
public $name = "SafeObject";
function transform($attr, $config, $context) {
if (!isset($attr['type'])) $attr['type'] = 'application/x-shockwave-flash';
return $attr;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,50 @@
<?php
/**
* Validates name/value pairs in param tags to be used in safe objects. This
* will only allow name values it recognizes, and pre-fill certain attributes
* with required values.
*
* @note
* This class only supports Flash. In the future, Quicktime support
* may be added.
*
* @warning
* This class expects an injector to add the necessary parameters tags.
*/
class HTMLPurifier_AttrTransform_SafeParam extends HTMLPurifier_AttrTransform
{
public $name = "SafeParam";
private $uri;
public function __construct() {
$this->uri = new HTMLPurifier_AttrDef_URI(true); // embedded
}
public function transform($attr, $config, $context) {
// If we add support for other objects, we'll need to alter the
// transforms.
switch ($attr['name']) {
// application/x-shockwave-flash
// Keep this synchronized with Injector/SafeObject.php
case 'allowScriptAccess':
$attr['value'] = 'never';
break;
case 'allowNetworking':
$attr['value'] = 'internal';
break;
case 'wmode':
$attr['value'] = 'window';
break;
case 'movie':
$attr['value'] = $this->uri->validate($attr['value'], $config, $context);
break;
// add other cases to support other param name/value pairs
default:
$attr['name'] = $attr['value'] = null;
}
return $attr;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,16 @@
<?php
/**
* Implements required attribute stipulation for <script>
*/
class HTMLPurifier_AttrTransform_ScriptRequired extends HTMLPurifier_AttrTransform
{
public function transform($attr, $config, $context) {
if (!isset($attr['type'])) {
$attr['type'] = 'text/javascript';
}
return $attr;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,18 @@
<?php
/**
* Sets height/width defaults for <textarea>
*/
class HTMLPurifier_AttrTransform_Textarea extends HTMLPurifier_AttrTransform
{
public function transform($attr, $config, $context) {
// Calculated from Firefox
if (!isset($attr['cols'])) $attr['cols'] = '22';
if (!isset($attr['rows'])) $attr['rows'] = '3';
return $attr;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,74 @@
<?php
/**
* Provides lookup array of attribute types to HTMLPurifier_AttrDef objects
*/
class HTMLPurifier_AttrTypes
{
/**
* Lookup array of attribute string identifiers to concrete implementations
*/
protected $info = array();
/**
* Constructs the info array, supplying default implementations for attribute
* types.
*/
public function __construct() {
// pseudo-types, must be instantiated via shorthand
$this->info['Enum'] = new HTMLPurifier_AttrDef_Enum();
$this->info['Bool'] = new HTMLPurifier_AttrDef_HTML_Bool();
$this->info['CDATA'] = new HTMLPurifier_AttrDef_Text();
$this->info['ID'] = new HTMLPurifier_AttrDef_HTML_ID();
$this->info['Length'] = new HTMLPurifier_AttrDef_HTML_Length();
$this->info['MultiLength'] = new HTMLPurifier_AttrDef_HTML_MultiLength();
$this->info['NMTOKENS'] = new HTMLPurifier_AttrDef_HTML_Nmtokens();
$this->info['Pixels'] = new HTMLPurifier_AttrDef_HTML_Pixels();
$this->info['Text'] = new HTMLPurifier_AttrDef_Text();
$this->info['URI'] = new HTMLPurifier_AttrDef_URI();
$this->info['LanguageCode'] = new HTMLPurifier_AttrDef_Lang();
$this->info['Color'] = new HTMLPurifier_AttrDef_HTML_Color();
// unimplemented aliases
$this->info['ContentType'] = new HTMLPurifier_AttrDef_Text();
$this->info['ContentTypes'] = new HTMLPurifier_AttrDef_Text();
$this->info['Charsets'] = new HTMLPurifier_AttrDef_Text();
$this->info['Character'] = new HTMLPurifier_AttrDef_Text();
// number is really a positive integer (one or more digits)
// FIXME: ^^ not always, see start and value of list items
$this->info['Number'] = new HTMLPurifier_AttrDef_Integer(false, false, true);
}
/**
* Retrieves a type
* @param $type String type name
* @return Object AttrDef for type
*/
public function get($type) {
// determine if there is any extra info tacked on
if (strpos($type, '#') !== false) list($type, $string) = explode('#', $type, 2);
else $string = '';
if (!isset($this->info[$type])) {
trigger_error('Cannot retrieve undefined attribute type ' . $type, E_USER_ERROR);
return;
}
return $this->info[$type]->make($string);
}
/**
* Sets a new implementation for a type
* @param $type String type name
* @param $impl Object AttrDef for type
*/
public function set($type, $impl) {
$this->info[$type] = $impl;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,162 @@
<?php
/**
* Validates the attributes of a token. Doesn't manage required attributes
* very well. The only reason we factored this out was because RemoveForeignElements
* also needed it besides ValidateAttributes.
*/
class HTMLPurifier_AttrValidator
{
/**
* Validates the attributes of a token, returning a modified token
* that has valid tokens
* @param $token Reference to token to validate. We require a reference
* because the operation this class performs on the token are
* not atomic, so the context CurrentToken to be updated
* throughout
* @param $config Instance of HTMLPurifier_Config
* @param $context Instance of HTMLPurifier_Context
*/
public function validateToken(&$token, &$config, $context) {
$definition = $config->getHTMLDefinition();
$e =& $context->get('ErrorCollector', true);
// initialize IDAccumulator if necessary
$ok =& $context->get('IDAccumulator', true);
if (!$ok) {
$id_accumulator = HTMLPurifier_IDAccumulator::build($config, $context);
$context->register('IDAccumulator', $id_accumulator);
}
// initialize CurrentToken if necessary
$current_token =& $context->get('CurrentToken', true);
if (!$current_token) $context->register('CurrentToken', $token);
if (
!$token instanceof HTMLPurifier_Token_Start &&
!$token instanceof HTMLPurifier_Token_Empty
) return $token;
// create alias to global definition array, see also $defs
// DEFINITION CALL
$d_defs = $definition->info_global_attr;
// don't update token until the very end, to ensure an atomic update
$attr = $token->attr;
// do global transformations (pre)
// nothing currently utilizes this
foreach ($definition->info_attr_transform_pre as $transform) {
$attr = $transform->transform($o = $attr, $config, $context);
if ($e) {
if ($attr != $o) $e->send(E_NOTICE, 'AttrValidator: Attributes transformed', $o, $attr);
}
}
// do local transformations only applicable to this element (pre)
// ex. <p align="right"> to <p style="text-align:right;">
foreach ($definition->info[$token->name]->attr_transform_pre as $transform) {
$attr = $transform->transform($o = $attr, $config, $context);
if ($e) {
if ($attr != $o) $e->send(E_NOTICE, 'AttrValidator: Attributes transformed', $o, $attr);
}
}
// create alias to this element's attribute definition array, see
// also $d_defs (global attribute definition array)
// DEFINITION CALL
$defs = $definition->info[$token->name]->attr;
$attr_key = false;
$context->register('CurrentAttr', $attr_key);
// iterate through all the attribute keypairs
// Watch out for name collisions: $key has previously been used
foreach ($attr as $attr_key => $value) {
// call the definition
if ( isset($defs[$attr_key]) ) {
// there is a local definition defined
if ($defs[$attr_key] === false) {
// We've explicitly been told not to allow this element.
// This is usually when there's a global definition
// that must be overridden.
// Theoretically speaking, we could have a
// AttrDef_DenyAll, but this is faster!
$result = false;
} else {
// validate according to the element's definition
$result = $defs[$attr_key]->validate(
$value, $config, $context
);
}
} elseif ( isset($d_defs[$attr_key]) ) {
// there is a global definition defined, validate according
// to the global definition
$result = $d_defs[$attr_key]->validate(
$value, $config, $context
);
} else {
// system never heard of the attribute? DELETE!
$result = false;
}
// put the results into effect
if ($result === false || $result === null) {
// this is a generic error message that should replaced
// with more specific ones when possible
if ($e) $e->send(E_ERROR, 'AttrValidator: Attribute removed');
// remove the attribute
unset($attr[$attr_key]);
} elseif (is_string($result)) {
// generally, if a substitution is happening, there
// was some sort of implicit correction going on. We'll
// delegate it to the attribute classes to say exactly what.
// simple substitution
$attr[$attr_key] = $result;
} else {
// nothing happens
}
// we'd also want slightly more complicated substitution
// involving an array as the return value,
// although we're not sure how colliding attributes would
// resolve (certain ones would be completely overriden,
// others would prepend themselves).
}
$context->destroy('CurrentAttr');
// post transforms
// global (error reporting untested)
foreach ($definition->info_attr_transform_post as $transform) {
$attr = $transform->transform($o = $attr, $config, $context);
if ($e) {
if ($attr != $o) $e->send(E_NOTICE, 'AttrValidator: Attributes transformed', $o, $attr);
}
}
// local (error reporting untested)
foreach ($definition->info[$token->name]->attr_transform_post as $transform) {
$attr = $transform->transform($o = $attr, $config, $context);
if ($e) {
if ($attr != $o) $e->send(E_NOTICE, 'AttrValidator: Attributes transformed', $o, $attr);
}
}
$token->attr = $attr;
// destroy CurrentToken if we made it ourselves
if (!$current_token) $context->destroy('CurrentToken');
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,98 @@
<?php
// constants are slow, so we use as few as possible
if (!defined('HTMLPURIFIER_PREFIX')) {
define('HTMLPURIFIER_PREFIX', realpath(dirname(__FILE__) . '/..'));
}
// accomodations for versions earlier than 5.0.2
// borrowed from PHP_Compat, LGPL licensed, by Aidan Lister <aidan@php.net>
if (!defined('PHP_EOL')) {
switch (strtoupper(substr(PHP_OS, 0, 3))) {
case 'WIN':
define('PHP_EOL', "\r\n");
break;
case 'DAR':
define('PHP_EOL', "\r");
break;
default:
define('PHP_EOL', "\n");
}
}
/**
* Bootstrap class that contains meta-functionality for HTML Purifier such as
* the autoload function.
*
* @note
* This class may be used without any other files from HTML Purifier.
*/
class HTMLPurifier_Bootstrap
{
/**
* Autoload function for HTML Purifier
* @param $class Class to load
*/
public static function autoload($class) {
$file = HTMLPurifier_Bootstrap::getPath($class);
if (!$file) return false;
require HTMLPURIFIER_PREFIX . '/' . $file;
return true;
}
/**
* Returns the path for a specific class.
*/
public static function getPath($class) {
if (strncmp('HTMLPurifier', $class, 12) !== 0) return false;
// Custom implementations
if (strncmp('HTMLPurifier_Language_', $class, 22) === 0) {
$code = str_replace('_', '-', substr($class, 22));
$file = 'HTMLPurifier/Language/classes/' . $code . '.php';
} else {
$file = str_replace('_', '/', $class) . '.php';
}
if (!file_exists(HTMLPURIFIER_PREFIX . '/' . $file)) return false;
return $file;
}
/**
* "Pre-registers" our autoloader on the SPL stack.
*/
public static function registerAutoload() {
$autoload = array('HTMLPurifier_Bootstrap', 'autoload');
if ( ($funcs = spl_autoload_functions()) === false ) {
spl_autoload_register($autoload);
} elseif (function_exists('spl_autoload_unregister')) {
$compat = version_compare(PHP_VERSION, '5.1.2', '<=') &&
version_compare(PHP_VERSION, '5.1.0', '>=');
foreach ($funcs as $func) {
if (is_array($func)) {
// :TRICKY: There are some compatibility issues and some
// places where we need to error out
$reflector = new ReflectionMethod($func[0], $func[1]);
if (!$reflector->isStatic()) {
throw new Exception('
HTML Purifier autoloader registrar is not compatible
with non-static object methods due to PHP Bug #44144;
Please do not use HTMLPurifier.autoload.php (or any
file that includes this file); instead, place the code:
spl_autoload_register(array(\'HTMLPurifier_Bootstrap\', \'autoload\'))
after your own autoloaders.
');
}
// Suprisingly, spl_autoload_register supports the
// Class::staticMethod callback format, although call_user_func doesn't
if ($compat) $func = implode('::', $func);
}
spl_autoload_unregister($func);
}
spl_autoload_register($autoload);
foreach ($funcs as $func) spl_autoload_register($func);
}
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,292 @@
<?php
/**
* Defines allowed CSS attributes and what their values are.
* @see HTMLPurifier_HTMLDefinition
*/
class HTMLPurifier_CSSDefinition extends HTMLPurifier_Definition
{
public $type = 'CSS';
/**
* Assoc array of attribute name to definition object.
*/
public $info = array();
/**
* Constructs the info array. The meat of this class.
*/
protected function doSetup($config) {
$this->info['text-align'] = new HTMLPurifier_AttrDef_Enum(
array('left', 'right', 'center', 'justify'), false);
$border_style =
$this->info['border-bottom-style'] =
$this->info['border-right-style'] =
$this->info['border-left-style'] =
$this->info['border-top-style'] = new HTMLPurifier_AttrDef_Enum(
array('none', 'hidden', 'dotted', 'dashed', 'solid', 'double',
'groove', 'ridge', 'inset', 'outset'), false);
$this->info['border-style'] = new HTMLPurifier_AttrDef_CSS_Multiple($border_style);
$this->info['clear'] = new HTMLPurifier_AttrDef_Enum(
array('none', 'left', 'right', 'both'), false);
$this->info['float'] = new HTMLPurifier_AttrDef_Enum(
array('none', 'left', 'right'), false);
$this->info['font-style'] = new HTMLPurifier_AttrDef_Enum(
array('normal', 'italic', 'oblique'), false);
$this->info['font-variant'] = new HTMLPurifier_AttrDef_Enum(
array('normal', 'small-caps'), false);
$uri_or_none = new HTMLPurifier_AttrDef_CSS_Composite(
array(
new HTMLPurifier_AttrDef_Enum(array('none')),
new HTMLPurifier_AttrDef_CSS_URI()
)
);
$this->info['list-style-position'] = new HTMLPurifier_AttrDef_Enum(
array('inside', 'outside'), false);
$this->info['list-style-type'] = new HTMLPurifier_AttrDef_Enum(
array('disc', 'circle', 'square', 'decimal', 'lower-roman',
'upper-roman', 'lower-alpha', 'upper-alpha', 'none'), false);
$this->info['list-style-image'] = $uri_or_none;
$this->info['list-style'] = new HTMLPurifier_AttrDef_CSS_ListStyle($config);
$this->info['text-transform'] = new HTMLPurifier_AttrDef_Enum(
array('capitalize', 'uppercase', 'lowercase', 'none'), false);
$this->info['color'] = new HTMLPurifier_AttrDef_CSS_Color();
$this->info['background-image'] = $uri_or_none;
$this->info['background-repeat'] = new HTMLPurifier_AttrDef_Enum(
array('repeat', 'repeat-x', 'repeat-y', 'no-repeat')
);
$this->info['background-attachment'] = new HTMLPurifier_AttrDef_Enum(
array('scroll', 'fixed')
);
$this->info['background-position'] = new HTMLPurifier_AttrDef_CSS_BackgroundPosition();
$border_color =
$this->info['border-top-color'] =
$this->info['border-bottom-color'] =
$this->info['border-left-color'] =
$this->info['border-right-color'] =
$this->info['background-color'] = new HTMLPurifier_AttrDef_CSS_Composite(array(
new HTMLPurifier_AttrDef_Enum(array('transparent')),
new HTMLPurifier_AttrDef_CSS_Color()
));
$this->info['background'] = new HTMLPurifier_AttrDef_CSS_Background($config);
$this->info['border-color'] = new HTMLPurifier_AttrDef_CSS_Multiple($border_color);
$border_width =
$this->info['border-top-width'] =
$this->info['border-bottom-width'] =
$this->info['border-left-width'] =
$this->info['border-right-width'] = new HTMLPurifier_AttrDef_CSS_Composite(array(
new HTMLPurifier_AttrDef_Enum(array('thin', 'medium', 'thick')),
new HTMLPurifier_AttrDef_CSS_Length('0') //disallow negative
));
$this->info['border-width'] = new HTMLPurifier_AttrDef_CSS_Multiple($border_width);
$this->info['letter-spacing'] = new HTMLPurifier_AttrDef_CSS_Composite(array(
new HTMLPurifier_AttrDef_Enum(array('normal')),
new HTMLPurifier_AttrDef_CSS_Length()
));
$this->info['word-spacing'] = new HTMLPurifier_AttrDef_CSS_Composite(array(
new HTMLPurifier_AttrDef_Enum(array('normal')),
new HTMLPurifier_AttrDef_CSS_Length()
));
$this->info['font-size'] = new HTMLPurifier_AttrDef_CSS_Composite(array(
new HTMLPurifier_AttrDef_Enum(array('xx-small', 'x-small',
'small', 'medium', 'large', 'x-large', 'xx-large',
'larger', 'smaller')),
new HTMLPurifier_AttrDef_CSS_Percentage(),
new HTMLPurifier_AttrDef_CSS_Length()
));
$this->info['line-height'] = new HTMLPurifier_AttrDef_CSS_Composite(array(
new HTMLPurifier_AttrDef_Enum(array('normal')),
new HTMLPurifier_AttrDef_CSS_Number(true), // no negatives
new HTMLPurifier_AttrDef_CSS_Length('0'),
new HTMLPurifier_AttrDef_CSS_Percentage(true)
));
$margin =
$this->info['margin-top'] =
$this->info['margin-bottom'] =
$this->info['margin-left'] =
$this->info['margin-right'] = new HTMLPurifier_AttrDef_CSS_Composite(array(
new HTMLPurifier_AttrDef_CSS_Length(),
new HTMLPurifier_AttrDef_CSS_Percentage(),
new HTMLPurifier_AttrDef_Enum(array('auto'))
));
$this->info['margin'] = new HTMLPurifier_AttrDef_CSS_Multiple($margin);
// non-negative
$padding =
$this->info['padding-top'] =
$this->info['padding-bottom'] =
$this->info['padding-left'] =
$this->info['padding-right'] = new HTMLPurifier_AttrDef_CSS_Composite(array(
new HTMLPurifier_AttrDef_CSS_Length('0'),
new HTMLPurifier_AttrDef_CSS_Percentage(true)
));
$this->info['padding'] = new HTMLPurifier_AttrDef_CSS_Multiple($padding);
$this->info['text-indent'] = new HTMLPurifier_AttrDef_CSS_Composite(array(
new HTMLPurifier_AttrDef_CSS_Length(),
new HTMLPurifier_AttrDef_CSS_Percentage()
));
$trusted_wh = new HTMLPurifier_AttrDef_CSS_Composite(array(
new HTMLPurifier_AttrDef_CSS_Length('0'),
new HTMLPurifier_AttrDef_CSS_Percentage(true),
new HTMLPurifier_AttrDef_Enum(array('auto'))
));
$max = $config->get('CSS', 'MaxImgLength');
$this->info['width'] =
$this->info['height'] =
$max === null ?
$trusted_wh :
new HTMLPurifier_AttrDef_Switch('img',
// For img tags:
new HTMLPurifier_AttrDef_CSS_Composite(array(
new HTMLPurifier_AttrDef_CSS_Length('0', $max),
new HTMLPurifier_AttrDef_Enum(array('auto'))
)),
// For everyone else:
$trusted_wh
);
$this->info['text-decoration'] = new HTMLPurifier_AttrDef_CSS_TextDecoration();
$this->info['font-family'] = new HTMLPurifier_AttrDef_CSS_FontFamily();
// this could use specialized code
$this->info['font-weight'] = new HTMLPurifier_AttrDef_Enum(
array('normal', 'bold', 'bolder', 'lighter', '100', '200', '300',
'400', '500', '600', '700', '800', '900'), false);
// MUST be called after other font properties, as it references
// a CSSDefinition object
$this->info['font'] = new HTMLPurifier_AttrDef_CSS_Font($config);
// same here
$this->info['border'] =
$this->info['border-bottom'] =
$this->info['border-top'] =
$this->info['border-left'] =
$this->info['border-right'] = new HTMLPurifier_AttrDef_CSS_Border($config);
$this->info['border-collapse'] = new HTMLPurifier_AttrDef_Enum(array(
'collapse', 'separate'));
$this->info['caption-side'] = new HTMLPurifier_AttrDef_Enum(array(
'top', 'bottom'));
$this->info['table-layout'] = new HTMLPurifier_AttrDef_Enum(array(
'auto', 'fixed'));
$this->info['vertical-align'] = new HTMLPurifier_AttrDef_CSS_Composite(array(
new HTMLPurifier_AttrDef_Enum(array('baseline', 'sub', 'super',
'top', 'text-top', 'middle', 'bottom', 'text-bottom')),
new HTMLPurifier_AttrDef_CSS_Length(),
new HTMLPurifier_AttrDef_CSS_Percentage()
));
$this->info['border-spacing'] = new HTMLPurifier_AttrDef_CSS_Multiple(new HTMLPurifier_AttrDef_CSS_Length(), 2);
// partial support
$this->info['white-space'] = new HTMLPurifier_AttrDef_Enum(array('nowrap'));
if ($config->get('CSS', 'Proprietary')) {
$this->doSetupProprietary($config);
}
if ($config->get('CSS', 'AllowTricky')) {
$this->doSetupTricky($config);
}
$allow_important = $config->get('CSS', 'AllowImportant');
// wrap all attr-defs with decorator that handles !important
foreach ($this->info as $k => $v) {
$this->info[$k] = new HTMLPurifier_AttrDef_CSS_ImportantDecorator($v, $allow_important);
}
$this->setupConfigStuff($config);
}
protected function doSetupProprietary($config) {
// Internet Explorer only scrollbar colors
$this->info['scrollbar-arrow-color'] = new HTMLPurifier_AttrDef_CSS_Color();
$this->info['scrollbar-base-color'] = new HTMLPurifier_AttrDef_CSS_Color();
$this->info['scrollbar-darkshadow-color'] = new HTMLPurifier_AttrDef_CSS_Color();
$this->info['scrollbar-face-color'] = new HTMLPurifier_AttrDef_CSS_Color();
$this->info['scrollbar-highlight-color'] = new HTMLPurifier_AttrDef_CSS_Color();
$this->info['scrollbar-shadow-color'] = new HTMLPurifier_AttrDef_CSS_Color();
// technically not proprietary, but CSS3, and no one supports it
$this->info['opacity'] = new HTMLPurifier_AttrDef_CSS_AlphaValue();
$this->info['-moz-opacity'] = new HTMLPurifier_AttrDef_CSS_AlphaValue();
$this->info['-khtml-opacity'] = new HTMLPurifier_AttrDef_CSS_AlphaValue();
// only opacity, for now
$this->info['filter'] = new HTMLPurifier_AttrDef_CSS_Filter();
}
protected function doSetupTricky($config) {
$this->info['display'] = new HTMLPurifier_AttrDef_Enum(array(
'inline', 'block', 'list-item', 'run-in', 'compact',
'marker', 'table', 'inline-table', 'table-row-group',
'table-header-group', 'table-footer-group', 'table-row',
'table-column-group', 'table-column', 'table-cell', 'table-caption', 'none'
));
$this->info['visibility'] = new HTMLPurifier_AttrDef_Enum(array(
'visible', 'hidden', 'collapse'
));
$this->info['overflow'] = new HTMLPurifier_AttrDef_Enum(array('visible', 'hidden', 'auto', 'scroll'));
}
/**
* Performs extra config-based processing. Based off of
* HTMLPurifier_HTMLDefinition.
* @todo Refactor duplicate elements into common class (probably using
* composition, not inheritance).
*/
protected function setupConfigStuff($config) {
// setup allowed elements
$support = "(for information on implementing this, see the ".
"support forums) ";
$allowed_attributes = $config->get('CSS', 'AllowedProperties');
if ($allowed_attributes !== null) {
foreach ($this->info as $name => $d) {
if(!isset($allowed_attributes[$name])) unset($this->info[$name]);
unset($allowed_attributes[$name]);
}
// emit errors
foreach ($allowed_attributes as $name => $d) {
// :TODO: Is this htmlspecialchars() call really necessary?
$name = htmlspecialchars($name);
trigger_error("Style attribute '$name' is not supported $support", E_USER_WARNING);
}
}
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,48 @@
<?php
/**
* Defines allowed child nodes and validates tokens against it.
*/
abstract class HTMLPurifier_ChildDef
{
/**
* Type of child definition, usually right-most part of class name lowercase.
* Used occasionally in terms of context.
*/
public $type;
/**
* Bool that indicates whether or not an empty array of children is okay
*
* This is necessary for redundant checking when changes affecting
* a child node may cause a parent node to now be disallowed.
*/
public $allow_empty;
/**
* Lookup array of all elements that this definition could possibly allow
*/
public $elements = array();
/**
* Get lookup of tag names that should not close this element automatically.
* All other elements will do so.
*/
public function getAllowedElements($config) {
return $this->elements;
}
/**
* Validates nodes according to definition and returns modification.
*
* @param $tokens_of_children Array of HTMLPurifier_Token
* @param $config HTMLPurifier_Config object
* @param $context HTMLPurifier_Context object
* @return bool true to leave nodes as is
* @return bool false to remove parent node
* @return array of replacement child tokens
*/
abstract public function validateChildren($tokens_of_children, $config, $context);
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,48 @@
<?php
/**
* Definition that uses different definitions depending on context.
*
* The del and ins tags are notable because they allow different types of
* elements depending on whether or not they're in a block or inline context.
* Chameleon allows this behavior to happen by using two different
* definitions depending on context. While this somewhat generalized,
* it is specifically intended for those two tags.
*/
class HTMLPurifier_ChildDef_Chameleon extends HTMLPurifier_ChildDef
{
/**
* Instance of the definition object to use when inline. Usually stricter.
*/
public $inline;
/**
* Instance of the definition object to use when block.
*/
public $block;
public $type = 'chameleon';
/**
* @param $inline List of elements to allow when inline.
* @param $block List of elements to allow when block.
*/
public function __construct($inline, $block) {
$this->inline = new HTMLPurifier_ChildDef_Optional($inline);
$this->block = new HTMLPurifier_ChildDef_Optional($block);
$this->elements = $this->block->elements;
}
public function validateChildren($tokens_of_children, $config, $context) {
if ($context->get('IsInline') === false) {
return $this->block->validateChildren(
$tokens_of_children, $config, $context);
} else {
return $this->inline->validateChildren(
$tokens_of_children, $config, $context);
}
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,90 @@
<?php
/**
* Custom validation class, accepts DTD child definitions
*
* @warning Currently this class is an all or nothing proposition, that is,
* it will only give a bool return value.
*/
class HTMLPurifier_ChildDef_Custom extends HTMLPurifier_ChildDef
{
public $type = 'custom';
public $allow_empty = false;
/**
* Allowed child pattern as defined by the DTD
*/
public $dtd_regex;
/**
* PCRE regex derived from $dtd_regex
* @private
*/
private $_pcre_regex;
/**
* @param $dtd_regex Allowed child pattern from the DTD
*/
public function __construct($dtd_regex) {
$this->dtd_regex = $dtd_regex;
$this->_compileRegex();
}
/**
* Compiles the PCRE regex from a DTD regex ($dtd_regex to $_pcre_regex)
*/
protected function _compileRegex() {
$raw = str_replace(' ', '', $this->dtd_regex);
if ($raw{0} != '(') {
$raw = "($raw)";
}
$el = '[#a-zA-Z0-9_.-]+';
$reg = $raw;
// COMPLICATED! AND MIGHT BE BUGGY! I HAVE NO CLUE WHAT I'M
// DOING! Seriously: if there's problems, please report them.
// collect all elements into the $elements array
preg_match_all("/$el/", $reg, $matches);
foreach ($matches[0] as $match) {
$this->elements[$match] = true;
}
// setup all elements as parentheticals with leading commas
$reg = preg_replace("/$el/", '(,\\0)', $reg);
// remove commas when they were not solicited
$reg = preg_replace("/([^,(|]\(+),/", '\\1', $reg);
// remove all non-paranthetical commas: they are handled by first regex
$reg = preg_replace("/,\(/", '(', $reg);
$this->_pcre_regex = $reg;
}
public function validateChildren($tokens_of_children, $config, $context) {
$list_of_children = '';
$nesting = 0; // depth into the nest
foreach ($tokens_of_children as $token) {
if (!empty($token->is_whitespace)) continue;
$is_child = ($nesting == 0); // direct
if ($token instanceof HTMLPurifier_Token_Start) {
$nesting++;
} elseif ($token instanceof HTMLPurifier_Token_End) {
$nesting--;
}
if ($is_child) {
$list_of_children .= $token->name . ',';
}
}
// add leading comma to deal with stray comma declarations
$list_of_children = ',' . rtrim($list_of_children, ',');
$okay =
preg_match(
'/^,?'.$this->_pcre_regex.'$/',
$list_of_children
);
return (bool) $okay;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,20 @@
<?php
/**
* Definition that disallows all elements.
* @warning validateChildren() in this class is actually never called, because
* empty elements are corrected in HTMLPurifier_Strategy_MakeWellFormed
* before child definitions are parsed in earnest by
* HTMLPurifier_Strategy_FixNesting.
*/
class HTMLPurifier_ChildDef_Empty extends HTMLPurifier_ChildDef
{
public $allow_empty = true;
public $type = 'empty';
public function __construct() {}
public function validateChildren($tokens_of_children, $config, $context) {
return array();
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,26 @@
<?php
/**
* Definition that allows a set of elements, and allows no children.
* @note This is a hack to reuse code from HTMLPurifier_ChildDef_Required,
* really, one shouldn't inherit from the other. Only altered behavior
* is to overload a returned false with an array. Thus, it will never
* return false.
*/
class HTMLPurifier_ChildDef_Optional extends HTMLPurifier_ChildDef_Required
{
public $allow_empty = true;
public $type = 'optional';
public function validateChildren($tokens_of_children, $config, $context) {
$result = parent::validateChildren($tokens_of_children, $config, $context);
// we assume that $tokens_of_children is not modified
if ($result === false) {
if (empty($tokens_of_children)) return true;
elseif ($this->whitespace) return $tokens_of_children;
else return array();
}
return $result;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,117 @@
<?php
/**
* Definition that allows a set of elements, but disallows empty children.
*/
class HTMLPurifier_ChildDef_Required extends HTMLPurifier_ChildDef
{
/**
* Lookup table of allowed elements.
* @public
*/
public $elements = array();
/**
* Whether or not the last passed node was all whitespace.
*/
protected $whitespace = false;
/**
* @param $elements List of allowed element names (lowercase).
*/
public function __construct($elements) {
if (is_string($elements)) {
$elements = str_replace(' ', '', $elements);
$elements = explode('|', $elements);
}
$keys = array_keys($elements);
if ($keys == array_keys($keys)) {
$elements = array_flip($elements);
foreach ($elements as $i => $x) {
$elements[$i] = true;
if (empty($i)) unset($elements[$i]); // remove blank
}
}
$this->elements = $elements;
}
public $allow_empty = false;
public $type = 'required';
public function validateChildren($tokens_of_children, $config, $context) {
// Flag for subclasses
$this->whitespace = false;
// if there are no tokens, delete parent node
if (empty($tokens_of_children)) return false;
// the new set of children
$result = array();
// current depth into the nest
$nesting = 0;
// whether or not we're deleting a node
$is_deleting = false;
// whether or not parsed character data is allowed
// this controls whether or not we silently drop a tag
// or generate escaped HTML from it
$pcdata_allowed = isset($this->elements['#PCDATA']);
// a little sanity check to make sure it's not ALL whitespace
$all_whitespace = true;
// some configuration
$escape_invalid_children = $config->get('Core', 'EscapeInvalidChildren');
// generator
$gen = new HTMLPurifier_Generator($config, $context);
foreach ($tokens_of_children as $token) {
if (!empty($token->is_whitespace)) {
$result[] = $token;
continue;
}
$all_whitespace = false; // phew, we're not talking about whitespace
$is_child = ($nesting == 0);
if ($token instanceof HTMLPurifier_Token_Start) {
$nesting++;
} elseif ($token instanceof HTMLPurifier_Token_End) {
$nesting--;
}
if ($is_child) {
$is_deleting = false;
if (!isset($this->elements[$token->name])) {
$is_deleting = true;
if ($pcdata_allowed && $token instanceof HTMLPurifier_Token_Text) {
$result[] = $token;
} elseif ($pcdata_allowed && $escape_invalid_children) {
$result[] = new HTMLPurifier_Token_Text(
$gen->generateFromToken($token)
);
}
continue;
}
}
if (!$is_deleting || ($pcdata_allowed && $token instanceof HTMLPurifier_Token_Text)) {
$result[] = $token;
} elseif ($pcdata_allowed && $escape_invalid_children) {
$result[] =
new HTMLPurifier_Token_Text(
$gen->generateFromToken($token)
);
} else {
// drop silently
}
}
if (empty($result)) return false;
if ($all_whitespace) {
$this->whitespace = true;
return false;
}
if ($tokens_of_children == $result) return true;
return $result;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,88 @@
<?php
/**
* Takes the contents of blockquote when in strict and reformats for validation.
*/
class HTMLPurifier_ChildDef_StrictBlockquote extends HTMLPurifier_ChildDef_Required
{
protected $real_elements;
protected $fake_elements;
public $allow_empty = true;
public $type = 'strictblockquote';
protected $init = false;
/**
* @note We don't want MakeWellFormed to auto-close inline elements since
* they might be allowed.
*/
public function getAllowedElements($config) {
$this->init($config);
return $this->fake_elements;
}
public function validateChildren($tokens_of_children, $config, $context) {
$this->init($config);
// trick the parent class into thinking it allows more
$this->elements = $this->fake_elements;
$result = parent::validateChildren($tokens_of_children, $config, $context);
$this->elements = $this->real_elements;
if ($result === false) return array();
if ($result === true) $result = $tokens_of_children;
$def = $config->getHTMLDefinition();
$block_wrap_start = new HTMLPurifier_Token_Start($def->info_block_wrapper);
$block_wrap_end = new HTMLPurifier_Token_End( $def->info_block_wrapper);
$is_inline = false;
$depth = 0;
$ret = array();
// assuming that there are no comment tokens
foreach ($result as $i => $token) {
$token = $result[$i];
// ifs are nested for readability
if (!$is_inline) {
if (!$depth) {
if (
($token instanceof HTMLPurifier_Token_Text && !$token->is_whitespace) ||
(!$token instanceof HTMLPurifier_Token_Text && !isset($this->elements[$token->name]))
) {
$is_inline = true;
$ret[] = $block_wrap_start;
}
}
} else {
if (!$depth) {
// starting tokens have been inline text / empty
if ($token instanceof HTMLPurifier_Token_Start || $token instanceof HTMLPurifier_Token_Empty) {
if (isset($this->elements[$token->name])) {
// ended
$ret[] = $block_wrap_end;
$is_inline = false;
}
}
}
}
$ret[] = $token;
if ($token instanceof HTMLPurifier_Token_Start) $depth++;
if ($token instanceof HTMLPurifier_Token_End) $depth--;
}
if ($is_inline) $ret[] = $block_wrap_end;
return $ret;
}
private function init($config) {
if (!$this->init) {
$def = $config->getHTMLDefinition();
// allow all inline elements
$this->real_elements = $this->elements;
$this->fake_elements = $def->info_content_sets['Flow'];
$this->fake_elements['#PCDATA'] = true;
$this->init = true;
}
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,142 @@
<?php
/**
* Definition for tables
*/
class HTMLPurifier_ChildDef_Table extends HTMLPurifier_ChildDef
{
public $allow_empty = false;
public $type = 'table';
public $elements = array('tr' => true, 'tbody' => true, 'thead' => true,
'tfoot' => true, 'caption' => true, 'colgroup' => true, 'col' => true);
public function __construct() {}
public function validateChildren($tokens_of_children, $config, $context) {
if (empty($tokens_of_children)) return false;
// this ensures that the loop gets run one last time before closing
// up. It's a little bit of a hack, but it works! Just make sure you
// get rid of the token later.
$tokens_of_children[] = false;
// only one of these elements is allowed in a table
$caption = false;
$thead = false;
$tfoot = false;
// as many of these as you want
$cols = array();
$content = array();
$nesting = 0; // current depth so we can determine nodes
$is_collecting = false; // are we globbing together tokens to package
// into one of the collectors?
$collection = array(); // collected nodes
$tag_index = 0; // the first node might be whitespace,
// so this tells us where the start tag is
foreach ($tokens_of_children as $token) {
$is_child = ($nesting == 0);
if ($token === false) {
// terminating sequence started
} elseif ($token instanceof HTMLPurifier_Token_Start) {
$nesting++;
} elseif ($token instanceof HTMLPurifier_Token_End) {
$nesting--;
}
// handle node collection
if ($is_collecting) {
if ($is_child) {
// okay, let's stash the tokens away
// first token tells us the type of the collection
switch ($collection[$tag_index]->name) {
case 'tr':
case 'tbody':
$content[] = $collection;
break;
case 'caption':
if ($caption !== false) break;
$caption = $collection;
break;
case 'thead':
case 'tfoot':
// access the appropriate variable, $thead or $tfoot
$var = $collection[$tag_index]->name;
if ($$var === false) {
$$var = $collection;
} else {
// transmutate the first and less entries into
// tbody tags, and then put into content
$collection[$tag_index]->name = 'tbody';
$collection[count($collection)-1]->name = 'tbody';
$content[] = $collection;
}
break;
case 'colgroup':
$cols[] = $collection;
break;
}
$collection = array();
$is_collecting = false;
$tag_index = 0;
} else {
// add the node to the collection
$collection[] = $token;
}
}
// terminate
if ($token === false) break;
if ($is_child) {
// determine what we're dealing with
if ($token->name == 'col') {
// the only empty tag in the possie, we can handle it
// immediately
$cols[] = array_merge($collection, array($token));
$collection = array();
$tag_index = 0;
continue;
}
switch($token->name) {
case 'caption':
case 'colgroup':
case 'thead':
case 'tfoot':
case 'tbody':
case 'tr':
$is_collecting = true;
$collection[] = $token;
continue;
default:
if (!empty($token->is_whitespace)) {
$collection[] = $token;
$tag_index++;
}
continue;
}
}
}
if (empty($content)) return false;
$ret = array();
if ($caption !== false) $ret = array_merge($ret, $caption);
if ($cols !== false) foreach ($cols as $token_array) $ret = array_merge($ret, $token_array);
if ($thead !== false) $ret = array_merge($ret, $thead);
if ($tfoot !== false) $ret = array_merge($ret, $tfoot);
foreach ($content as $token_array) $ret = array_merge($ret, $token_array);
if (!empty($collection) && $is_collecting == false){
// grab the trailing space
$ret = array_merge($ret, $collection);
}
array_pop($tokens_of_children); // remove phantom token
return ($ret === $tokens_of_children) ? true : $ret;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,497 @@
<?php
/**
* Configuration object that triggers customizable behavior.
*
* @warning This class is strongly defined: that means that the class
* will fail if an undefined directive is retrieved or set.
*
* @note Many classes that could (although many times don't) use the
* configuration object make it a mandatory parameter. This is
* because a configuration object should always be forwarded,
* otherwise, you run the risk of missing a parameter and then
* being stumped when a configuration directive doesn't work.
*
* @todo Reconsider some of the public member variables
*/
class HTMLPurifier_Config
{
/**
* HTML Purifier's version
*/
public $version = '3.3.0';
/**
* Bool indicator whether or not to automatically finalize
* the object if a read operation is done
*/
public $autoFinalize = true;
// protected member variables
/**
* Namespace indexed array of serials for specific namespaces (see
* getSerial() for more info).
*/
protected $serials = array();
/**
* Serial for entire configuration object
*/
protected $serial;
/**
* Parser for variables
*/
protected $parser;
/**
* Reference HTMLPurifier_ConfigSchema for value checking
* @note This is public for introspective purposes. Please don't
* abuse!
*/
public $def;
/**
* Indexed array of definitions
*/
protected $definitions;
/**
* Bool indicator whether or not config is finalized
*/
protected $finalized = false;
/**
* Property list containing configuration directives.
*/
protected $plist;
/**
* @param $definition HTMLPurifier_ConfigSchema that defines what directives
* are allowed.
*/
public function __construct($definition) {
$this->plist = new HTMLPurifier_PropertyList($definition->defaultPlist);
$this->def = $definition; // keep a copy around for checking
$this->parser = new HTMLPurifier_VarParser_Flexible();
}
/**
* Convenience constructor that creates a config object based on a mixed var
* @param mixed $config Variable that defines the state of the config
* object. Can be: a HTMLPurifier_Config() object,
* an array of directives based on loadArray(),
* or a string filename of an ini file.
* @param HTMLPurifier_ConfigSchema Schema object
* @return Configured HTMLPurifier_Config object
*/
public static function create($config, $schema = null) {
if ($config instanceof HTMLPurifier_Config) {
// pass-through
return $config;
}
if (!$schema) {
$ret = HTMLPurifier_Config::createDefault();
} else {
$ret = new HTMLPurifier_Config($schema);
}
if (is_string($config)) $ret->loadIni($config);
elseif (is_array($config)) $ret->loadArray($config);
return $ret;
}
/**
* Convenience constructor that creates a default configuration object.
* @return Default HTMLPurifier_Config object.
*/
public static function createDefault() {
$definition = HTMLPurifier_ConfigSchema::instance();
$config = new HTMLPurifier_Config($definition);
return $config;
}
/**
* Retreives a value from the configuration.
* @param $namespace String namespace
* @param $key String key
*/
public function get($namespace, $key) {
if (!$this->finalized) $this->autoFinalize ? $this->finalize() : $this->plist->squash(true);
if (!isset($this->def->info[$namespace][$key])) {
// can't add % due to SimpleTest bug
trigger_error('Cannot retrieve value of undefined directive ' . htmlspecialchars("$namespace.$key"),
E_USER_WARNING);
return;
}
if (isset($this->def->info[$namespace][$key]->isAlias)) {
$d = $this->def->info[$namespace][$key];
trigger_error('Cannot get value from aliased directive, use real name ' . $d->namespace . '.' . $d->name,
E_USER_ERROR);
return;
}
return $this->plist->get("$namespace.$key");
}
/**
* Retreives an array of directives to values from a given namespace
* @param $namespace String namespace
*/
public function getBatch($namespace) {
if (!$this->finalized) $this->autoFinalize ? $this->finalize() : $this->plist->squash(true);
if (!isset($this->def->info[$namespace])) {
trigger_error('Cannot retrieve undefined namespace ' . htmlspecialchars($namespace),
E_USER_WARNING);
return;
}
$full = $this->getAll();
return $full[$namespace];
}
/**
* Returns a md5 signature of a segment of the configuration object
* that uniquely identifies that particular configuration
* @note Revision is handled specially and is removed from the batch
* before processing!
* @param $namespace Namespace to get serial for
*/
public function getBatchSerial($namespace) {
if (empty($this->serials[$namespace])) {
$batch = $this->getBatch($namespace);
unset($batch['DefinitionRev']);
$this->serials[$namespace] = md5(serialize($batch));
}
return $this->serials[$namespace];
}
/**
* Returns a md5 signature for the entire configuration object
* that uniquely identifies that particular configuration
*/
public function getSerial() {
if (empty($this->serial)) {
$this->serial = md5(serialize($this->getAll()));
}
return $this->serial;
}
/**
* Retrieves all directives, organized by namespace
*/
public function getAll() {
if (!$this->finalized) $this->autoFinalize ? $this->finalize() : $this->plist->squash(true);
$ret = array();
foreach ($this->plist->squash() as $name => $value) {
list($ns, $key) = explode('.', $name, 2);
$ret[$ns][$key] = $value;
}
return $ret;
}
/**
* Sets a value to configuration.
* @param $namespace String namespace
* @param $key String key
* @param $value Mixed value
*/
public function set($namespace, $key, $value, $from_alias = false) {
if ($this->isFinalized('Cannot set directive after finalization')) return;
if (!isset($this->def->info[$namespace][$key])) {
trigger_error('Cannot set undefined directive ' . htmlspecialchars("$namespace.$key") . ' to value',
E_USER_WARNING);
return;
}
$def = $this->def->info[$namespace][$key];
if (isset($def->isAlias)) {
if ($from_alias) {
trigger_error('Double-aliases not allowed, please fix '.
'ConfigSchema bug with' . "$namespace.$key", E_USER_ERROR);
return;
}
$this->set($new_ns = $def->namespace,
$new_dir = $def->name,
$value, true);
trigger_error("$namespace.$key is an alias, preferred directive name is $new_ns.$new_dir", E_USER_NOTICE);
return;
}
// Raw type might be negative when using the fully optimized form
// of stdclass, which indicates allow_null == true
$rtype = is_int($def) ? $def : $def->type;
if ($rtype < 0) {
$type = -$rtype;
$allow_null = true;
} else {
$type = $rtype;
$allow_null = isset($def->allow_null);
}
try {
$value = $this->parser->parse($value, $type, $allow_null);
} catch (HTMLPurifier_VarParserException $e) {
trigger_error('Value for ' . "$namespace.$key" . ' is of invalid type, should be ' . HTMLPurifier_VarParser::getTypeName($type), E_USER_WARNING);
return;
}
if (is_string($value) && is_object($def)) {
// resolve value alias if defined
if (isset($def->aliases[$value])) {
$value = $def->aliases[$value];
}
// check to see if the value is allowed
if (isset($def->allowed) && !isset($def->allowed[$value])) {
trigger_error('Value not supported, valid values are: ' .
$this->_listify($def->allowed), E_USER_WARNING);
return;
}
}
$this->plist->set("$namespace.$key", $value);
// reset definitions if the directives they depend on changed
// this is a very costly process, so it's discouraged
// with finalization
if ($namespace == 'HTML' || $namespace == 'CSS') {
$this->definitions[$namespace] = null;
}
$this->serials[$namespace] = false;
}
/**
* Convenience function for error reporting
*/
private function _listify($lookup) {
$list = array();
foreach ($lookup as $name => $b) $list[] = $name;
return implode(', ', $list);
}
/**
* Retrieves object reference to the HTML definition.
* @param $raw Return a copy that has not been setup yet. Must be
* called before it's been setup, otherwise won't work.
*/
public function getHTMLDefinition($raw = false) {
return $this->getDefinition('HTML', $raw);
}
/**
* Retrieves object reference to the CSS definition
* @param $raw Return a copy that has not been setup yet. Must be
* called before it's been setup, otherwise won't work.
*/
public function getCSSDefinition($raw = false) {
return $this->getDefinition('CSS', $raw);
}
/**
* Retrieves a definition
* @param $type Type of definition: HTML, CSS, etc
* @param $raw Whether or not definition should be returned raw
*/
public function getDefinition($type, $raw = false) {
if (!$this->finalized) $this->autoFinalize ? $this->finalize() : $this->plist->squash(true);
$factory = HTMLPurifier_DefinitionCacheFactory::instance();
$cache = $factory->create($type, $this);
if (!$raw) {
// see if we can quickly supply a definition
if (!empty($this->definitions[$type])) {
if (!$this->definitions[$type]->setup) {
$this->definitions[$type]->setup($this);
$cache->set($this->definitions[$type], $this);
}
return $this->definitions[$type];
}
// memory check missed, try cache
$this->definitions[$type] = $cache->get($this);
if ($this->definitions[$type]) {
// definition in cache, return it
return $this->definitions[$type];
}
} elseif (
!empty($this->definitions[$type]) &&
!$this->definitions[$type]->setup
) {
// raw requested, raw in memory, quick return
return $this->definitions[$type];
}
// quick checks failed, let's create the object
if ($type == 'HTML') {
$this->definitions[$type] = new HTMLPurifier_HTMLDefinition();
} elseif ($type == 'CSS') {
$this->definitions[$type] = new HTMLPurifier_CSSDefinition();
} elseif ($type == 'URI') {
$this->definitions[$type] = new HTMLPurifier_URIDefinition();
} else {
throw new HTMLPurifier_Exception("Definition of $type type not supported");
}
// quick abort if raw
if ($raw) {
if (is_null($this->get($type, 'DefinitionID'))) {
// fatally error out if definition ID not set
throw new HTMLPurifier_Exception("Cannot retrieve raw version without specifying %$type.DefinitionID");
}
return $this->definitions[$type];
}
// set it up
$this->definitions[$type]->setup($this);
// save in cache
$cache->set($this->definitions[$type], $this);
return $this->definitions[$type];
}
/**
* Loads configuration values from an array with the following structure:
* Namespace.Directive => Value
* @param $config_array Configuration associative array
*/
public function loadArray($config_array) {
if ($this->isFinalized('Cannot load directives after finalization')) return;
foreach ($config_array as $key => $value) {
$key = str_replace('_', '.', $key);
if (strpos($key, '.') !== false) {
// condensed form
list($namespace, $directive) = explode('.', $key);
$this->set($namespace, $directive, $value);
} else {
$namespace = $key;
$namespace_values = $value;
foreach ($namespace_values as $directive => $value) {
$this->set($namespace, $directive, $value);
}
}
}
}
/**
* Returns a list of array(namespace, directive) for all directives
* that are allowed in a web-form context as per an allowed
* namespaces/directives list.
* @param $allowed List of allowed namespaces/directives
*/
public static function getAllowedDirectivesForForm($allowed, $schema = null) {
if (!$schema) {
$schema = HTMLPurifier_ConfigSchema::instance();
}
if ($allowed !== true) {
if (is_string($allowed)) $allowed = array($allowed);
$allowed_ns = array();
$allowed_directives = array();
$blacklisted_directives = array();
foreach ($allowed as $ns_or_directive) {
if (strpos($ns_or_directive, '.') !== false) {
// directive
if ($ns_or_directive[0] == '-') {
$blacklisted_directives[substr($ns_or_directive, 1)] = true;
} else {
$allowed_directives[$ns_or_directive] = true;
}
} else {
// namespace
$allowed_ns[$ns_or_directive] = true;
}
}
}
$ret = array();
foreach ($schema->info as $ns => $keypairs) {
foreach ($keypairs as $directive => $def) {
if ($allowed !== true) {
if (isset($blacklisted_directives["$ns.$directive"])) continue;
if (!isset($allowed_directives["$ns.$directive"]) && !isset($allowed_ns[$ns])) continue;
}
if (isset($def->isAlias)) continue;
if ($directive == 'DefinitionID' || $directive == 'DefinitionRev') continue;
$ret[] = array($ns, $directive);
}
}
return $ret;
}
/**
* Loads configuration values from $_GET/$_POST that were posted
* via ConfigForm
* @param $array $_GET or $_POST array to import
* @param $index Index/name that the config variables are in
* @param $allowed List of allowed namespaces/directives
* @param $mq_fix Boolean whether or not to enable magic quotes fix
* @param $schema Instance of HTMLPurifier_ConfigSchema to use, if not global copy
*/
public static function loadArrayFromForm($array, $index = false, $allowed = true, $mq_fix = true, $schema = null) {
$ret = HTMLPurifier_Config::prepareArrayFromForm($array, $index, $allowed, $mq_fix, $schema);
$config = HTMLPurifier_Config::create($ret, $schema);
return $config;
}
/**
* Merges in configuration values from $_GET/$_POST to object. NOT STATIC.
* @note Same parameters as loadArrayFromForm
*/
public function mergeArrayFromForm($array, $index = false, $allowed = true, $mq_fix = true) {
$ret = HTMLPurifier_Config::prepareArrayFromForm($array, $index, $allowed, $mq_fix, $this->def);
$this->loadArray($ret);
}
/**
* Prepares an array from a form into something usable for the more
* strict parts of HTMLPurifier_Config
*/
public static function prepareArrayFromForm($array, $index = false, $allowed = true, $mq_fix = true, $schema = null) {
if ($index !== false) $array = (isset($array[$index]) && is_array($array[$index])) ? $array[$index] : array();
$mq = $mq_fix && function_exists('get_magic_quotes_gpc') && get_magic_quotes_gpc();
$allowed = HTMLPurifier_Config::getAllowedDirectivesForForm($allowed, $schema);
$ret = array();
foreach ($allowed as $key) {
list($ns, $directive) = $key;
$skey = "$ns.$directive";
if (!empty($array["Null_$skey"])) {
$ret[$ns][$directive] = null;
continue;
}
if (!isset($array[$skey])) continue;
$value = $mq ? stripslashes($array[$skey]) : $array[$skey];
$ret[$ns][$directive] = $value;
}
return $ret;
}
/**
* Loads configuration values from an ini file
* @param $filename Name of ini file
*/
public function loadIni($filename) {
if ($this->isFinalized('Cannot load directives after finalization')) return;
$array = parse_ini_file($filename, true);
$this->loadArray($array);
}
/**
* Checks whether or not the configuration object is finalized.
* @param $error String error message, or false for no error
*/
public function isFinalized($error = false) {
if ($this->finalized && $error) {
trigger_error($error, E_USER_ERROR);
}
return $this->finalized;
}
/**
* Finalizes configuration only if auto finalize is on and not
* already finalized
*/
public function autoFinalize() {
if (!$this->finalized && $this->autoFinalize) $this->finalize();
}
/**
* Finalizes a configuration object, prohibiting further change
*/
public function finalize() {
$this->finalized = true;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,231 @@
<?php
/**
* Configuration definition, defines directives and their defaults.
*/
class HTMLPurifier_ConfigSchema {
/**
* Defaults of the directives and namespaces.
* @note This shares the exact same structure as HTMLPurifier_Config::$conf
*/
public $defaults = array();
/**
* The default property list. Do not edit this property list.
*/
public $defaultPlist;
/**
* Definition of the directives. The structure of this is:
*
* array(
* 'Namespace' => array(
* 'Directive' => new stdclass(),
* )
* )
*
* The stdclass may have the following properties:
*
* - If isAlias isn't set:
* - type: Integer type of directive, see HTMLPurifier_VarParser for definitions
* - allow_null: If set, this directive allows null values
* - aliases: If set, an associative array of value aliases to real values
* - allowed: If set, a lookup array of allowed (string) values
* - If isAlias is set:
* - namespace: Namespace this directive aliases to
* - name: Directive name this directive aliases to
*
* In certain degenerate cases, stdclass will actually be an integer. In
* that case, the value is equivalent to an stdclass with the type
* property set to the integer. If the integer is negative, type is
* equal to the absolute value of integer, and allow_null is true.
*
* This class is friendly with HTMLPurifier_Config. If you need introspection
* about the schema, you're better of using the ConfigSchema_Interchange,
* which uses more memory but has much richer information.
*/
public $info = array();
/**
* Application-wide singleton
*/
static protected $singleton;
public function __construct() {
$this->defaultPlist = new HTMLPurifier_PropertyList();
}
/**
* Unserializes the default ConfigSchema.
*/
public static function makeFromSerial() {
return unserialize(file_get_contents(HTMLPURIFIER_PREFIX . '/HTMLPurifier/ConfigSchema/schema.ser'));
}
/**
* Retrieves an instance of the application-wide configuration definition.
*/
public static function instance($prototype = null) {
if ($prototype !== null) {
HTMLPurifier_ConfigSchema::$singleton = $prototype;
} elseif (HTMLPurifier_ConfigSchema::$singleton === null || $prototype === true) {
HTMLPurifier_ConfigSchema::$singleton = HTMLPurifier_ConfigSchema::makeFromSerial();
}
return HTMLPurifier_ConfigSchema::$singleton;
}
/**
* Defines a directive for configuration
* @warning Will fail of directive's namespace is defined.
* @warning This method's signature is slightly different from the legacy
* define() static method! Beware!
* @param $namespace Namespace the directive is in
* @param $name Key of directive
* @param $default Default value of directive
* @param $type Allowed type of the directive. See
* HTMLPurifier_DirectiveDef::$type for allowed values
* @param $allow_null Whether or not to allow null values
*/
public function add($namespace, $name, $default, $type, $allow_null) {
$obj = new stdclass();
$obj->type = is_int($type) ? $type : HTMLPurifier_VarParser::$types[$type];
if ($allow_null) $obj->allow_null = true;
$this->info[$namespace][$name] = $obj;
$this->defaults[$namespace][$name] = $default;
$this->defaultPlist->set("$namespace.$name", $default);
}
/**
* Defines a namespace for directives to be put into.
* @warning This is slightly different from the corresponding static
* method.
* @param $namespace Namespace's name
*/
public function addNamespace($namespace) {
$this->info[$namespace] = array();
$this->defaults[$namespace] = array();
}
/**
* Defines a directive value alias.
*
* Directive value aliases are convenient for developers because it lets
* them set a directive to several values and get the same result.
* @param $namespace Directive's namespace
* @param $name Name of Directive
* @param $aliases Hash of aliased values to the real alias
*/
public function addValueAliases($namespace, $name, $aliases) {
if (!isset($this->info[$namespace][$name]->aliases)) {
$this->info[$namespace][$name]->aliases = array();
}
foreach ($aliases as $alias => $real) {
$this->info[$namespace][$name]->aliases[$alias] = $real;
}
}
/**
* Defines a set of allowed values for a directive.
* @warning This is slightly different from the corresponding static
* method definition.
* @param $namespace Namespace of directive
* @param $name Name of directive
* @param $allowed Lookup array of allowed values
*/
public function addAllowedValues($namespace, $name, $allowed) {
$this->info[$namespace][$name]->allowed = $allowed;
}
/**
* Defines a directive alias for backwards compatibility
* @param $namespace
* @param $name Directive that will be aliased
* @param $new_namespace
* @param $new_name Directive that the alias will be to
*/
public function addAlias($namespace, $name, $new_namespace, $new_name) {
$obj = new stdclass;
$obj->namespace = $new_namespace;
$obj->name = $new_name;
$obj->isAlias = true;
$this->info[$namespace][$name] = $obj;
}
/**
* Replaces any stdclass that only has the type property with type integer.
*/
public function postProcess() {
foreach ($this->info as $namespace => $info) {
foreach ($info as $directive => $v) {
if (count((array) $v) == 1) {
$this->info[$namespace][$directive] = $v->type;
} elseif (count((array) $v) == 2 && isset($v->allow_null)) {
$this->info[$namespace][$directive] = -$v->type;
}
}
}
}
// DEPRECATED METHODS
/** @see HTMLPurifier_ConfigSchema->set() */
public static function define($namespace, $name, $default, $type, $description) {
HTMLPurifier_ConfigSchema::deprecated(__METHOD__);
$type_values = explode('/', $type, 2);
$type = $type_values[0];
$modifier = isset($type_values[1]) ? $type_values[1] : false;
$allow_null = ($modifier === 'null');
$def = HTMLPurifier_ConfigSchema::instance();
$def->add($namespace, $name, $default, $type, $allow_null);
}
/** @see HTMLPurifier_ConfigSchema->addNamespace() */
public static function defineNamespace($namespace, $description) {
HTMLPurifier_ConfigSchema::deprecated(__METHOD__);
$def = HTMLPurifier_ConfigSchema::instance();
$def->addNamespace($namespace);
}
/** @see HTMLPurifier_ConfigSchema->addValueAliases() */
public static function defineValueAliases($namespace, $name, $aliases) {
HTMLPurifier_ConfigSchema::deprecated(__METHOD__);
$def = HTMLPurifier_ConfigSchema::instance();
$def->addValueAliases($namespace, $name, $aliases);
}
/** @see HTMLPurifier_ConfigSchema->addAllowedValues() */
public static function defineAllowedValues($namespace, $name, $allowed_values) {
HTMLPurifier_ConfigSchema::deprecated(__METHOD__);
$allowed = array();
foreach ($allowed_values as $value) {
$allowed[$value] = true;
}
$def = HTMLPurifier_ConfigSchema::instance();
$def->addAllowedValues($namespace, $name, $allowed);
}
/** @see HTMLPurifier_ConfigSchema->addAlias() */
public static function defineAlias($namespace, $name, $new_namespace, $new_name) {
HTMLPurifier_ConfigSchema::deprecated(__METHOD__);
$def = HTMLPurifier_ConfigSchema::instance();
$def->addAlias($namespace, $name, $new_namespace, $new_name);
}
/** @deprecated, use HTMLPurifier_VarParser->parse() */
public function validate($a, $b, $c = false) {
trigger_error("HTMLPurifier_ConfigSchema->validate deprecated, use HTMLPurifier_VarParser->parse instead", E_USER_NOTICE);
$parser = new HTMLPurifier_VarParser();
return $parser->parse($a, $b, $c);
}
/**
* Throws an E_USER_NOTICE stating that a method is deprecated.
*/
private static function deprecated($method) {
trigger_error("Static HTMLPurifier_ConfigSchema::$method deprecated, use add*() method instead", E_USER_NOTICE);
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,52 @@
<?php
/**
* Converts HTMLPurifier_ConfigSchema_Interchange to our runtime
* representation used to perform checks on user configuration.
*/
class HTMLPurifier_ConfigSchema_Builder_ConfigSchema
{
public function build($interchange) {
$schema = new HTMLPurifier_ConfigSchema();
foreach ($interchange->namespaces as $n) {
$schema->addNamespace($n->namespace);
}
foreach ($interchange->directives as $d) {
$schema->add(
$d->id->namespace,
$d->id->directive,
$d->default,
$d->type,
$d->typeAllowsNull
);
if ($d->allowed !== null) {
$schema->addAllowedValues(
$d->id->namespace,
$d->id->directive,
$d->allowed
);
}
foreach ($d->aliases as $alias) {
$schema->addAlias(
$alias->namespace,
$alias->directive,
$d->id->namespace,
$d->id->directive
);
}
if ($d->valueAliases !== null) {
$schema->addValueAliases(
$d->id->namespace,
$d->id->directive,
$d->valueAliases
);
}
}
$schema->postProcess();
return $schema;
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,108 @@
<?php
/**
* Converts HTMLPurifier_ConfigSchema_Interchange to an XML format,
* which can be further processed to generate documentation.
*/
class HTMLPurifier_ConfigSchema_Builder_Xml extends XMLWriter
{
protected $interchange;
protected function writeHTMLDiv($html) {
$this->startElement('div');
$purifier = HTMLPurifier::getInstance();
$html = $purifier->purify($html);
$this->writeAttribute('xmlns', 'http://www.w3.org/1999/xhtml');
$this->writeRaw($html);
$this->endElement(); // div
}
protected function export($var) {
if ($var === array()) return 'array()';
return var_export($var, true);
}
public function build($interchange) {
// global access, only use as last resort
$this->interchange = $interchange;
$this->setIndent(true);
$this->startDocument('1.0', 'UTF-8');
$this->startElement('configdoc');
$this->writeElement('title', $interchange->name);
foreach ($interchange->namespaces as $namespace) {
$this->buildNamespace($namespace);
}
$this->endElement(); // configdoc
$this->flush();
}
public function buildNamespace($namespace) {
$this->startElement('namespace');
$this->writeAttribute('id', $namespace->namespace);
$this->writeElement('name', $namespace->namespace);
$this->startElement('description');
$this->writeHTMLDiv($namespace->description);
$this->endElement(); // description
foreach ($this->interchange->directives as $directive) {
if ($directive->id->namespace !== $namespace->namespace) continue;
$this->buildDirective($directive);
}
$this->endElement(); // namespace
}
public function buildDirective($directive) {
$this->startElement('directive');
$this->writeAttribute('id', $directive->id->toString());
$this->writeElement('name', $directive->id->directive);
$this->startElement('aliases');
foreach ($directive->aliases as $alias) $this->writeElement('alias', $alias->toString());
$this->endElement(); // aliases
$this->startElement('constraints');
if ($directive->version) $this->writeElement('version', $directive->version);
$this->startElement('type');
if ($directive->typeAllowsNull) $this->writeAttribute('allow-null', 'yes');
$this->text($directive->type);
$this->endElement(); // type
if ($directive->allowed) {
$this->startElement('allowed');
foreach ($directive->allowed as $value => $x) $this->writeElement('value', $value);
$this->endElement(); // allowed
}
$this->writeElement('default', $this->export($directive->default));
$this->writeAttribute('xml:space', 'preserve');
if ($directive->external) {
$this->startElement('external');
foreach ($directive->external as $project) $this->writeElement('project', $project);
$this->endElement();
}
$this->endElement(); // constraints
if ($directive->deprecatedVersion) {
$this->startElement('deprecated');
$this->writeElement('version', $directive->deprecatedVersion);
$this->writeElement('use', $directive->deprecatedUse->toString());
$this->endElement(); // deprecated
}
$this->startElement('description');
$this->writeHTMLDiv($directive->description);
$this->endElement(); // description
$this->endElement(); // directive
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,11 @@
<?php
/**
* Exceptions related to configuration schema
*/
class HTMLPurifier_ConfigSchema_Exception extends HTMLPurifier_Exception
{
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,57 @@
<?php
/**
* Generic schema interchange format that can be converted to a runtime
* representation (HTMLPurifier_ConfigSchema) or HTML documentation. Members
* are completely validated.
*/
class HTMLPurifier_ConfigSchema_Interchange
{
/**
* Name of the application this schema is describing.
*/
public $name;
/**
* Array of Namespace ID => array(namespace info)
*/
public $namespaces = array();
/**
* Array of Directive ID => array(directive info)
*/
public $directives = array();
/**
* Adds a namespace array to $namespaces
*/
public function addNamespace($namespace) {
if (isset($this->namespaces[$i = $namespace->namespace])) {
throw new HTMLPurifier_ConfigSchema_Exception("Cannot redefine namespace '$i'");
}
$this->namespaces[$i] = $namespace;
}
/**
* Adds a directive array to $directives
*/
public function addDirective($directive) {
if (isset($this->directives[$i = $directive->id->toString()])) {
throw new HTMLPurifier_ConfigSchema_Exception("Cannot redefine directive '$i'");
}
$this->directives[$i] = $directive;
}
/**
* Convenience function to perform standard validation. Throws exception
* on failed validation.
*/
public function validate() {
$validator = new HTMLPurifier_ConfigSchema_Validator();
return $validator->validate($this);
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,77 @@
<?php
/**
* Interchange component class describing configuration directives.
*/
class HTMLPurifier_ConfigSchema_Interchange_Directive
{
/**
* ID of directive, instance of HTMLPurifier_ConfigSchema_Interchange_Id.
*/
public $id;
/**
* String type, e.g. 'integer' or 'istring'.
*/
public $type;
/**
* Default value, e.g. 3 or 'DefaultVal'.
*/
public $default;
/**
* HTML description.
*/
public $description;
/**
* Boolean whether or not null is allowed as a value.
*/
public $typeAllowsNull = false;
/**
* Lookup table of allowed scalar values, e.g. array('allowed' => true).
* Null if all values are allowed.
*/
public $allowed;
/**
* List of aliases for the directive,
* e.g. array(new HTMLPurifier_ConfigSchema_Interchange_Id('Ns', 'Dir'))).
*/
public $aliases = array();
/**
* Hash of value aliases, e.g. array('alt' => 'real'). Null if value
* aliasing is disabled (necessary for non-scalar types).
*/
public $valueAliases;
/**
* Version of HTML Purifier the directive was introduced, e.g. '1.3.1'.
* Null if the directive has always existed.
*/
public $version;
/**
* ID of directive that supercedes this old directive, is an instance
* of HTMLPurifier_ConfigSchema_Interchange_Id. Null if not deprecated.
*/
public $deprecatedUse;
/**
* Version of HTML Purifier this directive was deprecated. Null if not
* deprecated.
*/
public $deprecatedVersion;
/**
* List of external projects this directive depends on, e.g. array('CSSTidy').
*/
public $external = array();
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,31 @@
<?php
/**
* Represents a directive ID in the interchange format.
*/
class HTMLPurifier_ConfigSchema_Interchange_Id
{
public $namespace, $directive;
public function __construct($namespace, $directive) {
$this->namespace = $namespace;
$this->directive = $directive;
}
/**
* @warning This is NOT magic, to ensure that people don't abuse SPL and
* cause problems for PHP 5.0 support.
*/
public function toString() {
return $this->namespace . '.' . $this->directive;
}
public static function make($id) {
list($namespace, $directive) = explode('.', $id);
return new HTMLPurifier_ConfigSchema_Interchange_Id($namespace, $directive);
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,21 @@
<?php
/**
* Interchange component class describing namespaces.
*/
class HTMLPurifier_ConfigSchema_Interchange_Namespace
{
/**
* Name of namespace defined.
*/
public $namespace;
/**
* HTML description.
*/
public $description;
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,176 @@
<?php
class HTMLPurifier_ConfigSchema_InterchangeBuilder
{
/**
* Used for processing DEFAULT, nothing else.
*/
protected $varParser;
public function __construct($varParser = null) {
$this->varParser = $varParser ? $varParser : new HTMLPurifier_VarParser_Native();
}
public static function buildFromDirectory($dir = null) {
$parser = new HTMLPurifier_StringHashParser();
$builder = new HTMLPurifier_ConfigSchema_InterchangeBuilder();
$interchange = new HTMLPurifier_ConfigSchema_Interchange();
if (!$dir) $dir = HTMLPURIFIER_PREFIX . '/HTMLPurifier/ConfigSchema/schema/';
$info = parse_ini_file($dir . 'info.ini');
$interchange->name = $info['name'];
$files = array();
$dh = opendir($dir);
while (false !== ($file = readdir($dh))) {
if (!$file || $file[0] == '.' || strrchr($file, '.') !== '.txt') {
continue;
}
$files[] = $file;
}
closedir($dh);
sort($files);
foreach ($files as $file) {
$builder->build(
$interchange,
new HTMLPurifier_StringHash( $parser->parseFile($dir . $file) )
);
}
return $interchange;
}
/**
* Builds an interchange object based on a hash.
* @param $interchange HTMLPurifier_ConfigSchema_Interchange object to build
* @param $hash HTMLPurifier_ConfigSchema_StringHash source data
*/
public function build($interchange, $hash) {
if (!$hash instanceof HTMLPurifier_StringHash) {
$hash = new HTMLPurifier_StringHash($hash);
}
if (!isset($hash['ID'])) {
throw new HTMLPurifier_ConfigSchema_Exception('Hash does not have any ID');
}
if (strpos($hash['ID'], '.') === false) {
$this->buildNamespace($interchange, $hash);
} else {
$this->buildDirective($interchange, $hash);
}
$this->_findUnused($hash);
}
public function buildNamespace($interchange, $hash) {
$namespace = new HTMLPurifier_ConfigSchema_Interchange_Namespace();
$namespace->namespace = $hash->offsetGet('ID');
if (isset($hash['DESCRIPTION'])) {
$namespace->description = $hash->offsetGet('DESCRIPTION');
}
$interchange->addNamespace($namespace);
}
public function buildDirective($interchange, $hash) {
$directive = new HTMLPurifier_ConfigSchema_Interchange_Directive();
// These are required elements:
$directive->id = $this->id($hash->offsetGet('ID'));
$id = $directive->id->toString(); // convenience
if (isset($hash['TYPE'])) {
$type = explode('/', $hash->offsetGet('TYPE'));
if (isset($type[1])) $directive->typeAllowsNull = true;
$directive->type = $type[0];
} else {
throw new HTMLPurifier_ConfigSchema_Exception("TYPE in directive hash '$id' not defined");
}
if (isset($hash['DEFAULT'])) {
try {
$directive->default = $this->varParser->parse($hash->offsetGet('DEFAULT'), $directive->type, $directive->typeAllowsNull);
} catch (HTMLPurifier_VarParserException $e) {
throw new HTMLPurifier_ConfigSchema_Exception($e->getMessage() . " in DEFAULT in directive hash '$id'");
}
}
if (isset($hash['DESCRIPTION'])) {
$directive->description = $hash->offsetGet('DESCRIPTION');
}
if (isset($hash['ALLOWED'])) {
$directive->allowed = $this->lookup($this->evalArray($hash->offsetGet('ALLOWED')));
}
if (isset($hash['VALUE-ALIASES'])) {
$directive->valueAliases = $this->evalArray($hash->offsetGet('VALUE-ALIASES'));
}
if (isset($hash['ALIASES'])) {
$raw_aliases = trim($hash->offsetGet('ALIASES'));
$aliases = preg_split('/\s*,\s*/', $raw_aliases);
foreach ($aliases as $alias) {
$directive->aliases[] = $this->id($alias);
}
}
if (isset($hash['VERSION'])) {
$directive->version = $hash->offsetGet('VERSION');
}
if (isset($hash['DEPRECATED-USE'])) {
$directive->deprecatedUse = $this->id($hash->offsetGet('DEPRECATED-USE'));
}
if (isset($hash['DEPRECATED-VERSION'])) {
$directive->deprecatedVersion = $hash->offsetGet('DEPRECATED-VERSION');
}
if (isset($hash['EXTERNAL'])) {
$directive->external = preg_split('/\s*,\s*/', trim($hash->offsetGet('EXTERNAL')));
}
$interchange->addDirective($directive);
}
/**
* Evaluates an array PHP code string without array() wrapper
*/
protected function evalArray($contents) {
return eval('return array('. $contents .');');
}
/**
* Converts an array list into a lookup array.
*/
protected function lookup($array) {
$ret = array();
foreach ($array as $val) $ret[$val] = true;
return $ret;
}
/**
* Convenience function that creates an HTMLPurifier_ConfigSchema_Interchange_Id
* object based on a string Id.
*/
protected function id($id) {
return HTMLPurifier_ConfigSchema_Interchange_Id::make($id);
}
/**
* Triggers errors for any unused keys passed in the hash; such keys
* may indicate typos, missing values, etc.
* @param $hash Instance of ConfigSchema_StringHash to check.
*/
protected function _findUnused($hash) {
$accessed = $hash->getAccessed();
foreach ($hash as $k => $v) {
if (!isset($accessed[$k])) {
trigger_error("String hash key '$k' not used by builder", E_USER_NOTICE);
}
}
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,225 @@
<?php
/**
* Performs validations on HTMLPurifier_ConfigSchema_Interchange
*
* @note If you see '// handled by InterchangeBuilder', that means a
* design decision in that class would prevent this validation from
* ever being necessary. We have them anyway, however, for
* redundancy.
*/
class HTMLPurifier_ConfigSchema_Validator
{
/**
* Easy to access global objects.
*/
protected $interchange, $aliases;
/**
* Context-stack to provide easy to read error messages.
*/
protected $context = array();
/**
* HTMLPurifier_VarParser to test default's type.
*/
protected $parser;
public function __construct() {
$this->parser = new HTMLPurifier_VarParser();
}
/**
* Validates a fully-formed interchange object. Throws an
* HTMLPurifier_ConfigSchema_Exception if there's a problem.
*/
public function validate($interchange) {
$this->interchange = $interchange;
$this->aliases = array();
// PHP is a bit lax with integer <=> string conversions in
// arrays, so we don't use the identical !== comparison
foreach ($interchange->namespaces as $i => $namespace) {
if ($i != $namespace->namespace) $this->error(false, "Integrity violation: key '$i' does not match internal id '{$namespace->namespace}'");
$this->validateNamespace($namespace);
}
foreach ($interchange->directives as $i => $directive) {
$id = $directive->id->toString();
if ($i != $id) $this->error(false, "Integrity violation: key '$i' does not match internal id '$id'");
$this->validateDirective($directive);
}
return true;
}
/**
* Validates a HTMLPurifier_ConfigSchema_Interchange_Namespace object.
*/
public function validateNamespace($n) {
$this->context[] = "namespace '{$n->namespace}'";
$this->with($n, 'namespace')
->assertNotEmpty()
->assertAlnum(); // implicit assertIsString handled by InterchangeBuilder
$this->with($n, 'description')
->assertNotEmpty()
->assertIsString(); // handled by InterchangeBuilder
array_pop($this->context);
}
/**
* Validates a HTMLPurifier_ConfigSchema_Interchange_Id object.
*/
public function validateId($id) {
$id_string = $id->toString();
$this->context[] = "id '$id_string'";
if (!$id instanceof HTMLPurifier_ConfigSchema_Interchange_Id) {
// handled by InterchangeBuilder
$this->error(false, 'is not an instance of HTMLPurifier_ConfigSchema_Interchange_Id');
}
if (!isset($this->interchange->namespaces[$id->namespace])) {
$this->error('namespace', 'does not exist'); // assumes that the namespace was validated already
}
$this->with($id, 'directive')
->assertNotEmpty()
->assertAlnum(); // implicit assertIsString handled by InterchangeBuilder
array_pop($this->context);
}
/**
* Validates a HTMLPurifier_ConfigSchema_Interchange_Directive object.
*/
public function validateDirective($d) {
$id = $d->id->toString();
$this->context[] = "directive '$id'";
$this->validateId($d->id);
$this->with($d, 'description')
->assertNotEmpty();
// BEGIN - handled by InterchangeBuilder
$this->with($d, 'type')
->assertNotEmpty();
$this->with($d, 'typeAllowsNull')
->assertIsBool();
try {
// This also tests validity of $d->type
$this->parser->parse($d->default, $d->type, $d->typeAllowsNull);
} catch (HTMLPurifier_VarParserException $e) {
$this->error('default', 'had error: ' . $e->getMessage());
}
// END - handled by InterchangeBuilder
if (!is_null($d->allowed) || !empty($d->valueAliases)) {
// allowed and valueAliases require that we be dealing with
// strings, so check for that early.
$d_int = HTMLPurifier_VarParser::$types[$d->type];
if (!isset(HTMLPurifier_VarParser::$stringTypes[$d_int])) {
$this->error('type', 'must be a string type when used with allowed or value aliases');
}
}
$this->validateDirectiveAllowed($d);
$this->validateDirectiveValueAliases($d);
$this->validateDirectiveAliases($d);
array_pop($this->context);
}
/**
* Extra validation if $allowed member variable of
* HTMLPurifier_ConfigSchema_Interchange_Directive is defined.
*/
public function validateDirectiveAllowed($d) {
if (is_null($d->allowed)) return;
$this->with($d, 'allowed')
->assertNotEmpty()
->assertIsLookup(); // handled by InterchangeBuilder
if (is_string($d->default) && !isset($d->allowed[$d->default])) {
$this->error('default', 'must be an allowed value');
}
$this->context[] = 'allowed';
foreach ($d->allowed as $val => $x) {
if (!is_string($val)) $this->error("value $val", 'must be a string');
}
array_pop($this->context);
}
/**
* Extra validation if $valueAliases member variable of
* HTMLPurifier_ConfigSchema_Interchange_Directive is defined.
*/
public function validateDirectiveValueAliases($d) {
if (is_null($d->valueAliases)) return;
$this->with($d, 'valueAliases')
->assertIsArray(); // handled by InterchangeBuilder
$this->context[] = 'valueAliases';
foreach ($d->valueAliases as $alias => $real) {
if (!is_string($alias)) $this->error("alias $alias", 'must be a string');
if (!is_string($real)) $this->error("alias target $real from alias '$alias'", 'must be a string');
if ($alias === $real) {
$this->error("alias '$alias'", "must not be an alias to itself");
}
}
if (!is_null($d->allowed)) {
foreach ($d->valueAliases as $alias => $real) {
if (isset($d->allowed[$alias])) {
$this->error("alias '$alias'", 'must not be an allowed value');
} elseif (!isset($d->allowed[$real])) {
$this->error("alias '$alias'", 'must be an alias to an allowed value');
}
}
}
array_pop($this->context);
}
/**
* Extra validation if $aliases member variable of
* HTMLPurifier_ConfigSchema_Interchange_Directive is defined.
*/
public function validateDirectiveAliases($d) {
$this->with($d, 'aliases')
->assertIsArray(); // handled by InterchangeBuilder
$this->context[] = 'aliases';
foreach ($d->aliases as $alias) {
$this->validateId($alias);
$s = $alias->toString();
if (isset($this->interchange->directives[$s])) {
$this->error("alias '$s'", 'collides with another directive');
}
if (isset($this->aliases[$s])) {
$other_directive = $this->aliases[$s];
$this->error("alias '$s'", "collides with alias for directive '$other_directive'");
}
$this->aliases[$s] = $d->id->toString();
}
array_pop($this->context);
}
// protected helper functions
/**
* Convenience function for generating HTMLPurifier_ConfigSchema_ValidatorAtom
* for validating simple member variables of objects.
*/
protected function with($obj, $member) {
return new HTMLPurifier_ConfigSchema_ValidatorAtom($this->getFormattedContext(), $obj, $member);
}
/**
* Emits an error, providing helpful context.
*/
protected function error($target, $msg) {
if ($target !== false) $prefix = ucfirst($target) . ' in ' . $this->getFormattedContext();
else $prefix = ucfirst($this->getFormattedContext());
throw new HTMLPurifier_ConfigSchema_Exception(trim($prefix . ' ' . $msg));
}
/**
* Returns a formatted context string.
*/
protected function getFormattedContext() {
return implode(' in ', array_reverse($this->context));
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,66 @@
<?php
/**
* Fluent interface for validating the contents of member variables.
* This should be immutable. See HTMLPurifier_ConfigSchema_Validator for
* use-cases. We name this an 'atom' because it's ONLY for validations that
* are independent and usually scalar.
*/
class HTMLPurifier_ConfigSchema_ValidatorAtom
{
protected $context, $obj, $member, $contents;
public function __construct($context, $obj, $member) {
$this->context = $context;
$this->obj = $obj;
$this->member = $member;
$this->contents =& $obj->$member;
}
public function assertIsString() {
if (!is_string($this->contents)) $this->error('must be a string');
return $this;
}
public function assertIsBool() {
if (!is_bool($this->contents)) $this->error('must be a boolean');
return $this;
}
public function assertIsArray() {
if (!is_array($this->contents)) $this->error('must be an array');
return $this;
}
public function assertNotNull() {
if ($this->contents === null) $this->error('must not be null');
return $this;
}
public function assertAlnum() {
$this->assertIsString();
if (!ctype_alnum($this->contents)) $this->error('must be alphanumeric');
return $this;
}
public function assertNotEmpty() {
if (empty($this->contents)) $this->error('must not be empty');
return $this;
}
public function assertIsLookup() {
$this->assertIsArray();
foreach ($this->contents as $v) {
if ($v !== true) $this->error('must be a lookup array');
}
return $this;
}
protected function error($msg) {
throw new HTMLPurifier_ConfigSchema_Exception(ucfirst($this->member) . ' in ' . $this->context . ' ' . $msg);
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,12 @@
Attr.AllowedFrameTargets
TYPE: lookup
DEFAULT: array()
--DESCRIPTION--
Lookup table of all allowed link frame targets. Some commonly used link
targets include _blank, _self, _parent and _top. Values should be
lowercase, as validation will be done in a case-sensitive manner despite
W3C's recommendation. XHTML 1.0 Strict does not permit the target attribute
so this directive will have no effect in that doctype. XHTML 1.1 does not
enable the Target module by default, you will have to manually enable it
(see the module documentation for more details.)
--# vim: et sw=4 sts=4

View File

@ -0,0 +1,9 @@
Attr.AllowedRel
TYPE: lookup
VERSION: 1.6.0
DEFAULT: array()
--DESCRIPTION--
List of allowed forward document relationships in the rel attribute. Common
values may be nofollow or print. By default, this is empty, meaning that no
document relationships are allowed.
--# vim: et sw=4 sts=4

Some files were not shown because too many files have changed in this diff Show More