/* htmLawed_TESTCASE.txt, 14 August 2012 htmLawed 1.1.14, 8 August 2012 Copyright Santosh Patnaik Dual licensed with LGPL 3 and GPL 2+ A PHP Labware internal utility - http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed */ This file has UTF-8-encoded text with both correct and incorrect/malformed HTML/XHTML code snippets to test htmLawed (test cases/samples). The entire text may also be used as a unit. ************************************************ when viewing this file in a web browser, set the character encoding to Unicode/UTF-8 ************************************************ --------------------- start -------------------- Try different $config and $spec values. Some text even when filtered in will not be displayed in a rendered web-page
Attributes
Xml:lang:, ,
Standard, predefined value, or empty attribute: , ,
Required: , image
Quote & space variation: a, a, a
Invalid: a
Duplicated: a
Deprecated: a,

Casing:
Custom: image
Admin-restricted?:
Attribute values
Duplicate ID value:, ,
(try 'my_' for prefix)
Double-quotes in value:, ,
(try filter for CSS expression)
CSS expression:

Other: ,
(try 'maxlen', 'maxval', etc., for 'input' in '$spec')
Blockquotes
abc

abc
def

abc
def

abc
def
ghi

abc
def
ghi
QQQ
x

x
QQQ

x
QQQ
x

x
QQQ

x



(try with blockquote parent)
CDATA sections
Special characters inside: ]]>, 3.5, & 4 > 4 ]]>
Normal: , CDATA follows:
Malformed: , < ![CDATA check ]]>, , < ![CDATA check ] ]>
Invalid: >CDATA in tag content,
text not allowed
Complex-1: deprecated elements
The PHP software script used for this web-page webpage is htmLawedTest.php, from PHP Labware.
Complex-2: deprecated attributes
aa

image

Section

Para

  1. First item
  1. First item

Complex-3: embed, object, area


navigate the site: 1 | 3 | 4

value
Complex-4: nested and other tables
Cell
Cell
Cell
Cell Cell Cell
Cell
Cell Cell Cell

PCDATA wrong: Well
Hello

Missing tr:
Well

Complex-5: pseudo, disallowed or non-HTML tags
(Try different 'keep_bad' values) <*> Pseudotags <*> Non-HTML tag xml

Disallowed tag p

Elements
Unbalanced: check
Non-XHTML:

Malformed: < a href="">, , , , < /a>, < a href="">, a, a,
Invalid: a
Empty: a, a, atext
Content invalid: 12
Content invalid?:

(try setting 'form' as parent)
Casing:
Check for tidy:



hi
Entities
Special: & 3 < 2 & 5>4 and j >i >a & ia
Padding: B B f f  
Malformed: & #x27;, &x27;, ' &TILDE;, &tilde
Invalid: , �, , �, ￿, &bad;
Discouraged characters: , „, ﷠, 􏿾
Context: '>', <?
Casing: ', ', &TILDE;, ˜
(also check named-to-numeric and hexdec-to-decimal, and vice versa, conversions)
Format
Valid but ill-formatted: text text text text
p r e
text text

text none text text none t e x t
text none t e x t text none t e x t
p r e  
				pre
		
Cell
Cell
Cell
CellCellCell
Cell
CellCellCell
(try to compact or beautify)
Forms
(note nesting of 'form', missing required attributes, etc.)
pl
h


B:C:

(try each of these lines separately)
what
what (try with container as div and as form)
c a b
HTML comments (also CDATA)
Script inside:
Special characters inside: , , , c
Normal: , , comment:,
text not allowed

Malformed: , < ![CDATA check ]]>, < ![CDATA check ] ]>
Invalid:
>comment in tag content,
Ins-Del
(depending on context, these elements can be of either block or inline type)

block


d


d

d

d
Lists
Invalid character data:
  • (item
  • )

Definition list:
a
bad
first one
b
second

Definition list, close-tags omitted:
a
bad
first one
b
second

Definition lists, nested:
T1
D1
T2
D2
t1
d1
t2
d2
T3
D3
T4
D4
t1
d1

Definition lists, nested, close-tags omitted:
T1
D1
T2
D2
t1
d1
t2
d2
T3
D3
T4
D4
t1
d1

Nested:
  • l1
  • l2
    1. lo1
    2. lo2
  • l3
  • l4
    1. lo3
    2. lo4
      1. lo5

Nested, directly:
  • l1
    1. l2
  • l3

Nested, close-tags omitted:
  • l1
  • l2
    1. lo1
    2. lo2
  • l3
  • l4
    1. lo3
    2. lo4
      1. lo5

Complex:
Microdata
I am X but people call me Y. Find me at
Microsoft Word
Proprietary tag:

 


XML declaration:
XML-invalid character code-point (may not replicate):

“Where is he?” asked both Mary – the one so lovely – and Jane.

Non-English text-1
Inscrieţi-vă acum la a Zecea Conferinţă Internaţională
გთხოვთ ახლავე გაიაროთ რეგისტრაცია
večjezično računalništvo
อ.อ่าง
Зарегистрируйтесь сейчас на Десятую Международную Конференцию по
(this file should have utf-8 encoding; some characters may not be displayed because of missing fonts, etc.)
Non-English text-2: entities
用统一码
გთხოვთ
Inscreva-se agora para a Décima Conferência Internacional Sobre O Unicode, realizada entre os dias 10 e 12 de março de 1997 em Mainz na Alemanha.
Ruby
(need compatible browser)
さい とう のぶ W3C Associate Chairman
WWW (World Wide Web)
A (aaa)
Tables
Omitted closing tags:
h1c1h1c2
r1c1r1c2
r2c1r2c2

Nested, omitted closing tags:
h1c1h1c2
r1c1r1c2
h1c1h1c2
r1c1r1c2
r2c1r2c2
r2c1r2c2

Tag transformation
Font element intended as 'inline' element:

hi


Font element intended as 'block' element:
hi

Font element intended as 'block' element:
hi
QQQ

URLs
Relative and absolute: , , , , , ,
(try base URL value of 'http://a.com/b/')
CSS URLs:
,
,
,
,

Double URLs: b
Anti-spam: (try regex for 'http://a.com', etc.) , , , , , , ,
XSS
'';!--"=&{()}










test
Bad IE7: x
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: xxx
Bad IE7: x
Bad IE7: x
Bad IE7: x
Bad IE7: x
Bad IE7: exp/*x
Bad IE7: hi
Bad IE7: hi
Bad IE7: test
Bad IE7: hi
Bad IE7: hi
Other
3 < 4
3 > 4
> 3
<._.> hi!
<<< ALERT >>>
some stuff



if(13age){say 'teen'}
age >51 and a smoking history of >51 pack-years was
age > 51 and a smoking history of >51 pack-years was
age <51 and a smoking history of <51 pack-years was
age < 51 and a smoking history of < 51 pack-years was
age >51 and a smoking history of >51 pack-years
age > 51 and a smoking history of >51 pack-years
age <51 and a smoking history of <51 pack-years
age < 51 and a smoking history of < 51 pack-years