mirror of
https://github.com/jzillmann/pdf-to-markdown.git
synced 2024-11-26 01:33:59 +01:00
6.2 KiB
6.2 KiB
Test PDFs
This folder contains PDFs for testing purposes and the parse results of the PDFs. Generally there are 3 types of PDFs test setups:
- Self generated PDFs
- PDFs which entered
public domain
or have a otherwise permissive license likeCreative Commons SA
- PDFs where the license is unclear
For (1) and (2) we track the end-result and all transformation steps. For (3) we only track the resulst of some transfomation stages (those who doesn't leak too much of the content)
Self-generated PDFs
Included Public PDFs
(PDFs which entered public domain
or have a otherwise permissive license like Creative Commons SA
)
File | Source | Author /Editor | License Information |
---|---|---|---|
Achieving-The-Paris-Climate-Agreement | https://link.springer.com/ | Sven Teske | Open Access, CC 4.0 |
Adventures-Of-Sherlock-Holmes | https://pdfreebooks.org/ | Arthur Doyle | Public Domain |
Alice-In-Wonderland | https://pdfreebooks.org/ | Lewis Carroll | Public Domain |
CC_License_Agreement_of_siMPle | https://simple-plastics.eu/ | Aalborg University, Denmark and Alfred Wegener Institute | Creative Commons BY 4.0 |
CC-NC_Leitfaden | https://irights.info | Paul Klimpel | Creative Commons NC 4.0 |
Closed-Syllables | ? | Susan Jones | Creative Commons BY 4.0 |
Flash-Masques-Temperature | https://www.techtera.org/ | ? | Creative Commons BY 4.0 |
Grammar-Matters | ? | Debbie Kuhlmann | Creative Commons BY 4.0 |
Life-Of-God-In-Soul-Of-Man | https://archive.org/ | Henry Scougal | Public Domain |
Made-with-cc | https://creativecommons.org/ | Paul Stacey & Sarah Hinchliff Pearson | Public Domain |
Safe-Communication | https://www.england.nhs.uk/ | Nicola Davey & Ali Cole | Creative Commons BY-SA 4.0 |
St-Mary-Witney-Social-Audit | https://catrionarobertson.com/ | Catriona Robertson | Creative Commons BY 4.0 |
The-Art-of-Public-Speaking | http://www.gutenberg.org/ebooks/16317 | Dale Carnagey, J. Berg Esenwein | Project Gutenberg License |
The-Impact-of-Open-Access-Latin-American-Scholarship | https://about.jstor.org/ | John Kiplinger, Valerie Yaw | Creative Commons NC 4.0 |
The-Man-Without-A-Body | ? | Edward Page Mitchell | Public Domain |
The-War-of-the-Worlds | http://www.planetpdf.com/ | H.G Wells | Public Domain |
Tragedy-Of-The-Commons | https://science.sciencemag.org | Garrett Hardin | Public Domain |
Watered-Soul-Blog-Book | https://wateredsoul.com/ | Wanda | Creative Commons BY 4.0 |
WoodUp | https://bupress.unibz.it/ | Freie Universität Bozen-Bolzano / Giustino Tonon | Creative Commons BY 4.0 |
PDFs not stored but partially tested
- https://homepages.cwi.nl/~lex/files/dict.pdf
- Page numbers with current chapter
- https://github.com/mozilla/pdf.js/raw/master/web/compressed.tracemonkey-pldi-09.pdf
- No page numbers
Known transformation problems
Tracks known problems with parsing and transforming certain PDFs .
See als KNOWN_ISSUES