pdf-to-markdown/examples
2021-07-18 14:25:55 -06:00
..
Achieving-The-Paris-Climate-Agreement Move types to front 2021-07-18 14:25:55 -06:00
Adventures-Of-Sherlock-Holmes Move types to front 2021-07-18 14:25:55 -06:00
Alice-In-Wonderland Move types to front 2021-07-18 14:25:55 -06:00
CC_License_Agreement_of_siMPle Don't print globals twice 2021-07-18 14:13:38 -06:00
CC-NC_Leitfaden Move types to front 2021-07-18 14:25:55 -06:00
Closed-Syllables Move types to front 2021-07-18 14:25:55 -06:00
compressed.tracemonkey-pldi-09 Don't print globals twice 2021-07-18 14:13:38 -06:00
dict Move types to front 2021-07-18 14:25:55 -06:00
ExamplePdf Don't print globals twice 2021-07-18 14:13:38 -06:00
Flash-Masques-Temperature Don't print globals twice 2021-07-18 14:13:38 -06:00
Grammar-Matters Don't print globals twice 2021-07-18 14:13:38 -06:00
Life-Of-God-In-Soul-Of-Man Move types to front 2021-07-18 14:25:55 -06:00
Made-with-cc Don't print globals twice 2021-07-18 14:13:38 -06:00
Safe-Communication Move types to front 2021-07-18 14:25:55 -06:00
St-Mary-Witney-Social-Audit Move types to front 2021-07-18 14:25:55 -06:00
The-Art-of-Public-Speaking Don't print globals twice 2021-07-18 14:13:38 -06:00
The-Impact-of-Open-Access-Latin-American-Scholarship Don't print globals twice 2021-07-18 14:13:38 -06:00
The-Man-Without-A-Body Don't print globals twice 2021-07-18 14:13:38 -06:00
The-War-of-the-Worlds Don't print globals twice 2021-07-18 14:13:38 -06:00
Tragedy-Of-The-Commons Don't print globals twice 2021-07-18 14:13:38 -06:00
Watered-Soul-Blog-Book Move types to front 2021-07-18 14:25:55 -06:00
WoodUp Move types to front 2021-07-18 14:25:55 -06:00
Achieving-The-Paris-Climate-Agreement.pdf Fix typos 2021-04-18 11:38:34 +02:00
Adventures-Of-Sherlock-Holmes.pdf #24 Add first external PDFs for testing 2021-03-13 22:53:54 +01:00
Alice-In-Wonderland.pdf #24 Add first external PDFs for testing 2021-03-13 22:53:54 +01:00
CC_License_Agreement_of_siMPle.pdf Add 6 more test PDFs 2021-04-18 11:34:11 +02:00
CC-NC_Leitfaden.pdf Add 6 more test PDFs 2021-04-18 11:34:11 +02:00
Closed-Syllables.pdf #24 Add first external PDFs for testing 2021-03-13 22:53:54 +01:00
ExamplePdf.pages Add example files 2017-03-29 08:17:14 +02:00
ExamplePdf.pdf Add example files 2017-03-29 08:17:14 +02:00
Flash-Masques-Temperature.pdf #24 Add first external PDFs for testing 2021-03-13 22:53:54 +01:00
Grammar-Matters.pdf #24 Add first external PDFs for testing 2021-03-13 22:53:54 +01:00
KNOWN_ISSUES.md Add 6 more test PDFs 2021-04-18 11:34:11 +02:00
Life-Of-God-In-Soul-Of-Man.pdf #24 Add first external PDFs for testing 2021-03-13 22:53:54 +01:00
Made-with-cc.pdf Add 6 more test PDFs 2021-04-18 11:34:11 +02:00
README.md Fix typos 2021-04-18 11:38:34 +02:00
Safe-Communication.pdf #24 Add first external PDFs for testing 2021-03-13 22:53:54 +01:00
St-Mary-Witney-Social-Audit.pdf #24 Add first external PDFs for testing 2021-03-13 22:53:54 +01:00
The-Art-of-Public-Speaking.pdf #24 Add first external PDFs for testing 2021-03-13 22:53:54 +01:00
The-Impact-of-Open-Access-Latin-American-Scholarship.pdf Add 6 more test PDFs 2021-04-18 11:34:11 +02:00
The-Man-Without-A-Body.pdf #24 Add first external PDFs for testing 2021-03-13 22:53:54 +01:00
The-War-of-the-Worlds.pdf #24 Add first external PDFs for testing 2021-03-13 22:53:54 +01:00
Tragedy-Of-The-Commons.pdf #24 Add first external PDFs for testing 2021-03-13 22:53:54 +01:00
Watered-Soul-Blog-Book.pdf Add 6 more test PDFs 2021-04-18 11:34:11 +02:00
WoodUp.pdf #24 Add first external PDFs for testing 2021-03-13 22:53:54 +01:00

Test PDFs

This folder contains PDFs for testing purposes and the parse results of the PDFs. Generally there are 3 types of PDFs test setups:

  1. Self generated PDFs
  2. PDFs which entered public domain or have a otherwise permissive license like Creative Commons SA
  3. PDFs where the license is unclear

For (1) and (2) we track the end-result and all transformation steps. For (3) we only track the resulst of some transfomation stages (those who doesn't leak too much of the content)

Self-generated PDFs

Included Public PDFs

(PDFs which entered public domain or have a otherwise permissive license like Creative Commons SA)

File Source Author /Editor License Information
Achieving-The-Paris-Climate-Agreement https://link.springer.com/ Sven Teske Open Access, CC 4.0
Adventures-Of-Sherlock-Holmes https://pdfreebooks.org/ Arthur Doyle Public Domain
Alice-In-Wonderland https://pdfreebooks.org/ Lewis Carroll Public Domain
CC_License_Agreement_of_siMPle https://simple-plastics.eu/ Aalborg University, Denmark and Alfred Wegener Institute Creative Commons BY 4.0
CC-NC_Leitfaden https://irights.info Paul Klimpel Creative Commons NC 4.0
Closed-Syllables ? Susan Jones Creative Commons BY 4.0
Flash-Masques-Temperature https://www.techtera.org/ ? Creative Commons BY 4.0
Grammar-Matters ? Debbie Kuhlmann Creative Commons BY 4.0
Life-Of-God-In-Soul-Of-Man https://archive.org/ Henry Scougal Public Domain
Made-with-cc https://creativecommons.org/ Paul Stacey & Sarah Hinchliff Pearson Public Domain
Safe-Communication https://www.england.nhs.uk/ Nicola Davey & Ali Cole Creative Commons BY-SA 4.0
St-Mary-Witney-Social-Audit https://catrionarobertson.com/ Catriona Robertson Creative Commons BY 4.0
The-Art-of-Public-Speaking http://www.gutenberg.org/ebooks/16317 Dale Carnagey, J. Berg Esenwein Project Gutenberg License
The-Impact-of-Open-Access-Latin-American-Scholarship https://about.jstor.org/ John Kiplinger, Valerie Yaw Creative Commons NC 4.0
The-Man-Without-A-Body ? Edward Page Mitchell Public Domain
The-War-of-the-Worlds http://www.planetpdf.com/ H.G Wells Public Domain
Tragedy-Of-The-Commons https://science.sciencemag.org Garrett Hardin Public Domain
Watered-Soul-Blog-Book https://wateredsoul.com/ Wanda Creative Commons BY 4.0
WoodUp https://bupress.unibz.it/ Freie Universität Bozen-Bolzano / Giustino Tonon Creative Commons BY 4.0

PDFs not stored but partially tested

Known transformation problems

Tracks known problems with parsing and transforming certain PDFs .

  • Remove Repetitive Elements
      • often numbers are cryptic text
      • high variance in Y

See als KNOWN_ISSUES