pdf-to-markdown/KNOWN_ISSUES.md

27 lines
969 B
Markdown
Raw Normal View History

2021-04-11 09:08:45 +02:00
# Known Issues
## Missing or wrong characters
The text which comes of pdfjs looks very erronous sometimes. E.g [Life-Of-God-In-Soul-Of-Man](examples/Life-Of-God-In-Soul-Of-Man.pdf).
The interesting thing is that rendering with pdfjs (online) looks good. So maybe this is just a setup problem !?
2021-04-12 08:09:30 +02:00
## Uncovered TOC variants
- out of order items [Safe-Communication](examples/Safe-Communication.pdf)
- items in wrong lines + numbers are not numbers [Life-Of-God-In-Soul-Of-Man](examples/Life-Of-God-In-Soul-Of-Man.pdf)
2021-04-25 14:41:50 +02:00
- CC-NC_Leitfaden.pdf: un-verified toc entries (and/und/&... etc...)
- Closed-Syllables.pdf: unverified toc entries
2021-04-27 08:24:47 +02:00
- Safe-Communication.pdf: One toc element is one page off (8=>9)
2021-04-18 11:34:11 +02:00
2021-04-18 11:56:42 +02:00
## Not yet reviewed test PDFS
2021-04-18 11:34:11 +02:00
2021-04-18 11:56:42 +02:00
- Achieving-The-Paris-Climate-Agreement.pdf
- wrong page page mapping ?
- no page numbers removed
- no toc
2021-04-18 11:34:11 +02:00
- Made-with-cc.pdf
2021-04-18 11:56:42 +02:00
- no toc
2021-04-18 11:34:11 +02:00
- Watered-Soul-Blog-Book.pdf
2021-04-18 11:56:42 +02:00
- TOC: character minumum cuts out year
- TOC: stops to early