pdf-to-markdown/KNOWN_ISSUES.md
2021-04-27 08:29:00 +02:00

969 B

Known Issues

Missing or wrong characters

The text which comes of pdfjs looks very erronous sometimes. E.g Life-Of-God-In-Soul-Of-Man. The interesting thing is that rendering with pdfjs (online) looks good. So maybe this is just a setup problem !?

Uncovered TOC variants

  • out of order items Safe-Communication
  • items in wrong lines + numbers are not numbers Life-Of-God-In-Soul-Of-Man
  • CC-NC_Leitfaden.pdf: un-verified toc entries (and/und/&... etc...)
  • Closed-Syllables.pdf: unverified toc entries
  • Safe-Communication.pdf: One toc element is one page off (8=>9)

Not yet reviewed test PDFS

  • Achieving-The-Paris-Climate-Agreement.pdf
    • wrong page page mapping ?
    • no page numbers removed
    • no toc
  • Made-with-cc.pdf
    • no toc
  • Watered-Soul-Blog-Book.pdf
    • TOC: character minumum cuts out year
    • TOC: stops to early