pdf-to-markdown/KNOWN_ISSUES.md
Johannes Zillmann 5daa8aa45a Detect Footnotes
- not yet converted in MD
- detection should be same as old version
2024-04-09 08:25:27 -06:00

1.6 KiB

Known Issues

Missing or wrong characters

The text which comes of pdfjs looks very erronous sometimes. E.g Life-Of-God-In-Soul-Of-Man. The interesting thing is that rendering with pdfjs (online) looks good. So maybe this is just a setup problem !?

Uncovered TOC variants

Footnotes

  • multiline foot notes (compressed.tracemonkey-pldi-09.pdf)

Not yet reviewed test PDFS

Achieving-The-Paris-Climate-Agreement.pdf

  • wrong page page mapping ?
  • no page numbers removed
  • no toc
  • romisch numbers are wrong
  • subheading under the toc headings should be detected as well (clearly not in the code)

Sherlock

  • words not together

Made-with-cc.pdf

  • no toc

Watered-Soul-Blog-Book.pdf

  • TOC: character minumum cuts out year
  • TOC: stops to early

Life of God in Soul of man

  • Headlines confusion (after the headline the first words of a sentence are big... shouldn't be a headline in this case... looks at all heights in the line)