mirror of
https://github.com/jzillmann/pdf-to-markdown.git
synced 2024-11-22 15:53:34 +01:00
Update known issues
This commit is contained in:
parent
5ab8730a4b
commit
3c31c12768
@ -12,15 +12,33 @@ The interesting thing is that rendering with pdfjs (online) looks good. So maybe
|
||||
- CC-NC_Leitfaden.pdf: un-verified toc entries (and/und/&... etc...)
|
||||
- Closed-Syllables.pdf: unverified toc entries
|
||||
- Safe-Communication.pdf: One toc element is one page off (8=>9)
|
||||
- no page numbers [The-Art-of-Public-Speaking](examples/The-Art-of-Public-Speaking.pdf).
|
||||
- multiline headlines: [WoodUp](examples/WoodUp.pdf)
|
||||
- Detecting list of figures (and creating headlines) [Achieving-The-Paris-Climate-Agreement](Achieving-The-Paris-Climate-Agreement.pdf)
|
||||
|
||||
## Not yet reviewed test PDFS
|
||||
|
||||
- Achieving-The-Paris-Climate-Agreement.pdf
|
||||
- wrong page page mapping ?
|
||||
- no page numbers removed
|
||||
- no toc
|
||||
- Made-with-cc.pdf
|
||||
- no toc
|
||||
- Watered-Soul-Blog-Book.pdf
|
||||
- TOC: character minumum cuts out year
|
||||
- TOC: stops to early
|
||||
# Achieving-The-Paris-Climate-Agreement.pdf
|
||||
|
||||
- wrong page page mapping ?
|
||||
- no page numbers removed
|
||||
- no toc
|
||||
- romisch numbers are wrong
|
||||
- subheading under the toc headings should be detected as well (clearly not in the code)
|
||||
|
||||
# Sherlock
|
||||
|
||||
- words not together
|
||||
|
||||
# Made-with-cc.pdf
|
||||
|
||||
- no toc
|
||||
|
||||
# Watered-Soul-Blog-Book.pdf
|
||||
|
||||
- TOC: character minumum cuts out year
|
||||
- TOC: stops to early
|
||||
|
||||
# Life of God in Soul of man
|
||||
|
||||
- Headlines confusion (after the headline the first words of a sentence are big... shouldn't be a headline in this case... looks at all heights in the line)
|
||||
|
@ -1,14 +0,0 @@
|
||||
# Known Issues
|
||||
|
||||
## Missing or wrong characters
|
||||
|
||||
The text which comes of pdfjs looks very erronous sometimes. E.g [Life-Of-God-In-Soul-Of-Man](examples/Life-Of-God-In-Soul-Of-Man.pdf).
|
||||
The interesting thing is that rendering with pdfjs (online) looks good (but copying the text shows the same distortion). So maybe this is just a setup problem !?
|
||||
|
||||
## Uncovered TOC variants
|
||||
|
||||
- out of order items [Safe-Communication](examples/Safe-Communication.pdf)
|
||||
- items in wrong lines + numbers are not numbers [Life-Of-God-In-Soul-Of-Man](examples/Life-Of-God-In-Soul-Of-Man.pdf)
|
||||
- no page numbers [The-Art-of-Public-Speaking](examples/The-Art-of-Public-Speaking.pdf).
|
||||
- multiline headlines: [WoodUp](examples/WoodUp.pdf)
|
||||
- Detecting list of figures (and creating headlines) [Achieving-The-Paris-Climate-Agreement](Achieving-The-Paris-Climate-Agreement.pdf)
|
Loading…
Reference in New Issue
Block a user