Commit Graph

3 Commits

Author SHA1 Message Date
Johannes Zillmann
74c941f88a Add newline after code block
- Also some test changes from prev commit (not captured before because of OOM)
2024-05-26 08:29:15 -06:00
Johannes Zillmann
7abafc61e7 Improve word boundary detection
- sometimes a word is provided with multiple items. E.g: "T his is a sen tence"
- use x-axis distance to not put whitespaces in the middle of a word
- also tweak the line detection a bit (for Alice)
2024-05-20 00:22:24 -06:00
Johannes Zillmann
78db114632 Add Markdown comparison tests
- Convert the `example PDFs` with the old `pdf-to-markdown` and write them to text files
- Compare the text files with the conversion of the current code
- Next:
  - Improve the current code to match good conversions of the old code
  - Adapt the text files in case the current conversion is better than the old
- Current tests are breaking
2024-04-21 09:15:46 -06:00