Commit Graph

12 Commits

Author SHA1 Message Date
Johannes Zillmann
78db114632 Add Markdown comparison tests
- Convert the `example PDFs` with the old `pdf-to-markdown` and write them to text files
- Compare the text files with the conversion of the current code
- Next:
  - Improve the current code to match good conversions of the old code
  - Adapt the text files in case the current conversion is better than the old
- Current tests are breaking
2024-04-21 09:15:46 -06:00
Johannes Zillmann
b5f3075bdf Clean up types
- merge `ItemType`/`BlockType` to `TextType`
- fix bug with duplicate and flattened types
2024-04-02 11:18:55 -06:00
Johannes Zillmann
55ae236928 Improve header detection
- fix tests
- still run header detection based on heights even if TOC headlines have been identified
2024-03-28 11:39:34 -06:00
Johannes Zillmann
0dc47329ef Move core module to root 2024-03-26 10:52:54 -06:00
Johannes Zillmann
c4679238cd Improve list detection
* Add ‘ ‘ on compact lines when line starts with list character
* Add – as list character
* rename functions.jsx to stringFunctions.jsx
2017-03-28 09:00:21 +02:00
Johannes Zillmann
09facb09b4 WIP Introduce word/wordType/lineItem
* Way to do the markdown transformation of inline formats (bold, italic, link, footnote, etc..) at the end and not in the middle
* Introduce StashingStream as a helper
2017-03-27 07:34:58 +02:00
Johannes Zillmann
e144d6a6d5 [WIP] stabilized formatting 2017-03-24 13:31:56 +01:00
Johannes Zillmann
dbd9d8bf5f [WIP] find not found TOC-Headers by size 2017-03-15 08:42:46 +01:00
Johannes Zillmann
5caf8154db [WIP] Simplify code/quote detection 2017-03-14 10:30:21 +01:00
Johannes Zillmann
c6f592d3fc [WIP] Simplify list detection 2017-03-11 13:42:09 +01:00
Johannes Zillmann
6f69566e98 [WIP] TOC headline parsing 2017-03-07 18:43:43 +01:00
Johannes Zillmann
1b326a9f36 Headline to upper case transformation
* Add testing capability (mocha, chai)
* Add MarkdownElement to text item
2017-02-05 21:22:42 +01:00