Commit Graph

104 Commits

Author SHA1 Message Date
Johannes Zillmann
78f44a0ad9 Base markdown converter on blocks 2024-04-02 19:57:53 -06:00
Johannes Zillmann
182dd34c46 Detect lists & blocks 2024-04-02 16:23:19 -06:00
Johannes Zillmann
b5f3075bdf Clean up types
- merge `ItemType`/`BlockType` to `TextType`
- fix bug with duplicate and flattened types
2024-04-02 11:18:55 -06:00
Johannes Zillmann
5ab8730a4b Improve markdown rendering 2024-03-28 11:56:16 -06:00
Johannes Zillmann
55ae236928 Improve header detection
- fix tests
- still run header detection based on heights even if TOC headlines have been identified
2024-03-28 11:39:34 -06:00
Johannes Zillmann
0dc47329ef Move core module to root 2024-03-26 10:52:54 -06:00
Johannes Zillmann
2869b5e5de Upgrade pdfjs from 1.7.246 to 2.0.489
* Required web pack 2+ => updated to 3
* Add web pack-dev-server to be able to test worker.js stuff
2018-04-25 23:52:44 +02:00
Johannes Zillmann
b1bcb5388d Set title empty if its null 2018-04-25 23:47:03 +02:00
Johannes Zillmann
ecde2ea0f5 Prevent headline detection code from detecting headline > 6 2017-05-02 19:26:11 +02:00
Johannes Zillmann
908f0b4be1 Add favicon 2017-05-02 19:19:57 +02:00
Johannes Zillmann
d7ad8088a8 Menu cleanup 2017-05-02 19:19:45 +02:00
Johannes Zillmann
46a965785a Cleanups & readme
* Move line-item transformations to own package
* Have WordFormat names instead of whole enum in globals
* Rename PdfUploadView to UploadView
* Correct license
2017-03-30 07:40:41 +02:00
Johannes Zillmann
5ef6c362b0 Affix for debug panel 2017-03-29 08:15:39 +02:00
Johannes Zillmann
96d4f72889 Fix loading when PDF has non-resolvable fonts
* Sometimes pdf.js gave fontId like Helvetica & Times instead of g_d0_f1, etc…, but those never got resolved through the callback. Now we simply ignore those.
* Also fixed that no fonts could be parsed
2017-03-29 08:11:14 +02:00
Johannes Zillmann
c4c23ac6ee Add Footer & some messaging 2017-03-29 08:08:55 +02:00
Johannes Zillmann
a0c5bb29d6 rename element type to block type 2017-03-28 09:11:00 +02:00
Johannes Zillmann
c4679238cd Improve list detection
* Add ‘ ‘ on compact lines when line starts with list character
* Add – as list character
* rename functions.jsx to stringFunctions.jsx
2017-03-28 09:00:21 +02:00
Johannes Zillmann
106e2bfa8e separate type and format for a word 2017-03-28 08:15:27 +02:00
Johannes Zillmann
9dbc57b4fe [DONE] format words properly 2017-03-28 06:11:42 +02:00
Johannes Zillmann
09facb09b4 WIP Introduce word/wordType/lineItem
* Way to do the markdown transformation of inline formats (bold, italic, link, footnote, etc..) at the end and not in the middle
* Introduce StashingStream as a helper
2017-03-27 07:34:58 +02:00
Johannes Zillmann
fde670e83f [DONE] fix formatting - all functionality restored*
* the formats in code/quote blocks are still disturbing…
2017-03-24 21:06:35 +01:00
Johannes Zillmann
e144d6a6d5 [WIP] stabilized formatting 2017-03-24 13:31:56 +01:00
Johannes Zillmann
10cc7cf0ab [WIP] first draft complete formats transformation 2017-03-24 12:30:35 +01:00
Johannes Zillmann
81518a857b [WIP] don’t make paragraph bolds to headline 2017-03-24 08:06:54 +01:00
Johannes Zillmann
e19294f35f [WIP] remove old stuff 2017-03-24 08:05:59 +01:00
Johannes Zillmann
bd7d9bc0e9 [WIP] Switch order of Debug & Result view 2017-03-24 07:08:54 +01:00
Johannes Zillmann
d927b45087 [WIP] use fontMap to map fonts to formats 2017-03-22 20:08:34 +01:00
Johannes Zillmann
b5bb56b647 [WIP] parse metadata & display title 2017-03-22 07:19:21 +01:00
Johannes Zillmann
94c2561717 [WIP] store font-map in appState 2017-03-21 23:12:45 +01:00
Johannes Zillmann
a35ecd28b6 [WIP] add headers for all Uppercase lines 2017-03-20 07:10:43 +01:00
Johannes Zillmann
07e7fbb505 [WIP] Add remove whitespace and detect links again 2017-03-18 08:56:08 +01:00
Johannes Zillmann
4600dc6ee7 [WIP] headline detection for non TOC pdfs 2017-03-16 07:40:57 +01:00
Johannes Zillmann
77576ebd7e [WIP] Headlines for title pages 2017-03-16 07:08:46 +01:00
Johannes Zillmann
1eda51c0b4 [WIP] detect more headlines with already detected heights 2017-03-16 06:52:45 +01:00
Johannes Zillmann
a9b851ceb6 [WIP] robustify TOC headline finding 2017-03-16 06:01:07 +01:00
Johannes Zillmann
dbd9d8bf5f [WIP] find not found TOC-Headers by size 2017-03-15 08:42:46 +01:00
Johannes Zillmann
93f15a38b5 [WIP] move different typed transformations to different folders 2017-03-15 06:09:18 +01:00
Johannes Zillmann
739d20d83b [WIP] Simplify major headline detections 2017-03-15 05:27:47 +01:00
Johannes Zillmann
5caf8154db [WIP] Simplify code/quote detection 2017-03-14 10:30:21 +01:00
Johannes Zillmann
c6f592d3fc [WIP] Simplify list detection 2017-03-11 13:42:09 +01:00
Johannes Zillmann
f8fecc4c1d [WIP] remove MarkdownElement in favor of ElementType enum 2017-03-10 12:39:42 +01:00
Johannes Zillmann
15c5946073 [WIP] remove explicit Footnotes transformation 2017-03-10 12:12:20 +01:00
Johannes Zillmann
68e3fd7a9f [WIP] change gather blocks transformation to new system 2017-03-10 12:10:58 +01:00
Johannes Zillmann
bd4c207ae3 [WIP] detect TOC on text items, not on blocks 2017-03-10 09:52:29 +01:00
Johannes Zillmann
e2481bdd2a [WIP] Compact Lines
* Almost every transformer first combines the lines, so we can make it an explicit one time transformation in the beginning
2017-03-10 08:49:40 +01:00
Johannes Zillmann
e2ddf0312b [WIP] move unused stuff in separate folder 2017-03-10 06:30:18 +01:00
Johannes Zillmann
111124fbf3 [WIP] Cleanup page / item handling 2017-03-07 21:59:15 +01:00
Johannes Zillmann
6f69566e98 [WIP] TOC headline parsing 2017-03-07 18:43:43 +01:00
Johannes Zillmann
c9352d8396 [WIP] improve TOC parsing 2017-03-07 18:43:31 +01:00
Johannes Zillmann
1fcd08f6d5 [WIP] small fixes 2017-02-27 21:19:29 +01:00