Commit Graph

175 Commits

Author SHA1 Message Date
Johannes Zillmann
4600dc6ee7 [WIP] headline detection for non TOC pdfs 2017-03-16 07:40:57 +01:00
Johannes Zillmann
77576ebd7e [WIP] Headlines for title pages 2017-03-16 07:08:46 +01:00
Johannes Zillmann
1eda51c0b4 [WIP] detect more headlines with already detected heights 2017-03-16 06:52:45 +01:00
Johannes Zillmann
a9b851ceb6 [WIP] robustify TOC headline finding 2017-03-16 06:01:07 +01:00
Johannes Zillmann
dbd9d8bf5f [WIP] find not found TOC-Headers by size 2017-03-15 08:42:46 +01:00
Johannes Zillmann
93f15a38b5 [WIP] move different typed transformations to different folders 2017-03-15 06:09:18 +01:00
Johannes Zillmann
739d20d83b [WIP] Simplify major headline detections 2017-03-15 05:27:47 +01:00
Johannes Zillmann
5caf8154db [WIP] Simplify code/quote detection 2017-03-14 10:30:21 +01:00
Johannes Zillmann
c6f592d3fc [WIP] Simplify list detection 2017-03-11 13:42:09 +01:00
Johannes Zillmann
f8fecc4c1d [WIP] remove MarkdownElement in favor of ElementType enum 2017-03-10 12:39:42 +01:00
Johannes Zillmann
15c5946073 [WIP] remove explicit Footnotes transformation 2017-03-10 12:12:20 +01:00
Johannes Zillmann
68e3fd7a9f [WIP] change gather blocks transformation to new system 2017-03-10 12:10:58 +01:00
Johannes Zillmann
bd4c207ae3 [WIP] detect TOC on text items, not on blocks 2017-03-10 09:52:29 +01:00
Johannes Zillmann
e2481bdd2a [WIP] Compact Lines
* Almost every transformer first combines the lines, so we can make it an explicit one time transformation in the beginning
2017-03-10 08:49:40 +01:00
Johannes Zillmann
e2ddf0312b [WIP] move unused stuff in separate folder 2017-03-10 06:30:18 +01:00
Johannes Zillmann
111124fbf3 [WIP] Cleanup page / item handling 2017-03-07 21:59:15 +01:00
Johannes Zillmann
6f69566e98 [WIP] TOC headline parsing 2017-03-07 18:43:43 +01:00
Johannes Zillmann
c9352d8396 [WIP] improve TOC parsing 2017-03-07 18:43:31 +01:00
Johannes Zillmann
1fcd08f6d5 [WIP] small fixes 2017-02-27 21:19:29 +01:00
Johannes Zillmann
5827379d1b WIP footer detection 2017-02-22 23:18:49 +01:00
Johannes Zillmann
b7db48af4b WIP globalize display of globals and summary/messages 2017-02-21 08:05:00 +01:00
Johannes Zillmann
62fd0155ed WIP Proper footnote link detection 2017-02-20 21:58:37 +01:00
Johannes Zillmann
a3b6a26437 WIP add detect Lists function 2017-02-19 14:23:35 +01:00
Johannes Zillmann
edfa76b033 WIP fix bugs 2017-02-19 11:05:41 +01:00
Johannes Zillmann
2783d724e5 WIP initial TOC detection 2017-02-19 10:20:14 +01:00
Johannes Zillmann
bed3fd357b WIP merge successive code blocks 2017-02-18 12:33:21 +01:00
Johannes Zillmann
e7ff939351 WIP markdown formatting for code/quote 2017-02-18 11:46:13 +01:00
Johannes Zillmann
f93d1e4aa1 WIP initial quote/code detector with new TextItemCombiner 2017-02-18 10:50:54 +01:00
Johannes Zillmann
d78d9be8a3 WIP rename splitIntoBlocks to DetectPdfBlocks 2017-02-17 20:19:57 +01:00
Johannes Zillmann
767462bc9b WIP Introduce PdfBlockView
* Add vertical to horizontal transformation
* Improve header/footer removal
2017-02-17 20:17:04 +01:00
Johannes Zillmann
a92e384249 Calculate most used distance
* round coordinates on construction
2017-02-17 09:01:12 +01:00
Johannes Zillmann
b7393fc806 Detect bold and emphasis 2017-02-17 08:16:27 +01:00
Johannes Zillmann
6441580889 Add global statistics 2017-02-15 07:33:07 +01:00
Johannes Zillmann
a76dac6428 Summary for detect footnotes 2017-02-15 07:11:26 +01:00
Johannes Zillmann
55506576f5 Pimp up transformation pipeline with ParseResult object 2017-02-15 07:03:44 +01:00
Johannes Zillmann
c08105ecaf Show pdf item text in pre only for the whitespace transformation 2017-02-14 22:03:58 +01:00
Johannes Zillmann
41bc2f6c34 Move pageView construction into Transformer 2017-02-14 21:47:54 +01:00
Johannes Zillmann
92a4337387 Add font info to pdf page view 2017-02-14 20:28:01 +01:00
Johannes Zillmann
ab5705cd27 combine on Y with variation of 1 (instead of being strict) 2017-02-14 20:24:01 +01:00
Johannes Zillmann
a1222544bb Remove unused class 2017-02-14 20:23:22 +01:00
Johannes Zillmann
3a1241896b Replace text with block system 2017-02-12 19:37:21 +01:00
Johannes Zillmann
1ca9fa4362 Outsource annotation definitions 2017-02-11 15:42:30 +01:00
Johannes Zillmann
996e5fae62 Detect Links & Remove Whitespaces 2017-02-11 15:23:01 +01:00
Johannes Zillmann
fc0aafebdd Render pdf items as pre elements to see duplicate whitespaces 2017-02-11 15:13:45 +01:00
Johannes Zillmann
f0491af073 nice fonts 2017-02-11 15:02:13 +01:00
Johannes Zillmann
b31ad64fb7 Introduce Result View 2017-02-06 19:13:43 +01:00
Johannes Zillmann
b7634423cc Add Markdown view 2017-02-06 17:13:41 +01:00
Johannes Zillmann
0a6242b944 update dependencies 2017-02-05 23:21:36 +01:00
Johannes Zillmann
1b326a9f36 Headline to upper case transformation
* Add testing capability (mocha, chai)
* Add MarkdownElement to text item
2017-02-05 21:22:42 +01:00
Johannes Zillmann
0245ea16f1 Add not perfect headline detection 2017-02-05 09:58:25 +01:00