Commit Graph

23 Commits

Author SHA1 Message Date
Johannes Zillmann
7abafc61e7 Improve word boundary detection
- sometimes a word is provided with multiple items. E.g: "T his is a sen tence"
- use x-axis distance to not put whitespaces in the middle of a word
- also tweak the line detection a bit (for Alice)
2024-05-20 00:22:24 -06:00
Johannes Zillmann
c531dba632 Detect code blocks
- with some inconveniences... e.g
  - only code blocks (same as in previous version)
  - split across pages
2024-04-15 22:24:44 -06:00
Johannes Zillmann
b529dfa0a2 Detect Links
- Still needs a proper place since this is on `word` basis
2024-04-15 08:20:18 -06:00
Johannes Zillmann
3fa91a5d1e FontStyle detection
- what is missing is combining subsequent equal elements
2024-04-15 07:55:55 -06:00
Johannes Zillmann
fab5d4649c List Levels
- no tests for this... need to revise the test infrastructure and the transformation which is modifying the item contents directly
2024-04-05 12:06:21 -06:00
Johannes Zillmann
182dd34c46 Detect lists & blocks 2024-04-02 16:23:19 -06:00
Johannes Zillmann
55ae236928 Improve header detection
- fix tests
- still run header detection based on heights even if TOC headlines have been identified
2024-03-28 11:39:34 -06:00
Johannes Zillmann
7f5f4d7071 Add DetectHeaders transformation
- This is mainly code from 2 years ago (was in the stash)
- The tests were green but failing now because of recent changes
- Plan is to first move all files to the root to then be able to debug the tests better
2024-03-26 10:23:15 -06:00
Johannes Zillmann
616909481a Don't print globals twice 2021-07-18 14:13:38 -06:00
Johannes Zillmann
46234417ad Fine tune line detection
* Before lines where assembled that really separate lines
2021-07-18 13:07:06 -06:00
Johannes Zillmann
e261583c65 Improve TOC headline detection 2021-04-27 08:29:00 +02:00
Johannes Zillmann
94a7405671 Lookup and verify toc links 2021-04-25 14:41:50 +02:00
Johannes Zillmann
28c2b1a6a6 Have types instead of type 2021-04-18 16:23:52 +02:00
Johannes Zillmann
5b611cd506 Rename TocDetection to DetectToc 2021-04-18 15:31:45 +02:00
Johannes Zillmann
ce6c9fe977 Initial TOC detection 2021-04-12 08:09:30 +02:00
Johannes Zillmann
a427806f68 Move width & height after x & y 2021-04-11 18:28:53 +02:00
Johannes Zillmann
d8fb3e0b24 Rename CalculateCoordinate to Unwrap... cause thats what its really is 2021-03-31 10:08:05 +02:00
Johannes Zillmann
898af7bbc8 Fix previous commit and re-use page mapping 2021-03-29 07:24:20 +02:00
Johannes Zillmann
388e8cc6b1 Find page mapping during statistics calculation 2021-03-28 23:45:26 +02:00
Johannes Zillmann
89d4bbd2f9 Cover globals in tests 2021-03-28 10:58:24 +02:00
Johannes Zillmann
f42358d63b Remove empty items 2021-03-16 05:50:57 +01:00
Johannes Zillmann
9bd5043f2e Very basic removal of repetitive elements 2021-03-14 12:15:37 +01:00
Johannes Zillmann
60596e7416 #24 Add first external PDFs for testing 2021-03-13 22:53:54 +01:00