Johannes Zillmann
|
02c2fd04fe
|
DetectToc removes TOC items and marks headlines
|
2021-07-19 10:15:59 -06:00 |
|
Johannes Zillmann
|
d223e8a790
|
Move types to front
|
2021-07-18 14:25:55 -06:00 |
|
Johannes Zillmann
|
616909481a
|
Don't print globals twice
|
2021-07-18 14:13:38 -06:00 |
|
Johannes Zillmann
|
46234417ad
|
Fine tune line detection
* Before lines where assembled that really separate lines
|
2021-07-18 13:07:06 -06:00 |
|
Johannes Zillmann
|
e261583c65
|
Improve TOC headline detection
|
2021-04-27 08:29:00 +02:00 |
|
Johannes Zillmann
|
94a7405671
|
Lookup and verify toc links
|
2021-04-25 14:41:50 +02:00 |
|
Johannes Zillmann
|
19a76d6163
|
Publish TOC as global (rudimentary)
|
2021-04-25 08:15:10 +02:00 |
|
Johannes Zillmann
|
28c2b1a6a6
|
Have types instead of type
|
2021-04-18 16:23:52 +02:00 |
|
Johannes Zillmann
|
5b611cd506
|
Rename TocDetection to DetectToc
|
2021-04-18 15:31:45 +02:00 |
|
Johannes Zillmann
|
243736ea0a
|
Fix typos
|
2021-04-18 11:38:34 +02:00 |
|
Johannes Zillmann
|
baa5b4fadc
|
Add 6 more test PDFs
|
2021-04-18 11:34:11 +02:00 |
|
Johannes Zillmann
|
a1ea24cc3a
|
Improved TOC detection
- Restrict pages before numbered line
|
2021-04-18 10:05:34 +02:00 |
|
Johannes Zillmann
|
ce6c9fe977
|
Initial TOC detection
|
2021-04-12 08:09:30 +02:00 |
|
Johannes Zillmann
|
a427806f68
|
Move width & height after x & y
|
2021-04-11 18:28:53 +02:00 |
|
Johannes Zillmann
|
642509a454
|
Refine repetitive character removal
|
2021-04-02 22:33:12 +02:00 |
|
Johannes Zillmann
|
6283ab7a96
|
Track evaluation score (optionally)
Makes it easier to see how a value got classified
|
2021-04-01 18:16:42 +02:00 |
|
Johannes Zillmann
|
d8fb3e0b24
|
Rename CalculateCoordinate to Unwrap... cause thats what its really is
|
2021-03-31 10:08:05 +02:00 |
|
Johannes Zillmann
|
71ef84153c
|
Show page labels + default mapping to 1
|
2021-03-29 08:47:04 +02:00 |
|
Johannes Zillmann
|
898af7bbc8
|
Fix previous commit and re-use page mapping
|
2021-03-29 07:24:20 +02:00 |
|
Johannes Zillmann
|
388e8cc6b1
|
Find page mapping during statistics calculation
|
2021-03-28 23:45:26 +02:00 |
|
Johannes Zillmann
|
89d4bbd2f9
|
Cover globals in tests
|
2021-03-28 10:58:24 +02:00 |
|
Johannes Zillmann
|
d7d3502a25
|
Fix processing pdfs with no page numbers
|
2021-03-28 10:21:26 +02:00 |
|
Johannes Zillmann
|
21106d7e5e
|
Lower min score since accuracy has increased
|
2021-03-26 09:02:31 +01:00 |
|
Johannes Zillmann
|
0b096faa0c
|
More accurate page number detection
|
2021-03-26 08:42:31 +01:00 |
|
Johannes Zillmann
|
4340acb758
|
Simplify code
|
2021-03-24 23:08:36 +01:00 |
|
Johannes Zillmann
|
4d1821f584
|
Qualify lines for removal based on multiple scores
|
2021-03-23 08:08:13 +01:00 |
|
Johannes Zillmann
|
c98145a63c
|
Test for remote PDFS
|
2021-03-22 09:03:26 +01:00 |
|
Johannes Zillmann
|
68c4d9a4a3
|
Consolidate repetitive element eviction
* Solely rely on neighbour similarity
* Cut out `y` in the middle
|
2021-03-16 07:02:31 +01:00 |
|
Johannes Zillmann
|
f42358d63b
|
Remove empty items
|
2021-03-16 05:50:57 +01:00 |
|
Johannes Zillmann
|
5af033c0f1
|
Round and limit y
|
2021-03-15 20:37:41 +01:00 |
|
Johannes Zillmann
|
a90e6207dc
|
Add similarity checks to repetitive element removal
|
2021-03-15 09:16:50 +01:00 |
|
Johannes Zillmann
|
9bd5043f2e
|
Very basic removal of repetitive elements
|
2021-03-14 12:15:37 +01:00 |
|
Johannes Zillmann
|
8e024ee544
|
Fix layout
|
2021-03-13 22:57:49 +01:00 |
|
Johannes Zillmann
|
60596e7416
|
#24 Add first external PDFs for testing
|
2021-03-13 22:53:54 +01:00 |
|
Johannes Zillmann
|
db86552965
|
Fix tests
|
2021-03-13 22:50:02 +01:00 |
|
Johannes Zillmann
|
713a82b41d
|
Stabilize font display in tests
* If multiple PDF are tested after another their font ids change (e.g. `g_d0_f1` becomes `g_d1_f1`)
|
2021-03-13 19:38:47 +01:00 |
|
Johannes Zillmann
|
417cc2ab94
|
Add Test infrastructure for example PDFs
|
2021-03-13 08:46:22 +01:00 |
|
Johannes Zillmann
|
ef0bd7ebbe
|
Add example files
|
2017-03-29 08:17:14 +02:00 |
|