Commit Graph

19 Commits

Author SHA1 Message Date
Johannes Zillmann
388e8cc6b1 Find page mapping during statistics calculation 2021-03-28 23:45:26 +02:00
Johannes Zillmann
89d4bbd2f9 Cover globals in tests 2021-03-28 10:58:24 +02:00
Johannes Zillmann
d7d3502a25 Fix processing pdfs with no page numbers 2021-03-28 10:21:26 +02:00
Johannes Zillmann
21106d7e5e Lower min score since accuracy has increased 2021-03-26 09:02:31 +01:00
Johannes Zillmann
0b096faa0c More accurate page number detection 2021-03-26 08:42:31 +01:00
Johannes Zillmann
4340acb758 Simplify code 2021-03-24 23:08:36 +01:00
Johannes Zillmann
4d1821f584 Qualify lines for removal based on multiple scores 2021-03-23 08:08:13 +01:00
Johannes Zillmann
c98145a63c Test for remote PDFS 2021-03-22 09:03:26 +01:00
Johannes Zillmann
68c4d9a4a3 Consolidate repetitive element eviction
* Solely rely on neighbour similarity
* Cut out `y` in the middle
2021-03-16 07:02:31 +01:00
Johannes Zillmann
f42358d63b Remove empty items 2021-03-16 05:50:57 +01:00
Johannes Zillmann
5af033c0f1 Round and limit y 2021-03-15 20:37:41 +01:00
Johannes Zillmann
a90e6207dc Add similarity checks to repetitive element removal 2021-03-15 09:16:50 +01:00
Johannes Zillmann
9bd5043f2e Very basic removal of repetitive elements 2021-03-14 12:15:37 +01:00
Johannes Zillmann
8e024ee544 Fix layout 2021-03-13 22:57:49 +01:00
Johannes Zillmann
60596e7416 #24 Add first external PDFs for testing 2021-03-13 22:53:54 +01:00
Johannes Zillmann
db86552965 Fix tests 2021-03-13 22:50:02 +01:00
Johannes Zillmann
713a82b41d Stabilize font display in tests
* If multiple PDF are tested after another their font ids change (e.g. `g_d0_f1` becomes `g_d1_f1`)
2021-03-13 19:38:47 +01:00
Johannes Zillmann
417cc2ab94 Add Test infrastructure for example PDFs 2021-03-13 08:46:22 +01:00
Johannes Zillmann
ef0bd7ebbe Add example files 2017-03-29 08:17:14 +02:00