Commit Graph

50 Commits

Author SHA1 Message Date
02c2fd04fe DetectToc removes TOC items and marks headlines 2021-07-19 10:15:59 -06:00
d223e8a790 Move types to front 2021-07-18 14:25:55 -06:00
616909481a Don't print globals twice 2021-07-18 14:13:38 -06:00
e261583c65 Improve TOC headline detection 2021-04-27 08:29:00 +02:00
94a7405671 Lookup and verify toc links 2021-04-25 14:41:50 +02:00
25f23ee0e4 Fix tests 2021-04-25 08:28:48 +02:00
19a76d6163 Publish TOC as global (rudimentary) 2021-04-25 08:15:10 +02:00
5b611cd506 Rename TocDetection to DetectToc 2021-04-18 15:31:45 +02:00
a1ea24cc3a Improved TOC detection
- Restrict pages before numbered line
2021-04-18 10:05:34 +02:00
ce6c9fe977 Initial TOC detection 2021-04-12 08:09:30 +02:00
ddac96299d Fix not used locals 2021-04-10 09:18:46 +02:00
6283ab7a96 Track evaluation score (optionally)
Makes it easier to see how a value got classified
2021-04-01 18:16:42 +02:00
487b304c15 Move EvaluationIndex to debug package
(Not a 100% correct but somewhat more pleasing)
2021-03-31 10:00:21 +02:00
ce6d9f8984 Package refactoring: Move globals to root 2021-03-29 08:57:05 +02:00
71ef84153c Show page labels + default mapping to 1 2021-03-29 08:47:04 +02:00
898af7bbc8 Fix previous commit and re-use page mapping 2021-03-29 07:24:20 +02:00
388e8cc6b1 Find page mapping during statistics calculation 2021-03-28 23:45:26 +02:00
89d4bbd2f9 Cover globals in tests 2021-03-28 10:58:24 +02:00
d7d3502a25 Fix processing pdfs with no page numbers 2021-03-28 10:21:26 +02:00
202da9b005 Globals propagation infrastructure 2021-03-27 09:35:18 +01:00
0b096faa0c More accurate page number detection 2021-03-26 08:42:31 +01:00
4c77274d16 Fix tests 2021-03-23 08:46:14 +01:00
c98145a63c Test for remote PDFS 2021-03-22 09:03:26 +01:00
f5a180113d No unused locals 2021-03-21 08:39:42 +01:00
5af033c0f1 Round and limit y 2021-03-15 20:37:41 +01:00
a90e6207dc Add similarity checks to repetitive element removal 2021-03-15 09:16:50 +01:00
9bd5043f2e Very basic removal of repetitive elements 2021-03-14 12:15:37 +01:00
77b7d837eb Improve change detection to handle removal case properly 2021-03-14 11:59:46 +01:00
d5523fb1d4 Split result files
* Due 100 MB limit of Github
2021-03-13 22:46:10 +01:00
713a82b41d Stabilize font display in tests
* If multiple PDF are tested after another their font ids change (e.g. `g_d0_f1` becomes `g_d1_f1`)
2021-03-13 19:38:47 +01:00
417cc2ab94 Add Test infrastructure for example PDFs 2021-03-13 08:46:22 +01:00
45355a9315 PageControls 2021-03-09 08:44:06 +01:00
c60bd3f737 Un-Grouping switch 2021-03-01 23:42:02 +01:00
e7574513c5 Change detection on group and item level 2021-02-28 02:07:45 +01:00
229cb53eb0 Make LineItemMerger standalone and re-usable 2021-02-27 18:45:14 +01:00
cd8cdf4df6 Highlight changes 2021-02-27 09:51:04 +01:00
915827be0c Sort line items on X axis 2021-02-26 21:42:26 +01:00
08509953dc Fix line compaction for multi-columnar PDFs 2021-02-26 19:28:44 +01:00
6e5e5c9d53 Improve line compaction 2021-02-26 18:04:50 +01:00
0910f7b148 Grouping of line items 2021-02-21 13:23:31 +01:00
d8bc6d100b Cleanup & simple line detection 2021-02-21 08:23:51 +01:00
1b530c6c29 Fetch fontObjects 2021-02-20 13:10:13 +01:00
6c72d61590 Annotated schema for debug 2021-02-14 11:43:26 +01:00
698562ab27 Implement CalculateCoordinates + simplify schema transformation 2021-02-13 11:09:34 +01:00
4401f1fb5c Rudimentary explicit debug support 2021-02-05 18:28:04 +01:00
8783e3cf9e Flexible Debug table foundation 2021-01-28 23:06:37 +01:00
ee7d686ba6 Progress Infrastructure 2021-01-12 22:54:22 +01:00
42f54e6b38 Load Example PDF 2021-01-06 22:40:09 +01:00
a3695a4a56 Initial pdfJs integration 2020-12-20 19:18:38 +01:00
f988bd565e Core project setup
* `ts`, `jest`, `prettier` and `tslint`
* Used resources
  * https://itnext.io/step-by-step-building-and-publishing-an-npm-typescript-package-44fe7164964c
  * https://til.hashrocket.com/posts/lmnsdtce3y-import-absolute-paths-in-typescript-jest-tests
2020-12-19 14:06:40 +01:00