Johannes Zillmann
7abafc61e7
Improve word boundary detection
...
- sometimes a word is provided with multiple items. E.g: "T his is a sen tence"
- use x-axis distance to not put whitespaces in the middle of a word
- also tweak the line detection a bit (for Alice)
2024-05-20 00:22:24 -06:00
Johannes Zillmann
c952409c0e
Fix duplicate headline bug
2024-05-17 12:22:34 -06:00
Johannes Zillmann
491e8c549a
Correct test expectations
...
- Adapt the converter minimally
2024-04-22 07:39:30 -06:00
Johannes Zillmann
f0f9a3b896
Fix double lines
2024-04-21 09:26:12 -06:00
Johannes Zillmann
78db114632
Add Markdown comparison tests
...
- Convert the `example PDFs` with the old `pdf-to-markdown` and write them to text files
- Compare the text files with the conversion of the current code
- Next:
- Improve the current code to match good conversions of the old code
- Adapt the text files in case the current conversion is better than the old
- Current tests are breaking
2024-04-21 09:15:46 -06:00
Johannes Zillmann
c531dba632
Detect code blocks
...
- with some inconveniences... e.g
- only code blocks (same as in previous version)
- split across pages
2024-04-15 22:24:44 -06:00
Johannes Zillmann
b529dfa0a2
Detect Links
...
- Still needs a proper place since this is on `word` basis
2024-04-15 08:20:18 -06:00
Johannes Zillmann
3fa91a5d1e
FontStyle detection
...
- what is missing is combining subsequent equal elements
2024-04-15 07:55:55 -06:00
Johannes Zillmann
5db815076b
Render footnotes
2024-04-09 08:25:59 -06:00
Johannes Zillmann
5daa8aa45a
Detect Footnotes
...
- not yet converted in MD
- detection should be same as old version
2024-04-09 08:25:27 -06:00
Johannes Zillmann
fab5d4649c
List Levels
...
- no tests for this... need to revise the test infrastructure and the transformation which is modifying the item contents directly
2024-04-05 12:06:21 -06:00
Johannes Zillmann
78f44a0ad9
Base markdown converter on blocks
2024-04-02 19:57:53 -06:00
Johannes Zillmann
16e5a62951
Fix jest setup deprecation
2024-04-02 18:19:20 -06:00
Johannes Zillmann
182dd34c46
Detect lists & blocks
2024-04-02 16:23:19 -06:00
Johannes Zillmann
b5f3075bdf
Clean up types
...
- merge `ItemType`/`BlockType` to `TextType`
- fix bug with duplicate and flattened types
2024-04-02 11:18:55 -06:00
Johannes Zillmann
3c31c12768
Update known issues
2024-03-28 12:03:49 -06:00
Johannes Zillmann
5ab8730a4b
Improve markdown rendering
2024-03-28 11:56:16 -06:00
Johannes Zillmann
55ae236928
Improve header detection
...
- fix tests
- still run header detection based on heights even if TOC headlines have been identified
2024-03-28 11:39:34 -06:00
Johannes Zillmann
0dc47329ef
Move core module to root
2024-03-26 10:52:54 -06:00
Johannes Zillmann
7f5f4d7071
Add DetectHeaders transformation
...
- This is mainly code from 2 years ago (was in the stash)
- The tests were green but failing now because of recent changes
- Plan is to first move all files to the root to then be able to debug the tests better
2024-03-26 10:23:15 -06:00
Johannes Zillmann
125d4f3079
Simplify PDF pipeline a bit
2024-03-25 16:36:58 -06:00
Johannes Zillmann
075639979e
Add sparse support for final convert
2024-03-25 16:36:24 -06:00
Johannes Zillmann
5bf4988da2
Move from TsLint to EsLint
...
- Fix some error's but still not green (good enough for now)
2024-03-20 09:31:09 -06:00
Johannes Zillmann
c696806a0e
Update typescript
2024-03-19 18:08:18 -06:00
Johannes Zillmann
e56d70c599
Remove UI package
2024-03-19 16:50:38 -06:00
Johannes Zillmann
02c2fd04fe
DetectToc
removes TOC items and marks headlines
2021-07-19 10:15:59 -06:00
Johannes Zillmann
d223e8a790
Move types
to front
2021-07-18 14:25:55 -06:00
Johannes Zillmann
616909481a
Don't print globals twice
2021-07-18 14:13:38 -06:00
Johannes Zillmann
46234417ad
Fine tune line detection
...
* Before lines where assembled that really separate lines
2021-07-18 13:07:06 -06:00
Johannes Zillmann
e261583c65
Improve TOC headline detection
2021-04-27 08:29:00 +02:00
Johannes Zillmann
94a7405671
Lookup and verify toc links
2021-04-25 14:41:50 +02:00
Johannes Zillmann
25f23ee0e4
Fix tests
2021-04-25 08:28:48 +02:00
Johannes Zillmann
19a76d6163
Publish TOC as global (rudimentary)
2021-04-25 08:15:10 +02:00
Johannes Zillmann
f7bf4d95b3
Page selection popup: from grid to flexbox
2021-04-24 00:24:07 +02:00
Johannes Zillmann
28c2b1a6a6
Have types
instead of type
2021-04-18 16:23:52 +02:00
Johannes Zillmann
5b611cd506
Rename TocDetection to DetectToc
2021-04-18 15:31:45 +02:00
Johannes Zillmann
6ad8d2daa9
UI cosmetics
2021-04-18 12:19:23 +02:00
Johannes Zillmann
5365667314
Reviewing new PDFs
2021-04-18 11:56:42 +02:00
Johannes Zillmann
243736ea0a
Fix typos
2021-04-18 11:38:34 +02:00
Johannes Zillmann
baa5b4fadc
Add 6 more test PDFs
2021-04-18 11:34:11 +02:00
Johannes Zillmann
a1ea24cc3a
Improved TOC detection
...
- Restrict pages before numbered line
2021-04-18 10:05:34 +02:00
Johannes Zillmann
ce6c9fe977
Initial TOC detection
2021-04-12 08:09:30 +02:00
Johannes Zillmann
bf81416925
Fix: take last stage from local storage
2021-04-11 18:38:36 +02:00
Johannes Zillmann
9fcb431a64
Make item#uuid required (just set blank for tests)
2021-04-11 18:30:32 +02:00
Johannes Zillmann
a427806f68
Move width & height after x & y
2021-04-11 18:28:53 +02:00
Johannes Zillmann
932a79a3e9
Add known issues
2021-04-11 09:08:45 +02:00
Johannes Zillmann
de85337f47
remove unused imports
2021-04-10 20:25:41 +02:00
Johannes Zillmann
8e9642c9ed
Support bootstrapping URL, debug and stage from URL params
2021-04-10 20:25:23 +02:00
Johannes Zillmann
c8cfbebb92
Disable no used locals (for now)
2021-04-10 09:18:46 +02:00
Johannes Zillmann
ddac96299d
Fix not used locals
2021-04-10 09:18:46 +02:00