A PDF to Markdown converter
Go to file
Johannes Zillmann 7abafc61e7 Improve word boundary detection
- sometimes a word is provided with multiple items. E.g: "T his is a sen tence"
- use x-axis distance to not put whitespaces in the middle of a word
- also tweak the line detection a bit (for Alice)
2024-05-20 00:22:24 -06:00
docs Release 0.1.1 2017-05-02 19:30:05 +02:00
examples Improve word boundary detection 2024-05-20 00:22:24 -06:00
oldSrc Move core module to root 2024-03-26 10:52:54 -06:00
patches Move core module to root 2024-03-26 10:52:54 -06:00
src Improve word boundary detection 2024-05-20 00:22:24 -06:00
test Add Markdown comparison tests 2024-04-21 09:15:46 -06:00
.eslintrc.js Move core module to root 2024-03-26 10:52:54 -06:00
.gitignore Move core module to root 2024-03-26 10:52:54 -06:00
.prettierrc Move core module to root 2024-03-26 10:52:54 -06:00
jest.config.js Fix jest setup deprecation 2024-04-02 18:19:20 -06:00
KNOWN_ISSUES.md Improve word boundary detection 2024-05-20 00:22:24 -06:00
LICENSE Add License 2017-03-30 06:55:38 +02:00
package-lock.json Move core module to root 2024-03-26 10:52:54 -06:00
package.json Move core module to root 2024-03-26 10:52:54 -06:00
README.md Improve header detection 2024-03-28 11:39:34 -06:00
tsconfig.json Improve header detection 2024-03-28 11:39:34 -06:00

PDF-To-Markdown Converter

Javascript library to convert PDF files into Markdown text. Online at http://pdf2md.morethan.io.

Major Changes

  • Apr 2017 - 0.1: Initial Release

Use

//TBD

Contribute

Use the issue tracker and/or open pull requests!

Useful Build Commands

  • npm install Download all necessary npm packages
  • npm test Run the tests
  • npm test -- --verbose=false './test/Files\.test\.ts' -t "Alice-In-Wonderland.pdf" Run specific test
  • npm run test-write Run the tests and persist possibly new changes on the example file results
  • npm run lint Lint the javascript files
  • npm run format Run the prettier formatter
  • npm run build Compile the typescript files to the lib folder

Release

//TBD

Test Release locally and use in other projects

  • npm link in the core project
  • npm link pdf-to-markdown-core in the target project

Credits

pdf.js - Mozilla's PDF parsing & rendering platform which is used as a raw parser