Add Markdown comparison tests

- Convert the `example PDFs` with the old `pdf-to-markdown` and write them to text files
- Compare the text files with the conversion of the current code
- Next:
  - Improve the current code to match good conversions of the old code
  - Adapt the text files in case the current conversion is better than the old
- Current tests are breaking
This commit is contained in:
Johannes Zillmann 2024-04-21 09:15:46 -06:00
parent c531dba632
commit 78db114632
24 changed files with 205836 additions and 12 deletions

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

1572
examples/CC-NC_Leitfaden.md Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,340 @@
```
Developed by Aalborg University, Denmark and Alfred Wegener Institute, Germany
```
## Creative Commons Attribution-ShareAlike 4.
# International Public License Agreement of siMPle
# Software for the automated detection of microplastic
```
By exercising the Licensed Rights (defined below), You accept and agree to be bound by
the terms and conditions of this Creative Commons Attribution-ShareAlike 4.
International Public License ("Public License"). To the extent this Public License may be
interpreted as a contract, You are granted the Licensed Rights in consideration of Your
acceptance of these terms and conditions, and the Licensor grants You such rights in
consideration of benefits the Licensor receives from making the Licensed Material
available under these terms and conditions.
```
```
Section 1 Definitions.
```
a. Adapted Material means material subject to Copyright and Similar Rights that is derived
from or based upon the Licensed Material and in which the Licensed Material is translated,
altered, arranged, transformed, or otherwise modified in a manner requiring permission
under the Copyright and Similar Rights held by the Licensor. For purposes of this Public
License, where the Licensed Material is a musical work, performance, or sound recording,
Adapted Material is always produced where the Licensed Material is synched in timed
relation with a moving image.
b. Adapter's License means the license You apply to Your Copyright and Similar Rights in
Your contributions to Adapted Material in accordance with the terms and conditions of
this Public License.
c. BY-SA Compatible License means a license listed
at creativecommons.org/compatiblelicenses, approved by Creative Commons as
essentially the equivalent of this Public License.
d. Copyright and Similar Rights means copyright and/or similar rights closely related to
copyright including, without limitation, performance, broadcast, sound recording, and Sui
Generis Database Rights, without regard to how the rights are labeled or categorized. For
purposes of this Public License, the rights specified in Section 2(b)(1)-(2) are not Copyright
and Similar Rights.
e. Effective Technological Measures means those measures that, in the absence of proper
authority, may not be circumvented under laws fulfilling obligations under Article 11 of
the WIPO Copyright Treaty adopted on December 20, 1996, and/or similar international
agreements.
f. Exceptions and Limitations means fair use, fair dealing, and/or any other exception or
limitation to Copyright and Similar Rights that applies to Your use of the Licensed Material.
```
Developed by Aalborg University, Denmark and Alfred Wegener Institute, Germany
```
g. License Elements means the license attributes listed in the name of a Creative Commons
Public License. The License Elements of this Public License are Attribution and ShareAlike.
h. Licensed Material means the artistic or literary work, database, or other material to which
the Licensor applied this Public License.
i. Licensed Rights means the rights granted to You subject to the terms and conditions of
this Public License, which are limited to all Copyright and Similar Rights that apply to Your
use of the Licensed Material and that the Licensor has authority to license.
j. Licensor means the individual(s) or entity(ies) granting rights under this Public License.
k. Share means to provide material to the public by any means or process that requires
permission under the Licensed Rights, such as reproduction, public display, public
performance, distribution, dissemination, communication, or importation, and to make
material available to the public including in ways that members of the public may access
the material from a place and at a time individually chosen by them.
l. Sui Generis Database Rights means rights other than copyright resulting from Directive
96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal
protection of databases, as amended and/or succeeded, as well as other essentially
equivalent rights anywhere in the world.
m. You means the individual or entity exercising the Licensed Rights under this Public
License. Your has a corresponding meaning.
```
Section 2 Scope.
```
a. License grant.
1. Subject to the terms and conditions of this Public License, the Licensor hereby grants You
a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to
exercise the Licensed Rights in the Licensed Material to:
A. reproduce and Share the Licensed Material, in whole or in part; and
B. produce, reproduce, and Share Adapted Material.
2. Exceptions and Limitations. For the avoidance of doubt, where Exceptions and Limitations
apply to Your use, this Public License does not apply, and You do not need to comply with
its terms and conditions.
3. Term. The term of this Public License is specified in Section 6(a).
4. Media and formats; technical modifications allowed. The Licensor authorizes You to
exercise the Licensed Rights in all media and formats whether now known or hereafter
created, and to make technical modifications necessary to do so. The Licensor waives
and/or agrees not to assert any right or authority to forbid You from making technical
modifications necessary to exercise the Licensed Rights, including technical modifications
```
Developed by Aalborg University, Denmark and Alfred Wegener Institute, Germany
```
```
necessary to circumvent Effective Technological Measures. For purposes of this Public
License, simply making modifications authorized by this Section 2(a)(4) never produces
Adapted Material.
```
5. Downstream recipients.
A. Offer from the Licensor Licensed Material. Every recipient of the Licensed Material
automatically receives an offer from the Licensor to exercise the Licensed Rights under the
terms and conditions of this Public License.
B. Additional offer from the Licensor Adapted Material. Every recipient of Adapted Material
from You automatically receives an offer from the Licensor to exercise the Licensed Rights
in the Adapted Material under the conditions of the Adapters License You apply.
C. No downstream restrictions. You may not offer or impose any additional or different terms
or conditions on, or apply any Effective Technological Measures to, the Licensed Material
if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed
Material.
6. No endorsement. Nothing in this Public License constitutes or may be construed as
permission to assert or imply that You are, or that Your use of the Licensed Material is,
connected with, or sponsored, endorsed, or granted official status by, the Licensor or
others designated to receive attribution as provided in Section 3(a)(1)(A)(i).
b. Other rights.
1. Moral rights, such as the right of integrity, are not licensed under this Public License, nor
are publicity, privacy, and/or other similar personality rights; however, to the extent
possible, the Licensor waives and/or agrees not to assert any such rights held by the
Licensor to the limited extent necessary to allow You to exercise the Licensed Rights, but
not otherwise.
2. Patent and trademark rights are not licensed under this Public License.
3. To the extent possible, the Licensor waives any right to collect royalties from You for the
exercise of the Licensed Rights, whether directly or through a collecting society under any
voluntary or waivable statutory or compulsory licensing scheme. In all other cases the
Licensor expressly reserves any right to collect such royalties.
```
Section 3 License Conditions.
```
```
Your exercise of the Licensed Rights is expressly made subject to the following conditions.
```
a. Attribution.
1. If You Share the Licensed Material (including in modified form), You must:
```
Developed by Aalborg University, Denmark and Alfred Wegener Institute, Germany
```
```
A. retain the following if it is supplied by the Licensor with the Licensed Material:
i. identification of the creator(s) of the Licensed Material and any others designated to
receive attribution, in any reasonable manner requested by the Licensor (including by
pseudonym if designated);
ii. a copyright notice;
```
iii. a notice that refers to this Public License;
iv. a notice that refers to the disclaimer of warranties;
```
v. a URI or hyperlink to the Licensed Material to the extent reasonably practicable
```
```
Citing siMPle in academic papers:
```
- Primpke, S., A. Dias, P., Gerdts, G., Anal. Methods 11, 2138 2147. (2019)
- Liu, F., Olesen, K.B., Borregaard, A.R., Vollertsen, J., Sci. Total Environ. 671. (2019)
- Raman database: Cabernard, L.; Roscher, L.; Lorenz, C.; Gerdts, G.; Primpke, S., Environmental Science
& Technology 52 (22), 13279- 13288 (2018)
```
B. indicate if You modified the Licensed Material and retain an indication of any previous
modifications; and
C. indicate the Licensed Material is licensed under this Public License, and include the text
of, or the URI or hyperlink to, this Public License.
```
2. You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the
medium, means, and context in which You Share the Licensed Material. For example, it
may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource
that includes the required information.
3. If requested by the Licensor, You must remove any of the information required by
Section 3(a)(1)(A) to the extent reasonably practicable.
b. ShareAlike.
```
In addition to the conditions in Section 3(a), if You Share Adapted Material You produce,
the following conditions also apply.
```
1. The Adapters License You apply must be a Creative Commons license with the same
License Elements, this version or later, or a BY-SA Compatible License.
2. You must include the text of, or the URI or hyperlink to, the Adapter's License You apply.
You may satisfy this condition in any reasonable manner based on the medium, means,
and context in which You Share Adapted Material.
3. You may not offer or impose any additional or different terms or conditions on, or apply
any Effective Technological Measures to, Adapted Material that restrict exercise of the
rights granted under the Adapter's License You apply.
```
Developed by Aalborg University, Denmark and Alfred Wegener Institute, Germany
```
```
Section 4 Sui Generis Database Rights.
```
```
Where the Licensed Rights include Sui Generis Database Rights that apply to Your use of
the Licensed Material:
```
a. for the avoidance of doubt, Section 2(a)(1) grants You the right to extract, reuse,
reproduce, and Share all or a substantial portion of the contents of the database;
b. if You include all or a substantial portion of the database contents in a database in which
You have Sui Generis Database Rights, then the database in which You have Sui Generis
Database Rights (but not its individual contents) is Adapted Material, including for
purposes of Section 3(b); and
c. You must comply with the conditions in Section 3(a) if You Share all or a substantial
portion of the contents of the database.
For the avoidance of doubt, this Section 4 supplements and does not replace Your
obligations under this Public License where the Licensed Rights include other Copyright
and Similar Rights.
```
Section 5 Disclaimer of Warranties and Limitation of Liability.
```
a. Unless otherwise separately undertaken by the Licensor, to the extent possible, the
Licensor offers the Licensed Material as-is and as-available, and makes no representations
or warranties of any kind concerning the Licensed Material, whether express, implied,
statutory, or other. This includes, without limitation, warranties of title, merchantability,
fitness for a particular purpose, non-infringement, absence of latent or other defects,
accuracy, or the presence or absence of errors, whether or not known or discoverable.
Where disclaimers of warranties are not allowed in full or in part, this disclaimer may not
apply to You.
b. To the extent possible, in no event will the Licensor be liable to You on any legal theory
(including, without limitation, negligence) or otherwise for any direct, special, indirect,
incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or
damages arising out of this Public License or use of the Licensed Material, even if the
Licensor has been advised of the possibility of such losses, costs, expenses, or damages.
Where a limitation of liability is not allowed in full or in part, this limitation may not apply
to You.
c. The disclaimer of warranties and limitation of liability provided above shall be interpreted
in a manner that, to the extent possible, most closely approximates an absolute disclaimer
and waiver of all liability.
```
Section 6 Term and Termination.
```
```
Developed by Aalborg University, Denmark and Alfred Wegener Institute, Germany
```
a. This Public License applies for the term of the Copyright and Similar Rights licensed here.
However, if You fail to comply with this Public License, then Your rights under this Public
License terminate automatically.
b. Where Your right to use the Licensed Material has terminated under Section 6(a), it
reinstates:
1. automatically as of the date the violation is cured, provided it is cured within 30 days of
Your discovery of the violation; or
2. upon express reinstatement by the Licensor.
```
For the avoidance of doubt, this Section 6(b) does not affect any right the Licensor may
have to seek remedies for Your violations of this Public License.
```
c. For the avoidance of doubt, the Licensor may also offer the Licensed Material under
separate terms or conditions or stop distributing the Licensed Material at any time;
however, doing so will not terminate this Public License.
d. Sections 1 , 5 , 6 , 7 , and 8 survive termination of this Public License.
```
Section 7 Other Terms and Conditions.
```
a. The Licensor shall not be bound by any additional or different terms or conditions
communicated by You unless expressly agreed.
b. Any arrangements, understandings, or agreements regarding the Licensed Material not
stated herein are separate from and independent of the terms and conditions of this
Public License.
```
Section 8 Interpretation.
```
a. For the avoidance of doubt, this Public License does not, and shall not be interpreted to,
reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could
lawfully be made without permission under this Public License.
b. To the extent possible, if any provision of this Public License is deemed unenforceable, it
shall be automatically reformed to the minimum extent necessary to make it enforceable.
If the provision cannot be reformed, it shall be severed from this Public License without
affecting the enforceability of the remaining terms and conditions.
c. No term or condition of this Public License will be waived and no failure to comply
consented to unless expressly agreed to by the Licensor.
```
Developed by Aalborg University, Denmark and Alfred Wegener Institute, Germany
```
d. Nothing in this Public License constitutes or may be interpreted as a limitation upon, or
waiver of, any privileges and immunities that apply to the Licensor or You, including from
the legal processes of any jurisdiction or authority.
```
Creative Commons is not a party to its public licenses. Notwithstanding, Creative
Commons may elect to apply one of its public licenses to material it publishes and in those
instances will be considered the “Licensor.” The text of the Creative Commons public
```
### licenses is dedicated to the public domain under the CC0 Public Domain Dedication.
```
Except for the limited purpose of indicating that material is shared under a Creative
Commons public license or as otherwise permitted by the Creative Commons policies
published at creativecommons.org/policies, Creative Commons does not authorize the
use of the trademark “Creative Commons” or any other trademark or logo of Creative
Commons without its prior written consent including, without limitation, in connection
with any unauthorized modifications to any of its public licenses or any other
arrangements, understandings, or agreements concerning use of licensed material. For
the avoidance of doubt, this paragraph does not form part of the public licenses.
```
```
Creative Commons may be contacted at creativecommons.org.
```

3133
examples/Closed-Syllables.md Normal file

File diff suppressed because it is too large Load Diff

219
examples/ExamplePdf.md Normal file
View File

@ -0,0 +1,219 @@
# Mega Überschrift
## 2te Überschrift
```
Dies ist eine Test-PDF^1.
Fürs Testen des Markdown Parsers.
```
(^1) In Deutsch.
## Paragraphen
Das ist ein Paragraph. Ein einfacher Paragraph mit Schrift in Normalgröße^2. Damit wir _sehen_ wie
sich Zeilenumbrüche verhalten, schreiben wir einfach ein bisschen mehr. So, dass sieht ja jetzt
schon gut aus!
Ohne Zwischenzeile, neu angesetzt.
Mit Zwischenzeile, neu angesetzt.
Und nachfolgend ein etwas längerer Tex:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi laoreet diam nibh, sit amet bibendum
metus tristique vel. Sed neque nulla, lacinia sit amet ex ut, ultrices dictum turpis. Praesent fringilla,
lacus nec lobortis placerat, lorem ipsum convallis nisl, sit amet imperdiet erat arcu id arcu. Aenean
accumsan risus in purus facilisis interdum. Aliquam tincidunt condimentum est, scelerisque
venenatis orci. Fusce neque nibh, dapibus et volutpat sit amet, consectetur ac quam. Sed pharetra
faucibus arcu, at interdum dui ornare ut. Aliquam sodales, magna et euismod congue, ipsum diam
tempus sapien, vel aliquet tortor dolor ut purus. Aenean aliquet ut erat vitae dictum. Fusce eget
ultrices magna. Sed egestas mi nec rutrum iaculis. Phasellus condimentum^3 , urna sit amet sodales
accumsan, lacus risus cursus ipsum, et rhoncus ligula mi et nibh. In consequat a risus a
accumsan. Pellentesque nec lacus sodales eros laoreet pretium non ac erat.
Und jetzt ein kleiner Text im block-format. Das erzeugt schöne doppelte Leerzeichen zwischen
Wörtern. Wenn Markdown zu HTML gerendert wird, fällt das zwar nicht mehr auf. Aber in der puren
Text-Version ist es schon stark sichtbar!
Und jetzt^4 einfach nochmal Text^5 um die Fussnoten in zweistellige Bereiche^6 vorranzutreiben!
(^2) Was immer auch normal ist...
(^3) Nicht zu verwechseln mit condimenta. Meine Lateinkenntnisse sind zwar schon so alt das ich
überhaupt keine Ahnung hab, aber zumindest hab ich jetzt eine mehrzeilige Fussnote!
(^4) Hier & Jetzt!
(^5) Nicht viel mehr als ein Satz.
(^6) Weil dann wird's komplizierter!
## Schriftschnitt
Etwas _kursiv_ ist auch nicht schlecht. **Fett** ist auch interessant. Und was ist mit
**_FETTUNDKURSIV_**?
Interessant wird's wenn _mehrere Wörter hintereinanderweg formatiert_ sind. Und _dann noch über
Zeilenbrüche hinweg_.
Fies könnte es werden mit _abwechselnden_ **Formaten**. Und das ganze dann noch _über_ **mehrere**
_Zeilen_ hinweg.
Und weil es so schön ist, fangen wir jetzt in dieser Zeile mit einem Schriftschnitt, nämlich _kursiv an.
Ziehen es über die gesamte zweite Zeile durch. Ist nicht ganz leicht, aber schaffen wir! Und lassen
es dann Mitte_ der 3ten **Zeile** ausklingen.
Und nun _kursiv_ Und **Fett** Zusammen _Ge_ **Mixt**. Ohne Leerzeichen...
_Eine_ Zeile, die mit kursiv anfing und endet mit **fett.**
Beende die Zeile mit **fett.**
_Kursiv_ ist dann die nachfolgende!
Eine Liste mit unterschiedlich formatierten Wörtern
- Etwas _Kursiv_
- Etwas **Fett**
- Etwas Unterstrichen^7
- Etwas Durchgestrichen
- Und noch ein Link: [http://pdf2md.morethan.io](http://pdf2md.morethan.io)
Ne Zeile die _kursiv endet,
und in ner_ (fast) _komplett lasziven, eh, kursiven Zeile endet._
**Etwas eher unwahrscheinliches. Zeile komplette fett.**
_Zeile komplett kursiv._
**Und wieder fett.**
_Und_ **gemixt**.
_Ein kompletter Absatz in kursiver Schriftform. Was will ich damit erreichen? Ich will es sehen,
einfach nur sehen! Gibt sicher noch andere sehenswerte Sachen im Leben, aber JETZT,
interessiert mich ein kursiver Text Block! ;)_
_Und ein folgender Absatz, auch kursiv!_
_Und ein kursiver Setzt der einen eingeschlossen Link, nämlich [http://pdf2md.morethan.io,](http://pdf2md.morethan.io,) hat._
(^7) Fussnote in einer Liste
## Listen
Nun eine Liste mit dashs:
- Eintrag 1
- Eintrag 2, aber mit so langem Text, das er umbricht. Wirklich, wirklich lang. Breche du Zeile. Na
los. Na endlich. Vielleicht sollt ich das auf 3 Zeilen erweitern? Na ja, schaden kann es ja nicht.
Also los!
- Eintrag 3
Und Untergruppen:
- Eintrag 1
- Sub Eintrag 1.1, aber mit so langem Text, das er umbricht. Wirklich, wirklich lang. Breche du
Zeile. Na los. Na endlich. Vielleicht sollt ich das auf 3 Zeilen erweitern? Na ja, schaden kann
es ja nicht. Also los!
- Sub Eintrag 1.
- Eintrag 2
- Sub Eintrag 2.
Und eine mit bullets:
- Eintrage 1
- Eintrage 2
Gemixt:
- Eintrage 1
- Eintrage 2
Nummerierte Liste:
1. Eins
2. Zwei, aber mit so langem Text, das er umbricht. Wirklich, wirklich lang. Breche du Zeile. Na los.
Na endlich. Vielleicht sollt ich das auf 3 Zeilen erweitern? Na ja, schaden kann es ja nicht. Also
los!
3. Drei
4. Vier. Und auch hier wieder ein etwas längerer Text, so dass der Eintrag über mehrere Zeilen
geht!
Zentrierte Liste:
- Eintrag 1
- Eintrag 2, aber mit so langem Text, das er umbricht. Wirklich, wirklich lang. Breche du Zeile.
Na los. Na endlich. Vielleicht sollt ich das auf 3 Zeilen erweitern? Na ja, schaden kann es
ja nicht. Also los!
- Eintrag 3
Zwei aufeinander folgende Listen:
- Erste 1
- Erste 2
- Zwote 1
- Zwote 2
Liste mit drei Levels:
- Erster Level 1
- Zwoter Level 1.1, aber mit so langem Text, das er umbricht. Wirklich, wirklich lang. Breche du
Zeile. Na los. Na endlich. Vielleicht sollt ich das auf 3 Zeilen erweitern? Na ja, schaden kann
es ja nicht. Also los!
- 3ter Level 1.1.
- 3ter Level 1.1.2, aber mit so langem Text, das er umbricht. Wirklich, wirklich lang. Breche
du Zeile. Na los. Na endlich. Vielleicht sollt ich das auf 3 Zeilen erweitern? Na ja, schaden
kann es ja nicht. Also los!
- Zwoter Level 1.
- Zwoter Level 1.
- 3ter Level 1.3.
- Erster Level 2
Und nun nummeriert mit un-nummerierten Sub-Leveln:
1. Eintrag 1
- Eintrag 1.
- Eintrag 1.
2. Eintrag 2
Und jetzt eine Liste, die übergangslos aus einem zwei-zeiligen Paragraphen folgt. Mal sehen ob
der Parser das sauber trennen kann:
- Eintrag 1
- Eintrag 2
Und danach kommt auch gleicht was.
## Quotes & Spezielle Einschübe^8
Das hier ist wieder ein normaler Absatz. Das interessante ist der nachfolgende Teil, der
eingeschoben ist, gewöhnlicher Weise sowas wie ein Zitat, oder Code, oder sonst was:
```
Wenn ein chaotischer Schreibtisch eine chaotische Denkweise widerspiegelt, welche Denkweise
spiegelt dann ein leerer Schreibtisch wider? - Albert Einstein
```
So, das war ja schonmal ein guter Anfang. Hier noch ein Einzeiler:
```
Phantasie ist wichtiger als Wissen , denn Wissen^9 ist begrenzt. - Albert Einstein^10
```
Und nun mehrere Quotes hintereinander:
```
Die größte Macht hat das richtige Wort zur richtigen Zeit. - Mark Twain
```
```
Der Kuss ist ein liebenswerter Trick der Natur, ein Gespräch zu unterbrechen, wenn Worte
überflüssig werden. - Ingrid Bergman
```
```
Das Schicksal wird schon seine Gründe haben. - Voltaire
```
### Heading 2
abc
### Heading 2 II
(^8) Eine Überschrifts-Fussnote... so was gibts auch!
(^9) Wisse, dass ist eine Fussnote in einem Zitat!
(^10) Der Albert Einstein (Fussnote im Zitat, am Ende der Zeile)

View File

@ -0,0 +1,202 @@
# La vraie température!
## qui permet de laver en toute sécurité
## les masques barrières en tissu
```
avec le soutien de
```
## lF
## a
## sh
## -
## M
## asque
### Etude n° 2
```
Le 26 mars dernier, le guide des spécifi-
cations AFNOR S76-001 recommande
de laver ses masques à 60°C pendant au
moins 30 mn. Comment faire pour ceux et
celles qui nont pas de lave-linge? Létude
n°1, publiée le 29 mars, teste différents
récipients de la vie courante permettant de
maintenir de leau chaude à 60°C pendant
au moins 1/2 heure. Voici létude
```
```
CC $
BY NC ND
```
```
Florence Bost • 06 82 69 89 82 • florence@sablechaud.eu • http://www.sablechaud.eu
```
### Contexte
La période de confinement décrétée par le gouvernement français depuis le mardi 17/03/20 et pro-
longée jusquau 4/05/20 est dans le but de stopper la pandémie du virus COVID-19. Une situation
inédite qui laisse le grand public sans réponse face à des gestes de tous les jours pour faire face à la
pandémie dans des conditions exceptionnelles de confinement.
Le guide des spécifications AFNOR S76-001 sorti en express a pour but daider les industriels, et le
public à confectionner des masques barrières en tissu le plus adéquat possible. Dans cette norme, il
est préconisé de laver pendant 30 mn à 60°C les masques en tissu.
But de lexpérience :
Dans la réalité de tous les jours, plusieurs cas de figures se présentent : les personnes nont pas de
lave-linge (étudiants, célibataires...), les personnes ont un lave-linge mais ne peuvent ou ne veulent
pas lancer une machine que pour les masques, les personnes nont pas de thermomètre à disposi-
tion. Or limmédiateté du lavage est important dans le process de non-propagation et destruction du
virus. Le lavage à la main est donc une alternative intéressante.
Le but de lexpérience dans un premier temps, est de savoir si lon peut réunir les conditions essen-
tielles dune eau à 60°C pendant 30 mm dans un contexte de confinement.
Expérience : mesures de lévolution de la température en fonction du temps dans des récipients
ordinaires à usage familial.
Outils :
- Un thermomètre de cuisine équipé dune sonde (HABOR)
- Une bouilloire électrique à la température non réglable (RUSSEL-HOBS)
- 3 types de récipients sélectionnés : - une casserole inox,
- un saladier en porcelaine
- un récipient plastique (Tupperware)
- Un masque en tissu par récipient
Protocole de mesure :
### ETUDE de décroissance de température deau chaude
### en vue du lavage de masques barrières en tissu
### dans un cadre familial confiné
```
Florence Bost • 06 82 69 89 82 • florence@sablechaud.eu • http://www.sablechaud.eu
```
- Faire bouillir 1,5 l deau dans une bouilloire électrique jusquà son arrêt automatique
- Verser leau dans le récipient
- Prise de mesure de référence
- Relève des mesures toutes les 5 minutes
```
Lexpérience a été réalisée par 2 fois et dans 2 cas de figures : - sans couvercle
```
- avec couvercle (dépose)
Notes :
T° ambiante de la pièce au moment de lexpérience était de 23°C, aucune fenêtre ouverte.
T° de leau chaude sortant du robinet de la cuisine : 49°C
Taux dhumidité ambiant 42%.
Fluide : eau municipale
Les mesures de température sont faites en °C.
Date et lieu : le 29-03-20 - Paris
```
Résultats :
```
```
La température moyenne constatée en référence est de 92 °C pour les récipients inox et plastique
et de 87°C pour la porcelaine. La différence sexplique par le fait que le récipient en porcelaine a une
épaisseur plus épaisse que les autres et absorbe dans un premier temps, plus de chaleur.
```
```
Les résultats les plus convaincants sont dans les prises de mesure «récipient + couvercle».
Tous les matériaux sont performants, avec une légère supériorité pour le plastique.
```
```
En moyenne, il faut 40 mn pour que la T° arrive à la limite de 60°C. La Température minimale de
départ pour tenir 30 mn est denviron 75°C quelque soit le récipient.
```
```
Conclusion
```
- La garantie de conserver la température à 60°C pendant 30 mn comme préconisé dans le guide
de spécifications AFNOR S76-001 pour le lavage manuel des masques barrière en tissu, peut se
faire dans nimporte quel récipient à condition de bien le couvrir.
- Si vous avez une bouilloire électrique modulable, vous pouvez régler la température maximale à
70°C.
### ETUDE (suite)
```
NOTE :
Cette étude a été réalisée dans la cadre particulier du confinement. Lexpérience a été réalisée avec rigueur et reste informa-
tive. Lauteur ne peut être rendue responsable de linterprétation outre mesure des résultats par un tiers.
```
```
mn
```
## Qui sommes-nous?
## Les liens daccès
#### Cette série détudes est à linitiative de Florence Bost - et de son ré-
#### seau - soutenue par le pôle de compétitivité Techtera. Certaines dentre
#### elles ont participé à lélaboration du projet MasKaDom piloté par lIMdR.
#### Chaque étude complète est disponible sur les sites cités ci-dessous,
#### ils ne peuvent être utilisés à des fins commerciales sans lautorisation
#### expresse et écrite des auteurs.
```
Smart textiles designer, Florence Bost est consultante depuis 2003 sous le nom de
Sable Chaud. Elle réalise des cahiers didées, des études prospectives, des prestations
de création e-textiles et démonstrateurs ainsi que des formations professionnelles e-tex-
tiles. Auteur de louvrage «Textiles, innovations et matières actives» édité chez Eyrolles,
elle est aussi experte française AFNOR pour la commission de normalisation européenne
smart textiles (CEN/TC 248-WG31). Les études sur les masques ont été accompagnées
par les expertises complémentaires de :
Mattias Ganem
Ingénieur textile, IFM, actuellement chef de projet R&D et Développement durable.
Jean-Baptiste Chot-Plassot
Ingénieur généraliste, IFM, actuellement ingénieur projet innovation - Mode & Textile
```
```
Certaines illustrations ont été réalisées par la styliste et illustratrice Virginie Boy.
```
```
Techtera est le pôle de compétitivité dédié à la filière textile française soutenu par lEtat,
La Direction Générale de lArmement et les collectivités territoriales. Il anime un réseau de
200 membres (entreprises, laboratoires de recherches, centres techniques, universités et
grandes écoles), avec pour objectif principal de stimuler la compétitivité par linnovation
collaborative. Depuis 2005, Techtera a ainsi accompagné plus de 215 projets R&D finan-
cés pour un budget total de 556,5 millions dEuros, à destination des marchés dapplica-
tion de la santé, des sports et loisirs, du transport, du bâtiment, de la protection et de la
sécurité, de lhabillement et de la décoration.
```
```
Techtera - Actualités
```
```
avec le soutien de
```
## lF
## a
## sh
## -
## M
## asque
```
Sable Chaud - COVID-
```

5329
examples/Grammar-Matters.md Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

12159
examples/Made-with-cc.md Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,552 @@
```
Andrew W. Mellon Foundation
Grant 1711- 05155
December 19 , 2019
```
```
John Kiplinger
Valerie Yaw
```
# The Impact of Open Access
# Latin American Scholarship:
## Digitizing the Backlist of El Colegio de
## Méxicos Press
## WHITE PAPER
In 2018, JSTOR received a grant from the Andrew W. Mellon Foundation to support the
digitization of out-of-print titles from the Dirección de Publicaciones de El Colegio de
México, A.C., as well as the dissemination of those titles on an openly accessible basis.
Throughout the year-and-a-half-long project, we worked in deep collaboration with El
Colegio de México Press to complete this project. This white paper is intended to
document the significance of this work, the process we used to select titles, and what we
have learned so far about the usage of these titles on the JSTOR platform. We hope this
will help to benefit other initiatives interested in increasing access to out-of-print
materials.
Copyright 2019 ITHAKA. This work is licensed under a Creative Commons Attribution-
NonCommercial 4.0 International License. To view a copy of the license, please see
[http://creative-commons.org/licenses/by-nc/4.0/.](http://creative-commons.org/licenses/by-nc/4.0/.)
ITHAKA is interested in disseminating this paper as widely as possible. Please contact us
with any questions about using the report at support@jstor.org.
_This project was made possible by The Andrew W. Mellon Foundation. Any views or
recommendations expressed in this paper do not necessarily represent those of The
Andrew W. Mellon Foundation._
_The Dirección de Publicaciones de El Colegio de México, A.C. was established in 1938.
It offers a catalog of more than 2,400 titles and nine academic journals across the
humanities and social sciences._
_JSTOR, a service of the not-for-profit organization ITHAKA, collaborates with the
academic community to help libraries connect students and faculty to vital content
while lowering costs and increasing shelf space; provides independent researchers with
free and low-cost access to scholarship; and helps publishers reach new audiences and
preserve their content for future generations._
JSTOR gratefully acknowledges the contributions and cooperation of the following:
1. Gabriela Said Reyes, Director, Dirección de Publicaciones de El Colegio de
México, A.C.
2. Ninel Salcedo Romero, former Director of Marketing, Dirección de
Publicaciones de El Colegio de México, A.C.
3. Brian Connaughton, Área de Historia Regional y Comparada, Departamento de
Filosofia, Universidad Autónoma Metropolitana
4. Robert Darnton, Carl H. Pforzheimer University Professor and University
Librarian, Emeritus, Harvard University
5. Gilbert Joseph, Farnam Professor of History & International Studies, Yale
University; Past President, Latin American Studies Association
6. Herbert S. Klein, Gouverneur Morris Professor Emeritus of History, Columbia
University; former Director of the Center for Latin American Studies and Professor
of History at Stanford University; Research Scholar & Latin American Curator,
Hoover Institution, Stanford University
7. Jocelyn Olcott, Associate Professor, History and Gender, Sexuality & Feminist
Studies, Duke University
8. William B. Taylor, Muriel McKevitt Sonne Professor of Latin American History,
Emeritus, University of California, Berkeley
9. Pardha Karamsetty, President, Content & Media Solutions, Apex CoVantage; CEO,
Apex CoVantage India
10. Prabhanjan Mattam, Project Manager, Apex CoVantage
**Summary**
In 2018, JSTOR received a grant from the Andrew W. Mellon Foundation to support a
collaboration with the Dirección de Publicaciones de El Colegio de México, A.C., the
press of El Colegio de México, a graduate research institution in Mexico City^1. This grant
enabled JSTOR to digitize nearly 700 books from the presss backlist in the humanities
and humanistic social sciences, and make these books freely and openly available on the
JSTOR online platform.
The goal of this project was to digitize and make openly accessible scholarship from the
backlist of El Colegio de Mexicos Press that would be of significant value to students and
researchers in a range of humanities disciplines.
The work on this project proceeded in three phases, including a preparation and
selection process, in which JSTOR worked with experts in the field to determine which
books would be digitized; a digitization and ingest phase resulting in the books being
hosted openly on JSTOR; and an analysis phase, in which JSTOR sought to develop a
better understanding of the impact that foreign-language materials can have when
hosted on a global platform.
This project brought together Colmexs rich scholarly backlist with JSTORs experience
managing retrospective digitization projects and helping to increase the impact of
academic content by making that content easy to find and use online. Colmex and
JSTOR have collaborated over the past several years to make Colmexs frontlist books
available to readers around the world through JSTOR.org. In this project, we sought to
build on that collaboration by making a selection of books from the Presss backlist
available in digital form for the first time. In this white paper, we document our process
for selection and digitization of books and provide a high-level analysis of usage of the
content on the JSTOR platform.
**Introduction: History, Context, and**
**Significance of the Collection**
The press of El Colegio de México has published a body of important scholarship over
the course of the last eight decades.
(^1) Throughout this paper, we generally refer to Dirección de Publicaciones de El Colegio de México, A.C. simply as El
Colegio de México or by its common name “Colmex.”
The press was established in 1938 in Mexico City. It attracted a group of pathbreaking
scholars in the humanities and social sciences, and Colmexs press—one of the earliest
scholarly publishers in Latin America—provided an outlet for their work, which
foregrounded some of the ongoing lines of inquiry in Mexican and Latin American
studies, including scholarship on migration to and from Mexico, the interplay between
church and state in Latin America, and womens rights.
The universitys press published its first title in 1938 and continued to publish significant
work throughout its history. The list of the press spans disciplines in the humanities and
qualitative social sciences, with special emphases on history, sociology, literary criticism,
and political science. For the most part, the books focus on Mexican and Latin American
contexts.
In addition to a robust books program, the press of El Colegio de Mexico publishes seven
journals, including _Historia Mexicana,_ arguably the leading journal of Mexican
historical studies. Over time, the press has also been an important outlet for making
foreign-language writing available in Mexico: as one example, its journal _Diálogos_ was
the first to publish Milan Kundera's work in Spanish for a Mexican audience.
Since 2013, Colmexs press has published some of its new books in digital form and
distributed them through digital scholarly platforms, including JSTOR. Like many
established scholarly presses, Colmex licenses access to its frontlist titles to university
libraries to help sustain its ongoing publishing program. However, much of Colmexs
backlist was out of print and the press had never digitized it due to limited funding. In
todays increasingly digital landscape, the lack of electronic copies of this important body
of scholarly created, in essence, a barrier to accessing those titles.
This project sought to overcome this barrier and make these books discoverable and
accessible for free by a worldwide audience. As noted in the Summary, El Colegio de
México and JSTOR have collaborated over the past several years to make Colmex's
frontlist books available to readers around the world through JSTOR. In this project, we
built on that collaboration, bringing together Colmex's rich scholarly backlist with
JSTOR's experience managing retrospective digitization projects and helping to increase
the usage of academic content by making that content easy to find and use online.
JSTOR has seen high usage and impact for both archival journals and for backlist
monographs; in fact, two thirds of ebook usage on JSTOR is for titles published at least
three years earlier.
**Our Approach: Selection and Digitization**
JSTOR digitized nearly 700 titles, or almost 50% of the presss backlist. Significantly,
none of Colmexs backlist titles were previously available digitally. For every book made
available through this project, each page was scanned and OCR processed, and
accompanying book and chapter-level metadata was captured to make the books fully
searchable, discoverable, and usable for scholars and teachers.
Selection
We asked a group of scholar-advisors to help us assess the broader significance of
Colmex's list in Mexican and Latin American Studies by drawing our attention to books
that are noteworthy and that should be highlighted in outreach about the project to
scholars, librarians, students, and general readers.
Our scholar-advisors assisted with the selection process mainly in two ways. First, they
gave us high-level guidance to inform our strategic sense of the collections value. One
advisor wrote to us that the press's list “[provides] studies of the economic, social,
demographic, and political history of Mexico unparalleled by any other publisher.”
Several of the scholars also noted the broad discipline coverage of Colmex's list; while we
expected that the bulk of the books would be of greatest interest to historians, another
advisor wrote to us that “[s]ociologists, economists, demographers, linguists and
students of literature, geographers, and historians will all benefit by achieving the digital
availability of these works.” It is worth noting, as some of our advisors did, that the Press
also has a strong list in Asian studies, and the set of titles that we digitized through this
project includes books from that area. While the inclusion of these titles may initially
seem like an odd fit for a project that focuses for the most part on Mexican and Latin
American studies titles, the press's list in Asian studies reflects a critical aspect of the
Mexican academy's global engagement. Colmex's Center of Asian and African Studies is,
as one adviser noted, “the only functioning center on Asian studies in Latin America,”
and Colmex's press, picking up on this strength, has become “the major publisher of
studies of Asian history in Spanish.” To the extent that this digitization project is meant
in part to reflect the strengths and disciplinary breadth of Colmex's backlist, it seemed
important to include these titles in the project.
Second, while acknowledging the overall value of Colmex's backlist, our advisors also
directed us to particular titles that have become classics in their field. For example, some
of these titles include Silvio Zavala's multi-volume _El servicio personal de los indios en
la Nueva España,_ a study of labor and slavery in the 16th to 18th centuries; books and
edited volumes by Andrés Lira on Spanish exiles in Latin America after the Spanish Civil
War; and _Los bienes de la Iglesia en México,_ a study of the conflict between church and
state in the 1800s.
Of particular note among the books we digitized is the _Historia general de Mexico,_ a
multi-volume work completed in the 1970s and edited by the Colmex historian Daniel
Cosío Villegas. This work covers the range of Mexico's history from the dawn of human
habitation. As one reviewer in a scholarly journal noted, Cosío Villegas had a
longstanding interest in reaching non-academic audiences, and so the scholars who
penned essays for the _Historia general_ were asked to write such that a general audience
could read the work. Thus, one project advisor wrote, the volumes are well suited to
“students at the high school and university level as well as to adult readers who give
them the time and attention they deserve.” Despite the essays being shaped for a non-
academic audience, one of our advisors noted that the _Historia general_ remains “the
standard general history [of Mexico] used by all scholars.”
With this guidance in mind, the list of books we digitized resulted from a winnowing
process, the stages of which are outlined below^1 :
```
(1) At the start of this project, Colmex had the necessary permissions to digitize and
make freely available in digital form a significant number of titles in their backlist,
in many cases because the author was a faculty member at Colmex. Given the
sizable expense involved in clearing digital rights, we determined that there was
significant value in focusing our efforts on books that did not require painstaking
rights research. Of the 1,411 titles in the backlist, Colmex's press has distribution
rights for 741.
```
```
(2) This list was then refined to exclude a small number of books that were not
scholarly in nature (e.g., technical guides from the 1990s). We retained in the list,
however, a small number of literary or primary source titles that would be useful
for research and teaching.
```
```
(3) The list was further refined to exclude titles that did not fit well with the
humanities and humanistic social sciences profile^2. For example, books that
focused on environmental policy were considered out of scope for this project.
```
```
(4) Finally, based on cost estimates, we initially aimed to reach a final list of
approximately 600 titles. Given cost constraints, we made the difficult choices,
including moving approximately 40 social science-leaning titles (many in political
```
(^1) It is important to note that the winnowing process was undertaken by the project team with guidance from a set of
scholar-advisors for the project, given that it was not feasible to ask these advisors, who are also full-time faculty
members, to engage in a title-by-title selection process for a list of this size.
(^2) This project was funded through the Mellon Foundations Humanities Open Book Program, which emphasizes out-of-
print humanities books.
```
science) to a B-list. It is important to note that, while these titles were not
included in the starting list for digitization, lower-than-anticipated costs allowed
us to include these titles in our final output. We acknowledge that this initial
selection process was not perfect, but we are pleased with the final outcome since
these books hold value for humanities researchers (especially historians).
```
At the end of our initial selection process, we had an A-list of 611 titles. While the vast
majority of the books on this list were in history, literature, or other humanities fields,
there were also a number of titles that were exceptions. Some titles on the list leaned
more toward the social sciences, including a number of books on public policy. We felt
that it would be appropriate to include them because they would be of interest to
scholars of Mexican and Latin American history. In addition, a handful of titles on the
list (fewer than ten) are literary or primary texts (for example, a Spanish-language
translation of Giambattista Vico's _Scienzia nuova)._
Production
JSTOR's production unit converts over 9 million pages of scholarly journal and book
content per year, of which 2 million includes scanning from print sources. We have
longstanding relationships with several digitization vendors, and we believed that our
experience managing large-scale digitization projects would position us well to
accomplish the digitization of Colmex's backlist books quickly, cost-efficiently, and to a
high quality.
For books, JSTOR normally receives and processes PDFs from publishers. These PDFs
go through automated workflows at JSTORs end as well as processing by a third-party
vendor. This project was different because the source document for each book was a
print version^1 , and one of the required outputs was an ePub for each book. JSTOR
selected one of our current conversion vendors, Apex CoVantage, to handle all vendor
processing for the books in the project. This included scanning of the print copy, return
shipment of the print copy to Colmex, creation of the PDF from the page images, OCR
for creation of searchable full text, metadata capture to JSTOR standard specification for
books, and then creation of an ePub. JSTOR negotiated a per page price of $ 0 .83 that
covered all these tasks. The project covers 684 books.
Colmex sent nine shipments of print books to Apex CoVantages production facility in
Hyderabad, India. The initial batch was shipped mid-April 2018, and the final batch was
shipped early-May 2019. Each shipment contained an average of 76 print books. Apex
conducted non-destructive scanning with each page scanned as 600 dpi bitonal TIFF
(^1) Although JSTOR has scanned a relatively small number of print books outside this project, the bulk of our print scanning
continues to be for journals. However, the same imaging specifications are used regardless of whether they are journal or
book pages.
and grayscale/color content scanned at 300 dpi for RGB TIFF images. We instituted a
discrepancy process wherein Apex reported damaged, missing, or other problematic
pages to JSTOR. JSTOR assessed these reports and, as needed, worked with Colmex,
Harvard University Library, and University of Michigan Library collections to locate and
scan replacement pages from other extant copies. The resulting page scans were then
used in place of the damaged or otherwise unusable pages or to fill gaps where there
were missing pages so that the PDF would represent a complete and intact version of the
print original.
Apex submitted the completed PDFs to JSTOR and shipped the print copies back to
Colmex. JSTORs systems then ingested the PDF as well as spreadsheet-based supply
chain metadata (SCM) provided separately by Colmex. The PDF and SCM were matched
by the system and then were automatically sent to Apex for standard processing, which
consists of OCR as well as book- and chapter-level metadata capture.
As Apex completed the standard processing for each book, they then put the books
through an ePub creation process that, while very familiar to Apex, was new to JSTOR.
The ePubs were created to the EPUB standard version 3.0.1 or higher. Additionally, the
processing agreed upon between Apex and JSTOR ensured functionality such as links
from footnote anchors in the text block to the footnotes themselves. However, features
such as tables were captured as images rather than as HTML. During both the standard
processing and ePub creation, Apex occasionally raised metadata capture queries that
were reviewed and resolved by JSTORs metadata librarian team of Karen Aufdemberge,
Emily Betwee, and Rachel Ross, thus ensuring a higher and more consistent quality for
the metadata.
Apex grouped the ePub, the PDF, and the book- and chapter-level metadata XML files
into a zip file for delivery to JSTOR. JSTOR systems then ingested the zip file and ran
quality control scripts across the files to ensure that they adhered to our specifications.
For the initial batch of books, we also conducted a limited amount of manual quality
control reviews of the metadata and of the ePub. To accommodate the ingest of these zip
files, however, our content management systems staff had to update the JSTOR software
to recognize and accept the different directory structure and files that were present (i.e.,
the directory containing the ePub as well as the ePub file itself) but had not been present
in previous book deliverables from our vendors.
Furthermore, downstream systems for our content delivery platform had to be updated
to recognize and appropriately route and make available the ePub file. JSTOR opted to
treat the ePub in a manner similar to that of supplementary materials. The ePub is
available as a downloadable file via a clickable “Download EPUB” button at the top of the
page for each book in the project. Otherwise, the book is treated in a similar manner to
any other Open Access title on JSTOR.
Of the 68 4 books in the project, the first titles became available on the JSTOR site on
September 11, 2018. The most recent releases were on July 19, 2019. There are currently
four books for which processing cannot be completed because the books do not have
ISBN assignments, and the JSTOR systems require an ISBN.
While this project had typical logistical challenges, the challenges that were new to
JSTOR were:
1. The need to send the books to one particular vendor instead of dividing them
equally between our two vendors, which was addressed earlier in these
comments; and
2. The lack of electronic version ISBN (EISBN) assignments for any of the books.
ISBN best practice indicates that an electronic version of a book should have an ISBN
that is distinct from its print version counterpart. In fact, different electronic versions
(e.g., PDF vs. EPUB) can have their own ISBN assignments. However, JSTOR opted to
use a single ISBN assignment to cover both electronic versions of each book. Going into
the project, Colmex did not have EISBN assignments for the books, and 102 of the books
did not have a print version ISBN (PISBN) either. One problem was that, for Mexican-
published works, the ISBN are assigned by a third-party agency, and the turnaround
times for the assignments, particularly for large batches of requests, are unpredictable.
Therefore, to not to jeopardize the overall timeframe for the project, JSTOR opted to use
the PSIBN for any book that had a PISBN assignment. This would allow us to ingest the
supply chain metadata into JSTOR systems and to keep individual book processing
moving beyond the print-scanning stage.
Meanwhile, Colmex would apply for EISBN assignments attempting to prioritize the
assignments for those books that had no ISBN assignment at all. For books that had no
ISBN assignment at all, we could move them back into production post-scanning once
we had the EISBN assignment. For books that had a PISBN assignment, we plan to do a
mass swap of the PISBN for the EISBN once we have all those assignments. At the time
this paper was written, 1 28 books were still awaiting an EISBN assignment, including
four books that have no ISBN assignment at all and that therefore cannot proceed
beyond the scanning stage.
We are currently planning a project to swap the PISBN for the EISBN for those books
where we have the EISBN assignments. We will finish the processing and/or ISBN
swaps for the remaining books when the EISBN assignments are available. If we were to
do a similar digitization project for backlist books, we would certainly investigate the
EISBN situation at the earliest possible stage and work with project partners to secure
EISBN assignments as soon as possible.
**Usage: What Weve Learned So Far**
This project represented not only an opportunity to digitize and make available books
from the publication run of Colmex's list, but also to measure the usage of these books
over time and, ultimately, to understand better the impact that foreign-language
materials can have when hosted on a globally accessed platform.
Our objective in measuring usage was to understand how frequently the Colmex books
are read online as evidenced by generally accepted metrics such as views and downloads
of the chapter files. Additionally, we wanted to understand how this usage compares with
the usage of approximately 4 ,500 openly accessible English-language books hosted on
JSTOR.
JSTOR facilitates the discovery of ebook content in a variety of ways. We offer free
MARC records to libraries through OCLC, and distribute metadata and full text to
discovery services and search engines for indexing. Another important factor in driving
usage is co-locating ebook chapters with journal articles on JSTORs integrated platform,
enabling users to cross-search all types of content at once. For many scholars, JSTOR is a
starting point for research—in fact, our traffic referral data shows that more than 40% of
visits to ebook pages are by users who were already searching and using JSTOR. Faculty
and students are incorporating ebooks into their established research workflows on the
platform. In addition, we promoted the availability of the Colmex titles via a short
animated video in English and Spanish, email campaigns to librarians and faculty in
Latin American studies, announcements shared via JSTOR and Colmexs web and social
media channels, and promotions to members of the Latin American Studies Association,
including advertisements and a presentation at the associations annual conference.
The Colmex titles digitized through this project have been heavily used on JSTOR. The
680 titles made available on JSTOR between September 2018 and July 2019 have been
used a total of 502,134 times through October 28, 2019. Every single title has been used.
The most-used titles are listed below.
**Top ten most-used titles**
```
Title Copyright
year
```
```
Usage through
10/28/
```
```
Historia económica general de México: de la
colonia a nuestros días
```
### 2010 13 , 251
```
Historia general de México: volumen I 1994 9 , 323
```
```
Los intelectuales y el poder en México 1991 5 , 156
```
```
De amicitia et doctrina: homenaje a Martha
Elena Venier
```
### 2007 4 , 785
```
La lingüística en México, 1980- 1996 1998 4 , 564
```
```
Diccionario del español usual en México 1996 4 , 503
```
```
Introducción a la historia de la vida cotidiana 2006 4 , 300
```
```
Historia de la lectura en México 1997 3 , 888
```
```
Cuestiones de teoría sociológica 2005 3 , 659
```
```
Historia general de México: volumen II 1994 3 , 595
```
The data show that there is a broad audience for this scholarship. The titles have been
used in 173 countries and territories. While high levels of usage were recorded in
Spanish-speaking countries, as we expected, usage also occurred in 161 countries and
territories where Spanish is not a national or official language. The map below shows the
countries in which we have recorded usage for the Colmex titles, and the table lists the
ten countries with the highest usage.
## Top ten countries that recorded the most usage
- Country Usage through 10/28/
- Mexico 151 ,
- United States 54 ,
- Colombia 29 ,
- Spain 17 ,
- Argentina 13 ,
- Peru 11 ,
- Chile 9 ,
```
Ecuador 9 , 143
```
```
Costa Rica 4 , 770
```
```
United Kingdom 3 , 580
```
Because JSTOR works with thousands of institutions around the world, we can measure
the usage of these titles at institutions that participate in our services. We recorded usage
of the Colmex titles at 4,285 institutions. This included not only college and universities,
but also community colleges, secondary schools, government and not-for-profit
organizations, and public libraries.
JSTORs ebook program had not previously hosted EPUB files; for this project, we added
the capability for users to download the full book as an EPUB file from the table of
contents page, as well as the standard option to view or download chapter-level PDFs.
There were 19,234 downloads of EPUB files for the Colmex titles through the end of
October 2019—just 3.8% of the total usage of the titles in that timeframe.
This project also gave us the opportunity to compare the usage of Spanish and English-
language titles available on JSTOR. On average, the Colmex ebooks are used 57% as
much as the Open Access titles in English on the platform. While there are other
variables that may affect the level of usage (such as discipline or copyright year), this
figure shows an impressive amount of usage of Spanish-language titles on a primarily
English-language scholarly content site.
Weve also received positive feedback from librarians and scholars regarding the access
to this content. For example, responses to the news on Twitter included praise for the
initiative (“Excelente noticia para @elcolmex y el ámbito académico de México y el
mundo”) and recommendations of specific titles (“Una de las joyas liberadas en acceso
abierto [PDF / EPUB] por el Colmex a través de Jstor es /Los intelectuales y el poder en
México/ (1991) un nutrido volumen colectivo que contiene muy buenas intervenciones,
algunas de ellas referencias obligadas.”)
**Conclusion**
As a result of this project, 68 0 significant works of scholarship (with four more coming
when EISBN assignments are available) that were previously out of print are now
available to anyone who wishes to use them. They are easy to discover and access within
researchers existing digital workflows. The value of these titles is apparent in the strong
usage weve seen over the relatively short period theyve been available: more than half a
million views and downloads across 173 countries. Scholars and students in Latin
America and around the world are enriching their research with this content, and we
have ensured that it will be available to future generations.
In addition, the Mexican government launched a project earlier this year, the Estrategia
Nacional de Lectura, to promote reading and guarantee that books are accessible to the
entire population. The 684 digitized titles will be openly available to the Mexican people
and promoted as part of this project.
This project also built a foundation for continued work on the Open Access
dissemination of Latin American scholarship. JSTOR is currently participating in a pilot
led by the Latin American Research Resources Project (LARRP), a consortium of
research libraries that is funding the Open Access distribution of 200 titles published in
2018 - 2019 by the Latin American Council of Social Sciences (CLACSO). This initiative,
developed and supported by libraries, will test a framework for the sustainable, long-
term stewardship of Open Access scholarly monographs.
We are grateful that the Humanities Open Book program grant funded by The Andrew
W. Mellon Foundation provided the opportunity for JSTOR to partner with El Colegio de
México to make its important scholarship available for researchers around the world to
discover and use. We look forward to continuing to build on what weve achieved
together.

View File

@ -0,0 +1,400 @@
```
{from} THE {New York} SUN, SUNDAY, MARCH 25, 1877.
```
## THE MAN WITHOUT A BODY
```
{by Edward Page Mitchell}
```
On a shelf in the old Arsenal museum, in the
Central Park, in the midst of stuffed
hummingbirds, ermines, silver foxes, and
bright- colored parakeets, there is a ghastly row
of human heads. I pass by the mummied
Peruvian, the Maori chief, and the Flathead
Indian to speak of a Caucasian head which has
had a fascinating interest to me ever since it was
added to the grim collection a little more than a
year ago.
I was struck with the Head when I first saw it.
The pensive intelligence of the features won
me. The face is remarkable, although the nose
is gone, and the nasal fossæ are somewhat the
worse for wear. The eyes are likewise wanting,
but the empty orbs have an expression of their
own. The parchmenty skin is so shriveled that
the teeth show to their roots in the jaws. The
mouth has been much affected by the ravages
of decay, but what mouth there is displays
character. It seems to say: "Barring certain
deficiencies in my anatomy, you behold a man
of parts!" The features of the Head are of the
Teutonic cast, and the skull is the skull of a
philosopher. What particularly attracted my
attention, however, was the vague resemblance
which this dilapidated countenance bore to
some face which had at some time been familiar
to me **—** some face which lingered in my
memory, but which I could not place.
After all, I was not greatly surprised, when I
had known the Head for nearly a year, to see it
acknowledge our acquaintance and express its
appreciation of friendly interest on my part by
deliberately winking at me as I stood before its
glass case.
This was on a Trustees' day, and I was the
only visitor in the hall. The faithful attendant
had gone to enjoy a can of beer with his friend,
the superintendent of the monkeys.
The Head winked a second time, and even
more cordially than before. I gazed upon its
efforts with the critical delight of an anatomist.
I saw the masseter muscle flex beneath the
leathery skin. I saw the play of the buccinators,
and the beautiful lateral movement of the
internal pterygoid. I knew the Head was trying
to speak to me. I noted the convulsive
twitchings of the risorius and the zygomatie
```
major, and knew that it was endeavoring to
smile.
"Here," I thought, "is either a case of vitality
long after decapitation, or, an instance of reflex
action where there is no diastaltic or excitor-
motory system. In either case the phenomenon
is unprecedented, and should be carefully
observed. Besides, the Head is evidently well
disposed toward me." I found a key on my
bunch which opened the glass door.
"Thanks," said the Head. "A breath of fresh
air is quite a treat."
"How do you feel?" I asked politely. "How
does it seem without a body?"
The Head shook itself sadly and sighed. "I
would give," it said, speaking through its
ruined nose, and for obvious reasons using
chest tones sparingly, "I would give both ears
for a single leg. My ambition is principally
ambulatory, and yet I cannot walk. I cannot
even hop or waddle. I would fain travel, roam,
promenade, circulate in the busy paths of men,
but I am chained to this accursed shelf. I am no
better off than these barbarian heads — I, a man
of science! I am compelled to sit here on my
neck and see sandpipers and storks all around
me, with legs and to spare. Look at that infernal
little Oedieneninus Longpipes over there. Look
at that miserable Gray-headed Porphyrio. They
have no brains, no ambition, no yearnings. Yet
they have legs, legs, legs in profusion." He cast
an envious glance across the alcove at the
tantalizing limbs of the birds in question, and
added gloomily, "There isn't even enough of
me to make a hero for one of Wilkie Collins's
novels."
I did not exactly know how to console him in
so delicate a manner, but ventured to hint that
perhaps his condition had its compensations in
immunity from corns and the gout.
"And as to arms," he went on, "there's
another misfortune for you! I am unable to
brush away the flies that get in here — Lord
knows how — in the summertime. I cannot
reach over and cuff that confounded Chinook
mummy that sits there grinning at me like a
jack-in-the-box. I cannot scratch my head or
even blow my nose [his nose!] decently when I
get cold in this thundering draught. As to eating
```
and drinking, I don't care. My soul is wrapped
up in Science. Science is my bride, my divinity.
I worship her footsteps in the past, and hail the
prophecy of her future progress. I **—** "
I had heard these sentiments before. In a flash
I had accounted for the familiar look which had
haunted me ever since I first saw the Head.
"Pardon me," I said, "you are the celebrated
Prof. Dummkopf?"
"That is, or was, my name," he replied, with
dignity.
"And you formerly lived in Boston, where you
carried on scientific experiments of startling
originality. It was you who first discovered how
to photograph smell, how to bottle music, how
to freeze the aurora borealis. It was you who first
applied spectrum analysis to Mind."
"These were some of my minor
achievements," said the Head, sadly nodding
itself **—** " small when compared with my final
invention, the grand discovery which was at the
same time my greatest triumph and my ruin. I
lost my Body in an experiment."
"How was that?" I asked. "I had not heard."
"No," said the Head. "Living alone and
friendless, my disappearance was hardly
noticed. I will tell you **—** "
There was a sound upon the stairway.
"Hush!" cried the Head. "Here comes
somebody. We must not be discovered. You
must dissemble."
I hastily closed the door of the glass case,
locked it just in time to evade the vigilance of
the returning keeper, and dissembled by
pretending to examine, with great interest, Anas
Acuta, or Pin-tailed Duck.
On the next Trustees' day I revisited the
Museum and gave the keeper of the Head a
dollar on the pretense of purchasing
information in regard to the curiosities in his
charge. He made the circuit of the hall with me,
talking volubly all the while.
"That there," he said, as we stood before the
Head, "is a relict of morality presented to the
Museum fifteen months ago. The head of a
notorious murderer gilteened at Paris in the last
century, sir."
I fancied that I saw a slight twitching about
the corners of Prof. Dummkopf **** s mouth and an
almost imperceptible depression of what was
once his left eyelid, but he kept his face
remarkably well under the circumstances. I
```
dismissed my guide with many thanks for his
intelligent services, and, as I had anticipated, he
departed forthwith to invest his easily earned
dollar in beer, leaving me to pursue my
conversation with the Head.
"Think of putting a wooden-headed idiot like
that," said the Professor, after I had opened his
glass prison, "in charge of a portion, however
small, of a man of science — of the inventor of
the Telepomp! Paris! Murderer! Last century,
indeed!" and the Head shook with laughter
until I feared that it would tumble off the shelf.
"You spoke of your invention, the
Telepomp," I suggested.
"Ah, yes," said the Head, simultaneously
recovering its gravity and its center of gravity;
"I promised to tell you how I happen to be a
Man without a Body. You see that some three
or four years ago I discovered the principle of
the transmission of sound by electricity. My
Telephone, as I called it, would have been an
invention of great practical utility if I had been
spared to introduce it to the public. But, alas-"
"Excuse the interruption," I said, "but I must
inform you that somebody else has recently
accomplished the same thing. The Telephone
is a realized fact."
"Have they gone any further?" he eagerly
asked. "Have they discovered the great secret
of the transmission of atoms? In other words,
have they accomplished the Telepomp?"
"I have heard nothing of the kind," I hastened
to assure him, "but what do you mean?"
"Listen," he said. "In the course of my
experiments with the Telephone I became
convinced that the same principle was capable
of indefinite expansion. Matter is made up of
molecules, and molecules, in their turn, are
made up of atoms. The atom, you know, is the
unit of being. The molecules differ according to
the number and the arrangement of their
constituent atoms. Chemical changes are
effected by the dissolution of the atoms in the
molecules and their rearrangements into
molecules of another kind. This dissolution
may be accomplished by chemical affinity or by
a sufficiently strong electric current. Do you
follow me?"
"Perfectly."
"Well, then, following out this line of thought,
I conceived a great idea. There was no reason
why matter could not be telegraphed, or, to be
```
etymologically accurate, 'telepomped.' It was
only necessary to effect at one end of the line the
disintegration of the molecules into atoms, and
to convey the vibrations of the chemical
dissolution by electricity to the other pole,
where a corresponding reconstruction could be
effected from other atoms. As all atoms are
alike, their arrangement into molecules of the
same order, and the arrangement of those
molecules into an organization similar to the
original organization, would be practically a
reproduction of the original. It would be a
materialization **—** not in the sense of the
Spiritualists' cant, but in all the truth and logic
of stern science. Do you still follow me?"
"It is a little misty," I said, "but I think I get
the point. You would telegraph the Idea of the
matter, to use the word Idea in Plato's sense."
"Precisely. A candle flame is the same candle
flame although the burning gas is continually
changing. A wave on the surface of water is the
same wave, although the water composing it is
shifting as it moves. A man is the same man
although there is not an atom in his body which
was there five years before. It is the Form, the
Shape, the Idea, that is essential. The vibrations
that give individuality to matter may be
transmitted to a distance by wire just as readily
as the vibrations that give individuality to
sound. So I constructed an instrument by which
I could pull down matter, so to speak, at the
anode and build it up again on the same plan at
the cathode. This was my Telepomp."
"But in practice **—** how did the Telepomp
work?"
"To perfection! In my rooms on Joy street, in
Boston, I had about five miles of wire. I had no
difficulty in sending simple compounds, such
as quartz, starch, and water, from one room to
another over this five-mile coil. I shall never
forget the joy with which I disintegrated a three-
cent postage stamp in one room and found it
immediately reproduced at the receiving
instrument in another. This success with
inorganic matter emboldened me to attempt the
same thing with a living organism. I caught a
cat **—** a black and yellow cat **—** and I submitted
him to a terrible current from my two-hundred-
cup battery. The cat disappeared in a twinkling.
I hastened to the next room and, to my immense
satisfaction, found Thomas there, alive and
```
purring, although somewhat astonished. It
worked like a charm."
"This is certainly very remarkable."
"Isn't it? After my experiment with the cat, a
gigantic idea took possession of me. If I could
send a feline being, why not send a human
being? If I could transmit a cat five miles by
wire in a flash of electricity, why not transmit a
man to London by Atlantic cable and with equal
despatch? I resolved to strengthen my already
powerful battery and try the experiment. Like a
thorough votary of science, I resolved to try the
experiment on myself.
"I do not like to dwell upon this chapter of my
experience," continued the Head, winking at a
tear which had trickled down on to his cheek
and which I silently wiped away for him with my
own pocket handkerchief. "Suffice it that
I trebled the cups in my battery, stretched my
wire over housetops to my lodgings in Phillips
street, made everything ready, and with a
solemn calmness born of my confidence in the
theory, placed myself in the receiving
instrument of the Telepomp at my Joy street
office. I was as sure that when I made the
connection with the battery I would find myself
in my rooms in Phillips street as I was sure of
my existence. Then I touched the key that let on
the electricity. Alas!"
For some moments my friend was unable to
speak. At last, with an effort, he resumed his
narrative.
"I began to disintegrate at my feet and slowly
disappeared under my own eyes. My legs
melted away, and then my trunk and arms. That
something was wrong, I knew from the
exceeding slowness of my dissolution, but I was
helpless. Then my head went and I lost all
consciousness. According to my theory, my
head, having been the last to disappear, should
have been the first to materialize at the other
end of the wire. The theory was confirmed in
fact. I recovered consciousness. I opened my
eyes in my Phillips street apartments. My chin
was materializing, and with great satisfaction I
saw my neck slowly taking shape. Suddenly,
and about at the third cervical vertebra, the
process stopped. In a flash I knew the reason. I
had forgotten to replenish the cups of my
battery with fresh sulphuric acid, and there was
not electricity enough to materialize the rest of
```
me. I was a Head, but my body was, Lord
knows where!"
I did not attempt to offer consolation. Words
would have been mockery in the presence of
Prof. Dummkopf's grief.
"What matters it about the rest?" he sadly
continued. "The house in Phillips Street was
full of medical students. I suppose that some of
them found my Head, and knowing nothing of
me or of the Telepomp, appropriated it for
purposes of anatomical study. I suppose that
they attempted to preserve it by means of some
arsenical preparation. How badly the work was
done is shown by my defective nose. I suppose
that I drifted from medical student to medical
student, and from anatomical cabinet to
anatomical cabinet until some would-be
humorist presented me to this collection as a
French murderer of the last century. For some
months I knew nothing, and when I recovered
consciousness I found myself here.
"Such," added the Head, with a dry, harsh
laugh, "is the irony of Fate!"
"Is there nothing I can do for you?" I asked,
after a pause.
"Thank you," the Head replied; "I am
tolerably cheerful and resigned. I have lost
pretty much all interest in experimental
Science. I sit here day after day and watch the
objects of zoological, ichthyological,
ethnological, and conchological interest with
which this admirable museum abounds. I don't
know of anything you can do for me.
"Stay," he added, as his gaze fell once more
upon the exasperating legs of the Oedieneninus
Longpipes opposite him. "If there is anything I
do feel the need of, it is out-door exercise.
Couldn't you manage in some way to take me
out for a walk?"
I confess that I was somewhat staggered by
this request, but promised to do what I could.
After some deliberation, I formed a plan, which
was carried out in the following manner:
I returned to the Museum that afternoon just
before the closing hour, and hid myself behind
the mammoth sea cow, or Manatus
Americanus. The attendant, after a cursory
glance through the hall, locked up the building
and departed. Then I came boldly forth and
removed my friend from his shelf. With a piece
of stout twine, I lashed his one or two vertebrae
to the headless vertebrae of a skeleton Moa.
```
This gigantic and extinct bird of New Zealand
is heavy legged, full breasted, tall as a man, and
has huge, sprawling feet. My friend, thus
provided with legs and arms, manifested
extraordinary glee. He walked about, stamped
his big feet, swung his wings, and occasionally
broke forth into an hilarious shuffle. I was
obliged to remind him that he must support the
dignity of the venerable bird whose skeleton he
had borrowed. I despoiled the African lion of his
glass eyes, and inserted them in the empty
orbits of the Head. I gave Prof. Dummkopf a
Fiji war lance for a walking stick, covered him
with a Sioux blanket, and then we issued forth
from the old Arsenal into the fresh night air and
the moonlight, and wandered arm in arm along
the shores of the quiet lake and through the
mazy paths of the Ramble.
```
## {THE END}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

10420
examples/WoodUp.md Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

13903
examples/dict.md Normal file

File diff suppressed because it is too large Load Diff

View File

@ -23,10 +23,7 @@ export default class DetectCodeQuoteBlocks extends ItemTransformer {
const codeBlockItems = new Set<string>();
let foundCodeItems = 0;
groupByPage(inputItems).forEach((pageItems, pageIdx) => {
if (pageIdx === 5) {
console.log(pageItems[0].data['str']);
}
groupByPage(inputItems).forEach((pageItems) => {
const minX = toMinX(pageItems);
groupByBlock(pageItems).forEach((blockItems) => {
if (!blockItems[0].data['types'] && looksLikeCodeBlock(minX, blockItems, mostUsedHeight)) {

View File

@ -20,6 +20,7 @@ import DetectToc, { TOC_GLOBAL } from 'src/transformer/DetectToc';
import DetectHeaders from 'src/transformer/DetectHeaders';
import TOC from 'src/TOC';
import { getText } from 'src/support/items';
import MarkdownConverter from 'src/convert/MarkdownConverter';
pdfjs.GlobalWorkerOptions.workerSrc = `pdfjs-dist/es5/build/pdf.worker.min.js`;
@ -52,6 +53,13 @@ describe.each(files)('Test %p', (file) => {
debug = await pipeline.parse(data, () => {}).then((pc) => pc.debug());
});
test('Compare Markdown', async () => {
const lastStage = debug.stageResult(debug.stageNames.length - 1);
const items = lastStage.itemsCleanedAndUnpacked();
const text = new MarkdownConverter().convert(items);
expect(text).toMatchFile(markdownFilePath(file));
});
test.each(transformers.map((t) => t.name).filter((name) => name !== 'Does nothing'))(
'stage %p',
(transformerName) => {
@ -81,7 +89,9 @@ describe.each(files)('Test %p', (file) => {
try {
chunkedLines.forEach((lines, idx) => {
const transformerResultAsString = lines.join('\n') || '{}';
expect(transformerResultAsString).toMatchFile(matchFilePath(file, transformerName, chunkedLines.length, idx));
expect(transformerResultAsString).toMatchFile(
transformedFilePath(file, transformerName, chunkedLines.length, idx),
);
});
} finally {
stageResult.globals.keys().forEach((globalKey) => {
@ -92,20 +102,31 @@ describe.each(files)('Test %p', (file) => {
);
});
function matchFilePath(pdfFileName: string, transformerName: string, chunkCount = 1, chunkIndex = 0): string {
function transformedFilePath(pdfFileName: string, transformerName: string, chunkCount = 1, chunkIndex = 0): string {
const pdfFileNameWithoutExtension = pdfFileName.substr(0, pdfFileName.length - 4);
const resultFileName = `${transformerName[0].toLowerCase() + transformerName.slice(1).replace(/\s/g, '')}`;
const fileIndex = chunkCount > 1 ? `.${chunkIndex}` : '';
return `${folder}/${pdfFileNameWithoutExtension}/${resultFileName}${fileIndex}.json`;
}
describe('Selective transforms on URL PDFs', () => {
const transformerNames = [new RemoveRepetitiveItems().name, new DetectToc().name, new DetectHeaders().name];
test.each(urls)('URL %p', async (url) => {
const { fileName, data } = download(url);
function markdownFilePath(pdfFileName: string): string {
const pdfFileNameWithoutExtension = pdfFileName.substr(0, pdfFileName.length - 4);
return `${folder}/${pdfFileNameWithoutExtension}.md`;
}
const transformerNames = [new RemoveRepetitiveItems().name, new DetectToc().name, new DetectHeaders().name];
describe.each(urls)('Test URL %p', (url) => {
const { fileName, data } = download(url);
test(`markdown from ${url}`, async () => {
const transformResult = await pipeline.parse(data, () => {}).then((pc) => pc.transform());
const text = transformResult.convert(new MarkdownConverter());
expect(text).toMatchFile(markdownFilePath(fileName));
});
test(`stages from ${url}`, async () => {
const debug = await pipeline.parse(data, () => {}).then((pc) => pc.debug());
const printedGlobals = new Set<string>();
transformerNames.forEach((transformerName) => {
const stageResult = debug.stageResult(debug.stageNames.indexOf(transformerName));
const pages = stageResult.selectPages(true, true);
@ -124,7 +145,7 @@ describe('Selective transforms on URL PDFs', () => {
);
const transformerResultAsString = lines.join('\n') || '{}';
expect(transformerResultAsString).toMatchFile(matchFilePath(fileName, transformerName));
expect(transformerResultAsString).toMatchFile(transformedFilePath(fileName, transformerName));
stageResult.globals.keys().forEach((globalKey) => {
printedGlobals.add(globalKey);