Get Complete Project Material File(s) Now! »
Classes of Informal Elements
Most TEI elements may also be informally classified as belonging to one of the following groupings:
divisions high level, possibly self-nesting, major divisions of texts. These elements populate such classes as model.divLike or model.div1Like, and typically form the largest component units of a text.
chunks elements such as paragraphs and other paragraph-level elements, which can appear directly within texts or within divisions of them, but not 239 Ibidem : (12-14)
Alchemy and Computer: a computational analysis of the Jabirian corpus SECTION I (usually) within other chunks. These elements populate the class model.divPart, either directly or by means of other classes such as model.pLike (paragraph-like elements), model.entryLike, etc.
phrase-level elements: elements such as highlighted phrases, book titles, or editorial corrections which can occur only within chunks, but not between them (and thus cannot appear directly within a division). The TEI also identifies two further groupings derived from these three:
inter-level elements elements such as lists, notes, quotations, etc. which can appear either between chunks (as children of a <div>) or within them; these elements populate the class model.inter. Note that this class is not a superset of the model.phrase and model.divPart classes but rather a distinct grouping of elements which are both chunk-like and phrase-like. However, the Alchemy and Computer: a computational analysis of the Jabirian corpus SECTION I classes model.phrase, model.pLike, and model.inter are all disjoint.
components elements which can appear directly within texts or text divisions; this is a combination of the internal chunk- level elements defined above. These elements populate the class model.common, which is defined as a superset of the classes
model.divPart, model.inter, and (when the dictionary module is included in a schema) model.entryLike.
Regular expressions
The term “regular expression” (also regex, henceforth RE) comes from mathematics and computer science theory, and it refers to a specific trait of mathematical expressions: regularity. Their syntax was conceived in the 1950s by Kleene243 as tool of automata theory to describe formal languages; the original RE were mostly applied to mathematic theoretical environments, but they have been implemented by Thompson244 first, and then by others like Sipser245 so that they could be used in programming languages and frameworks like Python. RE, used with skill, can simplify many programming and text processing tasks, and allow processes that wouldn’t be at all possible without the regular expressions. A RE is a formally structured text string for describing complex search patterns. These patterns are then applied to longer text units in order to look for matches. RE are usually used for: searching, matching, replacing and splitting a text.
Tadbīr al-iksīr al-aʿẓam272
The result of Lory’s editorial work of Lory is a unified version of different manuscripts. The variations are listed in the note apparatus, and, as said before, this work decided not to take notes into consideration, trusting the editor choices.
Nevertheless, the presence of notes has been rendered with a tag as follows, in order to be able, in the future, to incorporate them in the digitalization.
Table of contents :
Acknowledgements
Transliteration table
INTRODUCTION
1.1 State-of-the-art
1.2 Why digitalize?
1.3 Structure of this thesis
SECTION I Historical remarks and methodology
CHAPTER I
1.1 Sciences in the Medieval Arab World
1.2 Alchemy, χημεία and kīmiyā
1.2.1 Etymology
1.2.2 Greek origins?
1.3 Features of the Egyptian, Greek and byzantine sources .
CHAPTER II
2.1 Jābir Ibn Ḥayyān
2.1.1 Religious and historical issues : Kraus’ position
2.1.2 The querelle about Jabir’s existence
2.1.3 Conclusions?
2.2 The Jabirian corpus and its peculiarities
2.2.1 Multi-layering (Polysemy)
2.2.2 Transcriptions, transliterations and translations
2.3 Synonyms
2.2.4 Quotes
2.2.5 Hypertextuality
2.3 Edited manuscripts
CHAPTER III
3.1 Arabic alchemy and digital humanities
3.2 Choices
3.3 Tools
3.3.1 XML language
3.3.2 Elements and characteristics of an XML document .
3.3.3 XML Editor
3.4 TEI and the corpus
3.4.1 Attribute Classes
3.4.2 Model Classes
3.4.3 Classes of Informal Elements
3.5 Python
3.5.1 Regular expressions
SECTION II Digitalization: a compromise process
CHAPTER I
1.1 The TEI compliant digitalization
1.1.1 Corpus: a specific methodology
1.1.2 How to encode books: choices and divisions
1.2 Tagging the texts and their peculiarities
1.2.1 Alchemical jargon
1.2.2 Synonyms
1.2.3 Loans, calques
1.2.4 Polysemy
1.2.5 Quotes
1.2.6 Hypertext
CHAPTER II
2.1 Book Analysis
2.1.1 Kitāb al-Aḫjār
2.1.2 Tadbīr al-iksīr al-aʿẓam
2.1.3 Muḫtar rasā’il
CHAPTER III
3.1 Lemmatization : choices, difficulties, resolutions
3.1.1 First attempts in lemmatization: Stanford parser and Buckwalter analyzer
3.1.2 AlKhalil lemmatizer
3.2 Outputs
CONCLUSIONS
BIBLIOGRAPHY