American library books Β» Language & Literature Β» Corpus CusyA by Marie de Sade, Dr. Olaf Hoffmann, Authors β€šA, B, C, D, E, F, Hβ€˜ (classic romance novels txt) πŸ“•

Read book online Β«Corpus CusyA by Marie de Sade, Dr. Olaf Hoffmann, Authors β€šA, B, C, D, E, F, Hβ€˜ (classic romance novels txt) πŸ“•Β».   Author   -   Marie de Sade, Dr. Olaf Hoffmann, Authors β€šA, B, C, D, E, F, Hβ€˜



1 2 3 4 5
Go to page:
differs from the others, but this is not particularly surprising. In the statistics verses were treated like paragraphs. In this work, each verse consists of four lines that are notated without a final punctuation mark. So formally one verse counts as one sentence. In poetry, stanza lines typically have a shorter structure than prose sentences in paragraphs, but this does not apply to this count.

Sentence Length in Glyphs: Comparison of Distributions

The following graphic shows a comparison of the length of sentences in glyphs in the works of the language CusyA. Common features, differences and characteristics of the different distributions are clearly recognisable.

The works have relatively broad distributions, but the form is quite similar and uniform. As expected, the poetic work B clearly differs from the others, even more so the collection Z. The poetic work B is particularly conspicuous by its high probability for the value 11. As it is counted, this is therefore a strophic line with 11 syllables. This again corresponds with the typical length of words including markers to a sentence with a simple subject, predicate, object.

Sentences per Paragraph: Comparison of Distributions

The following graphic shows a comparison of the distributions for sentences per paragraph for the CusyA language works in question. The similarities, differences and characteristics of the different distributions are also clearly recognisable here.

The deviation of the poetic work B can primarily be explained by the fact that in it the verses are counted as paragraphs and stanza lines are not terminated with a punctuation mark, which is why each verse is interpreted as one sentence. Poetry is in any case very different from prose. Paragraphs come closest to verses.

Of course, the number work Z also deviates strongly due to its special structure.

The distributions of the other works are quite even, but shifted against each other, which supports the hypothesis that they are different authors or at least text structures. The uniformity of the distributions, however, points to a coherent harmony of the individual works, in which the thought processes are apparently already well-dosed, well-considered, and through the outer structure also point to an inner uniformity.

Words per Paragraph: Comparison of Distributions

The following graph shows a comparison of the word distributions per paragraph for the works in CusyA.
Due to the different structure, the distribution of the poetic work B again differs significantly from that of the prose works. The same is of course true for the number work Z.

The relatively broad distributions are to be expected, but are relatively similar for the prose works. The incidences between the different texts are clearly different, which in turn supports the hypothesis of different authors or text structures.

Comparison of Word Incidences

The following graph shows a comparison of the distributions of word incidences in the considered works of the language CusyA. Of course, it is not so important which words appear concretely and how often, but statistically exciting is rather whether the distributions show conspicuous similarities or differences, possibly also deviating from the behaviour of known languages.

The distributions for known languages often roughly correspond to Zipf's law. According to Zipf's law, the incidence should decrease inversely proportional to the position in the ranking. Typical for languages is an individual deviation depending on the language, especially in the area of the most common words. Here a weaker decrease can be found regularly. On the other hand, there are also differences in the area of rarely used words.

The behaviour of the works is relatively similar. Of course the number Z differs clearly from the others. Once again, the poetic work B also differs significantly from the others.
It remains open whether this is merely due to the special structure of poetic works in CusyA or whether this poetic work B perhaps originates from a different time than the other works or perhaps was even written in a different dialect.

All in all, the distribution matches what one would expect from a language, thus a certain resemblance to Zipf's law, including the mentioned deviations for real languages, as they also occur in German or English.

CusyA – Simple Text Production and Grammar

Texts in the written language CusyA (short for cubic-symmetrical) are not completely decoded. The basic grammar, the meaning of some glyphs and some simple rules of text production are known. The overview available here should be helpful for the analysis of texts.

CusyA is at the same time the prototype of a rich, regular language. Unlike many other languages, CusyA avoids many syntactic redundancies and ambiguities.
A further characteristic of CusyA are the continuously used markers or prefixes to identify the regular grammatical structures.
The vocabulary of CusyA consists primarily of markers and the word cores. The meaning in the text structure results from the combination. Thus the markers preceding the word core as prefixes generate whether the word thus formed is a subject, predicate, object, adjective or adverb. Sentence structures are separated from each other by various punctuation marks and refined in intention.
In addition to the grammar of the markers and word cores, there is also a separate syntax for mathematical expressions and numbers.

The written language CusyA has for the most part its own character set and its own simple grammar.
To attempt to describe the formal production of the structure and grammar of simple text structures by CusyA, a variant of the short notation according to (e)BNF is used in the following.

In addition, parentheses are used to clearly prioritise the priority of the assignment. $ is used as a placeholder to avoid having to describe repeating structures more than once. A (=$) means that in the following definitions, the $ is to be replaced by a previously specified A. The $ is to be used as a placeholder to avoid having to describe repeating structures more than once.

In the context of semantic text markup, for example with XHTML, this production is supplemented in smaller structures, but it is also important to create larger structures than paragraphs, verses and lines, which are described here. The following short notation therefore only considers the mere text production, not the additional semantic text markup with elements, which typically contain a text structure, but which can also contain small structures, such as accentuated text passages, references, quotations, etc., thus appear additionally within the structures described here. Markers of markup languages such as XHTML usually enclose the text structures described here. Alternatively, they often occur where spaces are allowed. In special situations, however, the markers can also appear in other places.

The grammar outlined below also does not describe the complete structure of complete works, it begins roughly on the level of paragraphs, verses, headings as TextStructure. Superordinate structures and connections are to be marked semantically elsewhere. In addition to paragraph, verse and line, there may also be other structures at the level of abstraction which are not formally considered here, but which are not excluded. The grammar given here is therefore only a reference point for possible structures for now. With a real language it is anyway hardly possible to describe it completely by a formal algorithm, such an algorithm is inevitably always only a helpful reference point with the text analysis.
Thus the digital representation of such structures is described. That is on the upper levels and partly also on level of the characters already identified in their meaning inevitably an illustration on what is representable like in an XHTML document.

TextStructure: Paragraph | Verse | Line

Line: '󴨜' Space+ (Words | SentenceBlockA | Sentence ) Space+ '󴨝' Space*
Semantically, the line is a block element, therefore it is surrounded by a block element in a markup language, at least in the case of simple, traditional text with a line break.
Typical lines appear in verses of poems, but also as headings of works and chapters.

Verse: '󴨚' Space+ Line+ Space+'󴨛' Space*
Semantically, the verse is a block element, therefore it is surrounded by a block element in a markup language, at least in the case of simple, traditional text with a line break and further vertical text spacing to the preceding and following text separated.

Paragraph: '󴨘' Space+ SentenceBlockA+ '󴨙' Space*
Semantically, the paragraph is a block element, therefore it is surrounded by a block element in a markup language, at least in the case of simple, traditional text with a line break and separated by a further, vertical text spacing to the preceding and following text. In XHTML, the paragraph is represented by the p element.

Spaces include the normal space, but also the common characters for line breaks. Space (Unicode): #x20 | #x9 | #xD | #xA This exact coding of spaces is of course only part of this digital representation of the language, not of CusyA itself.

Words: (QuestionMarker Space+)? (Word | Word (Space+ Word)+) Space*

Word: WordE | WordE (Connector WordE)+

WordE: Adjective | Subject | Object | Predicate | NameE | Term | Number | Gate Number | Symbol

SentenceBlockA: (SentenceBlock Space+)+ | Quote Space+ | DirectSpeech Space+ | Quote3 Space+ | Quote4 Space+

Quote: '󴨞' (SentenceBlock | SentenceBlock (Space+ SentenceBlock)+ ) '󴨟'

DirectSpeech: '󴨠' (SentenceBlock | SentenceBlock (Space+ SentenceBlock)+ ) '󴨑'

Quote3: '󴨒' (SentenceBlock | SentenceBlock (Space+ SentenceBlock)+ ) '󴨣'

Quote4: '󴨀' (SentenceBlock | SentenceBlock (Space+ SentenceBlock)+ ) 'σ΄¨₯'

SentenceBlock: Statement | Question | SubjectQuestion

SubjectQuestion: 'ΒΏ' SubjectMarker Space+ Predicates Space+ Objects '?'

Question: 'ΒΏ' QuestionMarker Space+ Sentence '?'

Statement: NormalStatement | AttenuatedStatement | ImperativeExclamation | Term Space+

NormalStatement: ('󴨑'| '…') Sentence ('.' | '…')

AttenuatedStatement: '󴨒' Sentence '󴨓'

ImperativeExclamation: 'Β‘' Sentence '!'

Sentence: MainClause | MainClause ((',' | ';' | Space+ '…' | Space+ '–' | Space+ '/' | Space+ '') Space+ MainClause)+

MainClause: Subjects Space+ Predicates Space+ Objects

Subjects (=$): Subject | Brackets$+

Predicates (=$): Predicate | Brackets$+

Objects (=$): Object | Brackets$+

Adjectives (=$): Adjective | Brackets$+

Brackets$: Enumeration$ | '(' Enumeration$ ')' | '[' Enumeration$ ']' | '{' Enumeration$ '}' | '󴨦' Enumeration$ '󴨧'
More than one parenthesis expression should be used when different types of Enumeration$ are combined so that the enumeration is unique. The parentheses can also be useful for more complicated enumerations, such as enumerating subjects, predicates, or objects that contain enumerations of adjectives or adverbs.

Enumeration$: And$ | Or$ | EitherOr$ | NeitherNor$

And$: $ ('󴨔' Space+ $)+

Or$: $ ('󴨕' Space+ $)+

EitherOr$: $ ('󴨖' Space+ $)+

NeitherNor$: $ ('󴨗' Space+ $)+

Subject: (Adjectives Space+)? SubjectMarker PluralMarker? GenusMarker? (Core | Name) (Space+ Extension)?

Predicate: (Adjectives Space+)? Predicatemarker PassivMarker? Core (Space+ Extension)?

Object: (Adjectives Space+)? ObjectMarker PluralMarker? GenusMarker? (Core | Name) (Space+ Extension)?

Adverbs have the same syntax as Adjectives:
Adjective: '󴉅' | AdjectiveMarker Core

Name: NameMarker Glyphe+

Name outside a sentence construction:
NameE: PluralMarker? GenusMarker? NameMarker Glyphe+

Extension: ExtensionStartMarker Space+ Objects Space+ ExtensionsEndMarker

Adjective markers stand for in the order given: Positive, Competitive, Superlative, Maximative, Negative / Negation, Negated Comparative, Negated Superlative, Negated Maximative.
AdjectiveMarker: '󴉁' | '󴉂' | '󴉃' | '󴉄' | '󴉅' | '󴉆'| '󴉇' | 'σ΄‰ˆ'

Genus markers are only used if gender is to be emphasised, otherwise the genus is not used. The markers stand for: male, female, neuter, hermaphroditic, neither nor, changing or altered, indefinite (someone), irrelevant or neutral.
GenusMarker: | '󴉉' | 'σ΄‰Š' | '󴉋' | 'σ΄‰Œ' | '󴉍'

1 2 3 4 5
Go to page:

Free e-book: Β«Corpus CusyA by Marie de Sade, Dr. Olaf Hoffmann, Authors β€šA, B, C, D, E, F, Hβ€˜ (classic romance novels txt) πŸ“•Β»   -   read online now on website american library books (americanlibrarybooks.com)

Comments (0)

There are no comments yet. You can be the first!
Add a comment