-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Edit R codes and configuration files + first run of CLDF.
- Loading branch information
Showing
22 changed files
with
1,426 additions
and
111 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
{ | ||
"title": "CLDF dataset derived from von Rosenberg's \"De Mentawei-Eilanden en Hunne Bewoners\" from 1853 for comparative numeral data", | ||
"access_right": "open", | ||
"keywords": [ | ||
"cldf:Wordlist", | ||
"linguistics" | ||
], | ||
"creators": [], | ||
"contributors": [], | ||
"communities": [ | ||
{ | ||
"identifier": "lexibank" | ||
} | ||
], | ||
"upload_type": "dataset", | ||
"description": "<p>Cite the source of the dataset as:</p>\n\n<blockquote>\n<p>von Rosenberg, Carl Benjamin Hermann. 1853. De Mentawei-Eilanden en Hunne Bewoners. Tijdschrift voor Indische Taal-, Land- en Volkenkunde 1. 403\u2013440.</p>\n</blockquote>", | ||
"license": { | ||
"id": "CC-BY-NC-SA-4.0" | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
## Specification of form manipulation | ||
|
||
|
||
Specification of the value-to-form processing in Lexibank datasets: | ||
|
||
The value-to-form processing is divided into two steps, implemented as methods: | ||
- `FormSpec.split`: Splits a string into individual form chunks. | ||
- `FormSpec.clean`: Normalizes a form chunk. | ||
|
||
These methods use the attributes of a `FormSpec` instance to configure their behaviour. | ||
|
||
- `brackets`: `{'(': ')'}` | ||
Pairs of strings that should be recognized as brackets, specified as `dict` mapping opening string to closing string | ||
- `separators`: `,` | ||
Iterable of single character tokens that should be recognized as word separator | ||
- `missing_data`: `('?', '-')` | ||
Iterable of strings that are used to mark missing data | ||
- `strip_inside_brackets`: `True` | ||
Flag signaling whether to strip content in brackets (**and** strip leading and trailing whitespace) | ||
- `replacements`: `[]` | ||
List of pairs (`source`, `target`) used to replace occurrences of `source` in formswith `target` (before stripping content in brackets) | ||
- `first_form_only`: `False` | ||
Flag signaling whether at most one form should be returned from `split` - effectively ignoring any spelling variants, etc. | ||
- `normalize_whitespace`: `True` | ||
Flag signaling whether to normalize whitespace - stripping leading and trailing whitespace and collapsing multi-character whitespace to single spaces | ||
- `normalize_unicode`: `NFD` | ||
UNICODE normalization form to use for input of `split` (`None`, 'NFD' or 'NFC') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# CLDF dataset derived from von Rosenberg's "De Mentawei-Eilanden en Hunne Bewoners" from 1853 for comparative numeral data | ||
|
||
## How to cite | ||
|
||
If you use these data please cite | ||
- the original source | ||
> von Rosenberg, Carl Benjamin Hermann. 1853. De Mentawei-Eilanden en Hunne Bewoners. Tijdschrift voor Indische Taal-, Land- en Volkenkunde 1. 403–440. | ||
- the derived dataset using the DOI of the [particular released version](../../releases/) you were using | ||
|
||
## Description | ||
|
||
|
||
This dataset is licensed under a https://creativecommons.org/licenses/by-nc-sa/4.0/ license | ||
|
||
## Statistics | ||
|
||
|
||
![Glottolog: 100%](https://img.shields.io/badge/Glottolog-100%25-brightgreen.svg "Glottolog: 100%") | ||
![Concepticon: 100%](https://img.shields.io/badge/Concepticon-100%25-brightgreen.svg "Concepticon: 100%") | ||
![Source: 100%](https://img.shields.io/badge/Source-100%25-brightgreen.svg "Source: 100%") | ||
|
||
- **Varieties:** 8 (linked to 8 different Glottocodes) | ||
- **Concepts:** 10 (linked to 10 different Concepticon concept sets) | ||
- **Lexemes:** 80 | ||
- **Sources:** 1 | ||
- **Synonymy:** 1.00 | ||
|
||
## CLDF Datasets | ||
|
||
The following CLDF datasets are available in [cldf](cldf): | ||
|
||
- CLDF [Wordlist](https://github.com/cldf/cldf/tree/master/modules/Wordlist) at [cldf/cldf-metadata.json](cldf/cldf-metadata.json) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
|
||
# Detailed transcription record | ||
|
||
## Segments | ||
|
||
| Segment | Occurrence | BIPA | CLTS SoundClass | | ||
|-----------|--------------|--------|-------------------| | ||
|
||
(0 rows) | ||
|
||
|
||
|
||
## Unsegmentable lexemes (up to 100 only) | ||
|
||
| ID | LANGUAGE | CONCEPT | FORM | | ||
|------|------------|-----------|--------| | ||
|
||
(0 rows) | ||
|
||
|
||
|
||
## Words with invalid segments (up to 100 only) | ||
|
||
| ID | LANGUAGE | CONCEPT | FORM | SEGMENTS | | ||
|------|------------|-----------|--------|------------| | ||
|
||
(0 rows) | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"by_language": {}, | ||
"stats": { | ||
"bad_words": [], | ||
"bad_words_count": 0, | ||
"bipa_errors": [], | ||
"general_errors": 0, | ||
"invalid_words": [], | ||
"invalid_words_count": 0, | ||
"inventory_size": 0, | ||
"replacements": {}, | ||
"sclass_errors": [], | ||
"segments": {} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,89 @@ | ||
# CLDF directory | ||
<a name="ds-cldfmetadatajson"> </a> | ||
|
||
# Wordlist CLDF dataset derived from von Rosenberg's "De Mentawei-Eilanden en Hunne Bewoners" from 1853 for comparative numeral data | ||
|
||
**CLDF Metadata**: [cldf-metadata.json](./cldf-metadata.json) | ||
|
||
**Sources**: [sources.bib](./sources.bib) | ||
|
||
property | value | ||
--- | --- | ||
[dc:bibliographicCitation](http://purl.org/dc/terms/bibliographicCitation) | von Rosenberg, Carl Benjamin Hermann. 1853. De Mentawei-Eilanden en Hunne Bewoners. Tijdschrift voor Indische Taal-, Land- en Volkenkunde 1. 403–440. | ||
[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF Wordlist](http://cldf.clld.org/v1.0/terms.rdf#Wordlist) | ||
[dc:license](http://purl.org/dc/terms/license) | https://creativecommons.org/licenses/by-nc-sa/4.0/ | ||
[dcat:accessURL](http://www.w3.org/ns/dcat#accessURL) | git@github.com:complexico/vrosenberg1853-numeral | ||
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="git@github.com:complexico/vrosenberg1853-numeral/tree/82cb8c8">git@github.com:complexico/vrosenberg1853-numeral 82cb8c8</a></li><li><a href="glottolog-glottolog-d9da5e2">Glottolog glottolog-glottolog-d9da5e2</a></li><li><a href="https://github.com/concepticon/concepticon-data/tree/7c0b6ae3">Concepticon v3.1.0-19-g7c0b6ae3</a></li><li><a href="cldf-clts-clts-6dc73af">CLTS cldf-clts-clts-6dc73af</a></li></ol> | ||
[prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) | <ol><li><strong>lingpy-rcParams</strong>: <a href="./lingpy-rcParams.json">lingpy-rcParams.json</a></li><li><strong>python</strong>: 3.9.6</li><li><strong>python-packages</strong>: <a href="./requirements.txt">requirements.txt</a></li></ol> | ||
[rdf:ID](http://www.w3.org/1999/02/22-rdf-syntax-ns#ID) | vrosenberg1853 | ||
[rdf:type](http://www.w3.org/1999/02/22-rdf-syntax-ns#type) | http://www.w3.org/ns/dcat#Distribution | ||
|
||
|
||
## <a name="table-formscsv"></a>Table [forms.csv](./forms.csv) | ||
|
||
property | value | ||
--- | --- | ||
[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF FormTable](http://cldf.clld.org/v1.0/terms.rdf#FormTable) | ||
[dc:extent](http://purl.org/dc/terms/extent) | 80 | ||
|
||
|
||
### Columns | ||
|
||
Name/Property | Datatype | Description | ||
--- | --- | --- | ||
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string` | Primary key | ||
[Local_ID](http://purl.org/dc/terms/identifier) | `string` | | ||
[Language_ID](http://cldf.clld.org/v1.0/terms.rdf#languageReference) | `string` | References [languages.csv::ID](#table-languagescsv) | ||
[Parameter_ID](http://cldf.clld.org/v1.0/terms.rdf#parameterReference) | `string` | References [parameters.csv::ID](#table-parameterscsv) | ||
[Value](http://cldf.clld.org/v1.0/terms.rdf#value) | `string` | | ||
[Form](http://cldf.clld.org/v1.0/terms.rdf#form) | `string` | | ||
[Segments](http://cldf.clld.org/v1.0/terms.rdf#segments) | list of `string` (separated by ` `) | | ||
[Comment](http://cldf.clld.org/v1.0/terms.rdf#comment) | `string` | | ||
[Source](http://cldf.clld.org/v1.0/terms.rdf#source) | list of `string` (separated by `;`) | References [sources.bib::BibTeX-key](./sources.bib) | ||
`Cognacy` | `string` | | ||
`Loan` | `boolean` | | ||
`Graphemes` | `string` | | ||
`Profile` | `string` | | ||
`CommonTranscription` | `string` | | ||
|
||
## <a name="table-languagescsv"></a>Table [languages.csv](./languages.csv) | ||
|
||
property | value | ||
--- | --- | ||
[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF LanguageTable](http://cldf.clld.org/v1.0/terms.rdf#LanguageTable) | ||
[dc:extent](http://purl.org/dc/terms/extent) | 8 | ||
|
||
|
||
### Columns | ||
|
||
Name/Property | Datatype | Description | ||
--- | --- | --- | ||
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string` | Primary key | ||
[Name](http://cldf.clld.org/v1.0/terms.rdf#name) | `string` | | ||
[Glottocode](http://cldf.clld.org/v1.0/terms.rdf#glottocode) | `string` | | ||
`Glottolog_Name` | `string` | | ||
[ISO639P3code](http://cldf.clld.org/v1.0/terms.rdf#iso639P3code) | `string` | | ||
[Macroarea](http://cldf.clld.org/v1.0/terms.rdf#macroarea) | `string` | | ||
[Latitude](http://cldf.clld.org/v1.0/terms.rdf#latitude) | `decimal` | | ||
[Longitude](http://cldf.clld.org/v1.0/terms.rdf#longitude) | `decimal` | | ||
`Family` | `string` | | ||
`Sources` | `string` | | ||
`Doculect_Dutch` | `string` | | ||
|
||
## <a name="table-parameterscsv"></a>Table [parameters.csv](./parameters.csv) | ||
|
||
property | value | ||
--- | --- | ||
[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF ParameterTable](http://cldf.clld.org/v1.0/terms.rdf#ParameterTable) | ||
[dc:extent](http://purl.org/dc/terms/extent) | 10 | ||
|
||
|
||
### Columns | ||
|
||
Name/Property | Datatype | Description | ||
--- | --- | --- | ||
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string` | Primary key | ||
[Name](http://cldf.clld.org/v1.0/terms.rdf#name) | `string` | | ||
[Concepticon_ID](http://cldf.clld.org/v1.0/terms.rdf#concepticonReference) | `string` | | ||
`Concepticon_Gloss` | `string` | | ||
`Number` | `string` | | ||
|
||
This directory contains the dataset formatted as [CLDF dataset](https://cldf.clld.org). |
Oops, something went wrong.