Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rescrapes dialect data after #548 #551

Merged
merged 15 commits into from
Jul 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ Unreleased

#### Changed

- Fixes dialect xpath selector. (\#545)
- Rescrapes dialect data after \#548. (\#551)
- Fixes dialect XPath selector. (\#548)
- Fixes table alignment. (\#539)
- Repeats big scrape after \#523. (\#536)
- Fixes excessive line wrapping. (\#529)
Expand Down
14 changes: 7 additions & 7 deletions data/phones/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ See the [HOWTO](HOWTO.md) for the steps to generate phone lists.
| [phone](phones/hbs_broad.phones) | hbs | Serbo-Croatian | Serbo-Croatian | Broad | 65 |
| [phone](phones/hin_broad.phones) | hin | Hindi | Hindi | Broad | 64 |
| [phone](phones/hun_narrow.phones) | hun | Hungarian | Hungarian | Narrow | 86 |
| [phone](phones/hye_e_narrow.phones) | hye | Armenian | Armenian (Eastern Armenian, standard) | Narrow | 72 |
| [phone](phones/hye_w_narrow.phones) | hye | Armenian | Armenian (Western Armenian, standard) | Narrow | 72 |
| [phone](phones/hye_e_narrow.phones) | hye | Armenian | Armenian (Eastern Armenian) | Narrow | 74 |
| [phone](phones/hye_w_narrow.phones) | hye | Armenian | Armenian (Western Armenian) | Narrow | 75 |
| [phone](phones/isl_broad.phones) | isl | Icelandic | Icelandic | Broad | 71 |
| [phone](phones/ita_broad.phones) | ita | Italian | Italian | Broad | 32 |
| [phone](phones/jpn_narrow.phones) | jpn | Japanese | Japanese | Narrow | 64 |
Expand All @@ -31,13 +31,13 @@ See the [HOWTO](HOWTO.md) for the steps to generate phone lists.
| [phone](phones/mya_broad.phones) | mya | Burmese | Burmese | Broad | 70 |
| [phone](phones/nld_broad.phones) | nld | Dutch | Dutch | Broad | 50 |
| [phone](phones/nob_broad.phones) | nob | Norwegian Bokmål | Norwegian Bokmål | Broad | 72 |
| [phone](phones/por_bz_broad.phones) | por | Portuguese | Portuguese (Brazil) | Broad | 55 |
| [phone](phones/por_po_broad.phones) | por | Portuguese | Portuguese (Portugal) | Broad | 48 |
| [phone](phones/por_bz_broad.phones) | por | Portuguese | Portuguese (Brazil) | Broad | 45 |
| [phone](phones/por_po_broad.phones) | por | Portuguese | Portuguese (Portugal) | Broad | 44 |
| [phone](phones/ron_narrow.phones) | ron | Romanian | Romanian | Narrow | 51 |
| [phone](phones/slv_broad.phones) | slv | Slovenian | Slovene | Broad | 48 |
| [phone](phones/spa_ca_broad.phones) | spa | Spanish | Spanish (Castilian) | Broad | 29 |
| [phone](phones/spa_la_broad.phones) | spa | Spanish | Spanish (Latin America) | Broad | 27 |
| [phone](phones/spa_la_broad.phones) | spa | Spanish | Spanish (Latin America) | Broad | 28 |
| [phone](phones/tur_narrow.phones) | tur | Turkish | Turkish | Narrow | 51 |
| [phone](phones/vie_hanoi_narrow.phones) | vie | Vietnamese | Vietnamese (Hà Nội) | Narrow | 54 |
| [phone](phones/vie_hcmc_narrow.phones) | vie | Vietnamese | Vietnamese (Hồ Chí Minh City) | Narrow | 50 |
| [phone](phones/vie_hue_narrow.phones) | vie | Vietnamese | Vietnamese (Huế) | Narrow | 53 |
| [phone](phones/vie_hue_narrow.phones) | vie | Vietnamese | Vietnamese (Huế) | Narrow | 54 |
| [phone](phones/vie_saigon_narrow.phones) | vie | Vietnamese | Vietnamese (Saigon) | Narrow | 54 |
50 changes: 19 additions & 31 deletions data/phones/phones/hye_e_narrow.phones
Original file line number Diff line number Diff line change
@@ -1,32 +1,21 @@
# Based on https://en.wikipedia.org/wiki/Armenian_language#Phonology
# And based on the pronunciation script from Wiktionary: https://en.wiktionary.org/wiki/Module:hy-pronunciation
#
# Vowels.
# Vowels; these have been recently reworked upstream to use acute accents.
ɑ
ɛ
ɑ́
e
é
ə
ə́
o
ó
i
ɔ
í
u
#
# Older entries might contain [o] or [e]. These have been fixed. You shouldn't find these
# phones anymore.
#
# Long vowels: A sequence of identical heterosyllabic vowels is phonetically pronounced as
# one long vowel. Armenian does not have phonemic vowel length. But these are very rare.
# The only attested case on Wiktionary is the following.
ɛː
# The other types of long vowels are not found on Wiktionary as of Jan 2021. This is
# largely because long vowels are mostly in loanwords and borrowings, which Wiktionary
# underreports. At some point in the future, the Armenian users might add some entries
# which contain these sequences.
ɑː
əː
ɔː
#
#
ú
ʏ # Allophone that is automatically transcribed from word-medial յու /ju/.
ʏ́
# Consonants.
m
n
Expand All @@ -52,21 +41,21 @@ s
z
ʃ
ʒ
χ
χ
ʁ
h
l
j
r
ɾ
#
# Past errors: some instances of [χ] were incorrectly transcribed as [x].
# Past errors: some affricates were missing a tie-bar.
# * <ց> [tʃʰ]
# Past errors: some instances of [χ] were incorrectly transcribed as [x].
# Past errors: some affricates were missing a tie-bar.
# * <ց> [tʃʰ]
# * <չ> [tsʰ]
# These were fixed.
#
# Long consonants: A sequence of identical consonants are phonetically pronounced as
#
# Long consonants: A sequence of identical consonants are phonetically pronounced as
# long consonant. Armenian does not have phonemic consonant length or geminates.
Expand Down Expand Up @@ -100,7 +89,6 @@ d͡ʒː
ʒː
# This is just an accidental gap. The above long consonants can exist in Armenian.
# At some point in the future, the Armenian users might add some entries which contain
# This is just an accidental gap. The above long consonants can exist in Armenian.
# At some point in the future, the Armenian users might add some entries which contain
# these sequences.

46 changes: 21 additions & 25 deletions data/phones/phones/hye_w_narrow.phones
Original file line number Diff line number Diff line change
@@ -1,26 +1,21 @@
# Based on https://en.wikipedia.org/wiki/Armenian_language#Phonology
# And based on the pronunciation script from Wiktionary: https://en.wiktionary.org/wiki/Module:hy-pronunciation
#
# Vowels.
# Vowels; these have been recently reworked upstream to use acute accents.
ɑ
ɛ
ɑ́
e
é
ə
ə́
o
ó
i
ɔ
í
u
ʏ # allophone that is automatically transcribed from word-medial յու /ju/
#
# Older entries might contain [o] or [e]. These have been fixed. You shouldn't find these phones anymore.
#
# Long vowels: A sequence of identical heterosyllabic vowels is phonetically pronounced as one long vowel. Armenian does not have phonemic vowel length. But these are very rare. The only attested case on Wiktionary is the following.
ɛː
# The other types of long vowels are not found on Wiktionary as of Jan 2021. This is largely because long vowels are mostly in loanwords and borrowings, which Wiktionary underreports. At some point in the future, the Armenian users might add some entries which contain these sequences.
ɑː
əː
ɔː
#
ú
ʏ # Allophone that is automatically transcribed from word-medial յու /ju/.
ʏ́
# Consonants.
m
n
Expand All @@ -41,13 +36,14 @@ s
z
ʃ
ʒ
χ
χ
ʁ
h
l
j
ɾ
# Western Armenian doesn't have phonemic voiceless unaspirated stops. But the hy-pron script generetes unaspirated consonants after the /s/ segment
r
# Western Armenian doesn't have phonemic voiceless unaspirated stops. But the hy-pron script generetes unaspirated consonants after the /s/ segment
p
t
k
Expand All @@ -56,7 +52,7 @@ t͡ʃ
#
# Past errors: some instances of [χ] were incorrectly transcribed as [x].
# Past errors: some affricates were missing a tie-bar.
# * <ց> [tʃʰ]
# * <ց> [tʃʰ]
# * <չ> [tsʰ]
# These were fixed.
#
Expand All @@ -79,8 +75,8 @@ zː
ʁː
ɾː
#
# As an accidental gap, as of Jan 2021, there are no Wiktionary entries that have
r
# As an accidental gap, as of Jan 2021, there are no Wiktionary entries that have
# following long consonants.
t͡ʃʰː
d͡zː
Expand All @@ -96,16 +92,16 @@ kː
t͡sː
t͡ʃː
# These are all just an accidental gap. The above long consonants can exist in theory
# exist in Armenian. At some point in the future, the Armenian users might add some
# exist in Armenian. At some point in the future, the Armenian users might add some
# entries which contain these sequences.
#
# Western Armenian's phoneme inventory differs from Eastern Armenian in the following way:
# 1) Eastern Armenian has a trill grapheme ռ and a flap grapheme ր. In Western Armenian,
# both graphemes are pronounced as a flap. Western Armenian does not have a trill.
# 2) The graphemes for voiceless unaspirated stops+affricates are pronounced as voiceless
# unaspirated in Eastern Armenian, but as voiced in Western Armenian: կապ [kap] in
# 2) The graphemes for voiceless unaspirated stops+affricates are pronounced as voiceless
# unaspirated in Eastern Armenian, but as voiced in Western Armenian: կապ [kap] in
# Eastern, [gab] in Western.
# 3) The graphemes for voiced stops+affricates are pronounced as voiced in Eastern
# 3) The graphemes for voiced stops+affricates are pronounced as voiced in Eastern
# Armenian, but as voiceless aspirated in Western Armenian: գահ [gah] in Eastern, [kʰah] # in Western.
# * The graphemes for voiceless aspirated stops+affricates are pronounced as voiceless aspirated in both dialects: քուն [kʰun] in both.

83 changes: 38 additions & 45 deletions data/phones/phones/por_bz_broad.phones
Original file line number Diff line number Diff line change
@@ -1,62 +1,55 @@
# Based on
# Based on:
#
# https://en.wikipedia.org/wiki/Portuguese_phonology
#####
#
# Ordinary vowels and glides.
a
i
u
s
e
o
u
ɐ
ɔ
ɛ
ø
j
w
# Nasalized vowels and glides.
# Note the upstream template occasionally generates "double-nasalization",
# i.e., vowels followed by two combining tildes. This is ignored since
# it's not clear what this means.
ɐ̃
ĩ
õ
ũ
ɔ̃
ɛ̃
ã
# Consonants.
ɾ
t
s
k
ɐ # Allophone of /a/ in unstressed syllables.
o
ʁ
p
t
d
m
l
n
j
ɐ̃
w
b
ɛ
v
ɡ
ʁ
t͡ʃ
p
z
ʃ
b
f
t͡ʃ # Allophone of /t/ before high front vowels.
ɡ
ɡʷ # This gu in European, apparently.
d͡ʒ
v
ɻ
ʒ
ɔ
ĩ
d͡ʒ # Allophone of /d/ before high front vowels.
w̃ # Part of nasal diphthongs.
õ
ɲ
ʎ
j̃ # Allophone of /ɲ/. Part of nasal diphthongs.
ũ
ɹ # Allophone of /ʁ/ in coda position.
ɻ # Allophone of /ʁ/ in coda position.
r # Allophone of /ɾ/ in onset clusters. Allophone of /ʁ/ in coda position.
χ # Allophone of /ʁ/.
h # Allophone of /ʁ/.
ã
ɪ # Allophone of /e/ in unstressed syllables
ʊ # Allophone of /o/ in unstressed syllables.
ɦ # Allophone of /ʁ/.
x # Allophone of /ʁ/.
ɫ # Allophone of /l/ in coda position. [w] is a more common realization of /l/ in this position, however.
# NASAL DIPHTHONGS
# The following were not transcribed with tie bars.
ɐ̃͡j̃
ẽ͡j̃
õ͡j̃
ũ͡j̃
ɐ̃͡w̃
õ͡w̃

Loading