Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ə and ɜ have the exact same features #352

Open
DanielSWolf opened this issue Mar 17, 2022 · 1 comment
Open

ə and ɜ have the exact same features #352

DanielSWolf opened this issue Mar 17, 2022 · 1 comment

Comments

@DanielSWolf
Copy link

In component-feature-table.csv, the segments ə (mid central vowel) and ɜ (open-mid central unrounded vowel) have the exact same features:

ə,0259,0,-,+,-,-,-,+,+,0,+,-,-,-,-,-,0,0,-,0,0,0,+,-,-,-,-,-,-,-,+,-,-,-,0,-,-,0
ɜ,025C,0,-,+,-,-,-,+,+,0,+,-,-,-,-,-,0,0,-,0,0,0,+,-,-,-,-,-,-,-,+,-,-,-,0,-,-,0

I assume that's not on purpose, given that the FAQ states that "if two phonemes differ in their graphemic representation, then they should necessarily differ in their featural representation as well".

@bambooforest bambooforest self-assigned this Mar 21, 2022
@bambooforest
Copy link
Contributor

@DanielSWolf -- indeed. This is a problem that we are aware of (hence the "should"). The problem is also pervasive.

library(tidyverse)
df <- read_csv(url('https://raw.githubusercontent.com/phoible/dev/master/data/phoible.csv'))
df <- df %>% select(7:48, -Allophones, -Source, -Marginal, -SegmentClass) %>% distinct()
df <- df %>% remove_rownames %>% column_to_rownames(var="Phoneme")
df <- df %>% filter(tone != "+")
df <- rownames_to_column(df, "Phoneme")
out <- df %>% group_by(tone, stress, syllabic, short, long, consonantal, sonorant, continuant, delayedRelease, approximant, tap, trill, nasal, lateral, labial, round, labiodental, coronal, anterior, distributed, strident, dorsal, high, low, front, back, tense, retractedTongueRoot, advancedTongueRoot, periodicGlottalSource, epilaryngealSource, spreadGlottis, constrictedGlottis, fortis, raisedLarynxEjective, loweredLarynxImplosive, click) %>%
  summarize(phonemes = paste0(Phoneme, collapse = ', '), count = n()) %>% ungroup()
out %>% select(phonemes, count) %>% filter(count > 1) %>% arrange(desc(count))

   phonemes                       count
   <chr>                          <int>
 1 t, t͉, t̠, t̺, t̟, d̥, t̪̺, d̺̥, t̺͉          9
 2 t̻s̻, t̪s̪, ts̪, t̪s, t̪̻s̪̻, t̟ʃ̟, ts̻, t̻s̪̻     8
 3 t̠ʃ, t̠ʃ͉, t̠͉ʃ, d̥ʒ̥, t̻ʃ̻, d̥ʒ̊, ʈ̻ʂ̻         7
 4 ts, t͉s, t̺s̺, t̟s̟, d̥z̥, d̺̥z̺̥, ts̺         7
 5 d̻z̻, d̪z̪, dz̪, d̪ʒ, d̟ʒ̟, d̪z, dz̻         7
 6 ʃ, ʃ͉, ʒ̊, s̠, s̺̠, s̻̠, ʂ̻                7

There are several reasons for this, including but probably not limited to:

  • no features for tones (that's why I filter them out above)
  • some phoneme specifications in different documents collapse the feature vectors across phonemes, e.g., ʃ vs ʒ̊
  • some clicks are difficult to specify with the current feature set
  • plain mistakes that we need to revisit
  • the feature set itself requires some updates

@drammock anything else?

We should make this clearer in the FAQ and on the FEATURES page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants