OVERVIEW
- This document explains the purpose of each of the column headings in vocabulary.csv
- For details on the structure of the entries, see the README file
- each entry is a lexical expression
- entries can be more than one word, for example 'think about'
- tag based on categorial grammar
- contains information about what the expression combines with
- for example, transitive verbs are '(S\NP)/NP' and intransitive verbs are 'S\NP'
- tag based on part of speech, sometimes with additional info
- for example, transitive verbs are 'TV' and intransitive verbs are 'IV'
- 'IV' labels can contain additional information about whether the subject needs to be agentive and/or plural
- note that this column is not complete for all expressions. It is only used when the information in 'category' is insufficient
- indicates whether the expression is a main verb
- value is 1 if the expression is a main verb, and blank otherwise
- this category does not mark auxiliary verbs (e.g., 'can', 'might') or copulas
- indicates whether the expression is a noun (N or NP)
- value is 1 if the expression is a noun, and blank otherwise
- this category includes proper nouns (e.g., 'The Great Lakes')
- this category includes expletive pronouns 'it' and 'there'
- this category DOES NOT include any other pronouns (e.g., 'him', non-expletive 'it')
- indicates whether the expression is a non-verbal-predicate
- used for predicative adjectives (e.g., 'unemployed', 'hidden')
- used for prepositional phrases (e.g., 'at the bottom of', 'in one piece')
- used for other predicative phrases (e.g., 'similar to')
##frequent
- indicates whether the expression is frequent in English
- frequency is based on annotators judgments
- value is 1 if frequent, 0 if infrequent
- only nouns, determiners, and adverbs are consistently marked for frequency
- some verbs may be marked for frequency
- many irregular plural nouns are marked as infrequent (e.g., 'radii')
- indicates whether an expression is singular or not
- value is 1 if singular, 0 if plural, blank if the expression is not a noun
- only nouns are marked for singular or plural
- mass nouns are marked as sg because the have singular agreement with verbs for which they are subjects
- expression like 'Galileo' and 'turtle' are marked singular
- indicates whether an expression is plural or not
- value is 1 if plural, 0 if singular, blank if the expression is not a noun
- only nouns are marked for singular or plural
- expressions like 'the Clintons' and 'turtles' are marked plural
- indicates whether an expression is a mass noun or not
- value is 1 if it's a mass noun, 0 if it's a count noun, and blank if the expression is not a noun
- only nouns are marked for whether they're mass
- expressions like 'science' and 'ice cream' are marked as mass nouns