Skip to content

Latest commit

 

History

History
279 lines (258 loc) · 18.9 KB

STATS.md

File metadata and controls

279 lines (258 loc) · 18.9 KB

CGELBank Statistics

Analyzing 43 files:

Overview

  • Trees: 294
  • Nodes: 14277
  • Lexical Nodes: 5168 (17.6/tree)
  • Lexical Insertions (nodes where surface string is empty due to typo): 5
  • Gaps: 216
  • Punctuation Tokens: 539
  • Avg Tree Depth: 12.6

POS categories

POS count
N 1340
P 688
V 661
D 586
N_pro 496
V_aux 426
Adj 338
Adv 255
Coordinator 189
Sdr 181
Int 8

Lemmas occurring >=5 times, by categories the lemma appears in

  • {D}: a all another any enough no some something the this two
  • {V, V_aux}: be do have
  • {P}: about after at because between by from here in now of on out over than then through up when with
  • {D, Sdr}: that
  • {N}: $ Bush company course day food home horse money people place point time year
  • {N_pro}: he it my she they we who you
  • {V_aux}: can could may should will would
  • {V}: come find get give go know leave make need require say see take tell think use want
  • {Adv}: also even how just not really why
  • {N, V}: call charge file help issue look start talk try work
  • {Adj, D, N_pro}: what
  • {N_pro, P}: there
  • {P, Sdr}: for if to
  • {N, N_pro}: I
  • {Adj, Adv}: only very
  • {Adj}: different first good great new
  • {Coordinator}: / and but or
  • {Adj, Adv, Int}: well
  • {Adv, Coordinator, Int, P}: so
  • {Adv, D}: more most
  • {D, N, N_pro}: one
  • {Adj, V}: own
  • {Adv, N, V}: right
  • {Adj, P, V}: like
  • {Adv, P}: as
  • {D, N_pro}: which
  • {N, P}: back
  • {Adj, N, V}: deal

All lexemes of closed-class categories

  • D: 1, 10, 11, 11,000, 11780, 12, 120, 14000, 15, 1584, 16, 2, 2.3, 20, 200, 200000, 20000000, 2017, 21, 22, 24, 28, 3.7, 30, 300, 4, 45, 5, 500, 53, 600, 90, a, a few, a little, all, an, another, any, anybody, anyone, anything, anywhere, billion, both, each, enough, every, everybody, everyone, everything, fourteen, hundred, least, many, many a, million, more, most, much, neither, no, no one, none, once, one, several, some, someone, something, somewhere, that, the, these, this, those, three, two, what, which
  • N_pro: I, he, him, his, it, its, me, my, one, our, she, their, them, there, they, tomorrow, us, we, what, which, who, whom, whose, yesterday, you, you all
  • V_aux: are, be, can, cannot, could, do, have, may, might, must, should, will, would
  • P: @, Into, Like, a.m., about, above, according, after, against, along, alongside, around, as, aside, at, away, back, because, before, behind, below, between, by, considering, coupled, despite, down, due, during, else, except, for, forward, from, here, if, in, in order, including, inside, into, irrespective, like, near, next, now, of, off, on, onboard, once, onto, opposite, out, outside, over, past, per, plus, regarding, sideways, since, so, so long as, than, then, there, through, throughout, to, toward, towards, under, unless, until, up, upon, upstairs, when, where, while, with, within, without
  • Sdr: for, if, that, to, whether
  • Coordinator: &, -, /, and, but, etc, etc., nor, or, plus, so, v

Nonterminal categories

category count
Nom 2109
NP 1725
VP 1497
Clause 1146
PP 702
DP 584
AdjP 383
AdvP 261
GAP 216
Clause_rel 200
Coordination 184
N@flat 58
PP_strand 10
NP+PP 9
IntP 8
NP+Clause 6
D@flat 4
NP+AdvP 3
AdjP+PP 3
NP+AdjP 1

Functions

function count
Head 8515
Mod 1170
Comp 817
Obj 770
Det 579
Subj 555
Coordinate 382
Marker 369
(root) 294
PredComp 187
Supplement 165
Flat 129
Prenucleus 103
Det-Head 89
Postnucleus 25
Head-Prenucleus 18
Comp_ind 16
Particle 15
DisplacedSubj 15
Obj_dir 13
Obj_ind 13
ExtraposedSubj 12
Mod-Head 8
Compounding 5
Vocative 4
Obj+Mod 4
Marker-Head 3
Obj+PredComp/Comp 1
ExtraposedObj 1

High Valencies (ternary+, omitting Supplements and Coordinations)

valency count
(VP :Head V :Obj NP :Comp PP) 24
(VP :Head V :Obj NP :Comp Clause) 20
(VP :Head V :Obj_ind NP :Obj_dir NP) 10
(VP :Head V :Comp PP :Comp PP) 7
(VP :Head V :Particle PP :Obj NP) 4
(VP :Head V :Particle PP :Comp PP) 3
(VP :Head V :Obj NP :PredComp AdjP) 3
(VP :Head V :Obj NP :Particle PP) 2
(VP :Head V :Comp PP :Comp Clause) 2
(VP :Head V :Obj NP :Comp Coordination) 2
(VP :Head V :Obj NP :Comp PP_strand) 2
(VP :Head V :Obj GAP :Comp PP) 2
(VP :Head V :Obj_dir GAP :Obj_ind NP) 1
(VP :Head V :Obj NP :PredComp Coordination) 1
(VP :Head V :Particle PP :Obj GAP) 1
(VP :Head V :Obj GAP :Comp Clause) 1
(VP :Head V_aux :DisplacedSubj NP :PredComp AdjP) 1
(VP :Head V :Obj_ind NP :Obj_dir NP :Comp Clause) 1
(VP :Head V :Comp PP :PredComp PP) 1
(VP :Head V :Obj NP :Comp PP :Comp Clause) 1
(VP :Head V :Obj Coordination :Comp Clause) 1
(VP :Head V :Particle PP :Obj NP :Comp PP) 1
(VP :Head V :Particle PP :Obj Coordination) 1
(VP :Head V :Obj NP :Particle PP :Comp PP) 1
(VP :Head V :Obj NP :Comp GAP) 1
(VP :Head V :Comp GAP :Comp PP) 1
(VP :Head V :Comp PP :Comp PP_strand) 1
(VP :Head V :Comp GAP :PredComp AdjP) 1
(VP :Head V_aux :DisplacedSubj NP :PredComp PP) 1
(VP :Head V :Obj_ind GAP :Obj_dir NP) 1
(VP :Head V :Obj NP :PredComp GAP) 1
(VP :Head V_aux :DisplacedSubj NP :Comp PP) 1
(VP :Head V :Obj NP :PredComp Clause) 1
(VP :Head V :Particle PP :Comp Clause) 1
(PP :Head P :Obj NP :Comp PP) 1
(VP :Head V :Comp PP_strand :Comp PP_strand) 1

Nonlexical Categories by Function (excluding nonce categories)

Nom NP VP Clause PP DP AdjP AdvP GAP Clause_rel Coordination PP_strand IntP
Head 1855 72 1409 227 23 1 41 4 32 85 54
Mod 206 28 21 36 277 26 209 221 35 91 20
Comp 2 480 289 4 1 18 12 10
Obj 683 57 30
Det 109 4 466
Subj 494 4 1 52 4
Coordinate 46 99 59 110 4 2 37 6 3 4
(root) 1 22 1 227 2 41
PredComp 65 9 13 72 20 8
Supplement 44 2 24 47 17 18 4 8
Det-Head 2 87
Prenucleus 39 1 2 14 10 10
Postnucleus 1 10 5 5 3 1
Head-Prenucleus 14 1 1 2
Comp_ind 9 7
DisplacedSubj 14 1
Particle 15
Obj_dir 12 1
Obj_ind 12 1
ExtraposedSubj 12
Mod-Head 1 1 6
Compounding 3 1 1
Obj+Mod 4
Vocative 4
Marker 2
ExtraposedObj 1
Obj+PredComp/Comp 1

Parent Categories by Function (excluding nonce categories and root)

VP Nom NP Clause PP DP AdjP Clause_rel Coordination AdvP N PP_strand D IntP
Head 1496 1991 1725 1146 701 584 382 200 3 259 10 8
Mod 295 656 31 55 23 11 67 2 21
Comp 555 108 1 1 120 1 28
Obj 286 464 1 9
Det 578 1
Subj 448 104
Coordinate 382
Marker 135 20 42 109 3 1 16 29 3 4
PredComp 176 9
Supplement 11 7 61 72 3 2 3 6
Flat 121 8
Prenucleus 66 37
Det-Head 89
Postnucleus 6 2 1 15 1
Head-Prenucleus 18
Comp_ind 7 1 1 6 1
Particle 15
DisplacedSubj 15
Obj_dir 13
Obj_ind 13
ExtraposedSubj 12
Mod-Head 8
Compounding 5
Vocative 1 3
Obj+Mod 4
Marker-Head 1 2
Obj+PredComp/Comp 1
ExtraposedObj 1