-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retagging Joyce’s dialogue dash #9
Comments
So, I"m not sure if I see a big difference between My question is the following; why is speech kept within the preceding paragraph? For instance, <p><lb n="100026"/>Father Conmee was very glad to see the wife of Mr David Sheehy
<lb n="100027"/>M. P. looking so well and he begged to be remembered to Mr David Sheehy
<lb n="100028"/>M. P. Yes, he would certainly call.
<lb n="100029"/><said>--</said>Good afternoon, Mrs Sheehy.</p> I know that in the Gabler edition, there is no indentation for quoted speech (though there is the Random House text), but I think of that as a sort of print convention of spacing, not an indication of a paragraph. New speakers means new paragraphs, am I wrong? I would want the above example to look like this: <p><lb n="100026"/>Father Conmee was very glad to see the wife of Mr David Sheehy
<lb n="100027"/>M. P. looking so well and he begged to be remembered to Mr David Sheehy
<lb n="100028"/>M. P. Yes, he would certainly call.</p>
<lb n="100029"/><p><said>―Good afternoon, Mrs Sheehy.</said></p> In part I ask, because, given the existing markup, it seems like you could programmatically replace a So, two questions I guess:
|
I'm in favor of converting the double hyphens to quotation dashes. I'll go ahead and do that, since that should be an easy <said>— Il fait beau,</said> dit Robert. I like this syntax this best. Even better when it has the That's an interesting question as to whether quoted lines are the beginning of their own paragraphs. I'm not sure I know the answer. For typographical purposes, at least, I think we're fine leaving it as-is, since we can always have an XSLT rule makes every line beginning with a quotation dash flush left. More philosophically, there are some paragraphs, around line 200 in Wandering Rocks, for instance, that end with colons, and are followed by quoted lines, which suggest some kind of paragraph-like continuity between the intented block and the quoted line. But that's about as much as I can come up with. |
Direct speech would typically be assigned its own Hans’s sense (he writes in an email to me) is that, somewhere between A Portrait and Ulysses, Joyce realised that any marking used to bracket speech – such as the opening, intermediary and final dashes of the Dubliners manuscript – created the illusion of spoken words as existing outside the narrative and, for Joyce, this impression did not square with his increasingly sharpened sense of narrative. Instead, he shifted to the opening dash only, placed moreover in the left margin, to signal speech as integral to the narrative. Hans writes:
|
Thanks for making that global change, Jonathan. I think the syntax of the French example (†) looks very neat with, preferably, a † Minus the space between the quotation dash and the first word of direct speech. A colleague who’s into text mining suggested that nesting the quotation dash would simplify operations for his analysis purposes (so that a narrator’s “someone” and a spoken “―Someone” are not artificially distinguished), but maybe that’s something we just mention in the documentation (“Snip off quotation dashes”) rather than mark up explicitly throughout the corpus? That said, I’m in favour of retaining the <lb n="010008"/><said>―</said>Come up, Kinch! Come up, you fearful jesuit!</p> This example is easy but the task, more generally, might have to be crowdsourced. Although I wonder if we were to compile a dictionary of utterance markers (“said”; “cried”; “murmured” on p. 1 alone) would that help us to automatically detect the position of a closing A punctuation mark in close proximity to an utterance marker means a return to third-person narration. Any material following a full-stop in the third-person narration indicates resumed direct speech. |
Or, when there’s a cluster of <lb n="080202"/><said>―</said>O, Mr Bloom, how do you do?
<lb n="080203"/><said>―</said>O, how do you do, Mrs Breen?
<lb n="080204"/><said>―</said>No use complaining. How is Molly those times? Haven't seen her for
<lb n="080205"/>ages.
<lb n="080206"/><said>―</said>In the pink, Mr Bloom said gaily. Milly has a position down in Mullingar,
<lb n="080207"/>you know.
<lb n="080208"/><said>―</said>Go away! Isn't that grand for her?
<lb n="080209"/><said>―</said>Yes. In a photographer's there. Getting on like a house on fire. How are
<lb n="080210"/>all your charges? |
The idea of compiling some list of utterance markers to automatically detect where to put It has made me sensitive to a nesting problem related to the question of de-paragraphing. Consider this example from "Scyalla and Charybdis." <p><lb n="090054"/>A. E. has been telling some yankee interviewer. Wall, tarnation strike
<lb n="090055"/>me!
<lb n="090056"/><said>--</said>The schoolmen were schoolboys first, Stephen said superpolitely.
<lb n="090057"/>Aristotle was once Plato's schoolboy.
<lb n="090058"/><said>--</said>And has remained so, one should hope, John Eglinton sedately said. One
<lb n="090059"/>can see him, a model schoolboy with his diploma under his arm.</p> How to best encode this? "De-paragraphed" (if I am understanding it) would look like this. <p><lb n="090054"/>A. E. has been telling some yankee interviewer. Wall, tarnation strike
<lb n="090055"/>me!
<lb n="090056"/><said>--The schoolmen were schoolboys first,</said> Stephen said superpolitely.
<said><lb n="090057"/>Aristotle was once Plato's schoolboy.</said>
<lb n="090058"/><said>--And has remained so, one should hope,</said> John Eglinton sedately said. One
<lb n="090059"/>can see him, a model schoolboy with his diploma under his arm.</said></p> And that works--the narrative voice get's placed outside the If you tried to put |
Hi Chris, The larger unit would look like this, if it’s any help: <lb n="090046"/><said>―All these questions are purely academic,</said> Russell oracled out of his
<lb n="090047"/>shadow. <said>I mean, whether Hamlet is Shakespeare or James I or Essex.
[...]
<lb n="090053"/>ideas. All the rest is the speculation of schoolboys for schoolboys.</said></p>
<p><lb n="090054"/>A. E. has been telling some yankee interviewer. Wall, tarnation strike
<lb n="090055"/>me!
<lb n="090056"/><said>―The schoolmen were schoolboys first,</said> Stephen said superpolitely. |
Apologies if I'm unclear. This isn't supposed to be an interesting example; I'm trying to go for utterly typical. Above in this thread I had wondered about re-paragraphing the deparagraphed speech. But were one to do that how would you mark it up? I.e. if, as is typical, direct speech is given its own paragraph, how would that be marked up? Same passage as before, now trying to provide each instance of direct speech its own <p><lb n="090054"/>A. E. has been telling some yankee interviewer. Wall, tarnation strike
<lb n="090055"/>me!</p>
<lb n="090056"/><said><p>--The schoolmen were schoolboys first,</p></said> Stephen said superpolitely. So how would one mark up the narrative voice There is no problem with the "de-paragraphed" speech, where multiple instances of speech can be wrapped within a single paragraph. Perhaps the moral here is simply leave speech "de-paragraphed." But am I missing an obvious alternative? |
But what would be gained by giving each instance of direct speech its own <p><lb n="090054"/>A. E. has been telling some yankee interviewer. Wall, tarnation strike
<lb n="090055"/>me!</p>
<p><lb n="090056"/><said>--The schoolmen were schoolboys first,</said> Stephen said superpolitely.</p> |
OK; I'm not sure why I was getting confused... apologies for the derailment. So I guess my question boils down to: preserve de-paragraphing or no? I think I prefer the original (un-deparagraphed) markup for speech, though my reasons are simply that it seems more consistent with other novels (typographical) representation of speech. (And, I would argue, it is closer to what someone from outside this project coming to this markup would expect: principle of least surprise.) If there is a decision on that, I'll try to tackle marking up speech in an episode on two. Given where people's efforts are right now, I think that would be a way to contribute without operating at cross purposes to you and Jonathan. (I played around last night, and I think with some carefully designed regex, it will be possible to fix the mark of speech manually, but quickly.) |
Is this something that could be split into two separate tasks, Chris? Because while I’d love for the direct speech to be marked up – a huge task – I’d really rather the </p>
<p><lb n=xxx/><said>― I get your point about contributor / user expectations and about novelistic conventions, but Joyce is clearly doing something unusual with the convention for representing direct speech. Or at least that’s how Gabler saw it. His edition moved to flush left all instances of direct speech, a decision that departs from all (I think) previous editions of Ulysses and from the vast majority of novels, but which is consistent with / closer to the fair copies of the episodes. This isn’t the place for a discussion of early-twentieth-century norms and expectations for seeing a fair copy through the publication process because, at least for the moment, my sense is that we are working towards a TEI XML version of the Gabler Ulysses. So I’d like our encoding to be done in the spirit of that early digital edition, precisely because it means we now get to tackle aspects of the edition that couldn’t be realised in the seventies and eighties. |
Yes. Absolutely. I have no strong opinion here on how to nest paragraph and If I get a moment, I may try to tackle an episode--perhaps I'll try "Telemachus"--and share what I come up with via PR. |
Terrific! I’ll take a shot at “Nestor” then over the weekend. I’ll also tackle some of the Where would be the best place to store a list of values/speaker names? A comment on this thread that we would just continually edit/augment as the list grew? |
@who next. Would welcome commentary / feedback on the decisions I made at lines 101–108.
I just did the
We had already marked up the quoted direct speech using a <lb n="060146"/><said>―</said>Immense, Martin Cunningham said pompously. <said who="Tom Kernan">His singing of that simple
<lb n="060147"/>ballad, Martin, is the most trenchant rendering I ever heard in the whole
<lb n="060148"/>course of my experience.</said>
<lb n="060149"/><said>―</said><said who="Tom Kernan">Trenchant</said>, Mr Power said laughing. He's dead nuts on that. And the
<lb n="060150"/><said who="Tom Kernan">retrospective arrangement</said>.
I’ve just redone the tagging for each speaker (Martin C. and Mr Power) and now we have this: <lb n="060146"/><said>―Immense,</said> Martin Cunningham said pompously. <said><said who="Tom Kernan">His singing of that simple
<lb n="060147"/>ballad, Martin, is the most trenchant rendering I ever heard in the whole
<lb n="060148"/>course of my experience.</said></said>
<lb n="060149"/><said>―<said who="Tom Kernan">Trenchant</said>,</said> Mr Power said laughing. <said>He's dead nuts on that. And the
<lb n="060150"/><said who="Tom Kernan">retrospective arrangement</said>.</said> In other words, direct speech quoted by another speaker (and italicized in the text) now appears inside a double |
Will someone check the <said> encoding at U 16.1270? And I’m not sure about the newspaper at 16.1248 ff.
Regarding direct speech quoted by another speaker; are you only marking it up when it is italicized? I think that might be defensible. Other cases, without a typographic marker, seem potentially open to differing interpretations of whether or not the character is being quoted. |
Yes, I think so. We started tackling this phenomenon as part of the |
Following Chris’s lead, I’ve been shifting the closing <lb n="050090"/><said>―</said>Is there any ... no trouble I hope? I see you're ...
<lb n="050091"/><said>―</said>O, no, Mr Bloom said. Poor Dignam, you know. The funeral is today. becomes <lb n="050090"/><said>―Is there any ... no trouble I hope? I see you're ...</said>
<lb n="050091"/><said>―O, no,</said> Mr Bloom said. <said>Poor Dignam, you know. The funeral is today.</said> So far it’s been pretty straightforward if time consuming. I wonder is there any way to impose quality control on the results save by way of line-by-line or spot checking? Is it possible in GitHub for another user to revert to the earlier |
I didn’t do any of the <lb n="050090"/><said who="#jb">―Is there any ... no trouble I hope? I see you're ...</said>
<lb n="050091"/><said who="#lb">―O, no,</said> Mr Bloom said. <said who="#lb">Poor Dignam, you know. The funeral is today.</said> and in the header we would have something like: <listPerson>
<person xml:id="lb">
<persName>Leopold Bloom</persName>
</person>
<person xml:id="jb">
<persName>Josie Breen</persName>
</person>
</listPerson> And so on. Or is Q. What about ambiguity? It’s not always clear exactly who is speaking in scenes like the carriage-ride of “Hades.” |
A certain amount of the students’ dialogue is unattributed to specific, named speakers. I grouped these as @who="unidentified student".
There are a few unclears in the dialogue attribution. In each case, the candidates are only one or two speakers in the carriage. Is it possible to submit multiple values for @who and distinguish an instance of ambiguity from one of chorus?
Declaring characters in the header is a great idea. I love the idea of writing out all we can about them, too. That'll make it really easy to extract all the dialogue from female characters and from male characters, and to run analyses that look at patterns in their respective speech. There's something to be said for waiting on the XML IDs, though, for the moment, and just using the full names, since it's more human readable. If we invite more contributions from elsewhere (especially from people without much XML experience), it could be useful to make these dialogue attributions clear. As for ambiguity, the TEI docs have a great page about encoding certainty that could be helpful. I imagine that the most pertinent style is something like this: I have a <emph xml:id="CE-P3">bun</emph>.
<certainty target="#CE-P3" locus="value" assertedValue="gun" degree="0.8">
<desc>a gun makes more sense in a holdup</desc>
</certainty> But in our case it'd be a <lb n="060004"/><said xml:id="060004-a" who="Cunningham">―Come on, Simon.
<certainty target="#060004-a" match="@who" locus="value" assertedValue="Power" degree="0.5">
<desc>It's unclear here whether it's Cunningham or Power speaking.</desc>
</certainty>
</said> This is nice, since it can keep track of marginal uncertainties ( However, if this is too complicated, we could also just use GitHub issues to track uncertainties, or put them all in a seperate file, called edge-cases.md, for instance. |
Having character profiles sitting somewhere will make for very interesting analysis. Bloom’s direct speech in “Eumaeus,” for example, is so different from anything else he says in the novel. I’d love to see that tackled properly. I’d also be really interested just to see the raw balance of dialogue between Bloom and Stephen. For now, though, let’s continue with character names as I like that solution for speaker ambiguity, Jonathan. What happens when there are more than two potential values, can I ask? For example, I’ve <p><lb n="060115"/>The carriage halted short.
<lb n="060116"/><said who="unclear">―What's wrong?</said>
<lb n="060117"/><said who="unclear">―We're stopped.</said>
<lb n="060118"/><said who="unclear">―Where are we?</said></p>
<p><lb n="060119"/>Mr Bloom put his head out of the window.
<lb n="060120"/><said who="Leopold Bloom">―The grand canal,</said> he said.</p> If we presume Bloom says none of these unattributed utterances, how best would we capture the ambiguity of the whole exchange? For example, if Cunningham says “What's wrong?” he hardly answers himself with “We're stopped”. Could a tagging encompass the three lines U 6.116–118 and say this is a conversation between two, perhaps three, men from the group Cunningham, Power, and Dedalus? (I presume Bloom is not speaking here, but maybe that’s my subjective reading?) For “Nestor”, I created a character called |
I was just looking at that incredible moment in “Proteus” when Stephen contemplates visiting his Aunt Sara and dreams up an entire imaginary conversation between his uncle Richie Goulding and himself (with guest vocals from cousin Walter). Here’s some of it:
Despite the quotation dash and the multiple speaking parts, none of this conversation actually happens outside of Stephen’s mind. There’s something similar in “Lestrygonians” when Bloom imagines an exchange between a “[h]otblooded young student” and a maid-turned-informer:
The Why it might be worth tackling/tagging this phenomenon is if we ever try to mark up the intrusion of other voices into interior monologue. For example, right before the imagined conversation in “Proteus,” Stephen thinks of “[m]y consubstantial father’s voice” and his interior monologue shifts into Simon Dedalus-ese, complete with Simon’s impersonations of his brother-in-law Richie and nephew Walter:
If we ever do get round to marking up interior monologue, we’d want some way of distinguishing when other voices and other characters appear or are quoted/recalled. |
We’ve inherited the following tagging convention for Joyce’s dialogue markers throughout the corpus (episodes 15, 17, and 18 excepted):
<said>
is just tag abuse here. Eventually it will be used to tag the direct speech, but that’s likely a task for the crowd.I propose the non-controversial (?) global changes of:
<said>
nesting to be replaced with<q>
nestingSo:
The double hyphen currently in use for the dialogue dash is probably a legacy from a time when the character palette was considerably smaller. But it has no place in the corpus now, I don’t think. Instead we should make a global replace with the quotation dash or horizontal bar (Unicode U+2015 or HTML
―
).So neither the hyphen (-), the en dash (–), nor the em dash/tiret (—) but the quotation dash (―).
Q. Will all platforms support the quotation bar? Markdown XML in Chome has them looking like en dashes. (◔_◔)
Questions and refinements (controversy?) that occur to me:
Do we want to encode the dialogue dash at all?
(I don’t know if
<q>
can be an empty element.)Or, when and if we have the direct speech marked up, we might want to omit both the hard-coded dialogue dash and the
<q>
tagging:The text was updated successfully, but these errors were encountered: