-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible modification suggestion?: Sentence boundary disambiguation, and sentence segmentation (each sentence on a new line) – “period” “space” with “period” "new line” #4
Comments
Thanks for writing in! That's a really interesting idea. Do know if there are any scientific studies that have been done about the efficiency of sentence boundary disambiguation as compared to other reading aids, or the most effective way to implement it? This isn't something I'm particularly interested in pursuing myself, but that shouldn't stop you! The syntax you suggest looks like it should work perfectly — if you're interested in learning to program, figuring out how to make a tweak like that might be a great first step :). (Thanks for the heads-up on the broken link as well.) |
Implementation The people behind Grammarly (grammar checker software) have a short overview of sentence boundary disambiguation methods. http://tech.grammarly.com/blog/posts/How-to-Split-Sentences.html
I’m not sure how these more accurate systems work. I’m guessing that if you were to try to implement one of these systems for quick online reading, it would be more difficult to retain the structure and formatting of the original text. As the sentence boundary disambiguation Wikipedia entry mentions, identifying a period, a capitalized token, and some special abbreviations (e.g. “Ph.D. ” or “Mt. ” for mountain) allows you to catch 95% of sentences. Papers Line length and readability: speed vs. user experience? In the past, I’ve found references for line length and readability. Dyson and Kipping (1997), Dyson and Haselgrove (2001), Bernard, Fernandez, and Hull (2002), Ling and van Schaik (2006). A YouTube Google presentation on cognitive science also mentioned this. Eye regressions and backtracking for semantic and syntactic errors? I haven’t researched hard to find a paper, and scholarly titles are out of my league, but I just came across one possibly relevant paper about a topic that could be tested with sentence segmentation: Braze D, Shankweiler D, Ni W, Palumbo LC (January 2002). "Readers' eye movements distinguish anomalies of form and content". J Psycholinguist Res 31 (1): 25–44. PMC 2850050. PMID 11924838. http://www.ncbi.nlm.nih.gov/pubmed/11924838
I think that it’s syntax versus semantics.
(examples from Braze, D., Shankweiler, D. P., & Tabor, W. (2004). Individual Differences in Processing Anomalies of Form and Content. Poster presented at the 17th CUNY Conference on Human Sentence Processing. College Park, MD. http://www.haskins.yale.edu/staff/braze/braze-cuny2004-2up.pdf)
Pragmatic increased reading times more than syntactic errors. They talk about the eye regression landing sites.
I think that means that if you’re eyes are going to regress left for a syntax error, it’s going to happen right away at that error. For semantic errors, as you get further away from an error, you’re more likely to backtrack.
If you have to go back after experiencing a semantic error, you’re more likely to land closer to the beginning of a sentence than when you regress in a syntactic error and anomaly. Thoughts: a simple test for search time only So for pragmatic anomalies, it takes longer to read, and you regress further back than syntactic errors. For one basic and preliminary test, I think that you could put general comprehension aside. Counterargument: pragmatic anomalies are not natural – pragmatic anomalies = difficult-to-read material? I think that a counterargument is that these pragmatic errors are artificially created, and will rarely appear when reading normal text. However, whether it’s pragmatic anomalies, or difficult-to-read material, I think that either could induce a similar confused state. Counterargument: regressions to the beginning of the currently read sentence aren’t that far If you need to repeat a sentence after failing to understand it, the beginning isn’t really that far, and you might not be saving that much time. Other thoughts: backtracking and regression across multiple sentences and paragraphs – thought tangents, and working memory I would also be interested in backtracking to previous sentences and sections. There’s also the “zoning out” that can occur where you’ve read text, but you weren’t paying attention.
-Reddit comments Not comprehending a few sentences in a row due to a thought tangent might require a farther regression, and a longer search for a previous point to start. Lastly, while individual sentences might be fine, the overall content might be structured less adequately. Future experiments: additional factors to manipulate: grammar, length, skill, new material I think that if there are or will be experiments, additional factors that could create more challenging material, and thus a possibly more confused state could be:
There could be sentences with multiple qualifiers, prepositions, classes, etc.
People can have very different reading comprehension abilities.
Fresh material with concepts that a user doesn’t already know can be harder to read. Thanks for the start I have somewhat of a repetitive strain injury of tendinosis (keep those wrists neutral, especially when you game), so I kind of move at a snail’s pace, but over 99% of the code is already written for me, and I truly believe that this could be useful for some people if it works out, so I should really get around to figuring out how to tweak it myself. Thanks for having this extension and code up in the first place. |
Possible modification suggestion?: Sentence boundary disambiguation, and sentence segmentation (each sentence on a new line) – “period” “space” with “period” "new line”
Apologies in advance, as this isn’t an issue, but an off-topic idea.
Sentence boundary disambiguation, and sentence segmentation (each sentence on a new line) – search and replace
I’ve always been a terrible reader and slow learner, so to aid me in reading longer and more difficult pieces of text, I sometimes segment the text by sentence boundaries (put each sentence on a new line).
(wikipedia/org/wiki/Sentence_boundary_disambiguation)
This can allow me to quickly re-read portions of the text, as my eyes immediately find the start of sentences.
You also get a good view of the length of each sentence, so you might get a better idea of where the subject(s), verb(s), and object(s) of a sentence structure may be laid.
This can be done in a word processor with a text replacement of “period” “space”, with “period” “manual line break”, "new line", or “paragraph break”.
i.e. Search for:
.
Replace:
.\n
or
or
To segment online text, Ditto (open-source clipboard manager) can be used to gather multiple clipboard copies, and then you can paste all of the collected text into a word processor.
Other formatting examples
I don’t know anything about programming, but I think it could be similar to how people use things like the pprint (“pretty-print”) Python module to help read longer, nested data structures (?).
e.g. of pprint:
Other examples:
XAlign Xcode plugin:
http://i.imgur.com/o0Ysfw8.gif
ClangFormat-Xcode plugin:
http://i.imgur.com/vYts5uv.gif
nshipster/com/xcode-plugins/
JavaScript search and replace
I’ve been wondering if a piece of JavaScript could be used to segment online text so that you wouldn’t have to keep transferring text to a word processor.
Perhaps the code here in “Literally” could be modified.
Again, I don’t know how to program, but maybe the following could be adjusted:
Replace this:
v = v.replace(/\bliterally\b/g, "figuratively");
with this?:
v = v.replace(/\.\s/g, “.\n”);
(I’m not sure if the syntax and/or regular expression is correct)
Installing a plug-in
The link to the .crx file in “README.md” wasn’t working:
I managed to grab it from the master branch on top:
https://github.com/lazerwalker/literally/blob/master/Literally.crx
I dragged Literally.crx into the “Extensions” area of Chrome, but I don’t think you can easily install user scripts in regular Chrome anymore, so Literally isn’t enabled.
However, I did manage to get the plug-in working on Firefox by downloading and installing literally.xpi.
Yeah, I’m just wondering, and throwing the thought out there.
I’d definitely purchase a sentence segmentation browser extension.
The text was updated successfully, but these errors were encountered: