-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
srl for large for text file #4
Comments
There is a hard limit of 1024 records that this particular SRL tool can
handle - I think it's because of the combinatorial explosion at larger
input sets. I had to keep my inputs under that limit and it worked fine.
Hope that helps ....
…On Sun, Dec 11, 2016 at 9:51 AM, anithachacko ***@***.***> wrote:
I have been trying to use SRL for a large text file containing many
sentences. I am getting the following error:
Traceback (most recent call last):
File "C:\Users\anithachacko\Downloads\main_pjt_code1.py", line 370, in
tagged=StringIO(annotator.getAnnotations(string1)['srl'])
File "C:\Python27\lib\site-packages\practnlptools\tools.py", line 219, in
getAnnotations
pos+=[senna_tag[1].strip()]
IndexError: list index out of range
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#4>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/ABOvhZUjBzGzz9OWRE3GhwUlABTtgJ9Vks5rHBwfgaJpZM4LJ9KX>
.
--
#########################################################
Matthew R. Versaggi,
Senior Director of Artificial Intelligence &
Machine Learning - Optum Technologies
Email: mailto:ProfVersaggi@gmail.com
M: 630-292-8422
LinkedIn: http://www.linkedin.com/in/versaggi
About Me: http://www.matt-versaggi.com/resume/
#########################################################
|
Is there any other approach to this. I am doing a question answer system which fetch a wikipedia page from web and store the content in a text file. I need this entire text file. |
Yes - you can break up the entirety of the input file into chunks that are
under 1024 (like 1020 to be safe) records - that's what we did and it
worked. You'll just have to do good record keeping to keep things straight.
…On Sun, Dec 11, 2016 at 9:58 AM, anithachacko ***@***.***> wrote:
Is there any other approach to this. I am doing a question answer system
which fetch a wikipedia page from web and store the content in a text file.
I need this entire text file.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABOvhdssBYqk_0u72xFaU2m8BlpzXHxrks5rHB2RgaJpZM4LJ9KX>
.
--
#########################################################
Matthew R. Versaggi,
Senior Director of Artificial Intelligence &
Machine Learning - Optum Technologies
Email: mailto:ProfVersaggi@gmail.com
M: 630-292-8422
LinkedIn: http://www.linkedin.com/in/versaggi
About Me: http://www.matt-versaggi.com/resume/
#########################################################
|
Hi, |
Sure - tell me what you need and I'll see if I can did it up :-)
…On Sun, Dec 11, 2016 at 10:28 AM, anithachacko ***@***.***> wrote:
Hi,
I would be really grateful if you can share anything that would help me
with this process.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABOvhX5j8rCOYvZorn3N2_wB6G3QgUG8ks5rHCSogaJpZM4LJ9KX>
.
--
#########################################################
Matthew R. Versaggi,
Senior Director of Artificial Intelligence &
Machine Learning - Optum Technologies
Email: mailto:ProfVersaggi@gmail.com
M: 630-292-8422
LinkedIn: http://www.linkedin.com/in/versaggi
About Me: http://www.matt-versaggi.com/resume/
#########################################################
|
I am trying to read the contents from a URL ,clean it using BeautifulSoup, tokenize them and store it in a text file. I need to perform SRL on this entire text file to extract TMP,LOC and PER Labels. |
That part I cannot help you with - it's indigenous to the app you are
building.
What I can recommend is that you take a quick sizing of the entirety of the
data coming in right away and then have a function that will break that up
into chunks of less than 1020 records (sentences If I remember) and then
feed those chunks to the SRL engine - that's how got around that limitation.
…On Sun, Dec 11, 2016 at 10:37 AM, anithachacko ***@***.***> wrote:
I am trying to read the contents from a URL ,clean it using BeautifulSoup,
tokenize them and store it in a text file. I need to perform SRL on this
entire text file to extract TMP,LOC and PER Labels.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABOvhT6q-8gLG20JUsZWSxkdFoSLMGdpks5rHCbigaJpZM4LJ9KX>
.
--
#########################################################
Matthew R. Versaggi,
Senior Director of Artificial Intelligence &
Machine Learning - Optum Technologies
Email: mailto:ProfVersaggi@gmail.com
M: 630-292-8422
LinkedIn: http://www.linkedin.com/in/versaggi
About Me: http://www.matt-versaggi.com/resume/
#########################################################
|
Thanks for the help. |
I have been trying to use SRL for a large text file containing many sentences. I am getting the following error:
Traceback (most recent call last):
File "C:\Users\anithachacko\Downloads\main_pjt_code1.py", line 370, in
tagged=StringIO(annotator.getAnnotations(string1)['srl'])
File "C:\Python27\lib\site-packages\practnlptools\tools.py", line 219, in getAnnotations
pos+=[senna_tag[1].strip()]
IndexError: list index out of range
The text was updated successfully, but these errors were encountered: