srl for large for text file #4

anithachacko · 2016-12-11T15:51:59Z

I have been trying to use SRL for a large text file containing many sentences. I am getting the following error:

Traceback (most recent call last):
File "C:\Users\anithachacko\Downloads\main_pjt_code1.py", line 370, in
tagged=StringIO(annotator.getAnnotations(string1)['srl'])
File "C:\Python27\lib\site-packages\practnlptools\tools.py", line 219, in getAnnotations
pos+=[senna_tag[1].strip()]
IndexError: list index out of range

profversaggi · 2016-12-11T15:54:42Z

There is a hard limit of 1024 records that this particular SRL tool can handle - I think it's because of the combinatorial explosion at larger input sets. I had to keep my inputs under that limit and it worked fine. Hope that helps ....

…

On Sun, Dec 11, 2016 at 9:51 AM, anithachacko ***@***.***> wrote: I have been trying to use SRL for a large text file containing many sentences. I am getting the following error: Traceback (most recent call last): File "C:\Users\anithachacko\Downloads\main_pjt_code1.py", line 370, in tagged=StringIO(annotator.getAnnotations(string1)['srl']) File "C:\Python27\lib\site-packages\practnlptools\tools.py", line 219, in getAnnotations pos+=[senna_tag[1].strip()] IndexError: list index out of range — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#4>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABOvhZUjBzGzz9OWRE3GhwUlABTtgJ9Vks5rHBwfgaJpZM4LJ9KX> .

-- ######################################################### Matthew R. Versaggi, Senior Director of Artificial Intelligence & Machine Learning - Optum Technologies Email: mailto:ProfVersaggi@gmail.com M: 630-292-8422 LinkedIn: http://www.linkedin.com/in/versaggi About Me: http://www.matt-versaggi.com/resume/ #########################################################

anithachacko · 2016-12-11T15:58:09Z

Is there any other approach to this. I am doing a question answer system which fetch a wikipedia page from web and store the content in a text file. I need this entire text file.

profversaggi · 2016-12-11T16:12:02Z

Yes - you can break up the entirety of the input file into chunks that are under 1024 (like 1020 to be safe) records - that's what we did and it worked. You'll just have to do good record keeping to keep things straight.

…

On Sun, Dec 11, 2016 at 9:58 AM, anithachacko ***@***.***> wrote: Is there any other approach to this. I am doing a question answer system which fetch a wikipedia page from web and store the content in a text file. I need this entire text file. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABOvhdssBYqk_0u72xFaU2m8BlpzXHxrks5rHB2RgaJpZM4LJ9KX> .

-- ######################################################### Matthew R. Versaggi, Senior Director of Artificial Intelligence & Machine Learning - Optum Technologies Email: mailto:ProfVersaggi@gmail.com M: 630-292-8422 LinkedIn: http://www.linkedin.com/in/versaggi About Me: http://www.matt-versaggi.com/resume/ #########################################################

anithachacko · 2016-12-11T16:28:23Z

Hi,
I would be really grateful if you can share anything that would help me with this process.

profversaggi · 2016-12-11T16:30:14Z

Sure - tell me what you need and I'll see if I can did it up :-)

…

On Sun, Dec 11, 2016 at 10:28 AM, anithachacko ***@***.***> wrote: Hi, I would be really grateful if you can share anything that would help me with this process. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABOvhX5j8rCOYvZorn3N2_wB6G3QgUG8ks5rHCSogaJpZM4LJ9KX> .

-- ######################################################### Matthew R. Versaggi, Senior Director of Artificial Intelligence & Machine Learning - Optum Technologies Email: mailto:ProfVersaggi@gmail.com M: 630-292-8422 LinkedIn: http://www.linkedin.com/in/versaggi About Me: http://www.matt-versaggi.com/resume/ #########################################################

anithachacko · 2016-12-11T16:37:54Z

I am trying to read the contents from a URL ,clean it using BeautifulSoup, tokenize them and store it in a text file. I need to perform SRL on this entire text file to extract TMP,LOC and PER Labels.

profversaggi · 2016-12-11T16:47:50Z

That part I cannot help you with - it's indigenous to the app you are building. What I can recommend is that you take a quick sizing of the entirety of the data coming in right away and then have a function that will break that up into chunks of less than 1020 records (sentences If I remember) and then feed those chunks to the SRL engine - that's how got around that limitation.

…

On Sun, Dec 11, 2016 at 10:37 AM, anithachacko ***@***.***> wrote: I am trying to read the contents from a URL ,clean it using BeautifulSoup, tokenize them and store it in a text file. I need to perform SRL on this entire text file to extract TMP,LOC and PER Labels. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABOvhT6q-8gLG20JUsZWSxkdFoSLMGdpks5rHCbigaJpZM4LJ9KX> .

-- ######################################################### Matthew R. Versaggi, Senior Director of Artificial Intelligence & Machine Learning - Optum Technologies Email: mailto:ProfVersaggi@gmail.com M: 630-292-8422 LinkedIn: http://www.linkedin.com/in/versaggi About Me: http://www.matt-versaggi.com/resume/ #########################################################

anithachacko · 2016-12-11T16:51:54Z

Thanks for the help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

srl for large for text file #4

srl for large for text file #4

anithachacko commented Dec 11, 2016

profversaggi commented Dec 11, 2016 via email

anithachacko commented Dec 11, 2016

profversaggi commented Dec 11, 2016 via email

anithachacko commented Dec 11, 2016

profversaggi commented Dec 11, 2016 via email

anithachacko commented Dec 11, 2016

profversaggi commented Dec 11, 2016 via email

anithachacko commented Dec 11, 2016

srl for large for text file #4

srl for large for text file #4

Comments

anithachacko commented Dec 11, 2016

profversaggi commented Dec 11, 2016 via email

anithachacko commented Dec 11, 2016

profversaggi commented Dec 11, 2016 via email

anithachacko commented Dec 11, 2016

profversaggi commented Dec 11, 2016 via email

anithachacko commented Dec 11, 2016

profversaggi commented Dec 11, 2016 via email

anithachacko commented Dec 11, 2016