Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help identifying java.lang.ArrayIndexOutOfBoundsException issues and causes #90

Open
kde9867 opened this issue Sep 26, 2023 · 3 comments

Comments

@kde9867
Copy link

kde9867 commented Sep 26, 2023

I keep getting these errors.

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 65 out of bounds for length 33

2023-09-26 15:54:01,635 INFO [org.aksw.palmetto.Palmetto] - <Read 2 from file.>
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 65 out of bounds for length 33
        at org.apache.lucene.codecs.lucene41.ForUtil.readBlock(ForUtil.java:196)
        at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.refillDocs(Lucene41PostingsReader.java:744)
        at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.nextDoc(Lucene41PostingsReader.java:813)
        at org.aksw.palmetto.corpus.lucene.WindowSupportingLuceneCorpusAdapter.requestDocumentsWithWord(WindowSupportingLuceneCorpusAdapter.java:125)
        at org.aksw.palmetto.corpus.lucene.WindowSupportingLuceneCorpusAdapter.requestWordPositionsInDocuments(WindowSupportingLuceneCorpusAdapter.java:109)
        at org.aksw.palmetto.prob.window.BooleanSlidingWindowFrequencyDeterminer.determineCounts(BooleanSlidingWindowFrequencyDeterminer.java:59)
        at org.aksw.palmetto.prob.window.BooleanSlidingWindowFrequencyDeterminer.determineCounts(BooleanSlidingWindowFrequencyDeterminer.java:50)
        at org.aksw.palmetto.prob.AbstractProbabilitySupplier.getProbabilities(AbstractProbabilitySupplier.java:42)
        at org.aksw.palmetto.DirectConfirmationBasedCoherence.calculateCoherences(DirectConfirmationBasedCoherence.java:92)
        at org.aksw.palmetto.Palmetto.main(Palmetto.java:81)

We believe this is due to the following
1. there is a problem with the topic data.
Is there a specific data word length limit for using Palmetto?
Is the ArrayIndexOutOfBoundsException being thrown because the data is long?

The data I used has the top word of the topic listed on every line, separated by a space.
The data is 10 lines of 10 words.
Example)

screen lazy rude failed post hi limit href page missed
comment bot discord moderator server subreddit friendly repetitive action performed
exam professor college banned lawyer medical student trial school hack
gold subscription trading dollar investment chance invested buy spend worth
google bard search bing engine microsoft baidu chatgpt chatbot chrome
translation translate extension paste source copy india mobile copyright content
chatgpt played mode amp guy meant knew man lt alive
fan background github epic youtube watching worse final movie water
shouldnt bad shame worried trust biased alexa scared correct hurt
fish bar absolute blue christ reminds took ya steal turned
old chat bing dance screen gpt laugh dynamic youre siri
wondering average mirror mean think journey incorrect theory technically exactly
glad thanks thank awesome helpful wonderful appreciate sharing master openaichatgpt
fucking girl ridiculous garbage mad straight hilarious crazy joke insane
tag saved notion thread figured cover assignment generic designer interview
extreme coherent convincing sentence information ability impressive long knowledge believe
detect engine embrace skill writer dark ai test art remind
song poem rap dream write style wrote lyric sleep translate
whatsapp musk billion microsoft openai dalle introduced china midjourney connected
genius smarter closer invented machine engineering singularity gate define fighting

2. Is this caused by a mismatch between the lucene version and palmetto 0.1.5?
The lucene version in the pom.xml is as follows.
4.4.0
Could this error be caused by an incorrect version?
What should I set the lucene version to?

We would appreciate your response to this issue.

@MichaelRoeder
Copy link
Member

Thank you for your interest in Palmetto.

I tried your example file locally and I cannot reproduce the error. Instead, I get the following output for C_V:

2023-09-26 12:18:33,274 INFO [org.aksw.palmetto.Palmetto] - <Read 20 from file.>
14392
    0	0.22543	[screen, lazy, rude, failed, post, hi, limit, href, page, missed]
    1	0.22391	[comment, bot, discord, moderator, server, subreddit, friendly, repetitive, action, performed]
    2	0.48500	[exam, professor, college, banned, lawyer, medical, student, trial, school, hack]
    3	0.37765	[gold, subscription, trading, dollar, investment, chance, invested, buy, spend, worth]
    4	0.37760	[google, bard, search, bing, engine, microsoft, baidu, chatgpt, chatbot, chrome]
    5	0.45887	[translation, translate, extension, paste, source, copy, india, mobile, copyright, content]
    6	0.19467	[chatgpt, played, mode, amp, guy, meant, knew, man, lt, alive]
    7	0.30966	[fan, background, github, epic, youtube, watching, worse, final, movie, water]
    8	0.21816	[shouldnt, bad, shame, worried, trust, biased, alexa, scared, correct, hurt]
    9	0.24734	[fish, bar, absolute, blue, christ, reminds, took, ya, steal, turned]
   10	0.26910	[old, chat, bing, dance, screen, gpt, laugh, dynamic, youre, siri]
   11	0.28883	[wondering, average, mirror, mean, think, journey, incorrect, theory, technically, exactly]
   12	0.34330	[glad, thanks, thank, awesome, helpful, wonderful, appreciate, sharing, master, openaichatgpt]
   13	0.43697	[fucking, girl, ridiculous, garbage, mad, straight, hilarious, crazy, joke, insane]
   14	0.31246	[tag, saved, notion, thread, figured, cover, assignment, generic, designer, interview]
   15	0.39453	[extreme, coherent, convincing, sentence, information, ability, impressive, long, knowledge, believe]
   16	0.31439	[detect, engine, embrace, skill, writer, dark, ai, test, art, remind]
   17	0.51804	[song, poem, rap, dream, write, style, wrote, lyric, sleep, translate]
   18	0.11726	[whatsapp, musk, billion, microsoft, openai, dalle, introduced, china, midjourney, connected]
   19	0.31286	[genius, smarter, closer, invented, machine, engineering, singularity, gate, define, fighting]

Based on the stack trace, I would guess that there is a problem with the Lucene index that you try to use. If you created the index by yourself, I would suggest to double check whether something went wrong while creating it. If you downloaded the index you may want to download it again.

@kde9867
Copy link
Author

kde9867 commented Sep 26, 2023

Thank you for your reply, I have downloaded the Lucene index with the preprocessed Wikipedia following the instructions from palmetto wiki. Would it help to download it again and try using palmetto?

@MichaelRoeder
Copy link
Member

"If you downloaded the index you may want to download it again."

Something seems to be wrong with the index. At least that is the first idea that comes to my mind when looking at your error message. So I would suggest to download it again 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants