Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IOException in HFST-Wrapper #20

Open
DavidNemeskey opened this issue Dec 27, 2016 · 1 comment
Open

IOException in HFST-Wrapper #20

DavidNemeskey opened this issue Dec 27, 2016 · 1 comment
Assignees
Labels

Comments

@DavidNemeskey
Copy link
Contributor

I get IOExceptions (more often IO Exception -- I guess it depends on where the error occurs, i.e. enough words are written to the stdin of the dead process) for some input to the HFST Analyzer module.

Example output:

IO Exception: null
IO Exception: null
IO Exception: null
IO Exception: null
IO Exception: null
IO Exception: null
IO Exception: null
java.io.IOException: Broken pipe
	at java.io.FileOutputStream.writeBytes(Native Method)
	at java.io.FileOutputStream.write(FileOutputStream.java:326)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
	at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:297)
	at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
	at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
	at hu.nytud.hfst.Analyzer$WorkerProcess.write(Analyzer.java:218)
	at hu.nytud.hfst.Analyzer$WorkerProcess.run(Analyzer.java:206)
	at java.lang.Thread.run(Thread.java:745)

Example input from the Hungarian Webcorpus:
ioexception.input.txt

The culprit is the very long token Pécs-Nagykanizsa-Graz-Aussee-Ischl-Salzburg-Zürich-Luzern-Rigire-Zürich-München-Linz-Bécs-Győr-Mohács-Pécs, but presumable other inputs could induce the error as well. What is strange is that if I run hfst-lookup with the same parameters it is run by GATE:

cat ioexception.input.txt | ../linux/hfst-lookup.sh --cascade=composition --xfst=print-pairs --xfst=print-space --pipe-mode -t 2 ../hu.hfstol

, it is processed without a hitch.

@DavidNemeskey
Copy link
Contributor Author

The full example. I first ran it through quntoken (quntoken qterror.txt), and parsed the non-ws tokens from it. The resulting file is qterror.tokens.txt. Then I ran hfst-lookup on it, as described above, and no errors. I then tried it with GATE, and got the aforementioned problems. I also printed all tokens sent to HFST-Wrapper, and it is exactly the same as qterror.tokens.txt. So the error must be in the wrapper somewhere.

qterror.txt

qterror.tokens.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants