-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tool to extract objects with specified types #92
Comments
…rking on this assumption. this fix passes the unit tests and I think it will work on the cluster
…using the rewindableValues after all
…omething uncontolled in environment in last run?
…; hopefully the logging code isn't making this work...
I have had some success running the following command
and even over all of the
now I did take out the logging code, but it really ought to work without it. (Although lately I've had a lot of cases where things started working when I added logging code...) |
ooooh.... looks like reducer 0 failed with the following message...
Note that our nodes are highly memory constrained so I can believe we are running out of heap. It's disturbing that the system seems to think that this task succeeded. |
Note that two processes failed, and the other processes wound up like
probably our prefix processing is too smart and it doesn't just prepend the prefix, so we need to put skiing in all the types, unless we change it to make it dumber. |
Here is what I am running now
I get a total of 142 facts out of this so it's obvious that things are still terribly out of whack. What's going on? |
…rables because the one for sets seemed to cause trouble in the past
The result from this is screwier than I expected. With the above output I am getting a total of 95 facts in the output, which is less than the number of ski areas we should be turning up. Stranger than that, I find duplicates, which is completely unexpected
I'm planning on running the following test case
once I've checked out a few possible ways this could happen. (Verify no dups in the raw data, look to see that we're not adding the same input path over and over again, etc.) |
If we run that test case, we get 475 facts, which is close to the number of ski areas. All of those facts are 'a' facts, and there are still lots of dups... |
…re now we're getting hung up because Hadoop is reusing Writables so they are not safe to store in a collection
This is necessary for skibase. A big part of this will be configuring it so that topics are grouped and sorted on ?s.
A simple and scalable strategy is for the reducer to run in two passes. The first one will check to see if the condition is met, and then the next will send the facts to the output if that is the case.
The text was updated successfully, but these errors were encountered: