-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VowpalWabbitClassifier does not work with --oaa (One Against All) argument #804
Comments
Hey, why am I still getting this issue? |
@imatiach-msft I do not want to open another issue because the error is the same as mentioned by this thread's author. So is the stack trace.
|
@Ibtastic are you sure you are seeing exact same stack trace, with same line numbers as above? The line numbers should change I think since the code has changed. The issue was supposedly fixed with this PR: |
I will paste the stack trace in a while. Also, I thought of trying this with Spark 3.1.1, I got |
@Ibtastic spark 3.1 is only supported on latest master, you can use any master build please try this walkthrough with pictures on databricks: For example: Maven Coordinates |
@imatiach-msft Thanks for pointing that out!
|
adding @jackgerrits any idea? |
Have there been any updates on this issue? I'm seeing the same error. If this has been resolved, would it be possible for someone to provide a simple working example? PySpark Version: 3.1.2 pred = model.transform(test_df_feat) In [13]: pred.show(10) 22/05/26 21:47:19 ERROR TaskSetManager: Task 0 in stage 4.0 failed 4 times; aborting jobPy4JJavaError Traceback (most recent call last) ~/.venv/default/lib64/python3.7/site-packages/pyspark/sql/dataframe.py in show(self, n, truncate, vertical) ~/.venv/default/lib64/python3.7/site-packages/py4j/java_gateway.py in call(self, *args) ~/.venv/default/lib64/python3.7/site-packages/pyspark/sql/utils.py in deco(*a, **kw) ~/.venv/default/lib64/python3.7/site-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) Py4JJavaError: An error occurred while calling o398.showString. Driver stacktrace: |
It's not implemented yet: SynapseML/vw/src/main/scala/com/microsoft/azure/synapse/ml/vw/VowpalWabbitClassifier.scala Line 64 in 9d16166
let me follow-up w/ Jack. is there a minimal dataset we can use to repro? |
Modified from the adult census example:
|
Thanks for the effort on the PR! I look forward to test driving once it gets merged :) |
Describe the bug
Vowpal Wabbit's One Against All classifier does not work via the MMLSpark interface.
To Reproduce
features is a column of sparse vectors (constructed via VowpalWabbitFeaturizer in my case), label is a column of integers with values {1, 2}.
Expected behavior
would show my testDF with predictedLabel column containing predictions.
Info (please complete the following information):
** Stacktrace**
To me, it looks like the
Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to org.vowpalwabbit.spark.prediction.ScalarPrediction at com.microsoft.ml.spark.vw.VowpalWabbitBaseModel$class.predictInternal(VowpalWabbitBaseModel.scala:84)
is the root cause. Could it be that--oaa
outputs integers instead of doubles expected by MMLSpark?Additional context
For context, this works fine in my setup on the same dataset with the same VowpalWabbitFeaturizer (although I have to convert labels to {1, 0}):
AB#1166568
The text was updated successfully, but these errors were encountered: