Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Databricks 14.3 LTS usage of internal _jvm variable is no longer supported #2167

Open
4 of 19 tasks
NathanNZ opened this issue Feb 7, 2024 · 0 comments
Open
4 of 19 tasks

Comments

@NathanNZ
Copy link

NathanNZ commented Feb 7, 2024

SynapseML version

com.microsoft.azure:synapseml_2.12:1.0.2

System information

  • Spark Version 3.5.0
  • Platform: Azure Databricks
  • Operating System: Ubuntu 22.04.3 LTS
  • Java: Zulu 8.74.0.17-CA-linux64
  • Scala: 2.12.15
  • Python: 3.10.12

Describe the problem

There is no longer a dependency on the JVM when querying Apache Spark and as a consequence, internal APIs related to the JVM, such as _jsc, _jconf, _jvm, _jsparkSession, _jreader, _jc, _jseq, _jdf, _jmap, and _jcols are no longer supported.

https://github.com/microsoft/SynapseML/blob/fa9ba2eac6ea5e219dcae0f0025ef2ca9313a081/cognitive/src/main/python/synapse/ml/services/search/AzureSearchWriter.py#L18

Code to reproduce issue

from pyspark.sql.functions import lit, col, date_format, to_json
from synapse.ml.services import writeToAzureSearch
 
df = spark.read.table(index_delta_table)
 
df2 = df.select(
        col("id").alias("id"),
        col("subject").alias("subject"),
        lit("mergeOrUpload").alias("action")
    )
 
writeToAzureSearch(df2,
        subscriptionKey=ai_search_key,
        actionCol="action",
        serviceName=ai_search_name,
        indexName=index_name,
        batchSize='1000',
        keyCol="id"
    )

Other info / logs

Py4JJavaError: An error occurred while calling z:com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter.write.
: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(Lorg/apache/spark/sql/types/StructType;)Lorg/apache/spark/sql/catalyst/encoders/ExpressionEncoder;
	at com.microsoft.azure.synapse.ml.core.schema.SparkBindings.rowEnc$lzycompute(SparkBindings.scala:17)
	at com.microsoft.azure.synapse.ml.core.schema.SparkBindings.rowEnc(SparkBindings.scala:17)
	at com.microsoft.azure.synapse.ml.core.schema.SparkBindings.makeFromRowConverter(SparkBindings.scala:26)
	at com.microsoft.azure.synapse.ml.io.http.ErrorUtils$.addErrorUDF(SimpleHTTPTransformer.scala:57)
	at com.microsoft.azure.synapse.ml.io.http.SimpleHTTPTransformer.$anonfun$makePipeline$1(SimpleHTTPTransformer.scala:135)
	at org.apache.spark.injections.UDFUtils$$anon$1.call(UDFUtils.scala:23)
	at org.apache.spark.sql.functions$.$anonfun$udf$91(functions.scala:8103)
	at com.microsoft.azure.synapse.ml.stages.Lambda.$anonfun$transform$1(Lambda.scala:55)
	at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logVerb(SynapseMLLogging.scala:163)
	at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logVerb$(SynapseMLLogging.scala:160)
	at com.microsoft.azure.synapse.ml.stages.Lambda.logVerb(Lambda.scala:24)
	at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logTransform(SynapseMLLogging.scala:157)
	at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logTransform$(SynapseMLLogging.scala:156)
	at com.microsoft.azure.synapse.ml.stages.Lambda.logTransform(Lambda.scala:24)
	at com.microsoft.azure.synapse.ml.stages.Lambda.transform(Lambda.scala:56)
	at com.microsoft.azure.synapse.ml.stages.Lambda.transformSchema(Lambda.scala:64)
	at org.apache.spark.ml.PipelineModel.$anonfun$transformSchema$5(Pipeline.scala:317)
	at scala.collection.IndexedSeqOptimized.foldLeft(IndexedSeqOptimized.scala:60)
	at scala.collection.IndexedSeqOptimized.foldLeft$(IndexedSeqOptimized.scala:68)
	at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:198)
	at org.apache.spark.ml.PipelineModel.transformSchema(Pipeline.scala:317)
	at com.microsoft.azure.synapse.ml.io.http.SimpleHTTPTransformer.transformSchema(SimpleHTTPTransformer.scala:170)
	at org.apache.spark.ml.PipelineModel.$anonfun$transformSchema$5(Pipeline.scala:317)
	at scala.collection.IndexedSeqOptimized.foldLeft(IndexedSeqOptimized.scala:60)
	at scala.collection.IndexedSeqOptimized.foldLeft$(IndexedSeqOptimized.scala:68)
	at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:198)
	at org.apache.spark.ml.PipelineModel.transformSchema(Pipeline.scala:317)
	at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:72)
	at org.apache.spark.ml.PipelineModel.$anonfun$transform$2(Pipeline.scala:310)
	at org.apache.spark.ml.MLEvents.withTransformEvent(events.scala:148)
	at org.apache.spark.ml.MLEvents.withTransformEvent$(events.scala:141)
	at org.apache.spark.ml.util.Instrumentation.withTransformEvent(Instrumentation.scala:45)
	at org.apache.spark.ml.PipelineModel.$anonfun$transform$1(Pipeline.scala:309)
	at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:289)
	at scala.util.Try$.apply(Try.scala:213)
	at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:289)
	at org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:308)
	at com.microsoft.azure.synapse.ml.services.CognitiveServicesBaseNoHandler.$anonfun$transform$1(CognitiveServiceBase.scala:548)
	at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logVerb(SynapseMLLogging.scala:163)
	at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logVerb$(SynapseMLLogging.scala:160)
	at com.microsoft.azure.synapse.ml.services.CognitiveServicesBaseNoHandler.logVerb(CognitiveServiceBase.scala:495)
	at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logTransform(SynapseMLLogging.scala:157)
	at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logTransform$(SynapseMLLogging.scala:156)
	at com.microsoft.azure.synapse.ml.services.CognitiveServicesBaseNoHandler.logTransform(CognitiveServiceBase.scala:495)
	at com.microsoft.azure.synapse.ml.services.CognitiveServicesBaseNoHandler.transform(CognitiveServiceBase.scala:548)
	at com.microsoft.azure.synapse.ml.services.search.AddDocuments.super$transform(AzureSearch.scala:137)
	at com.microsoft.azure.synapse.ml.services.search.AddDocuments.$anonfun$transform$1(AzureSearch.scala:137)
	at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logVerb(SynapseMLLogging.scala:163)
	at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logVerb$(SynapseMLLogging.scala:160)
	at com.microsoft.azure.synapse.ml.services.CognitiveServicesBaseNoHandler.logVerb(CognitiveServiceBase.scala:495)
	at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logTransform(SynapseMLLogging.scala:157)
	at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logTransform$(SynapseMLLogging.scala:156)
	at com.microsoft.azure.synapse.ml.services.CognitiveServicesBaseNoHandler.logTransform(CognitiveServiceBase.scala:495)
	at com.microsoft.azure.synapse.ml.services.search.AddDocuments.transform(AzureSearch.scala:138)
	at com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter$.prepareDF(AzureSearch.scala:308)
	at com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter$.write(AzureSearch.scala:432)
	at com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter$.write(AzureSearch.scala:440)
	at com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter.write(AzureSearch.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
	at java.lang.Thread.run(Thread.java:750)
File <command-2049793672345868>, line 25
      5 df = spark.read.table(index_delta_table)
      7 df2 = df.select(
      8         col("id").alias("id"),
      9         col("subject").alias("subject"),
   (...)
     22         lit("mergeOrUpload").alias("action")
     23     )
---> 25 writeToAzureSearch(df2,
     26         subscriptionKey=ai_search_key,
     27         actionCol="action",
     28         serviceName=ai_search_name,
     29         indexName=index_name,
     30         batchSize='1000',
     31         keyCol="id"
     32     )
File /local_disk0/spark/userFiles/com_microsoft_azure_synapseml_cognitive_2_12_1_0_2.jar/synapse/ml/services/search/AzureSearchWriter.py:28, in writeToAzureSearch(df, **options)
     26 jvm = SparkContext.getOrCreate()._jvm
     27 writer = jvm.com.microsoft.azure.synapse.ml.services.search.AzureSearchWriter
---> 28 writer.write(df._jdf, options)
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
   1316 command = proto.CALL_COMMAND_NAME +\
   1317     self.command_header +\
   1318     args_command +\
   1319     proto.END_COMMAND_PART
   1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
   1323     answer, self.gateway_client, self.target_id, self.name)
   1325 for temp_arg in temp_args:
   1326     if hasattr(temp_arg, "_detach"):
File /databricks/spark/python/pyspark/errors/exceptions/captured.py:224, in capture_sql_exception.<locals>.deco(*a, **kw)
    222 def deco(*a: Any, **kw: Any) -> Any:
    223     try:
--> 224         return f(*a, **kw)
    225     except Py4JJavaError as e:
    226         converted = convert_exception(e.java_exception)
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))

What component(s) does this bug affect?

  • area/cognitive: Cognitive project
  • area/core: Core project
  • area/deep-learning: DeepLearning project
  • area/lightgbm: Lightgbm project
  • area/opencv: Opencv project
  • area/vw: VW project
  • area/website: Website
  • area/build: Project build system
  • area/notebooks: Samples under notebooks folder
  • area/docker: Docker usage
  • area/models: models related issue

What language(s) does this bug affect?

  • language/scala: Scala source code
  • language/python: Pyspark APIs
  • language/r: R APIs
  • language/csharp: .NET APIs
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/synapse: Azure Synapse integrations
  • integrations/azureml: Azure ML integrations
  • integrations/databricks: Databricks integrations
@NathanNZ NathanNZ added the bug label Feb 7, 2024
@github-actions github-actions bot added the triage label Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant