Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempting to shade scala-logging dependency #293

Closed
echeipesh opened this issue Aug 17, 2019 · 12 comments · Fixed by #379
Closed

Attempting to shade scala-logging dependency #293

echeipesh opened this issue Aug 17, 2019 · 12 comments · Fixed by #379
Assignees
Labels
bug When it really isn't a "feature".

Comments

@echeipesh
Copy link
Contributor

This issue was encountered while we were attempting to shade the com.typesafe.scala-logging classes in an effort to get RasterFrames running in DataBricks environment. DataBricks itself brings ins com.typesafe.scala-logging version 2.1.2 on the class path which is not binary compatible with version 3.9.0 used by RasterFrames and will result in InstantiationException

Reproduce

The failure to shade can be reproduced locally and shows ups as a failed import.

The attempt is done against current develop by trying to hook into the mechanism in RFAssemblyPlugin

diff --git a/project/RFAssemblyPlugin.scala b/project/RFAssemblyPlugin.scala
index 8f69776f..223257a2 100644
--- a/project/RFAssemblyPlugin.scala
+++ b/project/RFAssemblyPlugin.scala
@@ -53,7 +53,8 @@ object RFAssemblyPlugin extends AutoPlugin {
         "com.amazonaws",
         "org.apache.avro",
         "org.apache.http",
-        "com.google.guava"
+        "com.google.guava",
+        "com.typesafe.scalalogging"
       )
       shadePrefixes.map(p ⇒ ShadeRule.rename(s"$p.**" -> s"rf.shaded.$p.@1").inAll)
     },

Then creating assembly using:

sbt  'project pyrasterframes' assembly

Inspecting the assembly jar we see the shading took effect:

~/p/r/p/t/scala-2.11 ❯❯❯ unzip -l pyrasterframes-assembly-0.8.0-SNAPSHOT.jar | grep scalalogging                             develop ✚ ✱
        0  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/
      590  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/CanLog$class.class
      888  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/CanLog.class
     1002  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LazyLogging$class.class
      589  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LazyLogging.class
     3875  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/Logger$.class
    15727  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/Logger.class
     5104  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anon$1.class
     2766  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$1.class
     1366  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$2.class
     1604  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$3.class
     2196  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$4.class
     2203  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$debugMessageArgs$1.class
     2251  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$debugMessageArgsMarker$1.class
     2202  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$errorMessageArgs$1.class
     2250  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$errorMessageArgsMarker$1.class
     6090  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$formatArgs$1.class
     2199  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$infoMessageArgs$1.class
     2248  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$infoMessageArgsMarker$1.class
     2203  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$traceMessageArgs$1.class
     2252  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$traceMessageArgsMarker$1.class
     2199  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$warnMessageArgs$1.class
     2247  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$$anonfun$warnMessageArgsMarker$1.class
    31565  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro$.class
     9902  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerMacro.class
    13906  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicit.class
     2292  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro$$anonfun$debugMessageArgs$1.class
     2340  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro$$anonfun$debugMessageArgsMarker$1.class
     2291  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro$$anonfun$errorMessageArgs$1.class
     2339  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro$$anonfun$errorMessageArgsMarker$1.class
     2289  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro$$anonfun$infoMessageArgs$1.class
     2337  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro$$anonfun$infoMessageArgsMarker$1.class
     2292  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro$$anonfun$traceMessageArgs$1.class
     2340  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro$$anonfun$traceMessageArgsMarker$1.class
     2288  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro$$anonfun$warnMessageArgs$1.class
     2337  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro$$anonfun$warnMessageArgsMarker$1.class
    46510  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro$.class
    11097  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/LoggerTakingImplicitMacro.class
     1030  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/StrictLogging$class.class
      689  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/StrictLogging.class
      449  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/package$.class
      667  08-15-2019 18:38   rf/shaded/com/typesafe/scalalogging/package.class

Trying to use the assembly goes south:

~/p/r/p/t/scala-2.11 ❯❯❯ spark-shell --jars /Users/eugene/proj/rasterframes/pyrasterframes/target/scala-2.11/pyrasterframes-assembly-0.8.0-SNAPSHOT.jar
19/08/17 13:12:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://10.1.240.31:4040
Spark context available as 'sc' (master = local[*], app id = local-1566061963577).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.3
      /_/

Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_211)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.locationtech.rasterframes._
<console>:23: error: Symbol 'type <none>.scalalogging.LazyLogging' is missing from the classpath.
This symbol is required by 'package org.locationtech.rasterframes.package'.
Make sure that type LazyLogging is in your classpath and check for conflicting dependencies with `-Ylog-classpath`.
A full rebuild may help if 'package.class' was compiled against an incompatible version of <none>.scalalogging.
       import org.locationtech.rasterframes._
              ^

Trying to see what is going on in package org.locationtech.rasterframes.package everything seems hunky-dory:

~/p/r/p/t/s/j/o/l/rasterframes ❯❯❯ javap package\$                                                                                                                      develop ✚ ✱
Warning: Binary file package$ contains org.locationtech.rasterframes.package$
Compiled from "rasterframes.scala"
public final class org.locationtech.rasterframes.package$ implements org.locationtech.rasterframes.StandardColumns,org.locationtech.rasterframes.RasterFunctions,org.locationtech.rasterframes.util.ZeroSevenCompatibilityKit$RasterFunctions,org.locationtech.rasterframes.extensions.Implicits,org.locationtech.rasterframes.jts.Implicits,org.locationtech.rasterframes.encoders.StandardEncoders,org.locationtech.geomesa.spark.jts.DataFrameFunctions$Library,rf.shaded.com.typesafe.scalalogging.LazyLogging {
  public static final org.locationtech.rasterframes.package$ MODULE$;
  public static {};
  public rf.shaded.com.typesafe.scalalogging.Logger logger();
  public org.apache.spark.sql.TypedColumn<java.lang.Object, org.locationtech.jts.geom.Geometry> st_translate(org.apache.spark.sql.Column, org.apache.spark.sql.Column, org.apache.spark.sql.Column);

Note that it implements the shaded version rf.shaded.com.typesafe.scalalogging.LazyLogging

@metasim metasim self-assigned this Aug 19, 2019
@metasim metasim added the bug When it really isn't a "feature". label Aug 19, 2019
@metasim
Copy link
Member

metasim commented Aug 19, 2019

@echeipesh Note: I've not tried to run against Spark 2.4.x. So maybe there's something wrong there.

I wonder if some of this has to do with Scala logging using macros? Maybe an argument to get rid of it, and just use Java APIs?

If you're willing to try something, find any places where a package.scala class inherits from LazyLogger, and remove it, replacing call sites with something in the companion class or whatever... just to see if hiding it a bit more help.

@metasim
Copy link
Member

metasim commented Aug 19, 2019

@echeipesh Also, what version of Java is that running under?

@pomadchin
Copy link
Member

I would say that figuring out how to shade this dependency would be enough.
I don't have arguments to get rid of scala logging:
It is more performant: inlines calls and makes calls to the underlying logger only in case the warning level allows it (sounds a bit naive but even these two things will reduce logging overhead).

@echeipesh
Copy link
Contributor Author

@metasim This is all on Java 1.8
Based on your hunch removing the inheritance of LazyLogger from rasterframes package did the trick:

diff --git a/core/src/main/scala/org/locationtech/rasterframes/rasterframes.scala b/core/src/main/scala/org/locationtech/rasterframes/rasterframes.scala
index 1517e8f0..2b908c7f 100644
--- a/core/src/main/scala/org/locationtech/rasterframes/rasterframes.scala
+++ b/core/src/main/scala/org/locationtech/rasterframes/rasterframes.scala
@@ -21,7 +21,7 @@

 package org.locationtech
 import com.typesafe.config.ConfigFactory
-import com.typesafe.scalalogging.LazyLogging
+import com.typesafe.scalalogging.Logger
 import geotrellis.raster.isData
 import geotrellis.raster.{Tile, TileFeature}
 import geotrellis.spark.{ContextRDD, Metadata, SpaceTimeKey, SpatialKey, TileLayerMetadata}
@@ -33,6 +33,7 @@ import org.locationtech.rasterframes.encoders.StandardEncoders
 import org.locationtech.rasterframes.extensions.Implicits
 import org.locationtech.rasterframes.model.TileDimensions
 import org.locationtech.rasterframes.util.ZeroSevenCompatibilityKit
+import org.slf4j.LoggerFactory
 import shapeless.tag.@@

 import scala.reflect.runtime.universe._
@@ -43,8 +44,10 @@ package object rasterframes extends StandardColumns
   with Implicits
   with rasterframes.jts.Implicits
   with StandardEncoders
-  with DataFrameFunctions.Library
-  with LazyLogging {
+  with DataFrameFunctions.Library {
+
+  @transient
+  protected lazy val logger: Logger = Logger(LoggerFactory.getLogger(getClass.getName))

   @transient
   private[rasterframes]

I have not tested it on DataBricks yet, but this produces an assembly where the imports of RasterFrame packages happens fine. I'll keep you posted.

@metasim
Copy link
Member

metasim commented Aug 20, 2019

Reproduced. I'll go ahead and fix this and any similar cases.

@metasim
Copy link
Member

metasim commented Aug 20, 2019

A bit further, but serialization error (despite transient??):

$ spark-shell --jars  /Users/sfitch/Coding/OSS/locationtech-rasterframes/pyrasterframes/target/scala-2.11/pyrasterframes-assembly-0.8.1-SNAPSHOT.jar
...

Spark context available as 'sc' (master = local[*], app id = local-1566321608265).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.3
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.locationtech.rasterframes._
import org.locationtech.rasterframes._
scala> import org.locationtech.rasterframes.datasource.raster._
import org.locationtech.rasterframes.datasource.raster._
scala> val rf = spark.read.raster.load("https://s22s-test-geotiffs.s3.amazonaws.com/luray_snp/B02.tif")
java.lang.AssertionError: assertion failed: unsafe symbol Logger (child of package scalalogging) in runtime reflection universe
	at scala.reflect.internal.Symbols$Symbol.<init>(Symbols.scala:184)
	at scala.reflect.internal.Symbols$TypeSymbol.<init>(Symbols.scala:3009)
	at scala.reflect.internal.Symbols$ClassSymbol.<init>(Symbols.scala:3201)
	at scala.reflect.internal.Symbols$StubClassSymbol.<init>(Symbols.scala:3496)
	at scala.reflect.internal.Symbols$Symbol.newStubSymbol(Symbols.scala:498)
	at scala.reflect.internal.pickling.UnPickler$Scan.readExtSymbol$1(UnPickler

...

@metasim
Copy link
Member

metasim commented Aug 20, 2019

Calling spark.withRasterFrames triggers it too.

@metasim
Copy link
Member

metasim commented Aug 20, 2019

Caught the exception in the debugger after commenting out RF code triggering the AssertionError, and maybe (?) anything inheriting from LazyLogging has a problem? In the backtrace below, error occurred when Scala Reflection was attempting to resolve the type structure for Tile.

Screen Shot 2019-08-20 at 2 00 09 PM

Found this message on the stack in a frame further up, which makes me wonder if shading did't transform all references in all libraries:

missing or invalid dependency detected while loading class file 'geotrellis.raster.Tile'.
Could not access type LazyLogging in package com.typesafe.scalalogging,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'geotrellis.raster.Tile' was compiled against an incompatible version of com.typesafe.scalalogging.

@metasim
Copy link
Member

metasim commented Aug 20, 2019

Ran javap directly:

$ javap -cp  /Users/sfitch/Coding/OSS/locationtech-rasterframes/pyrasterframes/target/scala-2.11/pyrasterframes-assembly-0.8.1-SNAPSHOT.jar geotrellis.raster.Tile | grep scalalogging
public interface geotrellis.raster.Tile extends geotrellis.raster.CellGrid, geotrellis.raster.IterableTile, geotrellis.raster.MappableTile<geotrellis.raster.Tile>, rf.shaded.com.typesafe.scalalogging.LazyLogging {

and in the spark-shell:

$ echo ':javap geotrellis.raster.Tile' | spark-shell --jars  /Users/sfitch/Coding/OSS/locationtech-rasterframes/pyrasterframes/target/scala-2.11/pyrasterframes-assembly-0.8.1-SNAPSHOT.jar | grep scalalogging
public interface geotrellis.raster.Tile extends geotrellis.raster.CellGrid, geotrellis.raster.IterableTile, geotrellis.raster.MappableTile<geotrellis.raster.Tile>, rf.shaded.com.typesafe.scalalogging.LazyLogging
    #3 = Utf8               Ljava/lang/Object;Lgeotrellis/raster/CellGrid;Lgeotrellis/raster/IterableTile;Lgeotrellis/raster/MappableTile<Lgeotrellis/raster/Tile;>;Lrf/shaded/com/typesafe/scalalogging/LazyLogging;
   #12 = Utf8               rf/shaded/com/typesafe/scalalogging/LazyLogging
   #13 = Class              #12           // rf/shaded/com/typesafe/scalalogging/LazyLogging
Signature: #3                           // Ljava/lang/Object;Lgeotrellis/raster/CellGrid;Lgeotrellis/raster/IterableTile;Lgeotrellis/raster/MappableTile<Lgeotrellis/raster/Tile;>;Lrf/shaded/com/typesafe/scalalogging/LazyLogging;

and both show all references as having been shaded. :-/

@metasim
Copy link
Member

metasim commented Aug 20, 2019

I think the problem might have something to do with Spark making heavy use of Scala TypeTags, which are compile-time generated constructs, likely reflecting the type paths as they were at compile time.

@metasim
Copy link
Member

metasim commented Aug 20, 2019

@echeipesh I'm worried that this problem isn't solvable without removing LazyLogging from Tile. My current theory is that you can't shade any type that is processed by the Spark SQL Encoder subsystem due to its dependency on compile-time TypeTags. Since Tile is core to RasterFrames, it's hard to see a workaround on this side of the fence, except perhaps some variant of this:

https://stackoverflow.com/a/40404371/296509

I'm not a fan because it means RasterFrames is essentially requiring an isolated version of  GeoTrellis that has scalalogging shaded for it before its used by RasterFrames.

metasim added a commit to s22s/rasterframes that referenced this issue Aug 20, 2019
… the trait hierachy.

See issue for specifics on why this didn't work.
@metasim
Copy link
Member

metasim commented Aug 20, 2019

Branch with moving LazyLogging around in RasterFrames.

s22s@c07ccea

metasim added a commit to s22s/rasterframes that referenced this issue Oct 7, 2019
metasim added a commit that referenced this issue Oct 7, 2019
Reworking of logging usage to fix #293
metasim added a commit to s22s/rasterframes that referenced this issue Oct 17, 2019
* develop:
  Fixed image link in README.md
  PR feedback.
  Removed `GeoTiffCollectionRelation` due to usage limitation and overlap with `RasterSourceDataSource` functionality.
  Bump pytest to 4.x series
  In rf_ipython, honor users pandas max_colwidth on Geometry objects
  Additional python version wrangling.
  merge pyspark-notebook:spark-2.3.4-hadoop-2.7 Dockerfile into this one
  Dropping pandas version requirement.
  Updated requirements.txt for Spark.
  Fixes to `UserDefinedAggregateFunction` implementations so they run in the Databricks environment. Not sure what the cause is, but this change set seems to fix it.
  Added shading of Typesafe config.
  Upgraded to Spark 2.4.4
  Bumped dev version.
  Set release version.
  Reworking of logging usage to fix locationtech#293
  Eyecandy for index page.
  Bumping pytest version.
  Added liblzma-dev to build environment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug When it really isn't a "feature".
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants