Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readClassAndObject throws KryoException: Buffer underflow. #128

Closed
ghost opened this issue Nov 11, 2013 · 16 comments
Closed

readClassAndObject throws KryoException: Buffer underflow. #128

ghost opened this issue Nov 11, 2013 · 16 comments

Comments

@ghost
Copy link

ghost commented Nov 11, 2013

From Kolor...@gmail.com on August 22, 2013 09:36:08

this exception happens in my kryo performance test
invoke readClassAndObject 1000000 times in 100 thread

What steps will reproduce the problem?
1.multi-thread invoke readClassAndObject
2.each thread with kryo,input and output sington instance in thread local
3.each thread get kryo,input and output instance from thread local,and invoke kryo.readClassAndObject

What is the expected output? What do you see instead?
com.esotericsoftware.kryo.KryoException: Buffer underflow.
at com.esotericsoftware.kryo.io.Input.require(Input.java:156)
at com.esotericsoftware.kryo.io.Input.readInt(Input.java:337)
at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109)
at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:134)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at push.serializer.KryoSerializer.deserialize(KryoSerializer.java:40)
at push.KryoTest.decode(KryoTest.java:68)
at push.KryoTest.access$0(KryoTest.java:67)
at push.KryoTest$1.run(KryoTest.java:44)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

What version of the Kryo are you using?
2.21

Please provide any additional information below.
/**


  • /
    package push;

    import java.util.HashMap;
    import java.util.UUID;
    import java.util.concurrent.ExecutorService;
    import java.util.concurrent.Executors;
    import java.util.concurrent.TimeUnit;
    import java.util.concurrent.atomic.AtomicInteger;

    import push.serializer.KryoSerializer;

    /
    *
  • @author kolor

  • */
    public class KryoTest {

    private static final KryoSerializer serializer = new KryoSerializer();

    public static void main(String[] args) throws Exception {
    final HashMap<String, Object> map = new HashMap<String, Object>();
    for (int i = 0; i < 10; i++) {
    map.put(UUID.randomUUID().toString() + i, UUID.randomUUID().toString());
    }
    //

    final byte[] bytes = encode(map);
    System.out.println(bytes.length);
    System.out.println(decode(bytes));
    final AtomicInteger seq = new AtomicInteger();
    ExecutorService executor = Executors.newFixedThreadPool(100);
    long bt = System.currentTimeMillis();
    //
    for (int i = 0; i < 1000000; i++) {
    executor.execute(new Runnable() {
    @OverRide
    public void run() {
    try {
    //
    // encode(map);
    decode(bytes);
    } catch (Exception e) {
    if (seq.incrementAndGet() == 1) {
    e.printStackTrace();
    }
    }
    }
    });
    }
    executor.shutdown();
    executor.awaitTermination(10000, TimeUnit.DAYS);

    long et = System.currentTimeMillis();

    System.out.println("cost " + (et - bt) + " fails:" + seq.get());
    }

    private static byte[] encode(HashMap<String, Object> map) {
    return serializer.serialize(map);
    }

    private static HashMap<String, Object> decode(byte[] data) throws Exception {
    return (HashMap<String, Object>) serializer.deserialize(data);
    }

    }

Attachment: Serializer.java KryoSerializer.java KryoTest.java

Original issue: http://code.google.com/p/kryo/issues/detail?id=128

@ghost
Copy link
Author

ghost commented Nov 11, 2013

From Kolor...@gmail.com on August 22, 2013 00:47:35

if change thread count to 1, is ok
so, I think this issue based on multi-thread situation

@ghost
Copy link
Author

ghost commented Nov 11, 2013

From nathan.s...@gmail.com on August 22, 2013 11:49:19

https://code.google.com/p/kryo/#Threading
:)

Status: Invalid

@ghost
Copy link
Author

ghost commented Nov 11, 2013

From Kolor...@gmail.com on August 22, 2013 19:53:05

did you see my code?
In my code, each thread has its own Kryo, Input, and Output instance
but also throws this exception

@ghost
Copy link
Author

ghost commented Nov 11, 2013

From nathan.s...@gmail.com on August 23, 2013 04:08:58

Ah, sorry, didn't see the TLS in KryoSerializer.

Status: Accepted

@ghost
Copy link
Author

ghost commented Nov 11, 2013

From romixlev on August 23, 2013 05:49:16

@nate:
Actually, this is a valid bug report and there is a bug in Input.readAscii(). It manipulates its buffer in-place, which may lead to problems in multi-threaded applications when the same byte buffer is shared by many Input objects.

I see two way to handle this situation:

  1. Fix the Input.readAscii() method and allocate a temporary byte buffer, instead of doing in-place manipulations with a byte buffer of the Input object. This solves the problem but most likely affects performance.

  2. Provide a guideline in the documentation that the same byte buffer should not be used by multiple Input objects simultaneously in multi-threaded environments. In this case, we can leave the implementation as is.

@nate: What do you think? Which of these approaches we should take?

-Leo

@ghost
Copy link
Author

ghost commented Nov 11, 2013

From romixlev on August 23, 2013 05:50:42

@nate: I was not sure you'd get the previous comment from me, so I added you to CC.

Cc: nathan.s...@gmail.com

@ghost
Copy link
Author

ghost commented Nov 11, 2013

From nathan.s...@gmail.com on August 23, 2013 06:55:18

All comments on all issues get emailed to me, but the CC is fine too. Much thanks for pinpointing the problem! For others, it is here:
https://code.google.com/p/kryo/source/browse/trunk/src/com/esotericsoftware/kryo/io/Input.java#577
The last character in a 7-bit ASCII sequence has the 8th bit set to denote the end. Passing this to "new String" obviously would corrupt that character. The ASCII path is a fast path, so copying the buffer here would be suboptimal. Is it a real use case to deserialize the same byte[] concurrently? It comes up in tests and I suppose it could come up in real code, but it seems rare. I'm not sure it is worth changing our nice fast path to fix. My vote is to just update the documentation. We can continue discussion, but for now I've updated the webpage and javadocs.

@ghost
Copy link
Author

ghost commented Nov 11, 2013

From romixlev on August 23, 2013 07:14:29

I second your vote, Nate. Using the same byte[] concurrently is a bit artificial, IMHO. After all, the client can create copies of this byte array if required.

@ghost
Copy link
Author

ghost commented Nov 11, 2013

From romixlev on August 24, 2013 07:56:15

BTW, Nate, it would be nice to add this information about issues with multi-threaded deserialization into the corresponding section on the main page of the project. It it sort of stated there, but IMHO it is not very clear for a new user. May be talking about Input streams backed by byte arrays would be more clear. And may be we should provide a code example showing how it may happen, e.g.:
byte[] buf = new byte[1024];
// read buf from a file
Input in1 = new Input(buf);
Input in2 = new Input(buf);

Processing in1 and in2 concurrently may lead to problems. Don't do it. If you need to do something like this, better create a dedicated copy of "buf" for each Input stream.

And if you'd update the page anyways, it would be also very nice to add the following information there:

@ghost
Copy link
Author

ghost commented Nov 11, 2013

From Kolor...@gmail.com on September 02, 2013 00:49:10

Thanks for your attention

@ghost
Copy link
Author

ghost commented Nov 11, 2013

From romixlev on October 02, 2013 08:07:58

(No comment was entered for this change.)

Status: WontFix

@ghost ghost closed this as completed Nov 11, 2013
@mirion
Copy link

mirion commented Sep 15, 2014

I know that this bug is closed with wontfix but I think that the resolution is wrong and the issue has to be addressed thoroughly.

A brief explanation: I'm using Kryo to share informations between applications running into several j2ee servers. This data is routed through the sessions, therefore it may be requested by two threads in the same time. If I'm following your advice, I have to systematically duplicate the buffer in order to safely read it. Of course, this comes with a much bigger penalty than duplicating just the culprit, the encoded ascii data.

Your software is great but on the other hand, I really don't understand why are you taking this approach: a sort of global memory variables that can't be shared (the data buffers). I understand that the Kryo instances themselves are not multi-threading enabled, this is not a problem, but the data itself? Since this happens with byte arrays, already loaded into the memory, it is like having a house where if somebody is looking though a window, nobody else can do it in the same time. This is a dramatic limitation and can't be addressed by systematically breaking the wall to open another window...

Your software is designed for speed, but first of all it should be designed for reliability and the optimization comes afterwards.

I understand that these design choices have been taken long time ago, but maybe now, when the software is mature enough and in a good shape it is possible to fix some of the bugs/limitations.

Thanks

@NathanSweet
Copy link
Member

For people not accessing the byte[] from multiple threads (which I think is extremely common), the readAscii fast path is nice and fast. It would be great to find an alternative that is just as fast without adding to the serialized size.

For now you could extend Output and implement your own writeString and extend Input and implement your own readString. You can either send an extra byte per string so the "end of ASCII" bit isn't in the string data, or you can copy the string bytes, mask the last one, then create the string.

@mirion
Copy link

mirion commented Sep 15, 2014

You are really fast... Thank you

I understand that there are quick workarounds, but do you really think that turning this into the standard behavior is so expensive? I doubt it. What is the cost of a byte compared to the predictable and reliable behavior?

In my opinion, this inability to read data buffers through multiple threads is a limitation that compared to the minor speed bump becomes really big an in the end a pain for the developers.

In the same time, multi-threading is common into web development therefore the problem is not so infrequent. Probably most of those getting it just avoided the issue through a buffer duplication, or another trick.

Please consider a general resolution of this kind of root problem. It will only make your software better.

Thanks

@savinov
Copy link

savinov commented Oct 15, 2014

I have this very exception in single thread program with one input/output instance.

Code is very simple:

Kryo kryo = new Kryo();
List objects = ...
Path storagePath = ...
Output output = new Output(Files.newOutputStream(storagePath));
kryo.writeClassAndObject(output, objects);

Input input = new Input(Files.newInputStream(storagePath));
List data = (List) kryo.readClassAndObject(input);

@romix
Copy link
Collaborator

romix commented Oct 15, 2014

You probably need to close your output stream first before you read from the file, so that it flushes.

@serverperformance
Copy link
Contributor

Hi, you didn’t close the Output before starting reading…..

See https://github.com/EsotericSoftware/kryo#quickstart

Cheers,

Tumi

De: Guram Savinov [mailto:notifications@github.com]
Enviado el: miércoles, 15 de octubre de 2014 9:38
Para: EsotericSoftware/kryo
Asunto: Re: [kryo] readClassAndObject throws KryoException: Buffer underflow. (#128)

I have this very exception in single thread program with one input/output instance.

Code is very simple:

Kryo kryo = new Kryo();
List objects = ...
Path storagePath = ...
Output output = new Output(Files.newOutputStream(storagePath));
kryo.writeClassAndObject(output, objects);

Input input = new Input(Files.newInputStream(storagePath));
List data = (List) kryo.readClassAndObject(input);


Reply to this email directly or view it on GitHub #128 (comment) . https://github.com/notifications/beacon/6210680__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcyODk3Nzg3NywiZGF0YSI6eyJpZCI6MjAwNTY5NTJ9fQ==--12502b959b803c868852088735281b6f3b28132e.gif

@savinov
Copy link

savinov commented Oct 15, 2014

I close output before reading, more detailed code looks like this:

import com.esotericsoftware.kryo.Kryo;
import com.esotericsoftware.kryo.io.Input;
import com.esotericsoftware.kryo.io.Output;
import org.apache.commons.io.IOUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;

/**
 * @author Guram Savinov
 */
public class KryoStorage {

    private static final Kryo KRYO = new Kryo();
    private static final Logger LOGGER = LoggerFactory.getLogger(KryoStorage.class);

    private Path storagePath;

    public KryoStorage(Path storagePath) {
        this.storagePath = storagePath;
    }

    public void saveContext(List objects) {
        Output output = null;
        try {
            output = new Output(Files.newOutputStream(storagePath));
            KRYO.writeClassAndObject(output, objects);
        } catch (IOException e) {
            LOGGER.warn("I/O error  while write to the storage file", e);
        } finally {
            IOUtils.closeQuietly(output);
        }
    }

    public List loadContext() {
        List result = new ArrayList();
        if (Files.isReadable(storagePath)) {
            Input input = null;
            try {
                input = new Input(Files.newInputStream(storagePath));
                result = (List) KRYO.readClassAndObject(input);
            } catch (IOException e) {
                LOGGER.warn("I/O error while reading from the storage file", e);
            } finally {
                IOUtils.closeQuietly(input);
            }
        }

        try {
            Files.deleteIfExists(storagePath);
        } catch (IOException e) {
            LOGGER.warn("I/O error while delete storage file", e);
        }

        return result;
    }
}

There is a pause about few minutes between saveContext() and loadContext(), moreover saveContext() and loadContext() executes from different JVM: after saveContext is done, first JVM stops, then starting second JVM process and execute loadContext().
saveContext() have stored data without any problems, but now I have a file which cause KryoException: Buffer underflow in loadContext() on every attempt to read data from it.
File contain single java.util.ArrayList which contain one java.util.Map<String, MyClass> with about 1500000 kay/value pairs and has size about 50Mb.
Unforunately I can't give you this file for testing, because it has secret data.
Earlier this code works without any problems with same data this size.
Kryo version was 2.24.0 given from the maven repo, 3.0.0 release cause this exception also with reading my data from file.
Maybe my stacktrace will be useful:

Caused by: com.esotericsoftware.kryo.KryoException: Buffer underflow.
        at com.esotericsoftware.kryo.io.Input.require(Input.java:181) ~[kryo-2.24.0.jar:na]
        at com.esotericsoftware.kryo.io.Input.readVarInt(Input.java:355) ~[kryo-2.24.0.jar:na]
        at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109) ~[kryo-2.24.0.jar:na]
        at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:641) ~[kryo-2.24.0.jar:na]
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:752) ~[kryo-2.24.0.jar:na]
        at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:135) ~[kryo-2.24.0.jar:na]
        at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:21) ~[kryo-2.24.0.jar:na]
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761) ~[kryo-2.24.0.jar:na]
        at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:116) ~[kryo-2.24.0.jar:na]
        at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:22) ~[kryo-2.24.0.jar:na]
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761) ~[kryo-2.24.0.jar:na]

@savinov
Copy link

savinov commented Oct 15, 2014

Seems like problem is EOF detection, check out the topic on StackOverflow

@romix
Copy link
Collaborator

romix commented Oct 15, 2014

Could you provide a complete unit test? I quickly tested it on my machine and it works for me. I just replaced loggers and Apache utils by simpler code, which does not use any external dependencies. And it seems to work without any problems.

@savinov
Copy link

savinov commented Oct 15, 2014

Unit test doesn't reproduce this bug, I have file that Kryo succesfully serialized with saveContext() that Kryo can't read via loadContext().
I'm not alone who found this problem, see StackOverflow posts and comments at the top of this discussion:

did you see my code?
In my code, each thread has its own Kryo, Input, and Output instance
but also throws this exception
#128 (comment)

@romix
Copy link
Collaborator

romix commented Oct 15, 2014

But you can try to reproduce this bug with a unit test, or? Add something to make it look more like your real test-case. Add more complexity, etc, but try to make it trigger this bug. Otherwise it is a bit difficult to fix it.

BTW, there could be totally different root causes for buffer underflow exceptions. That's why it is important to have something reproducible.

@savinov
Copy link

savinov commented Oct 15, 2014

I think problem is the number of Map records, maybe Kryo write wrong value to the file and try to get more then file contain.
This unit-test reproduce a problem (it use KryoStorage class, see in my previous comment). When number of map records is 100 - it's Ok, 200 - Ok too, but if set it to 500 - get assertion error: written 500, but read 499 or 498. Why?

import org.junit.Test;

import java.nio.file.Paths;
import java.util.*;

/**
 * @author Guram Savinov
 */
public class KryoTest {

    private static final int NUMBER_OF_MAP_RECORDS = 500;

    @Test
    public void readStorage() {
        for (int i = 0; i < 10; i++) {
            writeAndRead();
        }
    }

    private void writeAndRead() {
        KryoStorage kryoStorage = new KryoStorage(Paths.get("/tmp/test.kryo"));
        Map<String, MyClass> dataMap = new HashMap<>();
        Random random = new Random();
        for (int i = 0; i < NUMBER_OF_MAP_RECORDS; i++) {
            dataMap.put(UUID.randomUUID().toString().substring(0, 5), new MyClass(random.nextInt(128), random.nextInt(128),
                    random.nextInt(128), UUID.randomUUID().toString()));
        }
        List dataToSave = new ArrayList();
        dataToSave.add(dataMap);
        kryoStorage.saveContext(dataToSave);
        List readData = kryoStorage.loadContext();
        org.junit.Assert.assertEquals(NUMBER_OF_MAP_RECORDS, ((Map) readData.get(0)).size());
    }

    public static class MyClass {
        private int num1;
        private int num2;
        private int num3;
        private String str1;

        public MyClass() {
        }

        public MyClass(int num1, int num2, int num3, String str1) {
            this.num1 = num1;
            this.num2 = num2;
            this.num3 = num3;
            this.str1 = str1;
        }

        public int getNum1() {
            return num1;
        }

        public void setNum1(int num1) {
            this.num1 = num1;
        }

        public int getNum2() {
            return num2;
        }

        public void setNum2(int num2) {
            this.num2 = num2;
        }

        public int getNum3() {
            return num3;
        }

        public void setNum3(int num3) {
            this.num3 = num3;
        }

        public String getStr1() {
            return str1;
        }

        public void setStr1(String str1) {
            this.str1 = str1;
        }
    }
}

@savinov
Copy link

savinov commented Oct 15, 2014

More interesting: remove for loop and set NUMBER_OF_MAP_RECORDS = 1000000

import org.junit.Test;

import java.nio.file.Paths;
import java.util.*;

/**
 * @author Guram Savinov
 */
public class KryoTest {

    private static final int NUMBER_OF_MAP_RECORDS = 1000000;


    @Test
    public void writeAndRead() {
        KryoStorage kryoStorage = new KryoStorage(Paths.get("/tmp/test.kryo"));
        Map<String, MyClass> dataMap = new HashMap<>();
        Random random = new Random();
        for (int i = 0; i < NUMBER_OF_MAP_RECORDS; i++) {
            dataMap.put(UUID.randomUUID().toString().substring(0, 5), new MyClass(random.nextInt(128), random.nextInt(128),
                    random.nextInt(128), UUID.randomUUID().toString()));
        }
        List dataToSave = new ArrayList();
        dataToSave.add(dataMap);
        kryoStorage.saveContext(dataToSave);
        List readData = kryoStorage.loadContext();
        org.junit.Assert.assertEquals(NUMBER_OF_MAP_RECORDS, ((Map) readData.get(0)).size());
    }

    public static class MyClass {
        private int num1;
        private int num2;
        private int num3;
        private String str1;

        public MyClass() {
        }

        public MyClass(int num1, int num2, int num3, String str1) {
            this.num1 = num1;
            this.num2 = num2;
            this.num3 = num3;
            this.str1 = str1;
        }

        public int getNum1() {
            return num1;
        }

        public void setNum1(int num1) {
            this.num1 = num1;
        }

        public int getNum2() {
            return num2;
        }

        public void setNum2(int num2) {
            this.num2 = num2;
        }

        public int getNum3() {
            return num3;
        }

        public void setNum3(int num3) {
            this.num3 = num3;
        }

        public String getStr1() {
            return str1;
        }

        public void setStr1(String str1) {
            this.str1 = str1;
        }
    }
}

Expected :1000000
Actual :645149

Where is other Map 354851 records?

@romix
Copy link
Collaborator

romix commented Oct 15, 2014

Very interesting. I'll look into this.

@romix
Copy link
Collaborator

romix commented Oct 15, 2014

Just tried it on my side. I can reproduce it. Looking into it.

@romix
Copy link
Collaborator

romix commented Oct 15, 2014

OK. There is a bug in your test:
You use this as a key: UUID.randomUUID().toString().substring(0, 5)
But if you call it enough times, it starts to generate the same keys. So, if you replace it by:
UUID.randomUUID().toString(), then it works properly.

You need to try harder if you want to reproduce your bug ;-)

sumwale pushed a commit to TIBCOSoftware/snappy-spark that referenced this issue Oct 19, 2016
- added back configurable closure serializer in Spark which was removed in SPARK-12414;
  some minor changes taken from closed Spark PR apache#6361
- added optimized Kryo serialization for multiple classes; currently registration and
  string sharing fix for kryo (EsotericSoftware/kryo#128) is
  only in the SnappyData layer PooledKryoSerializer implementation;
  classes providing maximum benefit have added KryoSerializable notably Accumulators and *Metrics
- optimized functions for inbuilt closures in SparkContext JobFunction1, JobFunction2 that
  avoids the heavy cost of closure serialization (and skip those two in ClosureCleaner)
- use closureSerializer for Netty messaging too instead of fixed JavaSerializer
- avoid SerializationUtils for cloning Properties which is costly; instead clone by
  enumerating Properties in a normal manner (which includes the "defaults")
- cached serialized task bytes in Stage for re-use and remove TODO comment about this
- avoid serializing the task separately in LaunchTask since the whole object is serialized anyway
sumwale pushed a commit to TIBCOSoftware/snappydata that referenced this issue Oct 19, 2016
- new PooledKryoSerializer that does pooling of Kryo objects (else performance is bad if
    new instance is created for every call which needs to register and walk tons of classes)
- has an overridden version for ASCII strings to fix (EsotericSoftware/kryo#128);
  currently makes a copy but will be modified to use one extra byte to indicate end of string
- optimized external serializers for StructType, and Externalizable having readResolve() method;
  using latter for StorageLevel and BlockManagerId
- added optimized serialization for the closure used by SparkSQLExecuteImpl (now a proper class instead);
  copied part of changes for LIMIT from d17c094 on SNAP-1067 to avoid
  merge pains later
- fixed index column determination in RowFormatRelation (was off by 1 due to 0 based vs 1 based)
sumwale pushed a commit to TIBCOSoftware/snappy-spark that referenced this issue Nov 25, 2016
- added back configurable closure serializer in Spark which was removed in SPARK-12414;
  some minor changes taken from closed Spark PR apache#6361
- added optimized Kryo serialization for multiple classes; currently registration and
  string sharing fix for kryo (EsotericSoftware/kryo#128) is
  only in the SnappyData layer PooledKryoSerializer implementation;
  classes providing maximum benefit have added KryoSerializable notably Accumulators and *Metrics
- use closureSerializer for Netty messaging too instead of fixed JavaSerializer
- updated kryo to 4.0.0 to get the fix for kryo#342
sumwale pushed a commit to TIBCOSoftware/snappydata that referenced this issue Nov 25, 2016
- new PooledKryoSerializer that does pooling of Kryo objects (else performance is bad if
    new instance is created for every call which needs to register and walk tons of classes)
- has an overridden version for ASCII strings to fix (EsotericSoftware/kryo#128);
  currently makes a copy but will be modified to use one extra byte to indicate end of string
- optimized external serializers for StructType, and Externalizable having readResolve() method;
  using latter for StorageLevel and BlockManagerId
- added optimized serialization for the closure used by SparkSQLExecuteImpl (now a proper class instead)
- fixed index column determination in RowFormatRelation (was off by 1 due to 0 based vs 1 based)
sumwale pushed a commit to TIBCOSoftware/snappy-spark that referenced this issue Nov 28, 2016
- added back configurable closure serializer in Spark which was removed in SPARK-12414;
  some minor changes taken from closed Spark PR apache#6361
- added optimized Kryo serialization for multiple classes; currently registration and
  string sharing fix for kryo (EsotericSoftware/kryo#128) is
  only in the SnappyData layer PooledKryoSerializer implementation;
  classes providing maximum benefit have added KryoSerializable notably Accumulators and *Metrics
- use closureSerializer for Netty messaging too instead of fixed JavaSerializer
- updated kryo to 4.0.0 to get the fix for kryo#342
- actually fixing scalastyle errors introduced by d80ef1b
- set ordering field with kryo serialization in GenerateOrdering
- removed warning if non-closure passed for cleaning
sumwale pushed a commit to TIBCOSoftware/snappydata that referenced this issue Nov 28, 2016
#426)

- new PooledKryoSerializer that does pooling of Kryo objects (else performance is bad if
    new instance is created for every call which needs to register and walk tons of classes)
- has an overridden version for ASCII strings to fix (EsotericSoftware/kryo#128);
  currently makes a copy but will be modified to use one extra byte to indicate end of string
- optimized external serializers for StructType, and Externalizable having readResolve() method;
  using latter for StorageLevel and BlockManagerId
- added optimized serialization for the closure used by SparkSQLExecuteImpl (now a proper class instead)
- fixed index column determination in RowFormatRelation (was off by 1 due to 0 based vs 1 based)
- set serializer/codec options explicitly in ClusterManagerTestBase since it does not use Lead API
- formatting changes and fixed some compiler warnings
- Kryo serialization for RowFormatScanRDD, SparkShellRowRDD, ColumnarStorePartitionedRDD, SparkShellCachedBatchRDD and MultiBucketExecutorPartition
- added base RDDKryo to encapsulate serialization of bare minimum fields in RDD (using reflection where required)
- removed unused SparkShellRDDHelper.mapBucketsToPartitions
- updated log4j.properties for core/cluster tests
- change Attribute to StructField in columns decoders since StructType has an efficient serializer
as well as being cleaner since it doesn't depend on Attribute (with potentially invalid ExprId
    for remote node though those fields are not used)
- updating spark link to fix AQP dunits with the new kryo serializer
- skip DUnitSingleTest from the aqp test target since those really are dunits which should not be run like normal junit tests
- re-create snappy catalog connection for MetaException failures too (message says "... we don't support retries ...")
- clear the serializer/codec system properties when stopping Spark so that these are not carried through to subsequent tests in same JVMs
ymahajan pushed a commit to TIBCOSoftware/snappy-spark that referenced this issue Jan 18, 2017
- added back configurable closure serializer in Spark which was removed in SPARK-12414;
  some minor changes taken from closed Spark PR apache#6361
- added optimized Kryo serialization for multiple classes; currently registration and
  string sharing fix for kryo (EsotericSoftware/kryo#128) is
  only in the SnappyData layer PooledKryoSerializer implementation;
  classes providing maximum benefit have added KryoSerializable notably Accumulators and *Metrics
- use closureSerializer for Netty messaging too instead of fixed JavaSerializer
- updated kryo to 4.0.0 to get the fix for kryo#342
- actually fixing scalastyle errors introduced by d80ef1b
- set ordering field with kryo serialization in GenerateOrdering
- removed warning if non-closure passed for cleaning
Conflicts:
	core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala
	core/src/main/scala/org/apache/spark/scheduler/Task.scala
	core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
	core/src/main/scala/org/apache/spark/util/collection/BitSet.scala
sumwale pushed a commit to TIBCOSoftware/snappy-spark that referenced this issue Jul 8, 2017
- added back configurable closure serializer in Spark which was removed in SPARK-12414;
  some minor changes taken from closed Spark PR apache#6361
- added optimized Kryo serialization for multiple classes; currently registration and
  string sharing fix for kryo (EsotericSoftware/kryo#128) is
  only in the SnappyData layer PooledKryoSerializer implementation;
  classes providing maximum benefit have added KryoSerializable notably Accumulators and *Metrics
- use closureSerializer for Netty messaging too instead of fixed JavaSerializer
- updated kryo to 4.0.0 to get the fix for kryo#342
- actually fixing scalastyle errors introduced by d80ef1b
- set ordering field with kryo serialization in GenerateOrdering
- removed warning if non-closure passed for cleaning
ymahajan pushed a commit to TIBCOSoftware/snappy-spark that referenced this issue Mar 5, 2018
- added back configurable closure serializer in Spark which was removed in SPARK-12414;
  some minor changes taken from closed Spark PR apache#6361
- added optimized Kryo serialization for multiple classes; currently registration and
  string sharing fix for kryo (EsotericSoftware/kryo#128) is
  only in the SnappyData layer PooledKryoSerializer implementation;
  classes providing maximum benefit have added KryoSerializable notably Accumulators and *Metrics
- use closureSerializer for Netty messaging too instead of fixed JavaSerializer
- updated kryo to 4.0.0 to get the fix for kryo#342
- actually fixing scalastyle errors introduced by d80ef1b
- set ordering field with kryo serialization in GenerateOrdering
- removed warning if non-closure passed for cleaning

Conflicts:
	core/src/main/scala/org/apache/spark/network/netty/NettyBlockTransferService.scala
	core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala
	core/src/main/scala/org/apache/spark/scheduler/Task.scala
	core/src/main/scala/org/apache/spark/scheduler/TaskDescription.scala
	core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
	core/src/main/scala/org/apache/spark/storage/BlockId.scala
	core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateSafeProjection.scala
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
	sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala
ahshahid added a commit to TIBCOSoftware/snappy-spark that referenced this issue Dec 15, 2019
* [SNAP-846][CLUSTER] Ensuring that Uncaught exceptions are handled in the Snappy side and do not cause a system.exit (#2)

Instead of using SparkUncaughtExceptionHandler, executor now gets the uncaught exception handler and uses it to handle the exception. But if it is a local mode, it still uses the SparkUncaughtExceptionHandler

A test has been added in the Snappy side PR for the same.

* [SNAPPYDATA] Updated Benchmark code from Spark PR#13899

Used by the new benchmark from the PR adapted for SnappyData for its vectorized implementation.

Build updated to set testOutput and other variables instead of appending to existing values
(causes double append with both snappydata build adding and this adding for its tests)

* [SNAPPYDATA] Spark version 2.0.1-2

* [SNAPPYDATA] fixing antlr generated code for IDEA

* [SNAP-1083] fix numBuckets handling (#15)

- don't apply numBuckets in Shuffle partitioning since Shuffle cannot create
  a compatible partitioning with matching numBuckets (only numPartitions)
- check numBuckets too in HashPartitioning compatibility

* [SNAPPYDATA] MemoryStore changes for snappydata

* [SNAPPYDATA] Spark version 2.0.1-3

* [SNAPPYDATA] Added SnappyData modification license

* [SNAPPYDATA] updating snappy-spark version after the merge

* [SNAPPYDATA] Bootstrap perf (#16)

Change involves:
1) Reducing the generated code size when writing struct having all fields of same data type.
2) Fixing an issue in WholeStageCodeGenExec, where a plan supporting CodeGen was not being prefixed by InputAdapter in case, the node did not participate in whole stage code gen.

* [SNAPPYDATA] Provide preferred location for each bucket-id in case of partitioned sample table. (#22)

These changes are related to AQP-79.
Provide preferred location for each bucket-id in case of partitioned sample table.

* [SNAPPYDATA] Bumping version to 2.0.3-1

* [SNAPPYDATA] Made two methods in Executor as protected to make them customizable for SnappyExecutors. (#26)

* [SNAPPYDATA]: Honoring JAVA_HOME variable while compiling java files
instead of using system javac. This eliminates problem when system jdk
is set differently from JAVA_HOME

* [SNAPPYDATA] Helper classes for DataSerializable implementation. (#29)

This is to provide support for DataSerializable implementation in AQP

* [SNAPPYDATA] More optimizations to UTF8String

- allow direct UTF8String objects in RDD data conversions to DataFrame;
  new UTF8String.cloneIfRequired to clone only if required used by above
- allow for some precision change in QueryTest result comparison

* [SNAP-1192] correct offsetInBytes calculation (#30)

corrected offsetInBytes in UnsafeRow.writeToStream

* [SNAP-1198] Use ConcurrentHashMap instead of queue for ContextCleaner.referenceBuffer (#32)

Use a map instead of queue for ContextCleaner.referenceBuffer. Profiling shows lot of time being spent removing from queue where a hash map will do (referenceQueue is already present for poll).

* [SNAP-1194] explicit addLong/longValue methods in SQLMetrics (#33)

This avoids runtime erasure for add/value methods that will result in unnecessary boxing/unboxing overheads.

- Adding spark-kafka-sql project
- Update version of deps as per upstream.
- corrected kafka-clients reference

* [SNAPPYDATA] Adding fixed stats to common filter expressions

Missing filter statistics in filter's logical plan is causing incorrect plan selection at times.
Also, join statistics always return sizeInBytes as the product of its child sizeInBytes which
result in a big number. For join, product makes sense only when it is a cartesian product join.
Hence, fixed the spark code to check for the join type. If the join is a equi-join,
  we now sum the sizeInBytes of the child instead of doing a product.

For missing filter statistics, adding a heuristics based sizeInBytes calculation mentioned below.
If the filtering condition is:
- equal to: sizeInBytes is 5% of the child sizeInBytes
- greater than less than: sizeInBytes is 50% of the child sizeInBytes
- isNull: sizeInBytes is 50% of the child sizeInBytes
- starts with: sizeInBytes is 10% of the child sizeInBytes

* [SNAPPYDATA] adding kryo serialization missing in LongHashedRelation

* [SNAPPYDATA] Correcting HashPartitioning interface to match apache spark

Addition of numBuckets as default parameter made HashPartitioning incompatible with upstream apache spark.
Now adding it separately so restore compatibility.

* [SNAP-1233] clear InMemorySorter before calling its reset (#35)

This is done so that any spill call (due to no EVICTION_DOWN) from within the spill
call will return without doing anything, else it results in NPE trying to read
page tables which have already been cleared.

* [SNAPPYDATA] Adding more filter conditions for plan sizing as followup

- IN is 50% of original
- StartsWith, EndsWith 10%
- Contains and LIKE at 20%
- AND is multiplication of sizing of left and right (with max filtering of 5%)
- OR is 1/x+1/y sizing of the left and right (with min filtering of 50%)
- NOT three times of that without NOT

* [SNAPPYDATA] reduced factors in filters a bit to be more conservative

* [SNAP-1240]  Snappy monitoring dashboard (#36)

* UI HTML, CSS and resources changes

* Adding new health status images

* Adding SnappyData Logo.

* Code changes for stting/updating Spark UI tabs list.

* Adding icon images for Running, Stopped and Warning statuses.

* 1. Adding New method for generating Spark UI page without page header text.
2. Updating CSS: Cluster Normal status text color is changed to match color of Normal health logo.

* Suggestion: Rename Storage Tab to Spark Cache.

*  Resolving Precheckin failure due to scala style comments
:snappy-spark:snappy-spark-core_2.11:scalaStyle
SparkUI.scala message=Insert a space after the start of the comment line=75 column=4
UIUtils.scala message=Insert a space after the start of the comment line=267 column=4

* [SNAP-1251] Avoid exchange when number of shuffle partitions > child partitions (#37)

- reason is that shuffle is added first with default shuffle partitions,
  then the child with maximum partitions is selected; now marking children where
  implicit shuffle was introduced then taking max of rest (except if there are no others
      in which case the negative value gets chosen and its abs returns default shuffle partitions)
- second change is to add a optional set of alias columns in OrderlessHashPartitioning
  for expression matching to satisfy partitioning in case it is on an alias for partitioning column
  (helps queries like TPCH Q21 where implicit aliases are introduced to resolve clashes in self-joins);
  data sources can use this to pass projection aliases, if any (only snappydata ones in embedded mode)

* [SNAPPYDATA] reverting lazy val to def for defaultNumPreShufflePartitions

use child.outputPartitioning.numPartitions for shuffle partition case instead of depending
on it being defaultNumPreShufflePartitions

* [SNAPPYDATA] Code changes for displaying product version details. (#38)

* [SNAPPYDATA] Fixes for Scala Style precheckin failure. (#39)

* [SNAPPYDATA] Removing duplicate RDD already in snappy-core

Update OrderlessHashPartitioning to allow multiple aliases for a partitioning column.

Reduce plan size statistics by a factor of 2 for groupBy.

* [SNAP-1256] (#41)

set the memory manager as spark's UnifiedMemoryManager, if spark.memory.manager is set as default

* SNAP-1257 (#40)

* SNAP-1257
1. Adding SnappyData Product documentation link on UI.
2. Fixes for SnappyData Product version not displayed issue.

* SNAP-1257:
 Renamed SnappyData Guide link as Docs.

Conflicts:
	core/src/main/scala/org/apache/spark/ui/UIUtils.scala

* [SNAPPYDATA] Spark Version 2.0.3-2

* [SNAP-1185] Guard logging and time measurements (#28)

- add explicit log-level check for some log lines in java code
  (scala code already uses logging arguments as pass-by-name)
- for System.currentTimeInMillis() calls that are used only by logging,
  guard it with the appropriate log-level check
- use System.nanoTime in a few places where duration is to be measured;
  also using a DoubleAccumulator to add results for better accuracy
- cache commonly used logging.is*Enabled flags
- use explicit flag variable in Logging initialized lazily instead of lazy val that causes hang
  in streaming tests for some reason even if marked transient
- renamed flags for consistency
- add handling for possible DoubleAccumulators in a couple of places that expect only
  LongAccumulators in TaskMetrics
- fixing scalastyle error due to 2c432045
Conflicts:
	core/src/main/scala/org/apache/spark/executor/Executor.scala
	core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala
	core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala
	core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala
	core/src/main/scala/org/apache/spark/storage/BlockManager.scala

* SNAP-1281: UI does not show up if spark shell is run without snappydata (#42)

Fixes: Re-enabling the default spark redirection handler to redirect user to spark jobs page.

* [SNAP-1136] Kryo closure serialtization support and optimizations (#27)

- added back configurable closure serializer in Spark which was removed in SPARK-12414;
  some minor changes taken from closed Spark PR https://github.com/apache/spark/pull/6361
- added optimized Kryo serialization for multiple classes; currently registration and
  string sharing fix for kryo (https://github.com/EsotericSoftware/kryo/issues/128) is
  only in the SnappyData layer PooledKryoSerializer implementation;
  classes providing maximum benefit have added KryoSerializable notably Accumulators and *Metrics
- use closureSerializer for Netty messaging too instead of fixed JavaSerializer
- updated kryo to 4.0.0 to get the fix for kryo#342
- actually fixing scalastyle errors introduced by d80ef1b4
- set ordering field with kryo serialization in GenerateOrdering
- removed warning if non-closure passed for cleaning

* [SNAP-1190] Reduce partition message overhead from driver to executor (#31)

- DAGScheduler:
  - For small enough common task data (RDD + closure) send inline with the Task instead of a broadcast
  - Transiently store task binary data in Stage to re-use if possible
  - Compress the common task bytes to save on network cost
- Task: New TaskData class to encapsulate task compressed bytes from above, the uncompressed length
  and reference index if TaskData is being read from a separate list (see next comments)
- CoarseGrainedClusterMessage: Added new LaunchTasks message to encapsulate multiple
  Task messages to same executor
- CoarseGrainedSchedulerBackend:
  - Create LaunchTasks by grouping messages in ExecutorTaskGroup per executor
  - Actual TaskData is sent as part of TaskDescription and not the Task to easily
    separate out the common portions in a separate list
  - Send the common TaskData as a separate ArrayBuffer of data with the index into this
    list set in the original task's TaskData
- CoarseGrainedExecutorBackend: Handle LaunchTasks by splitting into individual jobs
- CompressionCodec: added bytes compress/decompress methods for more efficient byte array compression
- Executor:
  - Set the common decompressed task data back into the Task object.
  - Avoid additional serialization of TaskResult just to determine the serialization time.
    Instead now calculate the time inline during serialization write/writeExternal methods
- TaskMetrics: more generic handling for DoubleAccumulator case
- Task: Handling of TaskData during serialization to send a flag to indicate whether
  data is inlined or will be received via broadcast
- ResultTask, ShuffleMapTask: delegate handling of TaskData to parent Task class
- SparkEnv: encapsulate codec creation as a zero-arg function to avoid repeated conf lookups
- SparkContext.clean: avoid checking serializability in case non-default closure serializer is being used
- Test updates for above
Conflicts:
	core/src/main/scala/org/apache/spark/SparkEnv.scala
	core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
	core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
	core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala
	core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala
	core/src/main/scala/org/apache/spark/scheduler/Task.scala
	core/src/main/scala/org/apache/spark/scheduler/TaskResult.scala
	core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
	core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

* [SNAP-1202] Reduce serialization overheads of biggest contributors in queries (#34)

- Properties serialization in Task now walks through the properties and writes to same buffer
  instead of using java serialization writeObject on a separate buffer
- Cloning of properties uses SerializationUtils which is inefficient. Instead added
  Utils.cloneProperties that will clone by walking all its entries (including defaults if requested)
- Separate out WholeStageCodegenExec closure invocation into its own WholeStageCodegenRDD
  for optimal serialization of its components including base RDD and CodeAndComment.
  This RDD also removes the limitation of having a max of only 2 RDDs in inputRDDs().

* [SNAP-1067] Optimizations seen in perf analysis related to SnappyData PR#381 (#11)

 - added hashCode/equals to UnsafeMapData and optimized hashing/equals for Decimal
   (assuming scale is same for both as in the calls from Spark layer)
 - optimizations to UTF8String: cached "isAscii" and "hash"
 - more efficient ByteArrayMethods.arrayEquals (~3ns vs ~9ns for 15 byte array)
 - reverting aggregate attribute changes (nullability optimization) from Spark layer and instead take care of it on the SnappyData layer; also reverted other changes in HashAggregateExec made earlier for AQP and nullability
 - copy spark-version-info in generateSources target for IDEA
 - updating snappy-spark version after the merge

Conflicts:
	build.gradle
	sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala

* [SNAP-1067] Optimizations seen in perf analysis related to SnappyData PR#381 (#11)

 - added hashCode/equals to UnsafeMapData and optimized hashing/equals for Decimal
   (assuming scale is same for both as in the calls from Spark layer)
 - optimizations to UTF8String: cached "isAscii" and "hash"
 - more efficient ByteArrayMethods.arrayEquals (~3ns vs ~9ns for 15 byte array)
 - reverting aggregate attribute changes (nullability optimization) from Spark layer and instead take care of it on the SnappyData layer; also reverted other changes in HashAggregateExec made earlier for AQP and nullability
- copy spark-version-info in generateSources target for IDEA
Conflicts:
	common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java

* [SNAPPYDATA] Bootstrap perf (#16)

1) Reducing the generated code size when writing struct having all fields of same data type.
2) Fixing an issue in WholeStageCodeGenExec, where a plan supporting CodeGen was not being
   prefixed by InputAdapter in case, the node did not participate in whole stage code gen.

* [SNAPPYDATA] Skip cast if non-nullable type is being inserted in nullable target

Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala

* [SNAPPYDATA] optimized versions for a couple of string functions

* [SNAPPYDATA] Update to gradle-scalatest version 0.13.1

* Snap 982 (#43)

* a) Added a method in SparkContext to manipulate addedJar. This is an workaround for SNAP-1133.
b) made repl classloader a variable in Executor.scala

* Changed Executor field variable to protected.

* Changed build.gradle of launcher and network-yarn to exclude netty dependecies , which was causing some messages to hang.
made urlclassLoader in Executor.scala a variable.

* Made Utils.doFetchFile method public.

* Made Executor.addReplClassLoaderIfNeeded() method as public.

* [SNAPPYDATA] Increasing the code generation cache eviction size to 300 from 100

* [SNAP-1398] Update janino version to latest 3.0.x

This works around some of the limitations of older janino versions causing SNAP-1398

* [SNAPPYDATA] made some methods protected to be used by SnappyUnifiedManager (#47)

* SNAP-1420

What changes were proposed in this pull request?

Logging level of cluster manager classes is changed to info in store-log4j.properties. But, there are multiple task level logs which generate lot of unneccessary info level logs. Changed these logs from info to debug.
Other PRs

#48
SnappyDataInc/snappy-store#168
SnappyDataInc/snappydata#573

* [SNAPPYDATA] Reducing file read/write buffer sizes

Reduced buffer sizes from 1M to 64K to reduce unaccounted memory overhead.
Disk read/write buffers beyond 32K don't help in performance in any case.

* [SNAP-1486] make QueryPlan.cleanArgs a transient lazy val (#51)

cleanArgs can end up holding transient fields of the class which can be
recalculated on the other side if required in any case.

Also added full exception stack for cases of task listener failures.

* SNAP-1420 Review

What changes were proposed in this pull request?

Added a task logger that does task based info logging. This logger has WARN as log level by default. Info logs can be enabled using the following setting in log4j.properties.

log4j.logger.org.apache.spark.Task=INFO
How was this patch tested?

Manual testing.
Precheckin.

* [SPARK-19500] [SQL] Fix off-by-one bug in BytesToBytesMap (#53)

Merging Spark fix.
Radix sort require that half of array as free (as temporary space), so we use 0.5 as the scale factor to make sure that BytesToBytesMap will not have more items than 1/2 of capacity. Turned out this is not true, the current implementation of append() could leave 1 more item than the threshold (1/2 of capacity) in the array, which break the requirement of radix sort (fail the assert in 2.2, or fail to insert into InMemorySorter in 2.1).

This PR fix the off-by-one bug in BytesToBytesMap.

This PR also fix a bug that the array will never grow if it fail to grow once (stay as initial capacity), introduced by #15722 .
Conflicts:
	core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java

* SNAP-1545: Snappy Dashboard UI Revamping (#52)

Changes:
  - Adding new methods simpleSparkPageWithTabs_2 and commonHeaderNodes_2 for custom snappy UI changes
  - Adding javascript librarires d3.js, liquidFillGauge.js and snappy-dashboard.js for snappy UI new widgets and styling changes.
  - Updating snappy-dashboard.css for new widgets and UI content stylings
  - Relocating snappy-dashboard.css into ui/static/snappydata directory.

* [SNAPPYDATA] handle "prepare" in answer comparison inside Map types too

* [SNAPPYDATA] fixing scalastyle errors introduced in previous commits

* SNAP-1698: Snappy Dashboard UI Enhancements (#55)

* SNAP-1698: Snappy Dashboard UI Enhancements
Changes:
  - CSS styling and JavaScript code changes for displaying Snappy cluster CPU usage widget.
  - Removed Heap and Off-Heap usage widgets.
  - Adding icons/styling for displaying drop down and pull up carets/pointers to expand cell details.
  - Adding handler for toggling expand and collapse cell details.

* [SNAPPYDATA] reduce a byte copy reading from ColumnVector

When creating a UTF8String from a dictionary item from ColumnVector, avoid a copy
by creating it over the range of bytes directly.

Conflicts:
	sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java

* [SNAPPYDATA] moved UTF8String.fromBuffer to Utils.stringFromBuffer

This is done to maintain full compatibility with upstream spark-unsafe module.

Conflicts:
	sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java

* [SNAPPYDATA] reverting changes to increase DECIMAL precision to 127

The changes to DECIMAL precision were incomplete and broken in more ways than one.
The other reason being that future DECIMAL optimization for operations in
generated code will depend on value to fit in two longs and there does not seem
to be a practical use-case of having precision >38 (which is not supported
    by most mainstream databases either)

Renamed UnsafeRow.todata to toData for consistency.

Conflicts:
	sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java

* [SNAPPYDATA][MERGE-2.1] Some fixes after the merge

- Fix for SnappyResourceEventsDUnitTest from Rishi
- Scala style fixes from Sachin J
- deleting unwanted files
- reverting some changes that crept in inadvertently

More code changes:

- adding dependency for org.fusesource.leveldbjni, com.fasterxml.jackson.core,
  io.dropwizard.metrics, io.netty and org.apache.commons
- fixing compilation issues after merge
- adding dependency for jetty-client, jetty-proxy and mllib-local for graphx
- bumped up parquetVersion and scalanlp breeze
- fixed nettyAllVersion, removed hardcoded value
- bumped up version
- Implement Kryo.read/write for subclasses of Task
- Do not implement KryoSerializable in Task
- spark.sql.warehouse.dir moved to StaticSQLConf
- moved VECTORIZED_AGG_MAP_MAX_COLUMNS from StaticSQLConf to SQLConf
- corrected jackson-databind version

* [SNAPPYDATA][MERGE-2.1]

- Removed SimplifyCasts, RemoveDispensableExpressions
- Fixed precheckin failuers
- Fixed Task serialization issues
- Serialize new TaskMetrics using Kryo serializer
- Pass extraOptions in case of saveAsTable
- removed debug statement
- SnappySink for structured streaming query result

* [SNAPPYDATA][MERGE-2.1]

removed struct streaming classes

* [SNAPPYDATA][MERGE-2.1]

- Avoid splitExpressions for DynamicFoldableExpressions. This used to create a lot of codegen issues
- Bump up the Hadoop version, to avoid issues in IDEA.
- Modified AnalysisException to use getSimpleMessage

* [SNAPPYDATA][MERGE-2.1]

- Handled Array[Decimal] type in ScalaReflection,
  fixes SNAP-1772 (SplitSnappyClusterDUnitTest#testComplexTypesForColumnTables_SNAP643)
- Fixing scalaStyle issues
- updated .gitignore; gitignore build-artifacts and .gradle

* [SNAPPYDATA][MERGE-2.1] Missing patches and version changes

- updated optimized ByteArrayMethods.arrayEquals as per the code in Spark 2.1
  - adapt the word alignment code and optimize it a bit
  - in micro-benchmarks the new method is 30-60% faster than upstream version;
    at larger sizes it is 40-50% faster meaning its base word comparison loop itself is faster
- increase default locality time from 3s to 10s since the previous code to force
  executor-specific routing if it is alive has been removed
- added back cast removal optimization when types differ only in nullability
- add serialization and proper nanoTime handling from *CpuTime added in Spark 2.1.x;
  use DoubleAccumulator for these new fields like done for others to get more accurate results;
  also avoid the rare conditions where these cpu times could be negative
- cleanup handling of jobId and related new fields in Task with kryo serialization
- reverted change to AnalysisException with null check for plan since it is transient now
- reverted old Spark 2.0 code that was retained in InsertIntoTable and changed to Spark 2.1 code
- updated library versions and make them uniform as per upstream Spark for
  commons-lang3, metrics-core, py4j, breeze, univocity; also updated exclusions as
  per the changes to Spark side between 2.0.2 to 2.1.0
- added gradle build for the new mesos sub-project

* [SNAP-1790] Fix one case of incorrect offset in ByteArrayMethods.arrayEquals

The endOffset incorrectly uses current leftOffset+length when the leftOffset
may already have been incremented for word alignment.

* Fix from Hemant for fialing :docs target during precheckin run (#61)

* SNAP-1794 (#59)

* Retaining Spark's CodeGenerator#splitExpressions changes

* [SNAP-1389] Optimized UTF8String.compareTo (#62)

- use unsigned long comparisons, followed by unsigned int comparison if possible,
  before finishing with unsigned byte comparisons for better performance
- use big-endian long/int for comparison since it requires the lower-index characters
  to be MSB positions
- no alignment attempted since we expect most cases to fail early in first long comparison itself

Detailed performance results in https://github.com/SnappyDataInc/spark/pull/62

*  [SNAPPYDATA][PERF] Fixes for issues found during concurrency testing (#63)

## What changes were proposed in this pull request?

Moved the regex patterns outside the functions into static variables to avoid their recreation.
Made WholeStageCodeGenRDD as a case class so that its member variables can be accessed using productIterator. 

## How was this patch tested?
Precheckin 

## Other PRs

https://github.com/SnappyDataInc/snappy-store/pull/247
https://github.com/SnappyDataInc/snappydata/pull/730

* [SNAPPYDATA][PERF] optimized pattern matching for byte/time strings

also added slf4j excludes to some imports

* SNAP-1792: Display snappy members logs on Snappy Pulse UI (#58)

Changes:
  - Adding snappy member details javascript for new UI view named SnappyData Member Details Page

* SNAP-1744: UI itself needs to consistently refer to itself as "SnappyData Pulse" (#64)

* SNAP-1744: UI itself needs to consistently refer to itself as "SnappyData Pulse"
Changes:
 - SnappyData Dashboard UI is named as SnappyData Pulse now.
 - Code refactoring and code clean up.

* Removed Array[Decimal] handling from spark layer as it only fixes embedded mode. (#66)

* Removed Array[Decimal] handling from spark layer as it only fixes embedded mode

* Snap 1890 : Snappy Pulse UI suggestions for 1.0 (#69)

* SNAP-1890: Snappy Pulse UI suggestions for 1.0
Changes:
 - SnappyData logo shifted to right most side on navigation tab bar.
 - Adding SnappyData's own new Pulse logo on left most side on navigation tab bar.
 - Displaying SnappyData Build details along with product version number on Pulse UI.
 - Adding CSS,HTML, JS code changes for displaying version details pop up.

* [SNAP-1377,SNAP-902] Proper handling of exception in case of Lead and Server HA (#65)

* [SNAP-1377] Added callback used for checking CacheClosedException

* [SNAP-1377] Added check for GemfirexdRuntimeException and GemfireXDException

* Added license header in source file

* Fix issue seen during precheckin

* Snap 1833 (#67)

Added a fallback path for WholeStageCodeGenRDD. As we dynamically change the classloader, generated code compile time classloaders and runtime class loader might be different. There is no clean way to handle this apart from recompiling the generated code.
This code path will be executed only in case of components having dynamically changing class loaders i.e Snappy jobs & UDFs. Other sql queries won't be impacted by this.

* Refactored the executor exception handling for cache (#71)

Refactored the executor exception handling for cache closed exception.

* [SNAP-1930] Rectified a code in WholeStageCodeGenRdd. (#73)

This change will avoid repeatedly calling code compilation incase of a ClassCastException.

* Snap 1813 : Security - Add Server (Jetty web server) level user authentication for Web UI in SnappyData. (#72)

* SNAP-1813: Security - Add Server (Jetty web server) level user authentication for Web UI in SnappyData.
Changes:
 - Adding Securty handler in jetty server with Basic Authentication.
  - Adding LDAP Authentication code changes for Snappy UI. Authenticator (SnappyBasicAuthenticator) is initialized by snappy leader.

* [SNAPPYDATA] fixing scalastyle failure introduced by last commit

merge of SNAP-1813 in 6b8f59e58f6f21103149ebacebfbaa5b7a5cbf00 introduced scalastyle failure

* Resized company logo (#74)

* Changes:.
 - Adding resized SnappyData Logo for UI .
 - Displaying spark version in version details pop up.
 - Code/Files(unused logo images) clean up.
 - Updated CSS

* [SNAPPYDATA] update janino to latest release 3.0.7

* [SNAP-1951] move authentication handler bind to be inside connect (#75)

When bind to default 5050 port fails, then code clears the loginService inside
SecurityHandler.close causing the next attempt on 5051 to fail with
"IllegalStateException: No LoginService for SnappyBasicAuthenticator".

This change moves the authentication handler setting inside the connect method.

* Bump version spark 2.1.1.1-rc1, store 1.5.6-rc1 and sparkJobserver 0.6.2.6-rc1

* Updated the year in the Snappydata copyright header. (#76)

* [SNAPPYDATA] upgrade netty versions (SPARK-18971, SPARK-18586)

- upgrade netty-all to 4.0.43.Final (SPARK-18971)
- upgrade netty-3.8.0.Final to netty-3.9.9.Final for security vulnerabilities (SPARK-18586)

* Added code to dump generated code in case of exception (#77)

## What changes were proposed in this pull request?

Added code to dump generated code in case of exception in the server side. hasNext function of the iterator is the one that fails in case of an excpetion. Added exception handling for next as well, just in case. 

## How was this patch tested?

Manual. Precheckin.

* [SNAPPYDATA] more efficient passing of non-primitive literals

Instead of using CodegenFallback, add the value directly as reference object.
Avoids an unncessary cast for every loop (and a virtual call)
  as also serialized object is smaller.

* [SNAP-1993] Optimize UTF8String.contains (#78)

- Optimized version of UTF8String.contains that improves performance by 40-50%.
  However, it is still 1.5-3X slower than JDK String.contains (that probably uses JVM intrinsics
  since the library version is slower than the new UTF8String.contains)
- Adding native JNI hooks to UTF8String.contains and ByteArrayMethods.arrayEquals if 
  present.

Comparison when searching in decently long strings (100-200 characters from customers.csv treating full line as a single string).

Java HotSpot(TM) 64-Bit Server VM 1.8.0_144-b01 on Linux 4.10.0-33-generic
Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
compare contains:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
UTF8String (orig)                              241 /  243          4.7         214.4       1.0X
UTF8String (opt)                               133 /  137          8.4         118.4       1.8X
String                                          97 /   99         11.6          86.4       2.5X
Regex                                          267 /  278          4.2         237.5       0.9X

* Fix to avoid dumping of gen code in case of low memory exception.  (#79)

* Don't log the generated code when a low memory exception is being thrown. Also, fixed a review comment that print a exception message before the generated code.

* [SNAPPYDATA][AQP-293] Native JNI callback changes for UTF8String (#80)

- added MacOSX library handling to Native; made minimum size to use JNI
  as configurable (system property "spark.utf8.jniSize")
- added compareString to Native API for string comparison
- commented out JNI for ByteArrayMethods.arrayEquals since it is seen to be less efficient
  for cases where match fails in first few bytes (JNI overhead of 5-7ns is far more)
- made the "memory leak" warning in Executor to be debug level; reason being that
  it comes from proper MemoryConsumers so its never a leak and it should not be
  required of MemoryConsumers to always clean up memory
  (unnecessary additional task listeners for each ParamLiteral)
- pass source size in Native to make the API uniform

* [SNAPPYDATA] update jetty version

update jetty to latest 9.2.x version in an attempt to fix occasional "bad request" errors
seen currently on dashboard

* [SNAP-2033] pass the original number of buckets in table via OrderlessHashPartitioning (#82)

also reduced parallel forks in tests to be same as number of processors/cores

* Update versions for snappydata 1.0.0, store 1.6.0, spark 2.1.1.1 and spark-jobserver 0.6.2.6

* [SNAPPYDATA] use common "vendorName" in build scripts

* [SPARK-21967][CORE] org.apache.spark.unsafe.types.UTF8String#compareTo Should Compare 8 Bytes at a Time for Better Performance

* Using 64 bit unsigned long comparison instead of unsigned int comparison in `org.apache.spark.unsafe.types.UTF8String#compareTo` for better performance.
* Making `IS_LITTLE_ENDIAN` a constant for correctness reasons (shouldn't use a non-constant in `compareTo` implementations and it def. is a constant per JVM)

Build passes and the functionality is widely covered by existing tests as far as I can see.

Author: Armin <me@obrown.io>

Closes #19180 from original-brownbear/SPARK-21967.

* [SNAPPYDATA] relax access-level of Executor thread pools to protected

* [SNAPPYDATA] Fix previous conflict in GenerateUnsafeProjection (#84)

From @jxwr: remove two useless lines.

* [SPARK-18586][BUILD] netty-3.8.0.Final.jar has vulnerability CVE-2014-3488 and CVE-2014-0193

## What changes were proposed in this pull request?

Force update to latest Netty 3.9.x, for dependencies like Flume, to resolve two CVEs. 3.9.2 is the first version that resolves both, and, this is the latest in the 3.9.x line.

## How was this patch tested?

Existing tests

Author: Sean Owen <sowen@cloudera.com>

Closes #16102 from srowen/SPARK-18586.

* [SPARK-18951] Upgrade com.thoughtworks.paranamer/paranamer to 2.6

## What changes were proposed in this pull request?
I recently hit a bug of com.thoughtworks.paranamer/paranamer, which causes jackson fail to handle byte array defined in a case class. Then I find https://github.com/FasterXML/jackson-module-scala/issues/48, which suggests that it is caused by a bug in paranamer. Let's upgrade paranamer. Since we are using jackson 2.6.5 and jackson-module-paranamer 2.6.5 use com.thoughtworks.paranamer/paranamer 2.6, I suggests that we upgrade paranamer to 2.6.

Author: Yin Huai <yhuai@databricks.com>

Closes #16359 from yhuai/SPARK-18951.

* [SPARK-18971][CORE] Upgrade Netty to 4.0.43.Final

## What changes were proposed in this pull request?

Upgrade Netty to `4.0.43.Final` to add the fix for https://github.com/netty/netty/issues/6153

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #16568 from zsxwing/SPARK-18971.

* [SPARK-19409][BUILD] Bump parquet version to 1.8.2

## What changes were proposed in this pull request?

According to the discussion on #16281 which tried to upgrade toward Apache Parquet 1.9.0, Apache Spark community prefer to upgrade to 1.8.2 instead of 1.9.0. Now, Apache Parquet 1.8.2 is released officially last week on 26 Jan. We can use 1.8.2 now.

https://lists.apache.org/thread.html/af0c813f1419899289a336d96ec02b3bbeecaea23aa6ef69f435c142%3Cdev.parquet.apache.org%3E

This PR only aims to bump Parquet version to 1.8.2. It didn't touch any other codes.

## How was this patch tested?

Pass the existing tests and also manually by doing `./dev/test-dependencies.sh`.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #16751 from dongjoon-hyun/SPARK-19409.

* [SPARK-19409][BUILD][TEST-MAVEN] Fix ParquetAvroCompatibilitySuite failure due to test dependency on avro

## What changes were proposed in this pull request?

After using Apache Parquet 1.8.2, `ParquetAvroCompatibilitySuite` fails on **Maven** test. It is because `org.apache.parquet.avro.AvroParquetWriter` in the test code used new `avro 1.8.0` specific class, `LogicalType`. This PR aims to fix the test dependency of `sql/core` module to use avro 1.8.0.

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/2530/consoleFull

```
ParquetAvroCompatibilitySuite:
*** RUN ABORTED ***
  java.lang.NoClassDefFoundError: org/apache/avro/LogicalType
  at org.apache.parquet.avro.AvroParquetWriter.writeSupport(AvroParquetWriter.java:144)
```

## How was this patch tested?

Pass the existing test with **Maven**.

```
$ build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver test
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 02:07 h
[INFO] Finished at: 2017-02-04T05:41:43+00:00
[INFO] Final Memory: 77M/987M
[INFO] ------------------------------------------------------------------------
```

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #16795 from dongjoon-hyun/SPARK-19409-2.

* [SPARK-19411][SQL] Remove the metadata used to mark optional columns in merged Parquet schema for filter predicate pushdown

There is a metadata introduced before to mark the optional columns in merged Parquet schema for filter predicate pushdown. As we upgrade to Parquet 1.8.2 which includes the fix for the pushdown of optional columns, we don't need this metadata now.

Jenkins tests.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #16756 from viirya/remove-optional-metadata.

* [SPARK-19409][SPARK-17213] Cleanup Parquet workarounds/hacks due to bugs of old Parquet versions

## What changes were proposed in this pull request?

We've already upgraded parquet-mr to 1.8.2. This PR does some further cleanup by removing a workaround of PARQUET-686 and a hack due to PARQUET-363 and PARQUET-278. All three Parquet issues are fixed in parquet-mr 1.8.2.

## How was this patch tested?

Existing unit tests.

Author: Cheng Lian <lian@databricks.com>

Closes #16791 from liancheng/parquet-1.8.2-cleanup.

* [SPARK-20449][ML] Upgrade breeze version to 0.13.1

Upgrade breeze version to 0.13.1, which fixed some critical bugs of L-BFGS-B.

Existing unit tests.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #17746 from yanboliang/spark-20449.

(cherry picked from commit 67eef47acfd26f1f0be3e8ef10453514f3655f62)
Signed-off-by: DB Tsai <dbtsai@dbtsai.com>

* [SNAPPYDATA] version upgrades as per previous cherry-picks

Following cherry-picked versions for dependency upgrades that fix various issues:
553aac5, 1a64388, a8567e3, 26a4cba, 55834a8

Some were already updated in snappy-spark while others are handled in this.

* Snap 2044 (#85)

* Corrected SnappySession code.

* Snap 2061 (#83)

* added previous code for reference

* added data validation in the test

* Incorporated review comments. added test for dataset encoder conversion to dataframe.

* [SNAPPYDATA] build changes/fixes (#81)

- update gradle to 3.5
- updated many dependencies to latest bugfix releases
- changed provided dependencies to compile/compileOnly
- changed deprecated "<<" with doLast
- changed deprecated JavaCompile.forkOptions.executable with javaHome
- gradlew* script changes as from upstream release
  (as updated by ./gradlew wrapper --gradle-version 3.5.1)

* [SNAP-2061] fix scalastyle errors, add test

- fix scalastyle errors in SQLContext
- moved the Dataset/DataFrame nested POJO tests to JavaDatasetSuite from SQLContextSuite
- added test for Dataset.as(Encoder) for nested POJO in the same

* [SPARK-17788][SPARK-21033][SQL] fix the potential OOM in UnsafeExternalSorter and ShuffleExternalSorter

In `UnsafeInMemorySorter`, one record may take 32 bytes: 1 `long` for pointer, 1 `long` for key-prefix, and another 2 `long`s as the temporary buffer for radix sort.

In `UnsafeExternalSorter`, we set the `DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD` to be `1024 * 1024 * 1024 / 2`, and hoping the max size of point array to be 8 GB. However this is wrong, `1024 * 1024 * 1024 / 2 * 32` is actually 16 GB, and if we grow the point array before reach this limitation, we may hit the max-page-size error.

Users may see exception like this on large dataset:
```
Caused by: java.lang.IllegalArgumentException: Cannot allocate a page with more than 17179869176 bytes
at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:241)
at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:121)
at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:374)
at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:396)
at org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:94)
...
```

Setting `DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD` to a smaller number is not enough, users can still set the config to a big number and trigger the too large page size issue. This PR fixes it by explicitly handling the too large page size exception in the sorter and spill.

This PR also change the type of `spark.shuffle.spill.numElementsForceSpillThreshold` to int, because it's only compared with `numRecords`, which is an int. This is an internal conf so we don't have a serious compatibility issue.

TODO

Author: Wenchen Fan <wenchen@databricks.com>

Closes #18251 from cloud-fan/sort.

* [SNAPPYDATA] add missing jersey-hk2 dependency

required after the upgrade to jersey 2.26 that does not include it automatically
(used by Executors tab in the GUI)

guard debug logs with "debugEnabled()"

* [SNAPPYDATA][SNAP-2120] make codegen cache size configurable (#87)

- use "spark.sql.codegen.cacheSize" to set codegenerator cache size else default to 1000
- also added explicit returns in MemoryPool else it does boxing/unboxing inside
  the sync block that also shows up in perf analysis (can be seen via decompiler too)
- avoid NPE for "Stages" tab of a standby lead

* Snap 2084 (#86)

If SnappyUMM is found in classpath , SparkEnv will assign the memory manager to SnappyUMM.If user has explicitly set the memory manager that will take precedence.

* [SNAPPYDATA] some optimizations to ExecutionMemoryPool

- avoid multiple lookups into the map in ExecutionMemoryPool.releaseMemory
- avoid an unnecessary boxing/unboxing by adding explicit return from lock.synchronized blocks

* [SNAP-2087] fix ArrayIndexOutOfBoundsException with JSON data

- issue is the custom code generation added for homogeneous Struct types
  where isNullAt check used an incorrect index variable
- also cleaned up determination of isHomogeneousStruct in both safe/unsafe projection

* [SNAPPYDATA] fixing all failures in snappy-spark test suite

Three broad categories of issues fixed:

- handling of double values in JSON conversion layer of the metrics; upstream spark has all
  metrics as Long but snappy-spark has the timings one as double to give more accurate results
- library version differences between Spark's maven poms and SnappyData's gradle builds;
  these are as such not product issues but this checkin changes some versions to be
  matching to maven builds to be fully upstream compatible
- path differences in test resource files/jars when run using gradle rather than using maven

Other fixes and changes:

- the optimized Decimal.equals gave incorrect result in case the scale of the two is different;
  this followed the Java BigDecimal convention of returning false if the scale is different
  but that is incorrect as per Spark's conventions; this should normally not happen from catalyst
  layer but can happen in RDD operations
- correct accumulator result in Task to be empty rather than null when nothing present
- override the extended two argument DStream.initialize in MapWithStateDStream.initialize
- correct the UI path for Spark cache to be "/Spark Cache/" rather than "/storage/"
- avoid sending the whole child plan across in DeserializeToObjectExec to executors when
  only the output is required (also see SNAP-1840 caused due to this)
- rounding of some of the time statistics (that are accumulated as double) in Spark metrics
- SparkListenerSuite local metrics tests frequently failed due to deserialization time being zero
  (despite above change); the reason being the optimizations in snappy-spark that allow it to
  run much quicker and not registering even with System.nanoTime(); now extended the closure
  to force a 1 milliseond sleep in its readExternal method
- use spark.serializer consistently for data only and spark.closureSerializer for others
  (for the case the two are different)
- don't allow inline message size to exceed spark.rpc.message.maxSize
- revert default spark.locality.wait to be 3s in Spark (will be set at snappydata layer if required)
- make SparkEnv.taskLogger to be serializable if required (extend Spark's Logging trait)
- account for task decompression time in the deserialization time too

The full spark test suite can be run either by:

- ./dev/snappy-build.sh && ./dev/snappy-build.sh test (or equivalent)
- ./gradlew check
- from SnappyData:
  - ./gradlew snappy-spark:check, OR
  - ./gradlew precheckin -Pspark (for full test suite run including snappydata suite)

For SnappyData product builds, one of the last two ways from SnappyData should be used

* [SNAPPYDATA] fixing one remaining failure in gradle runs

* Preserve the preferred location in MapPartitionRDD. (#92)

* * SnappyData Spark Version 2.1.1.2

* [SNAP-2218] honour timeout in netty RPC transfers (#93)

use a future for enforcing timeout (2 x configured value) in netty RPC transfers
after which the channel will be closed and fail

* Check for null connection. (#94)

If connection is not established properly null connection should be handled properly.

* [SNAPPYDATA] revert changes in Logging to upstream

reverting flag check optimization in Logging to be compatible with upstream Spark

* [SNAPPYDATA] Changed TestSparkSession in test class APIs to base SparkSession

This is to allow override by SnappySession extensions.

* [SNAPPYDATA] increased default codegen cache size to 2K

also added MemoryMode in MemoryPool warning message

* [SNAP-2225] Removed OrderlessHashPartitioning. (#95)

Handled join order in optimization phase. Also removed custom changes in HashPartition. We won't store bucket information in HashPartitioning. Instead based on the flag "linkPartitionToBucket" we can determine the number of partitions to be either numBuckets or num cores assigned to the executor. 
Reverted changes related to numBuckets in Snappy Spark.

* [SNAP-2242] Unique application names & kill app by names (#98)

The standalone cluster should support unique application names. As they are user visible and easy to track user can write scripts to kill applications by names.
Also, added support to kill Spark applications by names(case insensitive).

* [SNAPPYDATA] make Dataset.boundEnc as lazy val

avoid materializing it immediately (for point queries that won't use it)

* fix for SNAP-2342 . enclosing with braces when the child plan of aggregate nodes are not simple relations or subquery aliases (#101)

* Snap 1334 : Auto Refresh feature for Dashboard UI  (#99)

* SNAP-1334:

Summary:

- Fixed the JQuery DataTable Sorting Icons problem in the Spark UI by adding missing sort icons and CSS.

- Adding new snappy-commons.js JavaScript for common utility functions used by Snappy Web UI.

- Updated Snappy Dashboard and Member Details JavaScripts for following
   1. Creating and periodically updating JQuery Data Tables for Members, Tables and External Tables tabular lists.
   2. Loading , creating and updating Google Charts.
   3. Creating and periodically updating the Google Line Charts for CPU and various Memory usages.
   4. Preparing and making AJAX calls to snappy specific web services.
   5. Updated/cleanup of Spark UIUtils class.

Code Change details:

- Sparks UIUtils.headerSparkPage customized to accommodate snappy specific web page changes.
- Removed snappy specific UIUtils.simpleSparkPageWithTabs as most of the content was similar to UIUtils.headerSparkPage.
- Adding snappy-commons.js javascript script for utility functions used by Snappy UI.
- JavaScript implementation of New Members Grid on Dashboard page for displaying members stats and which will auto-refresh periodically.
- JavaScript code changes for rendering collapsible details in members grid for description, heap and off-heap.
- JavaScript code changes for rendering progress bar for CPU and Memory usages.
- Display value as "NA" wherever applicable in case of Locator node.
- JavaScript code implementation for displaying Table stats and External Table stats.
- Changes for periodic updating of Table stats and External Table stats.
- CSS updated for page styling and code formatting.
- Adding Sort Control Icons for data tables.
- - Code changes for adding, loading and rendering google charts for snappy members usages trends.
- Displaying cluster level usage trends for Average CPU, Heap and Off-Heap with their respective storage and execution splits and Disk usage.
- Removed Snappy page specific javaScripts from UIUtils to respective page classes.
- Grouped all dashboard related ajax calls into single ajax call clusterinfo.
- Utility function convertSizeToHumanReadable is updated in snappy-commons.js to include TB size.
- All line charts updated to include crosshair pointer/marks.
- Chart titles updated with % sign and GB for size to indicate values are in percents or in GB.
- Adding function updateBasicMemoryStats to update members basic memory stats.
- Displaying Connection Error message whenever cluster goes down.
- Disable sorting on Heap and Off-Heap columns, as cell contains multiple values in different units.

* Fixes for SNAP-2376: (#102)

- Adding 5 seconds timeout for auto refresh AJAX calls.
- Displays request timeout message in case AJAX request takes longer than 5 seconds.

* [SNAP-2379] App was getting registered with error (#103)

This change pertains to the modification to Standalone cluster for not allowing applications with the same name.
The change was erroneous and was allowing the app to get registered even after determining a duplicate name.

* Fixes for SNAP-2383: (#106)

- Adding code changes for retaining page selection in tables during stats auto refresh.

* Handling of POJOs containg array of Pojos while creating data frames (#105)

* Handling of POJOs containg array of Pojos while creating data frames

* added bug test for SNAp-2384

* Spark compatibility (#107)

Made overrideConfs as a variable. & made a method protected.

* Fixes for SNAP-2400 : (#108)

- Removed (commented out) timeout from AJAX calls.

* Code changes for SNAP-2144: (#109)

* Code changes for SNAP-2144:
 - JavaScript and CSS changes for displaying CPU cores details on Dashboard page.
 - Adding animation effect to CPU Core details.

* Fixes for SNAP-2415: (#110)

- Removing z-index.

* Fixing scala style issue.

* Code changes for SNAP-2144:
  - Display only Total CPU Cores count and remove cores count break up (into locators, leads
    and data servers).

* Reverting previous commit.

* Code changes for SNAP-2144: (#113)

- Display only Total CPU Cores count and remove cores count break up (into locators, leads
    and data servers).

* Fixes for SNAP-2422: (#112)

  - Code changes for displaying error message if loading Google charts library fails.
  - Code changes for retrying loading of Google charts library.
  - Update Auto-Refresh error message to guide user to go to lead logs if there is any connectivity issue.

* Fix to SNAP-2247 (#114)

* This is a Spark bug.
Please see PR https://github.com/apache/spark/pull/17529
Needed to do similar change in the code path of prepared statement
where precision needed to be adjusted if smaller than scale.

* Fixes for SNAP-2437: (#115)

- Updating CSS, to fix the member description details alignment issue.

* SNAP-2307 fixes (#116)

SNAP-2307 fixes related to SnappyTableScanSuite

* reverting changes done in pull request #116 (#119)

Merging after discussing with Rishi

* Code changes for ENT-21: (#118)

- Adding skipHandlerStart flag based on which handler can be started, wherever applicable.
 - Updating access specifiers.

* * Bump up version to 2.1.1.3

* [SNAPPYDATA] fixed scalastyle

* * Version 2.1.1.3-RC1

* Code changes for SNAP-2471: (#120)

- Adding close button in the SnappyData Version Details Pop Up to close it.

* * [ENT-46] Mask sensitive information. (#121)

* Code changes for SNAP-2478: (#122)

 - Updating font size of members basic statistics on Member Details Page.
 - Display External Tables only if available.

* Fixes for SNAP-2377: (#123)

- To fix Trend charts layout issue, changing fixed width to width in percent for all trends charts on UI.

* [SNAPPY-2511] initialize SortMergeJoin build-side scanner lazily (#124)

Avoid sorting the build side of SortMergeJoin if the streaming side is empty.

This already works that way for inner joins with code generation where the build side
is initialized on first call from processNext (using the generated variable
   "needToSort" in SortExec). This change also enables the behaviour for non-inner
join queries that use "SortMergeJoinScanner" that instantiates build-side upfront.

* [SPARK-24950][SQL] DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13

- Update DateTimeUtilsSuite so that when testing roundtripping in daysToMillis and millisToDays multiple skipdates can be specified.
- Updated test so that both new years eve 2014 and new years day 2015 are skipped for kiribati time zones.  This is necessary as java versions pre 181-b13 considered new years day 2015 to be skipped while susequent versions corrected this to new years eve.

Unit tests

Author: Chris Martin <chris@cmartinit.co.uk>

Closes #21901 from d80tb7/SPARK-24950_datetimeUtilsSuite_failures.

(cherry picked from commit c5b8d54c61780af6e9e157e6c855718df972efad)
Signed-off-by: Sean Owen <srowen@gmail.com>

* [SNAP-2569] remove explicit HiveSessionState dependencies

To enable using any SparkSession with Spark's HiveServer2, explicit
dependencies on HiveSessionState in processing have been removed.

* [SNAPPYDATA] make Benchmark class compatible with upstream

* [SNAPPYDATA] fix default bind-address of ThriftCLIService

- ThriftCLIService uses InetAddress.getLocalHost() as default address to be shown
  but hive thrift server actually uses InetAddress.anyLocalAddress()
- honour bind host property in ThriftHttpCLIService too

* [SNAPPYDATA] generate spark-version-info.properties in source path

spark-version-info.properties is now generated in src/main/extra-resources
rather than in build output so that IDEA can pick it up cleanly

remove Kafka-0.8 support from build: updated examples for Kafka-0.10

* [SNAPPYDATA] Increase hive-thrift shell history file size to 50000 lines

- skip init to set history max-size else it invokes load() in constructor
  that truncates the file to default 500 lines
- update jline to 2.14.6 for this new constructor (https://github.com/jline/jline2/issues/277)
- add explicit dependency on jline2 in hive-thriftserver to get the latest version

* [SNAPPYDATA] fix RDD info URLs to "Spark Cache"

- corrected the URL paths for RDDs to use /Spark Cache/ instead of /storage/
- updated effected tests

* [SNAPPYDATA] improved a gradle dependency to avoid unnecessary re-evaluation

* Changed the year frim 2017 to 2018 in license headers.

* SNAP-2602 : On snappy UI, add column named "Overflown Size"/ "Disk Size" in Tables. (#127)

* Changes for SNAP-2602:
 - JavaScript changes for displaying tables overflown size to disk as Spill-To-Disk size.

* Changes for SNAP-2612: (#126)

- Displaying external tables fully qualified name (schema.tablename).

* SNAP-2661 : Provide Snappy UI User a control over Auto Update (#128)

* Changes for SNAP-2661 : Provide Snappy UI User a control over Auto Update
 - Adding JavaScript and CSS code changes for Auto Update ON/OFF Switch on Snappy UI (Dashboard and Member Details page).

* [SNAPPYDATA] Property to set if hive meta-store client should use isolated ClassLoader (#132)

- added a property to allow setting whether hive client should be isolated or not
- improved message for max iterations warning in RuleExecutor

* [SNAP-2751] Enable connecting to secure SnappyData via Thrift server (#130)

* * Changes from @sumwale to set the credentials from thrift layer into session conf.

* * This fixes an issue with RANGE operator in non-code generated plans (e.g. if too many target table columns)
* Patch provided by @sumwale

* avoid dumping generated code in quick succession for exceptions

* correcting scalastyle errors

* * Trigger authentication check irrespective of presence of credentials.

* [SNAPPYDATA] update gradle to version 5.0

- updated builds for gradle 5.0
- moved all embedded versions to top-level build.gradle

* change javax.servlet-api version to 3.0.1

* Updated the janino compiler version similar to upstream spark (#134)

Updated the Janino compiler dependency version similar/compatible with the spark dependencies.

* Changes for SNAP-2787: (#137)

- Adding an option "ALL" in Show Entries drop down list of tabular lists, in order to display all the table entries to avoid paging.

* Fixes for SNAP-2750: (#131)

- Adding JavaScript plugin code for JQuery Data Table to sort columns containing file/data sizes in human readable form.
- Updating HTML, CSS and JavaScript, for sorting, of tables columns.

* Changes for SNAP-2611: (#138)

- Setting configuration parameter for setting ordering column.

* SNAP-2457 - enabling plan caching for hive thrift server sessions. (#139)

* Changes for SNAP-2926: (#142)

- Changing default page size for all tabular lists from 10 to 50.
- Sorting Members List tabular view on Member Type for ordering all nodes such that all locators first, then all leads and then all servers.

* Snap 2900 (#140)

Changes:
  * For SNAP-2900
    - Adding HTML, CSS, and JavaScript code changes for adding Expand and Collapse control button against each members list entry. Clicking on this control button, all additional cell details will be displayed or hidden.
    - Similarly adding parent expand and collapse control to expand and collapse all rows in the table in single click.
    - Removing existing Expand and Collapse control buttons per cell, as those will be redundant.

  * For SNAP-2908
    - Adding third party library Jquery Sparklines to add sparklines (inline charts) in members list for CPU and Memory Usages.
    - Adding HTML, CSS, and JavaScript code changes for rendering CPU and Memory usages Sparklines.

  * Code clean up.
    - Removing unused icons and images.
    - removing unused JavaScript Library liquidFillGauge.js

* Changes for SNAP-2908: [sparkline enhancements] (#143)

[sparkline enhancements]
  * Adding text above sparklines to display units and time duration of charts.
  * Formatting sparkline tooltips to display numbers with 3 precision places.

* [SNAP-2934] Avoid double free of page that caused server crash due to SIGABORT/SIGSEGV (#144)

* [SNAP-2956] Wrap non fatal OOME from Spark layer in a LowMemoryException (#146)

* Fixes for SNAP-2965: (#147)

- Using disk store UUID as an unique identifier for each member node.

* [SNAPPYDATA] correcting typo in some exception messages

* SNAP-2917 - generating SparkR library along with snappy product (#141)

removing some unused build code

* [SPARK-21523][ML] update breeze to 0.13.2 for an emergency bugfix in … (#149)

* [SPARK-21523][ML] update breeze to 0.13.2 for an emergency bugfix in 
strong wolfe line search

## What changes were proposed in this pull request?

Update breeze to 0.13.1 for an emergency bugfix in strong wolfe line search
scalanlp/breeze#651

Most of the content of this PR is cherry-picked from https://github.com/apache/spark/commit/b35660dd0e930f4b484a079d9e2516b0a7dacf1d with 
minimal code changes done to resolve merge conflicts.

---
Faced one test failure (ParquetHiveCompatibilitySuite#"SPARK-10177 timestamp") while running 
precheckin. This was due to recent upgrade in `jodd` library version to `5.0.6`. Downgraded `jodd`
 library version to `3.9.1` to fix this failure. 
Note that this changes is independent from breeze version upgrade.

* Changes for SNAP-2974 : Snappy UI re-branding to TIBCO ComputeDB (#150)

* Changes for SNAP-2974: Snappy UI re-branding to TIBCO ComputeDB
  1. Adding TIBCO ComputDB product logo
  2. Adding Help Icon, clicking on which About box is displayed
  3. Updating About Box content
     - Adding TIBCO ComputeDB product name and its Edition type
     - Adding Copyright information
     - Adding Assistance details web links
     - Adding Product Documentation link
  4. Removing or Changing user visible SnappyData references on UI to TIBCO ComputeDB.
  5. Renaming pages to just Dashboard, Member Details and Jobs
  6. Removing Docs link from tabs bar

* * Version changes

* Code changes for SNAP-2989: Snappy UI rebranding to Tibco ComputeDB iff it's Enterprise Edition  (#151)

Product UI updated for following:

 1. SnappyData is Community Edition
     - Displays Pulse logo on top left side.
     - Displays SnappyData logo on top right side.
     - About Box :
       Displays product name "Project SnappyData - Community Edition" 
       Displays product version, copyright information 
       Displays comunity product documentation link.

 2. TIBCO ComputeDB is Enterprise :
     - Displays TIBCO ComputeDB logo on top left side.
     - About Box:
       Displays product name "TIBCO ComputeDB - Enterprise Edition" 
       Displays product version, copyright information
       Displays enterprise product documentation link.

* * Updated some metainfo in prep for 1.1.0 release

* Changes for SNAP-2989: (#152)

- Removing SnappyData Community page link from Enterprise About Box.
- Fixes for issue SnappyData logo is displayed on first page load in Enterprise edition.

* [SNAPPYDATA] fix scalastyle error

* Spark compatibility fixes (#153)

- Spark compatibility suite fixes to make them work both in Spark and SD
- expand PathOptionSuite to check for data after table rename
- use Resolver to check intersecting columns in NATURAL JOIN

* Considering jobserver class loader as a key for generated code cache - (#154)

## Considering jobserver class loader as a key for generated code cache
For each submission of a snappy-job, a new URI class loader is used.
 The first run of a snappy-job may generate some code and it will be cached.
 The subsequent run of the snappy job will end up using the generated code
 which was cached by the first run of the job. This can lead to issues as the
 class loader used for the cached code is the one from the first job submission
 and subsequent submissions will be using a different class loader. This
 change is done to avoid such failures.

* SNAP-3054: Rename UI tab "JDBC/ODBC Server" to "Hive Thrift Server" (#156)

- Renaming tab name "JDBC/ODBC Server" to "Hive Thrift Server".

* SNAP-3015: Put thousands separators for Tables > Rows Count column in Dashboard. (#157)

- Adding thousands separators for table row count as per locale.

* Tracking spark block manager directories for each executors and cleaning
them in next run if left orphan.

* [SNAPPYDATA] fix scalastyle errors introduced by previous commit

* Revert: Tracking spark block manager directories for each executors and cleaning them in next run if left orphan.

* allow for override of TestHive session

* [SNAP-3010] Cleaning block manager directories if left orphan (#158)

## What changes were proposed in this pull request?
Tracking spark block manager directories for each executor and
 cleaning them in next run if left orphan.

The changes are for tracking the spark local directories (which
 are used by block manager to store shuffle data) and changes to
clean the local directories (which are left orphan due to abrupt
failure of JVM).

The changes to clean the orphan directory are also kept as part
of Spark module itself instead of cleaning it on Snappy Cluster start.
This is done because the changes to track the local directory has to
go in Spark and if the clean up is not done at the same place then
the metadata file used to track the local directories will keep 
growing while running spark cluster from snappy's spark distribution.

This cleanup is skipped when master is local because in local mode
driver and executors will end up writing `.tempfiles.list` file in the
same directory which may l…
sumwale pushed a commit to sumwale/spark that referenced this issue Nov 5, 2020
…BCOSoftware#27)

- added back configurable closure serializer in Spark which was removed in SPARK-12414;
  some minor changes taken from closed Spark PR apache#6361
- added optimized Kryo serialization for multiple classes; currently registration and
  string sharing fix for kryo (EsotericSoftware/kryo#128) is
  only in the SnappyData layer PooledKryoSerializer implementation;
  classes providing maximum benefit have added KryoSerializable notably Accumulators and *Metrics
- use closureSerializer for Netty messaging too instead of fixed JavaSerializer
- set ordering field with kryo serialization in GenerateOrdering
- fixing scalastyle and compile errors
ashetkar pushed a commit to TIBCOSoftware/snappydata that referenced this issue Apr 20, 2021
- new PooledKryoSerializer that does pooling of Kryo objects (else performance is bad if
    new instance is created for every call which needs to register and walk tons of classes)
- has an overridden version for ASCII strings to fix (EsotericSoftware/kryo#128);
  currently makes a copy but will be modified to use one extra byte to indicate end of string
- optimized external serializers for StructType, and Externalizable having readResolve() method;
  using latter for StorageLevel and BlockManagerId
- added optimized serialization for the closure used by SparkSQLExecuteImpl (now a proper class instead);
  copied part of changes for LIMIT from cf6976f on SNAP-1067 to avoid
  merge pains later
- fixed index column determination in RowFormatRelation (was off by 1 due to 0 based vs 1 based)
sumwale pushed a commit to TIBCOSoftware/snappy-spark that referenced this issue Jul 11, 2021
- added back configurable closure serializer in Spark which was removed in SPARK-12414;
  some minor changes taken from closed Spark PR apache#6361
- added optimized Kryo serialization for multiple classes; currently registration and
  string sharing fix for kryo (EsotericSoftware/kryo#128) is
  only in the SnappyData layer PooledKryoSerializer implementation;
  classes providing maximum benefit have added KryoSerializable notably Accumulators and *Metrics
- use closureSerializer for Netty messaging too instead of fixed JavaSerializer
- set ordering field with kryo serialization in GenerateOrdering
- fixing scalastyle and compile errors
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants