Example Pull Request: using 9897 #1

munderseth · 2018-02-06T15:32:21Z

Example only. To demonstrate how Testspace handles forked Pull Requests.

Using presto#9897 - has test failures.

Go to fork-PR-1 and select Failures.

Previously this method was just logging an error.

pageBufferClientCallbackExecutor is a fixed-sized thread pool, so wrapping it with a BoundedExecutor is unnecessary.

Since memory allocation in `MemoryPool.reserve()` is instantaneous, we should free the memory in previous pool without delay.

This is "more correct", as pools are stateful.

Fixes prestodb#9488

Separate handling of joins and semi-joins.

Fixes prestodb#8480

This allows connectors extending `presto-base-jdbc` to support other data types than supported by `BaseJdbcClient` and `JdbcRecordCursor` out of the box.

Previously the exception would say `No page sink provider for connector 'catalog-name'`.

toFailure(s) can create cyclic suppression or causes given it is recursive. Use a hash set to detect seen failures.

It causes compilation failure.

Previously, RENAME TABLE in presto-memory always renames table to same schema even if the target schema is specified. This commit changes the behavior to use specified schema. If the schema does not exist, throw SchemaNotFoundException.

The fact `object` type matches type's java type is asserted earlier.

"proper literal" means just a literal or a literal wrapped with a CAST. Previously, `toExpression(..., CHAR(x))` would return a "magic literal function call" which is hard to analyze.

This is only refactor, moving `decrementReferenceCounts` from _after_ calling `deleteGroup` into that method. Since there is exactly one call-site, this doesn't change current behavior. Motivation: this makes `deleteGroup` to maintain invariants.

We now run a query that will run for a longer time to give more time to the SqlQueryManager thread to kill it. We also explicitly wait for the final query state after running the query instead of waiting for the SqlQueryManager thread implicitly.

Add Hive storage format and respect table format session properties. These two properties are experimental and should be reverted in the future.

Previously when setMemoryPool is called while an operator is allocating memory it's possible that these two threads deadlock as they acquire the monitors in different orders. The thread calling setMemoryPool was acquiring the monitor of the QueryContext instance/this first and then the monitor of the user/revocable memory contexts in the queryMemoryContext. However, the driver threads reserving memory were acquiring the monitors of the user/revocable memory contexts first, and then the monitor of the QueryContext instance. This change solve the issue by preventing locking of the user/revocable memory contexts in the queryMemoryContext by getting the same information from the MemoryPool.

The generated code made the base state for aggregations a SingleState type. Combine for the aggregation had to handle merging that empty single into the first group. This makes it a group state for consistency and to simplify group state's so they don't have to handle single state (shouldn't know anything about them)

Currently, an array of histogram objects per group-by-id is kept. This is inefficient for two major reasons: 1. there is a large # of java objects and incur overhead. Inside the histogram implementation, collections like LongBigArray or IntBigArray are used which have minimize sizes in the 100-200+ byte ranges. Histograms with a single item, or a few, still incur this penalty. 2. across group-by-ids, there may be the same value multiple times. * anecdotal analysis showed we do have group-by histograms that have few values ** after implementation, we found with yourkit that use cases that were holding 1 gb of ram dropped to 45mb of ram. *** profiling with JMH showed that the new version was equal to or better than the old version what: The solution here is to build a single object that includes the group-by-id in the key within an internal hash table. There are many details to support the appearance of a single histogram per group-by table both for transfer when doing intermediate combining, as well as final output. See GroupedTypedHistogram for the details. In short, this reduces the # of java objects to a constant number. Values are also stored once in another custom hash implementation. We also found that changing the probe function from linear to "sum of square" (sort of super-quadratic) performed better testing with JMH. This has also introduced a new config variable, or rather used one from a previous commit, that allows legacy (existing behavior) or the new one. We default to the new behavior, but allow rollback to the old behavior in case of some unexpected failure. Future work for readability, private inner classes were introduced. These don't appear to have affected perf (per JMH), but JFR confirms they aren't elided by the JVM. This is an opportunity to improve perf (causes 4.5% of gc pressure in some testing).

It's now a proper method in TupleDomain/Domain that can be used in other places.

If the tuple domain contains too many terms, the resulting expression can be very expensive to interpret in the call to shouldPrune. This reduces the number of discrete elements in the domain to produce smaller expression trees.

This avoids re-analyzing types and other overhead that was occurring on each invocation to evaluate the constraint passed to getTableLayouts()

Take a look at query `SELECT 1 FROM orders WHERE orderstatus = 'O'`. In that case FilterNode will be optimized to `currentConstraint` in `TableScanNode`, `TableScanNode` do not output any column. In order to support that in TPCH we need to filter out all rows which do not match `currentConstraint`. It is needed to have a `TpchTable` being accessible in `TpchRecordSet` so we could check column used in `currentConstraint` have values contained in the domain of `currentConstraint`.

There is no practical reason that one could prefer predicate pushdown disabled. In regards of testing for case with predicate pushdown enabled and disabled, one can select columns which either support or do not support predicate according to test needs.

`BenchmarkPlanner` runs both iterative optimizer enabled and disabled. Without iterative optimizer, and with `forceSingleNode=true`, the created plan is invalid and eventually fails. This disables `forceSingleNode` so that we always produce a plan without failure.

Testing changes: prestodb/docker-images#18

nezihyigitbasi and others added 30 commits January 16, 2018 09:50

Simplify array and map unnesters

bc9b4ac

Fix checkNotHoldsLock in HttpPageBufferClient

61ff7d0

Previously this method was just logging an error.

Remove unnecessary BoundedExecutor

4f8b97a

pageBufferClientCallbackExecutor is a fixed-sized thread pool, so wrapping it with a BoundedExecutor is unnecessary.

Fix deserialization of maps of bing tiles in the client

e4aae2c

Fix formatting - remove blank line before {

2babbe3

Fix ConcurrentModificationException when dropping Hive column

6764994

Rename cost concept to stats

146e43d

Complete move of memory without delay

7821281

Since memory allocation in `MemoryPool.reserve()` is instantaneous, we should free the memory in previous pool without delay.

Compare pools by reference

0adbef6

This is "more correct", as pools are stateful.

Fix freeing revocable memory after query moved to reserved pool

aba2cf8

Fixes prestodb#9488

Split DetermineJoinDistributionType optimizer into two

e373c53

Separate handling of joins and semi-joins.

Migrate DetermineJoinDistributionType to iterative optimizer

e6dad0d

Fixes prestodb#8480

Prefer broadcast for LEFT join without equi criteria

c9cd385

Make read transformations extensible in base-jdbc

dd60cdc

This allows connectors extending `presto-base-jdbc` to support other data types than supported by `BaseJdbcClient` and `JdbcRecordCursor` out of the box.

Update exception message when no page sink provided

3d5ac77

Previously the exception would say `No page sink provider for connector 'catalog-name'`.

Remove redundant throws clause from test code

adb4558

Remove redundant throws clause from main code

5e2e9c4

Validate spiller-threads configuration

b401a20

Validate spiller-spill-path and spill-enabled configuration

cf9541e

Report meaningful exception when no spill paths configured

399f787

Fix stack overflow when toFailure is invoked with a cyclic throwable

885d231

toFailure(s) can create cyclic suppression or causes given it is recursive. Use a hash set to detect seen failures.

Remove left over throws clause

be46929

It causes compilation failure.

Make default JDBC read mappings reusable

59d7e03

Fix query failure when lambda is in UNNEST/VALUES

a024eba

Make CLI and JDBC driver respect cookies

1536ce5

Remove redundant instanceof in LiteralInterpreter.toExpression

c37c0f5

The fact `object` type matches type's java type is asserted earlier.

Construct "proper literal" for CHAR in LiteralInterpreter.toExpression

4f1c499

"proper literal" means just a literal or a literal wrapped with a CAST. Previously, `toExpression(..., CHAR(x))` would return a "magic literal function call" which is hard to analyze.

Remove use of deprecated VarcharType.getLength

1ea375e

Implement normal CDF to function

b08ec02

findepi and others added 30 commits February 6, 2018 15:27

Maintain group back-references in Memo

da702cc

Change StatsCalculator API and cache stats in Memo

a748fef

Add Hive storage format session property

3c0dc1f

Add Hive storage format and respect table format session properties. These two properties are experimental and should be reverted in the future.

Return a proper exit code from CLI

44bb17f

Compare nanos delta instead of direct comparison

84e036a

Remove unused parser instance from CLI

0f6d599

Add update type to JDBC driver

6b49c13

Add client info option to CLI

f1b8bf7

Allow configuring network timeout for JDBC driver

25f00ea

Support setting client tags in JDBC driver

57c27b3

Add refresh failure counter to DB resource group manager

6cb75bb

Make TupleDomain simplify logic first class

111c5e7

It's now a proper method in TupleDomain/Domain that can be used in other places.

Make variable name more precise

9581056

Memoize construction of expression interpreter for layout constraints

2cca09e

This avoids re-analyzing types and other overhead that was occurring on each invocation to evaluate the constraint passed to getTableLayouts()

Rename variable

4edfd20

Make predicate to always present in TpchRecordSet

c05cd0a

Accept Tpch columns as List

06878be

Add a test that Presto works without iterative optimizer

52db5cd

Test optmized (in terms of size) hdp2.6-hive docker images

6efee46

Testing changes: prestodb/docker-images#18

Stress product tests

bdeb182

fixup! Stress product tests

2e7b124

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example Pull Request: using 9897 #1

Example Pull Request: using 9897 #1

munderseth commented Feb 6, 2018 •

edited

Loading

Example Pull Request: using 9897 #1

Are you sure you want to change the base?

Example Pull Request: using 9897 #1

Conversation

munderseth commented Feb 6, 2018 • edited Loading

munderseth commented Feb 6, 2018 •

edited

Loading