Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example Pull Request: using 9897 #1

Open
wants to merge 142 commits into
base: master
Choose a base branch
from
Open

Example Pull Request: using 9897 #1

wants to merge 142 commits into from

Conversation

munderseth
Copy link
Member

@munderseth munderseth commented Feb 6, 2018

Example only. To demonstrate how Testspace handles forked Pull Requests.

Using presto#9897 - has test failures.

Go to fork-PR-1 and select Failures.

nezihyigitbasi and others added 30 commits January 16, 2018 09:50
Previously this method was just logging an error.
pageBufferClientCallbackExecutor is a fixed-sized thread pool, so
wrapping it with a BoundedExecutor is unnecessary.
Since memory allocation in `MemoryPool.reserve()` is instantaneous, we
should free the memory in previous pool without delay.
This is "more correct", as pools are stateful.
Separate handling of joins and semi-joins.
This allows connectors extending `presto-base-jdbc` to support other
data types than supported by `BaseJdbcClient` and `JdbcRecordCursor` out
of the box.
Previously the exception would say `No page sink provider for connector
'catalog-name'`.
toFailure(s) can create cyclic suppression or causes given it is
recursive. Use a hash set to detect seen failures.
It causes compilation failure.
Previously, RENAME TABLE in presto-memory always renames table to
same schema even if the target schema is specified. This commit
changes the behavior to use specified schema. If the schema does
not exist, throw SchemaNotFoundException.
The fact `object` type matches type's java type is asserted earlier.
"proper literal" means just a literal or a literal wrapped with a CAST.
Previously, `toExpression(..., CHAR(x))` would return a "magic literal
function call" which is hard to analyze.
findepi and others added 30 commits February 6, 2018 15:27
This is only refactor, moving `decrementReferenceCounts` from _after_
calling `deleteGroup` into that method. Since there is exactly one
call-site, this doesn't change current behavior.

Motivation: this makes `deleteGroup` to maintain invariants.
We now run a query that will run for a longer time to give more time to the
SqlQueryManager thread to kill it. We also explicitly wait for the final
query state after running the query instead of waiting for the SqlQueryManager
thread implicitly.
Add Hive storage format and respect table format session properties.
These two properties are experimental and should be reverted in the
future.
Previously when setMemoryPool is called while an operator is allocating memory
it's possible that these two threads deadlock as they acquire the monitors in
different orders. The thread calling setMemoryPool was acquiring the monitor of
the QueryContext instance/this first and then the monitor of the user/revocable
memory contexts in the queryMemoryContext. However, the driver threads reserving memory
were acquiring the monitors of the user/revocable memory contexts first, and
then the monitor of the QueryContext instance.

This change solve the issue by preventing locking of the user/revocable memory contexts
in the queryMemoryContext by getting the same information from the MemoryPool.
The generated code made the base state for aggregations a
SingleState type. Combine for the aggregation had to handle merging
that empty single into the first group.

This makes it a group state for consistency and to simplify group
state's so they don't have to handle single state (shouldn't know
anything about them)
Currently, an array of histogram objects per group-by-id is kept. This
is inefficient for two major reasons:

1. there is a large # of java objects and incur overhead. Inside
the histogram implementation, collections like LongBigArray or
IntBigArray are used which have minimize sizes in the 100-200+ byte
ranges. Histograms with a single item, or a few, still incur this
penalty.

2. across group-by-ids, there may be the same value multiple times.

* anecdotal analysis showed we do have group-by histograms that have
few values
** after implementation, we found with yourkit that use cases that
were holding 1 gb of ram dropped to 45mb of ram.
*** profiling with JMH showed that the new version was equal to or
better than the old version

what:

The solution here is to build a single object that includes the
group-by-id in the key within an internal hash table. There are
many details to support the appearance of a single histogram
per group-by table both for transfer when doing intermediate
combining, as well as final output. See GroupedTypedHistogram for
the details. In short, this reduces the # of java objects to a constant
number. Values are also stored once in another custom hash
implementation.

We also found that changing the probe function from linear to
"sum of square" (sort of super-quadratic) performed better testing
with JMH.

This has also introduced a new config variable, or rather used one
from a previous commit, that allows legacy (existing behavior) or
the new one. We default to the new behavior, but allow rollback
to the old behavior in case of some unexpected failure.

Future work for readability, private inner classes were introduced.
These don't appear to have affected perf (per JMH), but JFR confirms
they aren't elided by the JVM. This is an opportunity to improve perf
(causes 4.5% of gc pressure in some testing).
It's now a proper method in TupleDomain/Domain that can
be used in other places.
If the tuple domain contains too many terms, the resulting expression
can be very expensive to interpret in the call to shouldPrune.

This reduces the number of discrete elements in the domain to produce
smaller expression trees.
This avoids re-analyzing types and other overhead that was occurring
on each invocation to evaluate the constraint passed to getTableLayouts()
Take a look at query `SELECT 1 FROM orders WHERE orderstatus = 'O'`.
In that case FilterNode will be optimized to `currentConstraint` in
`TableScanNode`, `TableScanNode` do not output any column.

In order to support that in TPCH we need to filter out all rows which do
not match `currentConstraint`. It is needed to have a
`TpchTable` being accessible in `TpchRecordSet` so we could check
column used in `currentConstraint` have values contained in the domain
of `currentConstraint`.
There is no practical reason that one could prefer predicate pushdown
disabled. In regards of testing for case with predicate pushdown enabled
and disabled, one can select columns which either support or do not
support predicate according to test needs.
`BenchmarkPlanner` runs both iterative optimizer enabled and disabled.
Without iterative optimizer, and with `forceSingleNode=true`, the
created plan is invalid and eventually fails.

This disables `forceSingleNode` so that we always produce a plan without
failure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.