Replies: 11 comments 53 replies
-
@odanoburu Yes, I think benchmarking Logica with TPC-H queries makes total sense. And for recursion: initial support is there, e.g. this script already runs: If you want to discuss details in depth I'd be happy to. |
Beta Was this translation helpful? Give feedback.
-
Okay, I've been working on the first TPC-H query (and learning Logica at the same time :P), and I had two difficulties:
|
Beta Was this translation helpful? Give feedback.
-
Another query (TPC-H Q2), another question(s):
|
Beta Was this translation helpful? Give feedback.
-
Hi @EvgSkv, I've been out a bit because of other stuff, but I've managed to build some benchmarking code that I could share if you'd like (I'm not posting it publicly before checking that I may share the TPC-H queries publicly!) Working on Q8 of the benchmark, I've had problems again with calling datetime functions. SQLite seems to not follow the standard very thoroughly here, so what would the best practice for handling vendor-specific built-in functions? I can't manage to use the extract built-in (which wouldn't work on sqlite, AFAIK) either (apparently that's still in development). I'm trying to translate strftime('%Y', o_orderdate) -- SQLite
-- extract(year from o_orderdate) -- original query There's a Thanks for the answers on the other thread, btw! I get it now. |
Beta Was this translation helpful? Give feedback.
-
@EvgSkv, on query 4 of the benchmark, there is an use of exists: select o_orderpriority,
count(*) as order_count
from orders
where o_orderdate >= date('1993-07-01') -- QDB
and o_orderdate < date('1993-07-01', '+3 month') -- QDB
and exists
(select *
from lineitem
where l_orderkey = o_orderkey
and l_commitdate < l_receiptdate )
group by o_orderpriority
order by o_orderpriority; What would the best way of writing this in Logica be? I can think of counting or using the negation of the subquery, but these will be probably generate less efficient queries. One can also use something akin to |
Beta Was this translation helpful? Give feedback.
-
@odanoburu , you said:
What is your concern about posting the queries publicly? Do you mean original queries, or your program? |
Beta Was this translation helpful? Give feedback.
-
Hi, @EvgSkv! While translating query 11 (see the original below), I had two problems. -- original TPC-H query
select ps_partkey,
sum(ps_supplycost * ps_availqty) as value
from partsupp,
supplier,
nation
where ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'GERMANY' -- QDB
group by ps_partkey
having sum(ps_supplycost * ps_availqty) >
(select sum(ps_supplycost * ps_availqty) * 0.0001 -- QDB
from partsupp,
supplier,
nation
where ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'GERMANY') -- QDB
order by value desc; Unifying two similar subqueriesThis query is composed of two very similar subqueries, which could be defined as a single Logica predicate-function with an optional argument (I think, I wasn't able to do it). Currently I have the two predicates below:
I tried making the part argument in the latter predicate an optional argument, so that I could scrub the former predicate. I tried changing the output not accepted by SQLiteThe query as I have it looks like
which outputs the following SQL WITH t_0_TotalPartsValueInCountry AS (SELECT
nation.n_name AS country,
SUM(((t_1_partsupp.ps_supplycost) * (t_1_partsupp.ps_availqty))) AS value
FROM
partsupp AS t_1_partsupp, supplier, nation
WHERE
(t_1_partsupp.ps_suppkey = supplier.s_suppkey) AND
(supplier.s_nationkey = nation.n_nationkey)
GROUP BY country),
t_2_TotalPartValueInCountry AS (SELECT
t_5_nation.n_name AS country,
t_3_partsupp.ps_partkey AS part,
SUM(((t_3_partsupp.ps_supplycost) * (t_3_partsupp.ps_availqty))) AS value
FROM
partsupp AS t_3_partsupp, supplier AS t_4_supplier, nation AS t_5_nation
WHERE
(t_3_partsupp.ps_suppkey = t_4_supplier.s_suppkey) AND
(t_4_supplier.s_nationkey = t_5_nation.n_nationkey)
GROUP BY country, part)
SELECT
partsupp.ps_partkey AS ps_partkey,
TotalPartValueInCountry.value AS value
FROM
partsupp, t_0_TotalPartsValueInCountry AS TotalPartsValueInCountry, t_2_TotalPartValueInCountry AS TotalPartValueInCountry
WHERE
(TotalPartValueInCountry.value > ((0.0001) * (TotalPartsValueInCountry.value))) AND
(TotalPartValueInCountry.country = TotalPartsValueInCountry.country) AND
(TotalPartValueInCountry.part = partsupp.ps_partkey) AND
(TotalPartsValueInCountry.country = 'GERMANY')
GROUP BY ps_partkey, value ORDER BY value desc; which SQLite does not accept, complaining Thank you for the help! |
Beta Was this translation helpful? Give feedback.
-
Hi @EvgSkv, is there a way to use the IN and BETWEEN operators from Logica? E.g., to write something akin to and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
and p_size between 1 and 5 without desugaring to and (p_container = 'SM CASE' or p_container == 'SM BOX' or p_container == 'SM PACK' or p_container =='SM PKG')
and p_size >= 1 and p_size <= 5 In the case of IN, I tried something like |
Beta Was this translation helpful? Give feedback.
-
Hello @odanoburu ! |
Beta Was this translation helpful? Give feedback.
-
Hi @EvgSkv , I'm having trouble translating Q15 (maybe I'm rusty after all this time). I'd say the problem is similar to the one I had with Q2, but your proposed solution for that problem (link) uses intra-record aggregation, which only allows me to get the variable we are aggregating from the 'subquery', but I need it's ID (too). In the case of Q2 I had an ID and I wanted a minimum related to it in some way, but here what I need is the maximum and the ID that pertains to it. I tried a custom aggregation function that would store the ID and the value, but that didn't work (and introduced JSON in the produced query, which I'd rather avoid). The closest example I found on the docs uses ArgMax and arrow syntax (where can I read more about it?), but it doesn't work for SQLite (is it a bigquery thing?)
I also tried finding the maximum with negation, and it worked, but it's slower.
query plans and timings:
The original query, of course, is:
|
Beta Was this translation helpful? Give feedback.
-
@odanoburu to avoid use of
Let me know if you discover any issues while trying this. |
Beta Was this translation helpful? Give feedback.
-
Hi @EvgSkv, how are you thinking of tackling a benchmark of Logica (#38)?
After some consideration, I thought that since recursion as a primitive is not supported, we should focus on traditional relational benchmarks like the ones from TPC. (After recursion is added, we could try to compare Logica + BigQuery versus, say, Neo4j + Cypher, or SPARQL + Allegrograph)
Specifically, as a first attempt, I thought of translating the queries in TPC-H http://tpc.org/tpch/default5.asp to Logica, and then comparing the execution time between the original queries and the ones compiled from Logica. Do you think this is a fruitful approach?
Beta Was this translation helpful? Give feedback.
All reactions