Skip to content
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
159 changes: 159 additions & 0 deletions doc/sql.extensions/README.listagg
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
SQL Language Extension: LISTAGG

Function:
The current implementation has an aggregate function LIST which concatenates multiple row
fields into a blob. The SQL standard has a similar function called LISTAGG. The major
difference is that it also supports the ordered concatenation.

Authors:
Chudaykin Alex <chudaykinalex@gmail.com>

Format:
<listagg set function> ::=
LISTAGG <left paren> [ <set quantifier> ] <character value expression> <comma> <listagg separator> [ <listagg overflow clause> ] <right paren> [ <within group specification> ]

<listagg separator> ::=
<character string literal>

<listagg overflow clause> ::=
ON OVERFLOW <overflow behavior>

<overflow behavior> ::=
ERROR | TRUNCATE [ <listagg truncation filler> ] <listagg count indication>

<listagg truncation filler> ::=
<character string literal>

<listagg count indication> ::=
WITH COUNT | WITHOUT COUNT

<within group specification> ::=
WITHIN GROUP <left paren> ORDER BY <sort specification list> <right paren>

Syntax Rules:
The legacy LIST syntax is preserved for backward compatibility, LISTAGG is added to cover the
standard features.

There is a <listagg overflow clause> rule in the standard, which is intended to output an error
when the output value overflows. Since the LIST function always returns a BLOB, it was decided
that this rule would be meaningless. So the OVERFLOW clause is syntactically supported but
silently ignored if specified.

Examples:
CREATE TABLE TEST_T
(COL1 INT, COL2 VARCHAR(2), COL3 VARCHAR(2), COL4 VARCHAR(2), COL5 BOOLEAN, COL6 VARCHAR(2)
CHARACTER SET WIN1251);
COMMIT;
INSERT INTO TEST_T values(1, 'A', 'A', 'J', false, 'П');
INSERT INTO TEST_T values(2, 'B', 'B', 'I', false, 'Д');
INSERT INTO TEST_T values(3, 'C', 'A', 'L', true, 'Ж');
INSERT INTO TEST_T values(4, 'D', 'B', 'K', true, 'Й');
COMMIT;

SELECT LISTAGG (ALL COL4, ':') AS FROM TEST_T;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the syntax format section, <within group specification> is mandatory but does not exist in some examples.

Copy link
Member

@dyemanov dyemanov Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SQL specification declares <within group specification> as mandatory. However, IMHO it's quite restrictive and neither Oracle nor DB2 follows that rule, they have it optional. Given that LIST and LISTAGG share the same syntax in this PR, we've also made <within group specification> optional. So the easiest solution is to fix the README ;-)

Or we may go the standard way and separate the legacy LIST (leave it with the current grammar, without ordering) from LISTAGG (which is strictly standard-compliant) at the parser level. But IMHO it would be annoying for users to select either of them depending on whether you need ordering or not. So personally I'd keep everything "as is" and just fix the docs.

Other opinions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, LIST and LISTAGG should be complete synonyms.

I would also remove the mention of ON OVERFLOW from the documentation. It's standard, but we don't support it. It could be mentioned if it were simply ignored, but mentioning it leads to errors.

SELECT
LISTAGG(TRIM(RDB$RELATION_NAME), ';' ON OVERFLOW ERROR) WITHIN GROUP(ORDER BY RDB$RELATION_NAME) AS REL_NAMES
FROM RDB$RELATIONS;
Invalid token.
Dynamic SQL Error.
SQL error code = -104.
Token unknown - line 2, column 52.
ERROR.
----------------------------------
SQLCODE: -104
SQLSTATE: 42000
GDSCODE: 335544569
SELECT 
LISTAGG(TRIM(RDB$RELATION_NAME), ';' ON OVERFLOW TRUNCATE '...' WITHOUT COUNT) WITHIN GROUP(ORDER BY RDB$RELATION_NAME) AS REL_NAMES
FROM RDB$RELATIONS;
Invalid token.
Dynamic SQL Error.
SQL error code = -104.
Token unknown - line 2, column 52.
TRUNCATE.
----------------------------------
SQLCODE: -104
SQLSTATE: 42000
GDSCODE: 335544569

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange. I need to check it out

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove ON OVERFLOW from the docs, then I believe we should remove it from the parser too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ON OVERFLOW can be kept if it will not cause errors.

=======
J:I:L:K

SELECT LISTAGG (DISTINCT COL4, ':') FROM TEST_T;
========
I:J:K:L

SELECT LISTAGG (DISTINCT COL3, ':') FROM TEST_T;
====
A:B

SELECT LISTAGG (DISTINCT COL3, ':') WITHIN GROUP (ORDER BY COL3 ASCENDING) FROM TEST_T;
====
A:B

SELECT LISTAGG (DISTINCT COL3, ':') WITHIN GROUP (ORDER BY COL3 DESCENDING) FROM TEST_T;
====
B:A

SELECT LISTAGG (DISTINCT COL3, ':') WITHIN GROUP (ORDER BY COL3 DESCENDING, COL4, COL5) FROM TEST_T;
====
B:A

SELECT LISTAGG (DISTINCT COL3, ':') WITHIN GROUP (ORDER BY COL4, COL3 DESCENDING, COL5) FROM TEST_T;
====
A:B

SELECT LISTAGG (DISTINCT COL3, ':') WITHIN GROUP (ORDER BY COL2) FROM TEST_T;
====
A:B

SELECT LISTAGG (DISTINCT COL3, ':') WITHIN GROUP (ORDER BY COL2 DESCENDING) FROM TEST_T;
====
A:B

SELECT LISTAGG (COL2, ':') WITHIN GROUP (ORDER BY COL2 DESCENDING) FROM TEST_T;
=======
D:C:B:A

SELECT LISTAGG (COL4, ':') WITHIN GROUP (ORDER BY COL3 DESC) FROM TEST_T;
=======
I:K:J:L

SELECT LISTAGG (COL3, ':') WITHIN GROUP (ORDER BY COL5 ASCENDING) FROM TEST_T;
=======
A:B:A:B

SELECT LISTAGG (COL4, ':') WITHIN GROUP (ORDER BY COL3 ASC) FROM TEST_T;
=======
J:L:I:K

SELECT LISTAGG (ALL COL2) WITHIN GROUP (ORDER BY COL4) FROM TEST_T;
=======
B,A,D,C

SELECT LISTAGG (COL2, ':') WITHIN GROUP (ORDER BY COL3 DESC, COL4 ASC) FROM TEST_T;
=======
B:D:A:C

SELECT LISTAGG (COL2, ':') WITHIN GROUP (ORDER BY COL3 DESC, COL4 DESC) FROM TEST_T;
=======
D:B:C:A

SELECT LISTAGG (COL2, ':') WITHIN GROUP (ORDER BY COL3 ASC, COL4 DESC) FROM TEST_T;
=======
C:A:D:B

SELECT LISTAGG (ALL COL6, ':') FROM TEST_T;
=======
П:Д:Ж:Й

SELECT LISTAGG (ALL COL6, ':') WITHIN GROUP (ORDER BY COL2 DESC) FROM TEST_T;
=======
Й:Ж:Д:П

SELECT LISTAGG (ALL COL2, ':') WITHIN GROUP (ORDER BY COL6) FROM TEST_T;
=======
B:C:D:A

SELECT LISTAGG (COL4, ':' ON OVERFLOW TRUNCATE '...' WITHOUT COUNT) WITHIN GROUP (ORDER BY COL3 ASC) FROM TEST_T;
=======
J:L:I:K

SELECT LISTAGG (COL4, ':' ON OVERFLOW TRUNCATE '...' WITH COUNT) WITHIN GROUP (ORDER BY COL3 DESC) FROM TEST_T;
======
I:K:J:L

SELECT LISTAGG (DISTINCT COL3, ':' ON OVERFLOW ERROR) WITHIN GROUP (ORDER BY COL3) FROM TEST_T;
===
A:B

INSERT INTO TEST_T values(5, 'E', NULL, NULL, NULL, NULL);
INSERT INTO TEST_T values(6, 'F', 'C', 'N', true, 'К');

SELECT LISTAGG (ALL COL2, ':') WITHIN GROUP (ORDER BY COL3) FROM TEST_T;
===========
E:A:C:B:D:F

SELECT LISTAGG (ALL COL2, ':') WITHIN GROUP (ORDER BY COL3 NULLS LAST) FROM TEST_T;
===========
A:C:B:D:F:E

SELECT LISTAGG (ALL COL2, ':') WITHIN GROUP (ORDER BY COL6 NULLS FIRST) FROM TEST_T;
===========
E:B:C:D:F:A

4 changes: 4 additions & 0 deletions src/common/ParserTokens.h
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,7 @@ PARSER_TOKEN(TOK_ENCRYPT, "ENCRYPT", true)
PARSER_TOKEN(TOK_END, "END", false)
PARSER_TOKEN(TOK_ENGINE, "ENGINE", true)
PARSER_TOKEN(TOK_ENTRY_POINT, "ENTRY_POINT", true)
PARSER_TOKEN(TOK_ERROR, "ERROR", true)
PARSER_TOKEN(TOK_ESCAPE, "ESCAPE", false)
PARSER_TOKEN(TOK_EXCEPTION, "EXCEPTION", true)
PARSER_TOKEN(TOK_EXCESS, "EXCESS", true)
Expand Down Expand Up @@ -292,6 +293,7 @@ PARSER_TOKEN(TOK_LIKE, "LIKE", false)
PARSER_TOKEN(TOK_LIMBO, "LIMBO", true)
PARSER_TOKEN(TOK_LINGER, "LINGER", true)
PARSER_TOKEN(TOK_LIST, "LIST", true)
PARSER_TOKEN(TOK_LISTAGG, "LISTAGG", false)
PARSER_TOKEN(TOK_LN, "LN", true)
PARSER_TOKEN(TOK_LATERAL, "LATERAL", false)
PARSER_TOKEN(TOK_LOCAL, "LOCAL", false)
Expand Down Expand Up @@ -522,6 +524,7 @@ PARSER_TOKEN(TOK_TRIGGER, "TRIGGER", false)
PARSER_TOKEN(TOK_TRIM, "TRIM", false)
PARSER_TOKEN(TOK_TRUE, "TRUE", false)
PARSER_TOKEN(TOK_TRUNC, "TRUNC", true)
PARSER_TOKEN(TOK_TRUNCATE, "TRUNCATE", false)
PARSER_TOKEN(TOK_TRUSTED, "TRUSTED", true)
PARSER_TOKEN(TOK_TWO_PHASE, "TWO_PHASE", true)
PARSER_TOKEN(TOK_TYPE, "TYPE", true)
Expand Down Expand Up @@ -558,6 +561,7 @@ PARSER_TOKEN(TOK_WHERE, "WHERE", false)
PARSER_TOKEN(TOK_WHILE, "WHILE", false)
PARSER_TOKEN(TOK_WINDOW, "WINDOW", false)
PARSER_TOKEN(TOK_WITH, "WITH", false)
PARSER_TOKEN(TOK_WITHIN, "WITHIN", false)
PARSER_TOKEN(TOK_WITHOUT, "WITHOUT", false)
PARSER_TOKEN(TOK_WORK, "WORK", true)
PARSER_TOKEN(TOK_WRITE, "WRITE", true)
Expand Down
91 changes: 80 additions & 11 deletions src/dsql/AggNodes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ string AggNode::internalPrint(NodePrinter& printer) const
NODE_PRINT(printer, dialect1);
NODE_PRINT(printer, arg);
NODE_PRINT(printer, asb);
NODE_PRINT(printer, sort);
NODE_PRINT(printer, indexed);

return aggInfo.name;
Expand Down Expand Up @@ -352,6 +353,8 @@ AggNode* AggNode::pass2(thread_db* tdbb, CompilerScratch* csb)
dsc desc;
getDesc(tdbb, csb, &desc);
impureOffset = csb->allocImpure<impure_value_ex>();
if (sort)
doPass2(tdbb, csb, sort.getAddress());

return this;
}
Expand All @@ -361,7 +364,7 @@ void AggNode::aggInit(thread_db* tdbb, Request* request) const
impure_value_ex* impure = request->getImpure<impure_value_ex>(impureOffset);
impure->vlux_count = 0;

if (distinct)
if (distinct || sort)
{
// Initialize a sort to reject duplicate values.

Expand All @@ -373,8 +376,8 @@ void AggNode::aggInit(thread_db* tdbb, Request* request) const

asbImpure->iasb_sort = FB_NEW_POOL(request->req_sorts.getPool()) Sort(
tdbb->getDatabase(), &request->req_sorts, asb->length,
asb->keyItems.getCount(), 1, asb->keyItems.begin(),
RecordSource::rejectDuplicate, 0);
asb->keyItems.getCount(), (distinct ? 1 : asb->keyItems.getCount()),
asb->keyItems.begin(), (distinct ? RecordSource::rejectDuplicate : nullptr), 0);
}
}

Expand Down Expand Up @@ -427,6 +430,46 @@ bool AggNode::aggPass(thread_db* tdbb, Request* request) const
ULONG* const pDummy = reinterpret_cast<ULONG*>(data + asb->length - sizeof(ULONG));
*pDummy = asbImpure->iasb_dummy++;

return true;
}
else if (sort)
{
fb_assert(asb);
// "Put" the value to sort.
impure_agg_sort* asbImpure = request->getImpure<impure_agg_sort>(asb->impure);
UCHAR* data;
asbImpure->iasb_sort->put(tdbb, reinterpret_cast<ULONG**>(&data));

MOVE_CLEAR(data, asb->length);

auto descOrder = asb->descOrder.begin();
auto keyItem = asb->keyItems.begin();

for (auto& nodeOrder : sort->expressions)
{
dsc toDesc = *(descOrder++);
toDesc.dsc_address = data + (IPTR) toDesc.dsc_address;
if (const auto fromDsc = EVL_expr(tdbb, request, nodeOrder))
{
if (IS_INTL_DATA(fromDsc))
{
INTL_string_to_key(tdbb, INTL_TEXT_TO_INDEX(fromDsc->getTextType()),
fromDsc, &toDesc, INTL_KEY_UNIQUE);
}
else
MOV_move(tdbb, fromDsc, &toDesc);
}
else
*(data + keyItem->getSkdOffset()) = TRUE;

// The first key for NULLS FIRST/LAST, the second key for the sorter
keyItem += 2;
}

dsc toDesc = asb->desc;
toDesc.dsc_address = data + (IPTR) toDesc.dsc_address;
MOV_move(tdbb, desc, &toDesc);

return true;
}
}
Expand Down Expand Up @@ -455,7 +498,7 @@ dsc* AggNode::execute(thread_db* tdbb, Request* request) const
impure->vlu_blob = NULL;
}

if (distinct)
if (distinct || sort)
{
impure_agg_sort* asbImpure = request->getImpure<impure_agg_sort>(asb->impure);
dsc desc = asb->desc;
Expand All @@ -478,7 +521,10 @@ dsc* AggNode::execute(thread_db* tdbb, Request* request) const
break;
}

desc.dsc_address = data + (asb->intl ? asb->keyItems[1].getSkdOffset() : 0);
if (distinct)
desc.dsc_address = data + (asb->intl ? asb->keyItems[1].getSkdOffset() : 0);
else
desc.dsc_address = data + (IPTR) asb->desc.dsc_address;

aggPass(tdbb, request, &desc);
}
Expand Down Expand Up @@ -877,28 +923,49 @@ AggNode* AvgAggNode::dsqlCopy(DsqlCompilerScratch* dsqlScratch) /*const*/
static AggNode::Register<ListAggNode> listAggInfo("LIST", blr_agg_list, blr_agg_list_distinct);

ListAggNode::ListAggNode(MemoryPool& pool, bool aDistinct, ValueExprNode* aArg,
ValueExprNode* aDelimiter)
ValueExprNode* aDelimiter, ValueListNode* aOrderClause)
: AggNode(pool, listAggInfo, aDistinct, false, aArg),
delimiter(aDelimiter)
delimiter(aDelimiter),
dsqlOrderClause(aOrderClause)
{
}

DmlNode* ListAggNode::parse(thread_db* tdbb, MemoryPool& pool, CompilerScratch* csb, const UCHAR blrOp)
{
ListAggNode* node = FB_NEW_POOL(pool) ListAggNode(pool,
(blrOp == blr_agg_list_distinct));
ListAggNode* node = FB_NEW_POOL(pool) ListAggNode(pool, (blrOp == blr_agg_list_distinct));
node->arg = PAR_parse_value(tdbb, csb);
node->delimiter = PAR_parse_value(tdbb, csb);
node->sort = PAR_sort(tdbb, csb, blr_sort, true);
Copy link
Member

@dyemanov dyemanov Oct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail while parsing the BLR restored from the previous versions, because blr_sort will be missing.
One solution is to add blr_agg_list2 and use it only when the ordering is specified. Another could be something like this:

if (csb->csb_blr_reader.peekByte() == blr_sort)
	node->sort = PAR_sort(tdbb, csb, blr_sort, true);

In this case, I'd also change genBlr() below to call GEN_sort() only if the ordering is specified -- this way we keep generating a compatible BLR for the legacy LIST syntax, making it possible to downgrade the database if required.

return node;
}

bool ListAggNode::dsqlMatch(DsqlCompilerScratch* dsqlScratch, const ExprNode* other, bool ignoreMapCast) const
{
if (!AggNode::dsqlMatch(dsqlScratch, other, ignoreMapCast))
return false;

const ListAggNode* o = nodeAs<ListAggNode>(other);
fb_assert(o);

if (dsqlOrderClause || o->dsqlOrderClause)
return PASS1_node_match(dsqlScratch, dsqlOrderClause, o->dsqlOrderClause, ignoreMapCast);

return true;
}

void ListAggNode::make(DsqlCompilerScratch* dsqlScratch, dsc* desc)
{
DsqlDescMaker::fromNode(dsqlScratch, desc, arg);
desc->makeBlob(desc->getBlobSubType(), desc->getTextType());
desc->setNullable(true);
}

void ListAggNode::genBlr(DsqlCompilerScratch* dsqlScratch)
{
AggNode::genBlr(dsqlScratch);
GEN_sort(dsqlScratch, blr_sort, dsqlOrderClause);
}

bool ListAggNode::setParameterType(DsqlCompilerScratch* dsqlScratch,
std::function<void (dsc*)> makeDesc, bool forceVarChar)
{
Expand All @@ -920,6 +987,7 @@ ValueExprNode* ListAggNode::copy(thread_db* tdbb, NodeCopier& copier) const
node->nodScale = nodScale;
node->arg = copier.copy(tdbb, arg);
node->delimiter = copier.copy(tdbb, delimiter);
node->sort = sort->copy(tdbb, copier);
return node;
}

Expand Down Expand Up @@ -985,7 +1053,7 @@ dsc* ListAggNode::aggExecute(thread_db* tdbb, Request* request) const
{
impure_value_ex* impure = request->getImpure<impure_value_ex>(impureOffset);

if (distinct)
if (distinct || sort)
{
if (impure->vlu_blob)
{
Expand All @@ -1005,7 +1073,8 @@ AggNode* ListAggNode::dsqlCopy(DsqlCompilerScratch* dsqlScratch) /*const*/
thread_db* tdbb = JRD_get_thread_data();

AggNode* node = FB_NEW_POOL(dsqlScratch->getPool()) ListAggNode(dsqlScratch->getPool(), distinct,
doDsqlPass(dsqlScratch, arg), doDsqlPass(dsqlScratch, delimiter));
doDsqlPass(dsqlScratch, arg), doDsqlPass(dsqlScratch, delimiter),
doDsqlPass(dsqlScratch, dsqlOrderClause));

dsc argDesc;
node->arg->make(dsqlScratch, &argDesc);
Expand Down
Loading
Loading