Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dj.Top restriction #1084

Closed
wants to merge 67 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
ece77d6
initial implementation attempt
A-Baji May 10, 2023
e32ed8f
subquery
A-Baji May 11, 2023
fb4dfc0
left and right hand restrictions
A-Baji May 11, 2023
b601439
remove distinct, limit=None
A-Baji May 11, 2023
1340958
optional limit
A-Baji May 11, 2023
b67f1fe
recursive top subqueries
A-Baji May 12, 2023
90dedd9
imeplement with `make_subquery`
A-Baji May 15, 2023
2df9a46
apply top from self instead of params
A-Baji May 16, 2023
07003b1
remove sorting params
A-Baji May 16, 2023
3e518df
simplify make_subquery
A-Baji May 16, 2023
32491a6
optional order_by
A-Baji May 16, 2023
01be0ab
handle top in QE.restrict
A-Baji May 16, 2023
e86ca94
dataclass decorator
A-Baji May 16, 2023
f1b9511
Merge branch 'master' of https://github.com/datajoint/datajoint-pytho…
A-Baji May 16, 2023
2a6ec2b
oops
A-Baji May 17, 2023
cc5720f
new top defaults and order by "KEY"
A-Baji May 17, 2023
95124eb
docstring
A-Baji May 17, 2023
0fca2cc
Changelog
A-Baji May 17, 2023
a816b6c
docstring
A-Baji May 18, 2023
4f3ef26
use Top instead of dict
A-Baji May 18, 2023
6547af8
simpler
A-Baji May 18, 2023
722e061
optimize subqeury usage
A-Baji May 19, 2023
f21173c
unnecessary copy
A-Baji May 22, 2023
cade78c
remove list
A-Baji May 22, 2023
297077c
simplify
A-Baji May 22, 2023
74f9762
handle dj.U.aggr with no PK
A-Baji May 22, 2023
a5c4c24
type check in `post_init`
A-Baji May 22, 2023
bab1732
move None conversion to post_init
A-Baji May 22, 2023
16da960
error msg
A-Baji May 22, 2023
dee963e
move error to post_init
A-Baji May 23, 2023
c4d0726
handle sorting in fetch.py
A-Baji May 23, 2023
5520d0a
limit to some large number
A-Baji May 23, 2023
a520c40
remove re import
A-Baji May 23, 2023
b6fedc3
more tests
A-Baji May 23, 2023
e737b8e
remove unused logger
A-Baji May 23, 2023
f830467
remove offset warning and warning test
A-Baji May 24, 2023
5fc96a3
better error test
A-Baji May 24, 2023
0393f95
unused import
A-Baji May 24, 2023
4257459
datajointerror -> typeerror
A-Baji May 24, 2023
ef61f42
simplify order_by typecheck
A-Baji May 24, 2023
554f577
offset err msg
A-Baji May 24, 2023
161c7d0
remove unused logger
A-Baji May 25, 2023
a48688a
move order_by list conversion to post_init
A-Baji May 25, 2023
63ed6b8
handle edge case
A-Baji May 25, 2023
778c6f9
redundant comment
A-Baji May 25, 2023
7ea05cf
also handle "KEY desc"
A-Baji May 25, 2023
0002cb3
fstrings
A-Baji May 30, 2023
6268ebf
regex matching for empty pk case
A-Baji May 30, 2023
37351db
use flatten_atribute_list
A-Baji May 30, 2023
c7020f7
simplify flatten calls
A-Baji May 30, 2023
59279ba
formatting
A-Baji May 31, 2023
163f8aa
Merge branch 'master' of https://github.com/datajoint/datajoint-pytho…
A-Baji May 31, 2023
69c2e97
escape keywords
A-Baji May 31, 2023
091e444
always escape
A-Baji May 31, 2023
97d6e55
fix
A-Baji Jun 1, 2023
c9a8001
keywork pk test cases
A-Baji Jun 6, 2023
f25e1ba
Merge branch 'master' of https://github.com/datajoint/datajoint-pytho…
A-Baji Jun 6, 2023
223ccdd
fix schema test
A-Baji Jun 6, 2023
b19910a
regex mismatch
A-Baji Jun 6, 2023
330e520
Merge branch 'master' of https://github.com/datajoint/datajoint-pytho…
A-Baji Jun 22, 2023
ca4736a
documentation
A-Baji Jun 22, 2023
b355bc6
bump nginx
A-Baji Jun 22, 2023
5bb8da4
fix
A-Baji Jun 22, 2023
f303549
more docs
A-Baji Jun 23, 2023
d6626a8
suggestiosn
A-Baji Jun 23, 2023
96245f0
italicise
A-Baji Jun 23, 2023
d9aabf2
typo
A-Baji Jun 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
## Release notes

### Upcoming
- Added - `dj.Top` restriction ([#1024](https://github.com/datajoint/datajoint-python/issues/1024)) PR [#1084](https://github.com/datajoint/datajoint-python/pull/1084)

### 0.14.1 -- Jun 02, 2023
- Fixed - Fix altering a part table that uses the "master" keyword - PR [#991](https://github.com/datajoint/datajoint-python/pull/991)
- Fixed - `.ipynb` output in tutorials is not visible in dark mode ([#1078](https://github.com/datajoint/datajoint-python/issues/1078)) PR [#1080](https://github.com/datajoint/datajoint-python/pull/1080)
Expand Down
2 changes: 1 addition & 1 deletion LNX-docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ services:
interval: 15s
fakeservices.datajoint.io:
<<: *net
image: datajoint/nginx:v0.2.5
image: datajoint/nginx:v0.2.6
environment:
- ADD_db_TYPE=DATABASE
- ADD_db_ENDPOINT=db:3306
Expand Down
3 changes: 2 additions & 1 deletion datajoint/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
"Part",
"Not",
"AndList",
"Top",
"U",
"Diagram",
"Di",
Expand All @@ -61,7 +62,7 @@
from .schemas import VirtualModule, list_schemas
from .table import Table, FreeTable
from .user_tables import Manual, Lookup, Imported, Computed, Part
from .expression import Not, AndList, U
from .expression import Not, AndList, U, Top
from .diagram import Diagram
from .admin import set_password, kill
from .blob import MatCell, MatStruct
Expand Down
31 changes: 31 additions & 0 deletions datajoint/condition.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
import pandas
import json
from .errors import DataJointError
from typing import Union, List
from dataclasses import dataclass

JSON_PATTERN = re.compile(
r"^(?P<attr>\w+)(\.(?P<path>[\w.*\[\]]+))?(:(?P<type>[\w(,\s)]+))?$"
Expand Down Expand Up @@ -61,6 +63,35 @@ def append(self, restriction):
super().append(restriction)


@dataclass
class Top:
A-Baji marked this conversation as resolved.
Show resolved Hide resolved
"""
A restriction to the top entities of a query.
In SQL, this corresponds to ORDER BY ... LIMIT ... OFFSET
"""

limit: Union[int, None] = 1
order_by: Union[str, List[str]] = "KEY"
A-Baji marked this conversation as resolved.
Show resolved Hide resolved
offset: int = 0

def __post_init__(self):
self.order_by = self.order_by or ["KEY"]
self.offset = self.offset or 0

if self.limit is not None and not isinstance(self.limit, int):
raise TypeError("Top limit must be an integer")
if not isinstance(self.order_by, (str, collections.abc.Sequence)) or not all(
isinstance(r, str) for r in self.order_by
):
raise TypeError("Top order_by attributes must all be strings")
if not isinstance(self.offset, int):
raise TypeError("The offset argument must be an integer")
if self.offset and self.limit is None:
self.limit = 999999999999 # arbitrary large number to allow query
if isinstance(self.order_by, str):
self.order_by = [self.order_by]


class Not:
"""invert restriction"""

Expand Down
8 changes: 5 additions & 3 deletions datajoint/declare.py
Original file line number Diff line number Diff line change
Expand Up @@ -443,9 +443,11 @@ def format_attribute(attr):
return f"`{attr}`"
return f"({attr})"

match = re.match(
r"(?P<unique>unique\s+)?index\s*\(\s*(?P<args>.*)\)", line, re.I
).groupdict()
match = re.match(r"(?P<unique>unique\s+)?index\s*\(\s*(?P<args>.*)\)", line, re.I)
if match is None:
raise DataJointError(f'Table definition syntax error in line "{line}"')
match = match.groupdict()

attr_list = re.findall(r"(?:[^,(]|\([^)]*\))+", match["args"])
index_sql.append(
"{unique}index ({attrs})".format(
Expand Down
113 changes: 81 additions & 32 deletions datajoint/expression.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from .preview import preview, repr_html
from .condition import (
AndList,
Top,
Not,
make_condition,
assert_join_compatibility,
Expand Down Expand Up @@ -52,6 +53,7 @@ class QueryExpression:
_connection = None
_heading = None
_support = None
_top = None

# If the query will be using distinct
_distinct = False
Expand Down Expand Up @@ -119,17 +121,33 @@ def where_clause(self):
else " WHERE (%s)" % ")AND(".join(str(s) for s in self.restriction)
)

def sorting_clauses(self):
if not self._top:
return ""
clause = ", ".join(
_wrap_attributes(
_flatten_attribute_list(self.primary_key, self._top.order_by)
)
)
if clause:
clause = f" ORDER BY {clause}"
if self._top.limit is not None:
clause += f" LIMIT {self._top.limit}{f' OFFSET {self._top.offset}' if self._top.offset else ''}"

return clause

def make_sql(self, fields=None):
"""
Make the SQL SELECT statement.

:param fields: used to explicitly set the select attributes
"""
return "SELECT {distinct}{fields} FROM {from_}{where}".format(
return "SELECT {distinct}{fields} FROM {from_}{where}{sorting}".format(
distinct="DISTINCT " if self._distinct else "",
fields=self.heading.as_sql(fields or self.heading.names),
from_=self.from_clause(),
where=self.where_clause(),
sorting=self.sorting_clauses(),
)

# --------- query operators -----------
Expand Down Expand Up @@ -187,6 +205,14 @@ def restrict(self, restriction):
string, or an AndList.
"""
attributes = set()
if isinstance(restriction, Top):
result = (
self.make_subquery()
if self._top and not self._top.__eq__(restriction)
else copy.copy(self)
) # make subquery to avoid overwriting existing Top
result._top = restriction
return result
new_condition = make_condition(self, restriction, attributes)
if new_condition is True:
return self # restriction has no effect, return the same object
Expand All @@ -200,8 +226,10 @@ def restrict(self, restriction):
pass # all ok
# If the new condition uses any new attributes, a subquery is required.
# However, Aggregation's HAVING statement works fine with aliased attributes.
need_subquery = isinstance(self, Union) or (
not isinstance(self, Aggregation) and self.heading.new_attributes
need_subquery = (
isinstance(self, Union)
or (not isinstance(self, Aggregation) and self.heading.new_attributes)
or self._top
)
if need_subquery:
result = self.make_subquery()
Expand Down Expand Up @@ -537,19 +565,20 @@ def tail(self, limit=25, **fetch_kwargs):

def __len__(self):
""":return: number of elements in the result set e.g. ``len(q1)``."""
return self.connection.query(
result = self.make_subquery() if self._top else copy.copy(self)
return result.connection.query(
"SELECT {select_} FROM {from_}{where}".format(
select_=(
"count(*)"
if any(self._left)
if any(result._left)
else "count(DISTINCT {fields})".format(
fields=self.heading.as_sql(
self.primary_key, include_aliases=False
fields=result.heading.as_sql(
result.primary_key, include_aliases=False
)
)
),
from_=self.from_clause(),
where=self.where_clause(),
from_=result.from_clause(),
where=result.where_clause(),
)
).fetchone()[0]

Expand Down Expand Up @@ -617,18 +646,12 @@ def __next__(self):
# -- move on to next entry.
return next(self)

def cursor(self, offset=0, limit=None, order_by=None, as_dict=False):
def cursor(self, as_dict=False):
"""
See expression.fetch() for input description.
:return: query cursor
"""
A-Baji marked this conversation as resolved.
Show resolved Hide resolved
if offset and limit is None:
raise DataJointError("limit is required when offset is set")
sql = self.make_sql()
if order_by is not None:
sql += " ORDER BY " + ", ".join(order_by)
if limit is not None:
sql += " LIMIT %d" % limit + (" OFFSET %d" % offset if offset else "")
logger.debug(sql)
return self.connection.query(sql, as_dict=as_dict)

Expand Down Expand Up @@ -699,21 +722,24 @@ def make_sql(self, fields=None):
fields = self.heading.as_sql(fields or self.heading.names)
assert self._grouping_attributes or not self.restriction
distinct = set(self.heading.names) == set(self.primary_key)
return "SELECT {distinct}{fields} FROM {from_}{where}{group_by}".format(
distinct="DISTINCT " if distinct else "",
fields=fields,
from_=self.from_clause(),
where=self.where_clause(),
group_by=""
if not self.primary_key
else (
" GROUP BY `%s`" % "`,`".join(self._grouping_attributes)
+ (
""
if not self.restriction
else " HAVING (%s)" % ")AND(".join(self.restriction)
)
),
return (
"SELECT {distinct}{fields} FROM {from_}{where}{group_by}{sorting}".format(
distinct="DISTINCT " if distinct else "",
fields=fields,
from_=self.from_clause(),
where=self.where_clause(),
group_by=""
if not self.primary_key
else (
" GROUP BY `%s`" % "`,`".join(self._grouping_attributes)
+ (
""
if not self.restriction
else f" HAVING ({')AND('.join(self.restriction)})"
)
),
sorting=self.sorting_clauses(),
)
)

def __len__(self):
Expand Down Expand Up @@ -772,14 +798,15 @@ def make_sql(self):
):
# no secondary attributes: use UNION DISTINCT
fields = arg1.primary_key
return "SELECT * FROM (({sql1}) UNION ({sql2})) as `_u{alias}`".format(
return "SELECT * FROM (({sql1}) UNION ({sql2})) as `_u{alias}`{sorting}".format(
sql1=arg1.make_sql()
if isinstance(arg1, Union)
else arg1.make_sql(fields),
sql2=arg2.make_sql()
if isinstance(arg2, Union)
else arg2.make_sql(fields),
alias=next(self.__count),
sorting=self.sorting_clauses(),
)
# with secondary attributes, use union of left join with antijoin
fields = self.heading.names
Expand Down Expand Up @@ -931,3 +958,25 @@ def aggr(self, group, **named_attributes):
)

aggregate = aggr # alias for aggr


def _flatten_attribute_list(primary_key, attrs):
"""
:param primary_key: list of attributes in primary key
:param attrs: list of attribute names, which may include "KEY", "KEY DESC" or "KEY ASC"
:return: generator of attributes where "KEY" is replaced with its component attributes
"""
for a in attrs:
A-Baji marked this conversation as resolved.
Show resolved Hide resolved
if re.match(r"^\s*KEY(\s+[aA][Ss][Cc])?\s*$", a):
if primary_key:
yield from primary_key
elif re.match(r"^\s*KEY\s+[Dd][Ee][Ss][Cc]\s*$", a):
if primary_key:
yield from (q + " DESC" for q in primary_key)
else:
yield a


def _wrap_attributes(attr):
for entry in attr: # wrap attribute names in backquotes
yield re.sub(r"\b((?!asc|desc)\w+)\b", r"`\1`", entry, flags=re.IGNORECASE)
46 changes: 10 additions & 36 deletions datajoint/fetch.py
Original file line number Diff line number Diff line change
@@ -1,20 +1,18 @@
from functools import partial
from pathlib import Path
import logging
import pandas
import itertools
import re
import json
import numpy as np
import uuid
import numbers

from datajoint.condition import Top
from . import blob, hash
from .errors import DataJointError
from .settings import config
from .utils import safe_write

logger = logging.getLogger(__name__.split(".")[0])


class key:
"""
Expand Down Expand Up @@ -119,21 +117,6 @@ def _get(connection, attr, data, squeeze, download_path):
)


def _flatten_attribute_list(primary_key, attrs):
"""
:param primary_key: list of attributes in primary key
:param attrs: list of attribute names, which may include "KEY", "KEY DESC" or "KEY ASC"
:return: generator of attributes where "KEY" is replaces with its component attributes
"""
for a in attrs:
if re.match(r"^\s*KEY(\s+[aA][Ss][Cc])?\s*$", a):
yield from primary_key
elif re.match(r"^\s*KEY\s+[Dd][Ee][Ss][Cc]\s*$", a):
yield from (q + " DESC" for q in primary_key)
else:
yield a


class Fetch:
"""
A fetch object that handles retrieving elements from the table expression.
Expand Down Expand Up @@ -174,13 +157,13 @@ def __call__(
:param download_path: for fetches that download data, e.g. attachments
:return: the contents of the table in the form of a structured numpy.array or a dict list
"""
if order_by is not None:
# if 'order_by' passed in a string, make into list
if isinstance(order_by, str):
order_by = [order_by]
# expand "KEY" or "KEY DESC"
order_by = list(
_flatten_attribute_list(self._expression.primary_key, order_by)
if offset or order_by or limit:
self._expression = self._expression.restrict(
Top(
limit,
order_by,
offset,
)
)

attrs_as_dict = as_dict and attrs
Expand Down Expand Up @@ -212,13 +195,6 @@ def __call__(
'use "array" or "frame"'.format(format)
)

if limit is None and offset is not None:
logger.warning(
"Offset set, but no limit. Setting limit to a large number. "
"Consider setting a limit explicitly."
)
limit = 8000000000 # just a very large number to effect no limit

get = partial(
_get,
self._expression.connection,
Expand Down Expand Up @@ -255,9 +231,7 @@ def __call__(
]
ret = return_values[0] if len(attrs) == 1 else return_values
else: # fetch all attributes as a numpy.record_array or pandas.DataFrame
cur = self._expression.cursor(
as_dict=as_dict, limit=limit, offset=offset, order_by=order_by
)
cur = self._expression.cursor(as_dict=as_dict)
heading = self._expression.heading
if as_dict:
ret = [
Expand Down
Loading