problem dumping aggregate #175

reedstrm · 2017-11-08T15:11:35Z

I have a custom aggregate for fulltext search vectors, built on top of tsvector_concat defiend as so:

CREATE AGGREGATE tsvector_agg(tsvector) (
    SFUNC = tsvector_concat,
    STYPE = tsvector,
    INITCOND = ''
);

Attempting to dump it w/ dbtoyaml gives this traceback:

$ dbtoyaml pyrseas_test
Traceback (most recent call last):
  File "/home/reedstrm/src/rewrite/bin/dbtoyaml", line 11, in <module>
    load_entry_point('Pyrseas==0.8.dev0', 'console_scripts', 'dbtoyaml')()
  File "/home/reedstrm/src/rewrite/local/lib/python2.7/site-packages/pyrseas/dbtoyaml.py", line 49, in main
    dbmap = db.to_map()
  File "/home/reedstrm/src/rewrite/local/lib/python2.7/site-packages/pyrseas/database.py", line 483, in to_map
    dbmap.update(self.db.schemas.to_map(self.db, opts))
  File "/home/reedstrm/src/rewrite/local/lib/python2.7/site-packages/pyrseas/dbobject/schema.py", line 342, in to_map
    schemas.update(self[sch].to_map(db, self, opts))
  File "/home/reedstrm/src/rewrite/local/lib/python2.7/site-packages/pyrseas/dbobject/schema.py", line 121, in to_map
    schobjs.append((obj, obj.to_map(db, no_owner, no_privs)))
  File "/home/reedstrm/src/rewrite/local/lib/python2.7/site-packages/pyrseas/dbobject/function.py", line 371, in to_map
    dct = self._base_map(db, no_owner, no_privs)
  File "/home/reedstrm/src/rewrite/local/lib/python2.7/site-packages/pyrseas/dbobject/__init__.py", line 351, in _base_map
    deps -= self.get_implied_deps(db)
  File "/home/reedstrm/src/rewrite/local/lib/python2.7/site-packages/pyrseas/dbobject/function.py", line 404, in get_implied_deps
    deps.add(db.functions[sch, fnc, args])
KeyError: ('public', 'tsvector_concat', u'tsvector, tsvector')

This is tip-of-master code:

$ dbtoyaml --version
dbtoyaml 0.8.dev0
(master $)$ git describe --tags
v0.7.2-269-g42220a2

The text was updated successfully, but these errors were encountered:

reedstrm · 2017-11-08T15:17:31Z

seems related to #169

jmafc · 2017-11-08T17:30:49Z

I'm pretty sure it's unrelated to #169. The problem with your aggregate is that it depends on an SFUNC that we don't retrieve because, shall we say, we don't find it to be "of interest" (although we should). If you run the query at

Pyrseas/pyrseas/dbobject/function.py

Line 126 in 42220a2

SELECT nspname AS schema, proname AS name,

on psql you'll see it returns no rows (it may return some rows in your db but not the tsvector_concat). If you remove the "nspname != 'pg_catalog' AND " from the WHERE clause then it will show up, but so will 2500 other pg_catalog functions.

I thought that perhaps the work done by @dvarrazzo on dependency tracking would catch issues like this but apparently that's not the case. Whenever someone specializes (in the OO sense) on a pg_catalog object, we're liable to miss some links between objects. In your specific case perhaps we can modify the query above to fetch tsvector_concat (or whichever other pg_catalog function is used as an SFUNC or similar aggregate auxiliary) by modifying the pg_depend subquery in the query above.

dvarrazzo · 2017-11-08T18:32:38Z

The problem seems to me that pg_aggregate has fields defined as regproc instead of oid.

piro=# \d pg_aggregate
     Table "pg_catalog.pg_aggregate"
      Column      |   Type   | Modifiers 
------------------+----------+-----------
 aggfnoid         | regproc  | not null
 aggkind          | "char"   | not null
 aggnumdirectargs | smallint | not null
 aggtransfn       | regproc  | not null
 aggfinalfn       | regproc  | not null
...

well, not so much of a problem, but an inconsistency: for instance relations are always referred as oid where they could be defined as regclass:

piro=# \d pg_index
        Table "pg_catalog.pg_index"
     Column     |     Type     | Modifiers 
----------------+--------------+-----------
 indexrelid     | oid          | not null
 indrelid       | oid          | not null
...

So when you read pg_index you normally obtain numbers, when you read pg_aggregate you normally obtain strings. Unfortunately these strings are mangled according to the current search_path. See how much it sucks:

piro=# select aggfnoid, aggtransfn from pg_aggregate where aggfnoid::text ~ 'tsvector_agg';
   aggfnoid   |   aggtransfn    
--------------+-----------------
 tsvector_agg | tsvector_concat
(1 row)

piro=# set search_path to piro;
SET

piro=# select aggfnoid, aggtransfn from pg_aggregate where aggfnoid::text ~ 'tsvector_agg';
      aggfnoid       |   aggtransfn    
---------------------+-----------------
 public.tsvector_agg | tsvector_concat
(1 row)

I remember working in pyrseas that disambiguating objects in different schemas was already tricky, and disambiguating functions was even more because you can have the same func with same name and schema but different signature.

I think the solution for aggregates is to use numeric oids (casting the regproc fields to oid) instead of what returned natively querying pg_aggregate. Namespaces and signatures stop becoming issues:

piro=# select aggfnoid::oid, aggtransfn::oid from pg_aggregate where aggfnoid::text ~ 'tsvector_agg';
 aggfnoid | aggtransfn 
----------+------------
  1145986 |       3625
(1 row)

piro=# select nspname, proname from pg_proc p join pg_namespace n on p.pronamespace = n.oid where  p.oid in (1145986,3625);
  nspname   |     proname     
------------+-----------------
 pg_catalog | tsvector_concat
 public     | tsvector_agg

Can't remember if there is support in pyrseas to refer to functions or other objects by oid?

Parsing nulls also sucks big time...

Pyrseas/pyrseas/dbobject/function.py

Line 389 in 42220a2

self.finalfunc = finalfunc if finalfunc != '-' else None

That's not even null, it's 0::oid. Why, what they were even thinking? :\

dvarrazzo · 2017-11-08T18:49:15Z

Whenever someone specializes (in the OO sense) on a pg_catalog object, we're liable to miss some links between objects.

In this case this is not the problem, Aggregate.get_implied_deps() seems to get it right (although looking at the number of fields in pg_attribute we may be missing others?). The problem is the ambiguity between public and pg_catalog in the representation of the function as a regproc, with the default search_path. If we solved the function right we would notice it is a builtin and not consider it a dependency.

Pyrseas/pyrseas/dbobject/function.py

Lines 577 to 598 in 42220a2

    
           def get_implied_deps(self, db): 
        
               # List the previous dependencies 
        
               deps = super(Aggregate, self).get_implied_deps(db) 
        
               sch, fnc = split_schema_obj(self.sfunc) 
        
               if 'ORDER BY' in self.arguments: 
        
                   args = self.arguments.replace(' ORDER BY', ',') 
        
               else: 
        
                   args = self.stype + ', ' + self.arguments 
        
               deps.add(db.functions[sch, fnc, args]) 
        
               for fn in ('finalfunc', 'mfinalfunc'): 
        
                   if getattr(self, fn) is not None: 
        
                       sch, fnc = split_schema_obj(getattr(self, fn)) 
        
                       deps.add(db.functions[sch, fnc, self.mstype 
        
                                             if fn[0] == 'm' else self.stype]) 
        
               for fn in ('msfunc', 'minvfunc'): 
        
                   if getattr(self, fn) is not None: 
        
                       sch, fnc = split_schema_obj(getattr(self, fn)) 
        
                       args = self.mstype + ", " + self.arguments 
        
                       deps.add(db.functions[sch, fnc, args]) 
        
               return deps

(see also how much it sucks needing to generate a signature to solve the depending functions whereas using the oid it would not be a concern...)

reedstrm · 2017-11-08T21:00:41Z

Yup, I just noticed when poking at this in pdb that the schema assigned is wrong - public rather than pg_catalog , because self.sfunc is the string tsvector_concat, with no schema qual. Ah, I see, that's the default return string conversion for regproc. So the magic regproc type is lossy when using its string output function. It works properly if you join on it, though, or cast to oid

jmafc · 2017-11-08T22:10:05Z

Daniele, with all due respect but I think you're chasing the rabbit down the wrong hole. At the point where Ross gets the error, we're trying to find the function tsvector_concat in db.functions but if you run the CREATE AGGREGATE on an empty database and then run dbtoyaml, at that point db.functions is empty and why is it empty? Because the query that fetches functions doesn't fetch functions defined in the pg_catalog schema.

Re: the '-' in parsing regprocs (and other things) is because PG catalogs define most attributes as NOT NULL (very appropriate IMHO) so you can't "store NULLs" in them (in quotes because it's illogical to store nothing).

The problem with disambiguating schemas is due to a rather unfortunate initial design decision: doing a set search_path to public, pg_catalog right after we connect. At one point I considered changing that to include onlypg_catalog but it would be a major refactoring.

OIDs can't be used because we have to produce an external (YAML) dump and accept that back as input. See #119 for additional details. I hope we can eventually get rid of internal use of OIDs and rely solely on external object keys.

dvarrazzo · 2017-11-08T23:16:28Z

Daniele, with all due respect but I think you're chasing the rabbit down the wrong hole. At the point where Ross gets the error, we're trying to find the function tsvector_concat in db.functions but if you run the CREATE AGGREGATE on an empty database and then run dbtoyaml, at that point db.functions is empty and why is it empty? Because the query that fetches functions doesn't fetch functions defined in the pg_catalog schema.

Yes but the error is:

KeyError: ('public', 'tsvector_concat', u'tsvector, tsvector')

regardless of why is wrong, this looks wrong to me:

piro=# \sf tsvector_concat 
CREATE OR REPLACE FUNCTION pg_catalog.tsvector_concat(tsvector, tsvector)

Re: the '-' in parsing regprocs (and other things) is because PG catalogs define most attributes as NOT NULL (very appropriate IMHO) so you can't "store NULLs" in them (in quotes because it's illogical to store nothing).

I would find more appropriate to define a lack of existence with a NULL instead of with InvalidOid. However this is an opinion and not material for this bug.

The problem with disambiguating schemas is due to a rather unfortunate initial design decision: doing a set search_path to public, pg_catalog right after we connect. At one point I considered changing that to include onlypg_catalog but it would be a major refactoring.

If you go down this road I suggest instead to drop everything from search_path and not special-case public: it would make your life way easier.

OIDs can't be used because we have to produce an external (YAML) dump and accept that back as input. See #119 for additional details. I hope we can eventually get rid of internal use of OIDs and rely solely on external object keys.

I don't say to store the oids, I know it is not possible. This is only to create dependency edges in the graph path. It would be way easier if oids were used everywhere - still IMO and still not necessary to fix this bug.

jmafc · 2017-11-09T00:53:39Z

Yes but the error is:

KeyError: ('public', 'tsvector_concat', u'tsvector, tsvector')

regardless of why is wrong, this looks wrong to me:

piro=# \sf tsvector_concat
CREATE OR REPLACE FUNCTION pg_catalog.tsvector_concat(tsvector, tsvector)

I'm not sure what looks wrong to you: something about the way psql displays the function? the fact that we're searching for public.tsvector_concat instead of pg_catalog.tsvector_concat? If it's the latter, it's because a few lines above we called split_schema_obj(self.sfunc). Even if we had supplied self.schema as a second argument to that call, it would not have made a difference. If what looks wrong to you is that the query result is tsvector_concat instead of pg_catalog.tsvector_concat that's an issue for PGDG. We'll just have to assume that, for the time being, unqualified functions (or types like tsvector) are in pg_catalog (or public until we change the search_path).

If you go down this road I suggest instead to drop everything from search_path and not special-case public: it would make your life way easier.

Special-casing public is definitely desirable, but perhaps keeping pg_catalog would make sense since then something like tsvector_concat can be assumed to be pg_catalog.tsvector_concat.

jmafc · 2017-11-17T02:26:12Z

pg_depend has no row for tsvector_concat after the aggregate is created (it only has PIN dependency). So fetching from pg_depend (or modifying existing pg_depend queries) will not help to solve this issue.

The tests in tests/dbobject all pass, however the following have been marked xfail: test_type.py: BaseTypeToSql.test_drop_type test_operclass.py: ToSql.test_create_operclass_default test_trigger.py: ToSql.test_add_tsvector_trigger, test_change_tsvector_trigger The first two fail with an exception in Database.dep_sorted indicating a loop in the dependency graph. The other two fail because we don't retrieve info for pg_catalog objects such as 'tsvector_update_trigger' (see similar discussion in #175). The following functional tests currently fail: test_autodoc.py, test_autodoc_dir.py and test_pagila.py.

…176. This correction allows the removal of the mark.xfail from the tests in test_operclass and test_type. However, three other tests in the former had to be marked xfail because Pyrseas has virtually no info on PG system objects (e.g., types such as 'integer', functions such as 'ts_vector_concat' [see issue #175]).

jmafc · 2018-09-14T00:53:23Z

@reedstrm Please take a look at https://pyrseas.wordpress.com/2018/09/12/the-future-of-pyrseas-revisited/ . Solving this issue is dependent on issue #185 and I have no estimate of when that will worked on.

jmafc mentioned this issue Nov 14, 2017

schema 'public' should not be given special treatment #176

Closed

jmafc mentioned this issue Nov 22, 2017

KeyError in dbtoyaml if postgis_topology extension is installed #91

Closed

jmafc added the dbtoyaml label Nov 23, 2017

jmafc added this to the Release 0.9 milestone Nov 23, 2017

jmafc mentioned this issue Mar 2, 2018

Build a cache of Postgres system objects #185

Open

jmafc mentioned this issue May 17, 2018

Conditionally inject dependencies of functions #189

Open

jmafc mentioned this issue Nov 17, 2021

dbtoyaml fails when dealing with extension-owned triggers/functions #236

Open

jmafc mentioned this issue Sep 23, 2022

Key error when converting aggregate to YAML #251

Open

jmafc mentioned this issue Mar 23, 2023

Error Using dbtoyaml With Supabase Schema #244

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

problem dumping aggregate #175

problem dumping aggregate #175

reedstrm commented Nov 8, 2017 •

edited

Loading

reedstrm commented Nov 8, 2017

jmafc commented Nov 8, 2017

dvarrazzo commented Nov 8, 2017 •

edited

Loading

dvarrazzo commented Nov 8, 2017

reedstrm commented Nov 8, 2017

jmafc commented Nov 8, 2017

dvarrazzo commented Nov 8, 2017

jmafc commented Nov 9, 2017

jmafc commented Nov 17, 2017

jmafc commented Sep 14, 2018

problem dumping aggregate #175

problem dumping aggregate #175

Comments

reedstrm commented Nov 8, 2017 • edited Loading

reedstrm commented Nov 8, 2017

jmafc commented Nov 8, 2017

dvarrazzo commented Nov 8, 2017 • edited Loading

dvarrazzo commented Nov 8, 2017

reedstrm commented Nov 8, 2017

jmafc commented Nov 8, 2017

dvarrazzo commented Nov 8, 2017

jmafc commented Nov 9, 2017

jmafc commented Nov 17, 2017

jmafc commented Sep 14, 2018

reedstrm commented Nov 8, 2017 •

edited

Loading

dvarrazzo commented Nov 8, 2017 •

edited

Loading