-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import CSV files using existing APIs #260
base: main
Are you sure you want to change the base?
Changes from 1 commit
161788c
7503e1e
c8f2c90
64be783
68a89a4
a5b39b3
58a515f
63f4753
f254ecf
b944406
32e1144
71abcbc
307f9c6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -277,20 +277,52 @@ async def upload_assignments( | |
csv_rows = d["assignments"] | ||
document_id = d["document_id"] | ||
|
||
# process CSV via temp tables | ||
session.execute(text("DROP TABLE IF EXISTS temploader")) | ||
session.execute(text("""CREATE TEMP TABLE temploader ( | ||
geo_id TEXT, | ||
zone INT | ||
)""")) | ||
cursor = session.connection().connection.cursor() | ||
with cursor.copy("COPY temploader (geo_id, zone) FROM STDIN") as copy: | ||
for record in csv_rows: | ||
copy.write_row([record[0], int(record[1])]) | ||
if record[1] == "": | ||
copy.write_row([record[0], None]) | ||
else: | ||
copy.write_row([record[0], int(record[1])]) | ||
|
||
# insert into actual assignments table | ||
session.execute(text(""" | ||
INSERT INTO document.assignments (geo_id, zone, document_id) | ||
SELECT geo_id, zone, :document_id | ||
FROM temploader | ||
"""), {"document_id": document_id}) | ||
|
||
# find items to unshatter | ||
results = session.execute(text(""" | ||
SELECT DISTINCT(SUBSTR(geo_id, 1, 11)) AS vtd, zone | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 that makes sense. I also realized that I was doing this in reverse (un-shattering blocks, when on a fresh map I need to be shattering the VTDs which need block-level resolution) |
||
FROM document.assignments | ||
WHERE document_id = :document_id | ||
AND geo_id NOT LIKE 'vtd:%' | ||
GROUP BY vtd, zone | ||
HAVING COUNT(DISTINCT zone) = 1 | ||
"""), {"document_id": document_id}) | ||
zoneVTDs = {} | ||
for row in results: | ||
vtd, zone = row | ||
if zone not in zoneVTDs: | ||
zoneVTDs[zone] = [] | ||
zoneVTDs[zone].append('vtd:' + vtd) | ||
|
||
session.commit() | ||
|
||
# unshatter block imports | ||
for zone, vtds in zoneVTDs.items(): | ||
session.execute(text("SELECT unshatter_parent( :document_id, :vtds, :zone)"), { | ||
"document_id": document_id, | ||
"vtds": vtds, | ||
"zone": zone | ||
}) | ||
session.commit() | ||
|
||
return {"assignments_upserted": 1} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. H: I'd think we want a more informative return value. It's a bit misleading to say only a single assignment was inserted (also we're not upserting). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 |
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
H: Loading the rest of the assignments this way assumes that all the geo_ids are valid. I think at some stage we need to join the blocks against the parent_child_edges in order to determine that they are valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PP: When bulk loading, specifying the partition in the table name
INSERT INTO document.assignments_{document_id} ...
will have a slight performance boost over inserting w/o the partition specified.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
H: About geoid validation, should we drop invalid/old geoids? What if someone uploads 2010 geoids?