Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import CSV files using existing APIs #260

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
4 changes: 2 additions & 2 deletions app/src/app/uploader/page.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ export default function Uploader() {
const [totalRows, setTotalRows] = useState<number>(0);
const [mapLinks, setMapLinks] = useState<MapLink[]>([]);

const ROWS_PER_BATCH = 200000;
const ROWS_PER_BATCH = 20000000000;

const gTable = useMapStore(state => state.mapDocument?.gerrydb_table);
const upsertUserMap = useMapStore(state => state.upsertUserMap);
Expand Down Expand Up @@ -50,7 +50,7 @@ export default function Uploader() {
const assignments: Assignment[] = [];
const rows = results.data as Array<Array<string>>;
rows.slice(rowCursor, rowCursor + ROWS_PER_BATCH).forEach(row => {
if (row.length == 2 && row[1] !== '' && !isNaN(Number(row[1]))) {
if (row.length == 2 && !isNaN(Number(row[1]))) {
uploadRows.push([row[0], row[1]]);
}
});
Expand Down
34 changes: 33 additions & 1 deletion backend/app/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -277,20 +277,52 @@ async def upload_assignments(
csv_rows = d["assignments"]
document_id = d["document_id"]

# process CSV via temp tables
session.execute(text("DROP TABLE IF EXISTS temploader"))
session.execute(text("""CREATE TEMP TABLE temploader (
geo_id TEXT,
zone INT
)"""))
cursor = session.connection().connection.cursor()
with cursor.copy("COPY temploader (geo_id, zone) FROM STDIN") as copy:
for record in csv_rows:
copy.write_row([record[0], int(record[1])])
if record[1] == "":
copy.write_row([record[0], None])
else:
copy.write_row([record[0], int(record[1])])

# insert into actual assignments table
session.execute(text("""
INSERT INTO document.assignments (geo_id, zone, document_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

H: Loading the rest of the assignments this way assumes that all the geo_ids are valid. I think at some stage we need to join the blocks against the parent_child_edges in order to determine that they are valid.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PP: When bulk loading, specifying the partition in the table name INSERT INTO document.assignments_{document_id} ... will have a slight performance boost over inserting w/o the partition specified.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

H: About geoid validation, should we drop invalid/old geoids? What if someone uploads 2010 geoids?

SELECT geo_id, zone, :document_id
FROM temploader
"""), {"document_id": document_id})

# find items to unshatter
results = session.execute(text("""
SELECT DISTINCT(SUBSTR(geo_id, 1, 11)) AS vtd, zone
Copy link
Collaborator

@nofurtherinformation nofurtherinformation Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

H: Unfortunately, VTD GEOIDs are not perfectly hierarchical with blocks. They diverge at the county level, so we can't rely on slicing the IDs to identify VTDs to heal/unshatter.

Diagram from Peter:

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 that makes sense. I also realized that I was doing this in reverse (un-shattering blocks, when on a fresh map I need to be shattering the VTDs which need block-level resolution)

FROM document.assignments
WHERE document_id = :document_id
AND geo_id NOT LIKE 'vtd:%'
GROUP BY vtd, zone
HAVING COUNT(DISTINCT zone) = 1
"""), {"document_id": document_id})
zoneVTDs = {}
for row in results:
vtd, zone = row
if zone not in zoneVTDs:
zoneVTDs[zone] = []
zoneVTDs[zone].append('vtd:' + vtd)

session.commit()

# unshatter block imports
for zone, vtds in zoneVTDs.items():
session.execute(text("SELECT unshatter_parent( :document_id, :vtds, :zone)"), {
"document_id": document_id,
"vtds": vtds,
"zone": zone
})
session.commit()

return {"assignments_upserted": 1}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

H: I'd think we want a more informative return value. It's a bit misleading to say only a single assignment was inserted (also we're not upserting).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Expand Down
Loading