You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Prior reviews of data that came out of the aggregator identified that if a listing appeared in multiple weeks' scrapes, it would appear multiple times in the data. This is to be expected. As there is no unique posting id captured we need to first clarify our current process is for removing these records.
One key place where this deduplication happens is in the cleaner script after adjusting titles runs this script:
We need to confirm that records with all matching fields besides created date (the date we capture, not the posting_date) are being accurately removed before any further cleaning or analysis is completed.
After this review process, we will move to implement appropriate steps to remove duplicate captures of the same postings.
The text was updated successfully, but these errors were encountered:
Prior reviews of data that came out of the aggregator identified that if a listing appeared in multiple weeks' scrapes, it would appear multiple times in the data. This is to be expected. As there is no unique posting id captured we need to first clarify our current process is for removing these records.
One key place where this deduplication happens is in the cleaner script after adjusting titles runs this script:
We need to confirm that records with all matching fields besides created date (the date we capture, not the posting_date) are being accurately removed before any further cleaning or analysis is completed.
After this review process, we will move to implement appropriate steps to remove duplicate captures of the same postings.
The text was updated successfully, but these errors were encountered: