Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added New York city's Taxi and Limousine Commission For Hire Vehicle trip support #11

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

otegami
Copy link
Member

@otegami otegami commented Nov 2, 2022

Related Issue

What I did

  • added NYC TLC For Hire Vehicle trip support
  • added NYC TLC High Volume FHV Trips support
  • added simple unit test

Details

ref: https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page

Notes for reviewers

Please give me some advice.

High Volume FHV Trips Data has over 300MB which means the number of data cases is 1,143,691.
It will take a lot of time to execute tests and it's so hard to create over 1 million instances in TLC::HighVolumeFHVTrip#each test.
So I decided to skip TLC::HighVolumeFHVTrip#each test because it has already asserted HighVolumeFHVTrip data in the #to_arrow test.

@otegami otegami force-pushed the nyc-for-hire-vehicle-trip branch from f233d1c to a12125b Compare November 2, 2022 12:39
@otegami otegami force-pushed the nyc-for-hire-vehicle-trip branch from a12125b to e2eb492 Compare November 16, 2022 23:05
@otegami otegami marked this pull request as ready for review November 17, 2022 00:13
@otegami otegami changed the title [WIP] Added New York city's Taxi and Limousine Commission For Hire Vehicle trip support Added New York city's Taxi and Limousine Commission For Hire Vehicle trip support Nov 17, 2022
@otegami otegami force-pushed the nyc-for-hire-vehicle-trip branch from 8cac7a7 to a98f31d Compare November 22, 2022 00:19
@otegami otegami force-pushed the nyc-for-hire-vehicle-trip branch from a98f31d to a33050f Compare November 22, 2022 03:29
@otegami otegami force-pushed the nyc-for-hire-vehicle-trip branch from a33050f to a0c1e94 Compare November 22, 2022 03:39
@otegami otegami marked this pull request as draft January 17, 2023 11:19
@kou
Copy link
Member

kou commented Mar 12, 2023

So I decided to skip TLC::HighVolumeFHVTrip#each test because it has already asserted HighVolumeFHVTrip data in the #to_arrow test.

How about just checking the first record by each.first instead of skipping entirely.

If our #each uses return to_enum(__method__) {to_arrow.n_rows}, we can also check each.size.

@otegami
Copy link
Member Author

otegami commented Mar 28, 2023

Thank you for the comments.
I tried to check the only first record in the test. But the bottleneck is calling the raw_records method so it didn't improve.
I will fix this problem after implementing the Arrow::RecordBatch#each_raw_record.

# /lib/high-volume-fhv-trip.rb

def each
  return to_enum(__method__) unless block_given?

  to_arrow.raw_records.each do |raw_record| # <- Here
    record = Record.new(*raw_record)
    yield(record)
  end
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants