Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added New York city's taxi and limousine commission trip yellow taxi support #151

Conversation

otegami
Copy link
Member

@otegami otegami commented Aug 31, 2022

Related Issue

What I did

  • added red-parquet to runtime dependencies
  • added New York city's taxi and limousine commission trip yellow taxi support
  • added simple unit test

Details

What I Didn't

  • added following datasets related with NYC Taxi and Limousine Commission
    • Green Trips Data
    • FHV Trips Data
    • High Volume FHV Trips Data

What I checked

  • ran tlc-trip-yellow-taxi example
% ruby example/tlc-trip-yellow-taxi.rb 
[:creative_mobile_technologies, 2022-01-01 09:35:40 +0900, 2022-01-01 09:53:29 +0900, 2.0, 3.8, :standard_rate, false, 142, 236, :credit_card, 14.5, 3.0, 0.5, 3.65, 0.0, 0.3, 21.95, 2.5, 0.0]
[:creative_mobile_technologies, 2022-01-01 09:33:43 +0900, 2022-01-01 09:42:07 +0900, 1.0, 2.1, :standard_rate, false, 236, 42, :credit_card, 8.0, 0.5, 0.5, 4.0, 0.0, 0.3, 13.3, 0.0, 0.0]

Notes for reviewers

  • Would you give some advice about Class Name
    • TLCTripYellowTaxi or TLCTrip::YellowTaxi or Should I add NYC as a prefix
  • If you don't mind, I want to try to add the other following datasets too
    • Green Trips Data
    • FHV Trips Data
    • High Volume FHV Trips Data

@otegami otegami marked this pull request as ready for review August 31, 2022 09:08
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@metadata.url = "https://d37ci6vzurychx.cloudfront.net/trip-data"
@metadata.licenses = [
{
name: "Public",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: "Public",
spdx_id: "CC0-1.0",

Copy link
Member Author

@otegami otegami Sep 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: 5816dee Fixed metadata information
Although I was thinking I should use array style to express it like @metadata.licenses = ["CC0-1.0"], I couldn't find the license CC0-1.0 in NYC Open Data site so I used hash style.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rechecked this. The dataset site doesn't mention CC0-1.0 or public domain as you said. Sorry. We should not it.

I found https://opendata.cityofnewyork.us/overview/#termsofuse instead.

How about the following?

      @metadata.licenses = [
        {
          name: "NYC Open Data Terms of Use",
          url: "https://opendata.cityofnewyork.us/overview/#termsofuse",
        }
      ]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: ca35199 Thank you so much. It sounds perfect for me.🔆

@otegami
Copy link
Member Author

otegami commented Sep 4, 2022

Thank you for reviewing and giving me some advice . I dealt with all of them you pointed so far🙏


Would you help me to figure out why CI was failed on Ubuntu OS and what I should do in next step?
I'm thinking It happed because we failed to install libarrow-dev by using apt...

installing 'libarrow-dev' native package... failed
Failed to run '/usr/bin/sudo -p [sudo]\ password\ for\ %u\ to\ install
<libarrow-dev>:\ apt-get install -V -y libarrow-dev'.

ref: https://github.com/red-data-tools/red-datasets/runs/8173499574?check_suite_focus=true

@@ -0,0 +1,59 @@
module TLC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this module because we don't share our test and YellowTaxiTripTest will not be conflicted.

If we want to add TLC information to distinct from other test cases, class TLCYellowTaxiTripTest to reduce indent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this module because we don't share our test and YellowTaxiTripTest will not be conflicted.

Thank you. I understood it. I will rename it to TLCYellowTaxiTripTest🙏

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: c75357d

@kou
Copy link
Member

kou commented Sep 4, 2022

We need to prepare the Apache Arrow's APT repository for Ubuntu like https://github.com/red-data-tools/red-datasets-arrow/blob/master/.github/workflows/test.yml#L56-L64

Because Apache Arrow package isn't included in the official Ubuntu repository yet.

We don't have to share our test and name of this test case won't be conflicted.
Apache Arrow package isn't included in the official Ubuntu repository yet.
ref: red-data-tools#151 (comment)
@otegami
Copy link
Member Author

otegami commented Sep 5, 2022

Thank you for reviewing this PR. I dealt with all of them your comments so far🙏

@kou kou merged commit 520694e into red-data-tools:master Sep 5, 2022
@kou
Copy link
Member

kou commented Sep 5, 2022

Thanks!

TODO: We should consider red-parquet dependency before we release a new version.

@otegami otegami deleted the new-york-city-taxi-n-limousine-commission-trip branch September 5, 2022 21:15
otegami added a commit to otegami/red-datasets that referenced this pull request Sep 22, 2022
… yellow taxi trip records (red-data-tools#151)"

We decided not to use Apache Parquet data. Instead of it, we can use it by Red Datasets Parquet.
The reason is why we don't want to add red-parquet dependency to red-datasets for easy to install.
red-parquet is an extension library. So it's difficult to install rather than a pure Ruby library.
kou pushed a commit that referenced this pull request Sep 22, 2022
… yellow taxi trip records" (#154)

## What I did
- Reverted #151

### Reasons
We decided not to use Apache Parquet data in Red Datasets because we
don't want to add red-parquet dependency to red-datasets for easy to
install.
In details, red-parquet is an extension library. So it's difficult to
install rather than a pure Ruby library.
Instead of it, we created [Red Datasets
Parquet](https://github.com/red-data-tools/red-datasets-parquet) to use
Apache Parquet data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants