Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas ValueError on computing line frequencies #48

Open
fxjung opened this issue Feb 22, 2024 · 0 comments
Open

Pandas ValueError on computing line frequencies #48

fxjung opened this issue Feb 22, 2024 · 0 comments

Comments

@fxjung
Copy link

fxjung commented Feb 22, 2024

I've tried to follow the README example on computing line frequencies using a large GTFS feed (entire Germany), retrieved from here:

time_windows = [0, 6, 9, 15.5, 19, 22, 24]

feed = Feed(
    str(gtfs_path),
    time_windows=time_windows,
    start_date="2024-02-22",
    end_date="2024-02-23",
)
line_freq = feed.lines_freq
line_freq.head()

Unfortunately, this fails with the following error/trace:

INFO:root:Reading "stop_times.txt".
INFO:root:get trips in stop_times
INFO:root:accessing trips
INFO:root:Reading "routes.txt".
INFO:root:Reading "trips.txt".
INFO:root:Reading "calendar.txt".
INFO:root:Reading "calendar_dates.txt".
INFO:root:The busiest date/s of this feed or your selected date range is/are:  ['2024-02-23'] with 854144 trips.
INFO:root:In the case that more than one busiest date was found, the first one will be considered.
INFO:root:In this case is 2024-02-23.
INFO:root:Reading "stop_times.txt".
INFO:root:_trips is defined in stop_times
INFO:root:Reading "stops.txt".
INFO:root:computing patterns
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_795641/934594691.py in ?()
----> 1 line_freq = feed.lines_freq
      2 line_freq.head()

~/anaconda3/envs/gendev/lib/python3.10/site-packages/gtfs_functions/gtfs_functions.py in ?(self)
    224     @property
    225     def lines_freq(self):
    226         if self._lines_freq is None:
--> 227             self._lines_freq = self.get_lines_freq()
    228 
    229         return self._lines_freq

~/anaconda3/envs/gendev/lib/python3.10/site-packages/gtfs_functions/gtfs_functions.py in ?(self)
    786         Returns the bus frequency in minutes/bus broken down by
    787         time window.
    788         """
    789 
--> 790         stop_times = self.stop_times
    791         shapes = self.shapes
    792         cutoffs = self.time_windows
    793 

~/anaconda3/envs/gendev/lib/python3.10/site-packages/gtfs_functions/gtfs_functions.py in ?(self)
    203     @property
    204     def stop_times(self):
    205         if self._stop_times is None:
--> 206             self._stop_times = self.get_stop_times()
    207 
    208         return self._stop_times

~/anaconda3/envs/gendev/lib/python3.10/site-packages/gtfs_functions/gtfs_functions.py in ?(self)
    675             logging.info('_trips is defined in stop_times')
    676             trips = self._trips
    677         else:
    678             logging.info('get trips in stop_times')
--> 679             trips = self.trips
    680         stops = self.stops
    681 
    682         # Fix data types

~/anaconda3/envs/gendev/lib/python3.10/site-packages/gtfs_functions/gtfs_functions.py in ?(self)
    175         if self._trips is None:
    176             self._trips = self.get_trips()
    177 
    178         if self._patterns and self._trips_patterns is None:
--> 179             (trips_patterns, routes_patterns) = self.get_routes_patterns(
    180                     self._trips)
    181             self._trips_patterns = trips_patterns
    182             self._routes_patterns = routes_patterns

~/anaconda3/envs/gendev/lib/python3.10/site-packages/gtfs_functions/gtfs_functions.py in ?(self, trips)
    391         def version_hash(x):
    392             hash = hashlib.sha1(f"{x.route_id}{x.direction_id}{str(x.zipped_stops)}".encode("UTF-8")).hexdigest()
    393             return hash[:18]
    394 
--> 395         trips_with_stops['pattern_id'] = trips_with_stops.apply(
    396             version_hash, axis=1)
    397 
    398         # Count number of trips per pattern to identify the main one

~/anaconda3/envs/gendev/lib/python3.10/site-packages/pandas/core/frame.py in ?(self, key, value)
   4285             self._setitem_frame(key, value)
   4286         elif isinstance(key, (Series, np.ndarray, list, Index)):
   4287             self._setitem_array(key, value)
   4288         elif isinstance(value, DataFrame):
-> 4289             self._set_item_frame_value(key, value)
   4290         elif (
   4291             is_list_like(value)
   4292             and not self.columns.is_unique

~/anaconda3/envs/gendev/lib/python3.10/site-packages/pandas/core/frame.py in ?(self, key, value)
   4443 
   4444             return self.isetitem(locs, value)
   4445 
   4446         if len(value.columns) > 1:
-> 4447             raise ValueError(
   4448                 "Cannot set a DataFrame with multiple columns to the single "
   4449                 f"column {key}"
   4450             )

ValueError: Cannot set a DataFrame with multiple columns to the single column pattern_id

Am I doing anything wrong?

>>> import pandas as pd
>>> pd.__version__
'2.2.0'

Also, regarding the warning note in the README, I looked into the stop_times.txt:

>>> stop_times['arrival_time'].isna().any()
False

and, similarly:

>>> stop_times['departure_time'].isna().any()
False

Any help is appreciated as this library looks extremely promising.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant