Keyerror for getattr(stage, classification) #14

hidde1977 · 2023-08-28T07:05:12Z

line data = getattr(stage, classification)() gives error in the pcs code it seems:

       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "C:\Users%username%\AppData\Local\Programs\Python\Python311\Lib\site-packages\procyclingstats\stage_scraper.py", line 298, in results
table = join_tables(table, table_parser.table, "rider_url")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users%username%\AppData\Local\Programs\Python\Python311\Lib\site-packages\procyclingstats\utils.py", line 162, in join_tables
table.append({**table2_dict[row[join_key]], **row})
~~~~~~~~~~~^^^^^^^^^^^^^^^
KeyError: 'rider/laurens-de-plus'

The text was updated successfully, but these errors were encountered:

themm1 · 2023-08-28T09:48:53Z

It's fixed, however I had to remove riders that didn't finished stage from TTT results, because it's not possible to get additional info about them (age, nationality...) from the GC and almost all of their stage specific fields would be unknown (time, points, bonus...). I hope that it won't be a problem, since it's still possible to parse startlist/the stage before if you want and see which riders are missing after the TTT. The new version will be soon available on PyPI.

hidde1977 · 2023-08-28T11:21:35Z

🤟🏻 Met vriendelijke groet, Hidde Reitsma advocaat AMS Advocaten N.V.<http://www.amsadvocaten.nl/> Poeldijkstraat 4 Postbus 69111 1060 CD Amsterdam T: +31 20 3080 315<tel:+31%2020%203080%20315> F: +31 20 3080 325<tel:+31%2020%203080%20325> Vestiging Naarden: IJsselmeerweg 100A T: +31 35 302 0015<tel:+31%2035%20302%200015> F: +31 35 302 0025<tel:+31%2035%20302%200025> Verzonden vanaf een mobiel apparaat. Typo’s voorbehouden. AMS Advocaten N.V. (“AMS”) is een naamloze vennootschap, ingeschreven in het handelsregister onder nummer 53039734. Op alle door AMS verleende diensten zijn uitsluitend haar algemene voorwaarden van toepassing. Deze voorwaarden zijn onder voormeld nummer gedeponeerd bij het handelsregister, kunnen worden gedownload van https://www.amsadvocaten.nl/algemene-voorwaarden/ en worden op verzoek kosteloos verstrekt. De voorwaarden bevatten een beperking van aansprakelijkheid tot het bedrag dat in het betreffende geval onder de beroepsaansprakelijkheidsverzekering van AMS wordt uitgekeerd. AMS verwerkt persoonsgegevens, in overeenstemming met haar privacy policy: https://www.amsadvocaten.nl/privacy-policy/. Sent from a mobile device. Typographical errors reserved. AMS Advocaten N.V. (“AMS”) is a limited liability company under Dutch law, registered at the Dutch Commercial Register under number 53039734. All services provided by AMS are subject (exclusively) to its general terms and conditions. These terms have been filed with the Commercial Register, can be downloaded at https://www.amsadvocaten.com/general-terms-conditions/ and will be provided free of charge upon request. These terms contain a limitation of liability to the sum as is paid out in the matter concerned under the professional liability insurance of AMS. AMS processes personal data in accordance with its privacy policy: https://www.amsadvocaten.com/privacy-policy/.

…

________________________________ Van: Martin Madzin ***@***.***> Verzonden: Monday, August 28, 2023 11:49:03 AM Aan: themm1/procyclingstats ***@***.***> CC: Hidde Reitsma ***@***.***>; Author ***@***.***> Onderwerp: Re: [themm1/procyclingstats] Keyerror for getattr(stage, classification) (Issue #14) It's fixed, however I had to remove riders that didn't finished stage from TTT results, because it's not possible to get additional info about them (age, nationality...) from the GC and almost all of their stage specific fields would be unknown (time, points, bonus...). I hope that it won't be a problem, since it's still possible to parse startlist/the stage before if you want and see which riders are missing after the TTT. The new version will be soon available on PyPI. — Reply to this email directly, view it on GitHub<#14 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A4RCLSAUAC2NB4CF6QWMS7LXXRSQ7ANCNFSM6AAAAAA4A6WX4Y>. You are receiving this because you authored the thread.Message ID: ***@***.***>

hidde1977 · 2023-08-28T12:36:34Z

Mmm, I still get the Laurens de Plus(ki) KeyError:

Traceback (most recent call last):
File "c:\Users%%\OneDrive\coding\py\wielerpoule\import_results_from_pcs_vuelta_23.py", line 156, in
data = getattr(stage, classification)()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users%%AppData\Local\Programs\Python\Python311\Lib\site-packages\procyclingstats\stage_scraper.py", line 298, in results
table = join_tables(table, table_parser.table, "rider_url")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users%%\AppData\Local\Programs\Python\Python311\Lib\site-packages\procyclingstats\utils.py", line 162, in join_tables
table.append({**table2_dict[row[join_key]], **row})
~~~~~~~~~~~^^^^^^^^^^^^^^^
KeyError: 'rider/laurens-de-plus'

hidde1977 · 2023-08-28T12:39:57Z

or should I wait for an update in pip?

themm1 · 2023-08-28T18:47:27Z

Well I don't know what is the exact URL where you get the error but on this URL: https://www.procyclingstats.com/race/vuelta-a-espana/2023/stage-1 everything works fine in latest PyPI version. (I published version 0.1.7 on PyPI today so make sure that you have upgraded)

hidde1977 · 2023-08-28T21:07:36Z

Thx.

I updated, now the error seems no more related to the pcs code. guess it has to do with S01 being a TTT; the code tries to get columns expected in "normal" stage results but now unavailable. The stage type (TTT, etc) is not in the dataframe of the stages, right? Any suggestion how to filter this out with an if/else?

themm1 · 2023-08-28T22:50:30Z

It should return the same table as normal stage, but now I see that rider_number field is missing in TTT results. I will try to fix this soon. All other normal results fields are present in TTT results table.

hidde1977 · 2023-08-29T06:36:07Z

Thx. I am unsure if it is the TTT thing - riders are not in these results anyway, although I try to create columns rider_name and rider_number in my data. Anyway, I get this trackback, also when I skip scraping stage 1 (the part of the code can copied below):

(NB: my code generally genrerates CSV's, to be merged to an xlsx in the end, with all relevant results, being startlist, all stage results, and all classfications after each stage)

Traceback (most recent call last):
File "c:\Users%%\OneDrive\coding\py\wielerpoule\import_results_from_pcs_vuelta_23.py", line 156, in
data = getattr(stage, classification)()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users%%\AppData\Local\Programs\Python\Python311\Lib\site-packages\procyclingstats\stage_scraper.py", line 315, in results
table_parser.parse(fields)
File "C:\Users%%\AppData\Local\Programs\Python\Python311\Lib\site-packages\procyclingstats\table_parser.py", line 112, in parse
self._make_times_absolute()
File "C:\Users%%\AppData\Local\Programs\Python\Python311\Lib\site-packages\procyclingstats\table_parser.py", line 398, in _make_times_absolute
row[time_field] = add_times(first_time, row['time'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users%%\AppData\Local\Programs\Python\Python311\Lib\site-packages\procyclingstats\utils.py", line 107, in add_times
tdelta2 = time_to_timedelta(format_time(time2))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users%%\AppData\Local\Programs\Python\Python311\Lib\site-packages\procyclingstats\utils.py", line 76, in time_to_timedelta
[hours, minutes, seconds] = [int(value) for value in time.split(":")]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users%%\AppData\Local\Programs\Python\Python311\Lib\site-packages\procyclingstats\utils.py", line 76, in
[hours, minutes, seconds] = [int(value) for value in time.split(":")]
^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '0-1'

code:

loop1 - loop to create CSV's for results and classifications of each stage

for stage_number in stages:
    race = f'race/{race_name}/{race_year}/overview' 
    stage_url = f'race/{race_name}/{race_year}/stage-{stage_number}'
    startlist_url = f'race/{race_name}/{race_year}/startlist'
    try:
        stage = Stage(stage_url)
    except ValueError:
        print(f"No data found for stage {stage_number}. Exiting the loop.")
        break
    try:
        startlist = RaceStartlist(startlist_url)
        try:
            startlist_data = startlist.startlist('rider_name', 'rider_url', 'team_name', 'team_url', 'nationality', 'rider_number')
            startlist_df = pd.DataFrame(startlist_data, columns=['Rider', 'URL', 'Team', 'Team URL', 'Nationality', 'BIB'])
            # print(f"Startlist data: {startlist_data}") # debug to print full list
            # input("Press enter to continue")
        except ValueError:
            print("ValueError encountered while retrieving startlist.")
            print("Loop 1 - no startlist available yet - creating empty stage results file.")
            startlist_data = []
            print(f"Startlist data: {startlist_data}")
        for classification in classifications:
            try:
                # Call the method associated with the classification
                data = getattr(stage, classification)()
                #print(data)
                #input("press enter")
                if data:
                    # Convert list of dictionaries to DataFrame
                    df = pd.DataFrame(data)
                    # Different classifications have different columns
                    if classification == 'results':
                        necessary_columns = ['rank', 'rider_number', 'rider_name', 'team_name', 'status']
                        new_columns = ['Rnk', 'BIB', 'Rider', 'Team', 'Status']
                    elif classification == 'teams':
                        necessary_columns = ['rank', 'team_name']  
                        new_columns = ['Rnk', 'Team']
                    else:  # 'gc', 'points', 'kom', 'youth'
                        necessary_columns = ['rank', 'rider_number', 'rider_name', 'team_name']
                        new_columns = ['Rnk', 'BIB', 'Rider', 'Team']
                    # Ensure the necessary columns exist before subsetting
                    if set(necessary_columns).issubset(df.columns):
                        print(
                            f"Subsetting DataFrame for {classification} to include only {necessary_columns}. Available columns are: {df.columns.tolist()}")
                        df = df[necessary_columns]
                        df.columns = new_columns
                    else:
                        print(
                            f"Expected columns {necessary_columns} not found in DataFrame for {classification}. Available columns are: {df.columns.tolist()}")
                    filename = f'{stage_number}.csv' if classification == 'results' else f'Stage_S{stage_number}_{classification}.csv'
                else:
                    # In case 'data' is empty
                    if classification != 'teams':
                        df = pd.DataFrame(columns=['Rnk', 'BIB', 'Rider', 'Team', 'Status'])
                    else:
                        df = pd.DataFrame(columns=['Rnk', 'Team'])
                    filename = f'Stage_S{stage_number}_{classification}.csv'
                # print(f'Final DataFrame for {classification}:\n{df}')
                print(f'Writing data to {filename}')
                df.to_csv(f'C:\\Users\\hidde.reitsma\\OneDrive\\Wielerpoultjes\\{race_year}\\{race_name}\\codefiles\\{filename}', index=False)
                now = datetime.datetime.now()
                print(f'{now} - file {filename} updated')
            except ExpectedParsingError:
                df = pd.DataFrame()
                print(f"No data for stage {stage_number} classification {classification}. Available columns are: {df.columns.tolist()}")
                if classification == 'results':
                    df = pd.DataFrame(columns=['Rnk', 'BIB', 'Rider', 'Team', 'Status'])
                    filename = f'{stage_number}.csv'
                elif classification == 'teams':
                    df = pd.DataFrame(columns=['Rnk', 'Team'])
                    filename = f'Stage_S{stage_number}_{classification}.csv'
                else:
                    df = pd.DataFrame(columns=['Rnk', 'BIB', 'Rider', 'Team'])
                    filename = f'Stage_S{stage_number}_{classification}.csv'
                df.to_csv(f'C:\\Users\\hidde.reitsma\\OneDrive\\Wielerpoultjes\\{race_year}\\{race_name}\\codefiles\\{filename}', index=False)
                print(f'File {filename} updated with empty dataframe due to parsing error')
    except AttributeError:
        print("AttributeError encountered while retrieving startlist.")
        traceback.print_exc()  # This will print details about the error
        print("Loop 1 - no startlist available yet - creating empty stage results file.")
        startlist_data = []
        print(f"Startlist data: {startlist_data}")

hidde1977 · 2023-08-29T11:32:29Z

One more thing: I added a debug print line, for data = getattr(stage, classification); print(f"Data for {classification}: {data}") gives:

Data for results: <bound method Stage.results of Stage(url='https://www.procyclingstats.com/race/vuelta-a-espana/2023/stage-1')>

so the wrong data for results seems to be scraped, right?

themm1 · 2023-08-29T11:56:07Z

The mentioned error is caused because of this line on the page from where you are parsing results from: https://www.procyclingstats.com/race/vuelta-a-espana/2023/stage-2 I can not do much about it, because the problem is in the PCS results page. I think that they will fix the times soon.

hidde1977 · 2023-08-29T12:08:49Z

Ah! Yest, that's it

hidde1977 · 2023-08-29T12:56:56Z

Mmmm...! https://twitter.com/HiddeReitsma/status/1696495817178177686

themm1 · 2023-08-29T15:46:16Z

Well I don't really understand what does the "-1:0-1:0" mean. But I think it's weird that they are listing the times from 9k in the results from finish even if that counts to the GC. It would made more sense to just use the real finish line times in the stage results. If the results won't change in a few days I will have to try to deal with that time notations however.

themm1 closed this as completed Aug 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keyerror for getattr(stage, classification) #14

Keyerror for getattr(stage, classification) #14

hidde1977 commented Aug 28, 2023

themm1 commented Aug 28, 2023

hidde1977 commented Aug 28, 2023 via email

hidde1977 commented Aug 28, 2023

hidde1977 commented Aug 28, 2023

themm1 commented Aug 28, 2023

hidde1977 commented Aug 28, 2023

themm1 commented Aug 28, 2023

hidde1977 commented Aug 29, 2023

hidde1977 commented Aug 29, 2023

themm1 commented Aug 29, 2023 •

edited

Loading

hidde1977 commented Aug 29, 2023

hidde1977 commented Aug 29, 2023

themm1 commented Aug 29, 2023

Keyerror for getattr(stage, classification) #14

Keyerror for getattr(stage, classification) #14

Comments

hidde1977 commented Aug 28, 2023

themm1 commented Aug 28, 2023

hidde1977 commented Aug 28, 2023 via email

hidde1977 commented Aug 28, 2023

hidde1977 commented Aug 28, 2023

themm1 commented Aug 28, 2023

hidde1977 commented Aug 28, 2023

themm1 commented Aug 28, 2023

hidde1977 commented Aug 29, 2023

loop1 - loop to create CSV's for results and classifications of each stage

hidde1977 commented Aug 29, 2023

themm1 commented Aug 29, 2023 • edited Loading

hidde1977 commented Aug 29, 2023

hidde1977 commented Aug 29, 2023

themm1 commented Aug 29, 2023

themm1 commented Aug 29, 2023 •

edited

Loading