Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCAAB Boxscore causes IndexError #591

Open
ChrisSBouchard opened this issue Feb 19, 2021 · 8 comments · May be fixed by #598
Open

NCAAB Boxscore causes IndexError #591

ChrisSBouchard opened this issue Feb 19, 2021 · 8 comments · May be fixed by #598

Comments

@ChrisSBouchard
Copy link

Pulling an NCAAB Boxscore gives and IndexError in _parse_record

boxscore = Boxscore('2021-02-17-19-virginia-military-institute')

Should pull the boxscore from the 02/17 VMI game. Throws same exception for any other boxscore index as well.

Traceback (most recent call last):
ncaam_scraper.py", line 24, in
if name == "main": main()
ncaam_scraper.py", line 12, in main
print(Boxscore('2021-02-17-19-virginia-military-institute'))
sportsipy\ncaab\boxscore.py", line 225, in init
self._parse_game_data(uri)
sportsipy\ncaab\boxscore.py", line 683, in _parse_game_data
value = self._parse_record(short_field, boxscore, index)
sportsipy\ncaab\boxscore.py", line 390, in _parse_record
return records[index]
IndexError: list index out of range

  • OS: Windows 10 Pro
  • Sportsipy Version: 0.6.0
@davefitz153
Copy link

davefitz153 commented Feb 19, 2021

Issue is with parsing team records. Appears the HTML has changed. A temporary fix (if you don't need team records) is commenting line 671 of boxscore.py and replacing with a dummy value. I'll leave the parsing issue to the experts since I couldn't figure it out.

#value = self._parse_record(short_field, boxscore, index)
value = '0-0'

@ChrisSBouchard
Copy link
Author

Ahh the HTML change was my initial thought. Luckily I don't need team records; thank you for the fix! :)

@ericmk52
Copy link

Issue is with parsing team records. Appears the HTML has changed. A temporary fix (if you don't need team records) is commenting line 671 of boxscore.py and replacing with a dummy value. I'll leave the parsing issue to the experts since I couldn't figure it out.

#value = self._parse_record(short_field, boxscore, index)
value = '0-0'

Hi, Im currently trying to obtain the dataframe_extended for each team but i am getting the "list index out of range" issue. I tried to comment out the line add add '0-0' but the problem still persists. Do you have any other suggestions?

@ChrisSBouchard
Copy link
Author

When I went to make the fix, I noticed that the self._parse_record was past line 671. Make sure when you are commenting out value it is the one where it says:

value = self._parse_record(short_field, boxscore, index)

then add:

value = '0-0'

@cdhayes
Copy link

cdhayes commented Feb 23, 2021

Assuming I understand how the whole PyQuery objects works, In my case the index is failing trying to parse the away_record field. The BOXSCORE_SCHEME for away_record is: 'div#boxes div[class="section_heading"] h2' And in the HTML of the records that work the HTML looks like:

<div class="section_heading" id="box-score-basic-cal-state-northridge_sh">
--
  | <span class="section_anchor" id="box-score-basic-cal-state-northridge_link" data-label="Cal State Northridge (1-1)"></span><h2>Cal State Northridge (1-1)</h2>

Which seems to match up to the BOXSCORE_SCHEME for the away_record.

For the boxscores that give me an Index Exception the HTML looks like:

<div class="section_heading assoc_box-score-basic-texas-am-corpus-christi" id="box-score-basic-texas-am-corpus-christi_sh">
--
  | <span class="section_anchor" id="box-score-basic-texas-am-corpus-christi_link" data-label="Texas A&M-Corpus Christi (3-15)"></span><h2>Texas A&M-Corpus Christi (3-15)</h2>

It appears that some additional text (assoc_box-score-basic-texas-am-corpus-christi) gets added into the "section_heading" name - and I can't find a pattern to why or when it happens. -- But maybe somebody smarter than I will know how to tweak the PyQuery to make this work correctly given the two scenarios.

*** Update
I updated both the away_record and home_record BOXSCORE_SCEME to use a wildcard as part of the selector so it now looks like: div#boxes div[class*="section_heading"] h2 Running some tests now but this does appear to have resolved the IndexException and returns the data expected.

@criedel40
Copy link

Was the updated fixed presented above over pushed out? I am still getting the error. I could apply the fix mentioned above, however, I would rather use the modified/fixed code.

@michigandrew
Copy link

michigandrew commented Mar 13, 2021

I posted this on another issue for the same error. It might be of use to you all.

I just worked my way through this issue -- I believe the format for boxscore pages has changed on sports reference.

@roclark I have it running locally by updating the away_record & home_record parts of the BOXSCORE_SCHEME to div#boxes div[class*="assoc_box-score-basic-"] h2. I also updated _parse_record in boxscore.py to

        records = boxscore(BOXSCORE_SCHEME[field])
        records = [x.text for x in records if x.text != ''] 
        
        if len(records) > index:
            return records[index]
        else:
            return ''

Not positive this is the correct way to fix the issue, but it's working for me. Great project by the way! I'd been trying to parse manually prior to finding it.

Edit: apologies for the double @, roclark!

@alexwisswolf alexwisswolf linked a pull request Mar 16, 2021 that will close this issue
@alexwisswolf
Copy link

Attempting to fix with #598 based on the suggestion by @cdhayes since it didn't look like anyone had submitted a PR. Happy to change it based on feedback if there's a better option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants