NCAAB Boxscore causes IndexError #591

ChrisSBouchard · 2021-02-19T15:28:36Z

Pulling an NCAAB Boxscore gives and IndexError in _parse_record

boxscore = Boxscore('2021-02-17-19-virginia-military-institute')

Should pull the boxscore from the 02/17 VMI game. Throws same exception for any other boxscore index as well.

Traceback (most recent call last):
ncaam_scraper.py", line 24, in
if name == "main": main()
ncaam_scraper.py", line 12, in main
print(Boxscore('2021-02-17-19-virginia-military-institute'))
sportsipy\ncaab\boxscore.py", line 225, in init
self._parse_game_data(uri)
sportsipy\ncaab\boxscore.py", line 683, in _parse_game_data
value = self._parse_record(short_field, boxscore, index)
sportsipy\ncaab\boxscore.py", line 390, in _parse_record
return records[index]
IndexError: list index out of range

OS: Windows 10 Pro
Sportsipy Version: 0.6.0

davefitz153 · 2021-02-19T15:32:57Z

Issue is with parsing team records. Appears the HTML has changed. A temporary fix (if you don't need team records) is commenting line 671 of boxscore.py and replacing with a dummy value. I'll leave the parsing issue to the experts since I couldn't figure it out.

#value = self._parse_record(short_field, boxscore, index)
value = '0-0'

ChrisSBouchard · 2021-02-19T15:49:29Z

Ahh the HTML change was my initial thought. Luckily I don't need team records; thank you for the fix! :)

ericmk52 · 2021-02-19T22:28:09Z

Issue is with parsing team records. Appears the HTML has changed. A temporary fix (if you don't need team records) is commenting line 671 of boxscore.py and replacing with a dummy value. I'll leave the parsing issue to the experts since I couldn't figure it out.

#value = self._parse_record(short_field, boxscore, index)
value = '0-0'

Hi, Im currently trying to obtain the dataframe_extended for each team but i am getting the "list index out of range" issue. I tried to comment out the line add add '0-0' but the problem still persists. Do you have any other suggestions?

ChrisSBouchard · 2021-02-20T20:37:34Z

When I went to make the fix, I noticed that the self._parse_record was past line 671. Make sure when you are commenting out value it is the one where it says:

value = self._parse_record(short_field, boxscore, index)

then add:

value = '0-0'

cdhayes · 2021-02-23T00:18:22Z

Assuming I understand how the whole PyQuery objects works, In my case the index is failing trying to parse the away_record field. The BOXSCORE_SCHEME for away_record is: 'div#boxes div[class="section_heading"] h2' And in the HTML of the records that work the HTML looks like:

<div class="section_heading" id="box-score-basic-cal-state-northridge_sh">
--
  | <span class="section_anchor" id="box-score-basic-cal-state-northridge_link" data-label="Cal State Northridge (1-1)"></span><h2>Cal State Northridge (1-1)</h2>

Which seems to match up to the BOXSCORE_SCHEME for the away_record.

For the boxscores that give me an Index Exception the HTML looks like:

<div class="section_heading assoc_box-score-basic-texas-am-corpus-christi" id="box-score-basic-texas-am-corpus-christi_sh">
--
  | <span class="section_anchor" id="box-score-basic-texas-am-corpus-christi_link" data-label="Texas A&M-Corpus Christi (3-15)"></span><h2>Texas A&M-Corpus Christi (3-15)</h2>

It appears that some additional text (assoc_box-score-basic-texas-am-corpus-christi) gets added into the "section_heading" name - and I can't find a pattern to why or when it happens. -- But maybe somebody smarter than I will know how to tweak the PyQuery to make this work correctly given the two scenarios.

*** Update
I updated both the away_record and home_record BOXSCORE_SCEME to use a wildcard as part of the selector so it now looks like: div#boxes div[class*="section_heading"] h2 Running some tests now but this does appear to have resolved the IndexException and returns the data expected.

criedel40 · 2021-03-09T03:04:49Z

Was the updated fixed presented above over pushed out? I am still getting the error. I could apply the fix mentioned above, however, I would rather use the modified/fixed code.

michigandrew · 2021-03-13T18:35:03Z

I posted this on another issue for the same error. It might be of use to you all.

I just worked my way through this issue -- I believe the format for boxscore pages has changed on sports reference.

@roclark I have it running locally by updating the away_record & home_record parts of the BOXSCORE_SCHEME to div#boxes div[class*="assoc_box-score-basic-"] h2. I also updated _parse_record in boxscore.py to

        records = boxscore(BOXSCORE_SCHEME[field])
        records = [x.text for x in records if x.text != ''] 
        
        if len(records) > index:
            return records[index]
        else:
            return ''

Not positive this is the correct way to fix the issue, but it's working for me. Great project by the way! I'd been trying to parse manually prior to finding it.

Edit: apologies for the double @, roclark!

alexwisswolf · 2021-03-16T01:53:06Z

Attempting to fix with #598 based on the suggestion by @cdhayes since it didn't look like anyone had submitted a PR. Happy to change it based on feedback if there's a better option.

alexwisswolf linked a pull request Mar 16, 2021 that will close this issue

Fix failing NCAAB boxscore record parsing #598

Open

alexwisswolf mentioned this issue Mar 17, 2021

NCAAB Boxscore no longer working #597

Open

jkoestner mentioned this issue Mar 26, 2022

fix parsing ncaab boxscore #720

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NCAAB Boxscore causes IndexError #591

NCAAB Boxscore causes IndexError #591

ChrisSBouchard commented Feb 19, 2021

davefitz153 commented Feb 19, 2021 •

edited

Loading

ChrisSBouchard commented Feb 19, 2021

ericmk52 commented Feb 19, 2021

ChrisSBouchard commented Feb 20, 2021

cdhayes commented Feb 23, 2021 •

edited

Loading

criedel40 commented Mar 9, 2021

michigandrew commented Mar 13, 2021 •

edited

Loading

alexwisswolf commented Mar 16, 2021

NCAAB Boxscore causes IndexError #591

NCAAB Boxscore causes IndexError #591

Comments

ChrisSBouchard commented Feb 19, 2021

davefitz153 commented Feb 19, 2021 • edited Loading

ChrisSBouchard commented Feb 19, 2021

ericmk52 commented Feb 19, 2021

ChrisSBouchard commented Feb 20, 2021

cdhayes commented Feb 23, 2021 • edited Loading

criedel40 commented Mar 9, 2021

michigandrew commented Mar 13, 2021 • edited Loading

alexwisswolf commented Mar 16, 2021

davefitz153 commented Feb 19, 2021 •

edited

Loading

cdhayes commented Feb 23, 2021 •

edited

Loading

michigandrew commented Mar 13, 2021 •

edited

Loading