Skip to content

QST: FutureWarning: Defining usecols with out of bounds indices is deprecated and will raise a ParserError in a future version. #48127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks done
Vondoe79 opened this issue Aug 17, 2022 · 10 comments
Labels
IO CSV read_csv, to_csv Usage Question

Comments

@Vondoe79
Copy link

Vondoe79 commented Aug 17, 2022

Research

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

https://stackoverflow.com/questions/73263189/pandas-excel-data-read-with-incorrect-output-no-getting-all-the-tabular-data-fr?noredirect=1#comment129454561_73263189

Question about pandas

I have a similar issue in this conversation where I have a function - "read_files()" - which takes 2 arguments (a file_name & a list). I then traverse my project directory using a "For loop" to fetch the target files and feed them to the "read_files()" function. But I am constantly getting this error: FutureWarning: Defining usecols with out of bounds indices is deprecated and will raise a ParserError in a future version.

I have posted an excerpt of my code StackOverflow with the same data file at - https://stackoverflow.com/questions/73263189/pandas-excel-data-read-with-incorrect-output-no-getting-all-the-tabular-data-fr?noredirect=1#comment129454561_73263189. However, I have not gotten any workable solution thus far. I have even tried the suggestion by @ghost from March 2019 at - #25623 - but I still can't get my code to work.

I, therefore, welcome any further ideas/suggestions from this forum on what I may be missing.

@Vondoe79 Vondoe79 added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Aug 17, 2022
@phofl
Copy link
Member

phofl commented Aug 17, 2022

I don’t really understand your problem/question. Do you have a problem with the warning itself or are you confused why it shows up with your data?

if you define Usecols with higher indices than your file has columns, then this warning is raised

@Vondoe79
Copy link
Author

Vondoe79 commented Aug 17, 2022

@phofl my issue is actually both. First, I do not understand why the warning at all as my "usecols" as I have not defined Usecols with higher indices than my file has columns. And second, I am not sure why id shows up with my data. If I call the function, read_files - and hard code the file name/path individually, and give it the second parameter list " like this: k_df = read_files('../data/test_input/ex_22.xlsx', loc_list=loca_list), it works. But the issue happens when I am traversing the directory using os.listdir(fpath+fname, loc_list=loca_list) along with for a loop.

I have shared an excerpt of my code on the Stackoverflow link prior to being provided, but please let me know if you need me to share it here as well.

@phofl
Copy link
Member

phofl commented Aug 17, 2022

Hm, can you check if we inadvertently modify usecols? I don't think so, but does not hurt to be sure. Just print your usecols argument after every call.

Can you create a simple reproducer for us to debug?

@Vondoe79
Copy link
Author

@phofl certainly, I can create a simple reproducer code here for review.

@mroeschke mroeschke added Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 17, 2022
@Vondoe79
Copy link
Author

Vondoe79 commented Aug 17, 2022

@phofl kindly see below.

import pandas as pd
import os
import re


def read_files(file_name, loc_list=None):
    if loc_list is None:
        loc_list = []
    for nbs in loc_list:
        if nbs == 10:
            df_10 = pd.read_excel(file_name, sheet_name='Sheet1', skiprows=1, nrows=10, usecols=range(1, 11))
            df_10.columns = [k for k in range(1, len(df_10.columns) + 1)]
            df_10.index = df_10.index + 1

        if nbs == 12:
            df_12 = pd.read_excel(file_name, sheet_name='Sheet1', skiprows=1, nrows=12, usecols=range(1, 13))
            df_12.columns = [k for k in range(1, len(df_12.columns) + 1)]
            df_12.index = df_12.index + 1

    return df_10,  df_12


# excerpt of function keyword argument
loca_list = [10, 12]


# the function positional argument - read the Excel files from project dir using os.listdir() method
fdir = "../data/test_input/"
for fname in os.listdir(fdir):
	if fname.endswith('.xlsx') and re.findall('[0-9]+', fname) and 'ex' in fname:
		df_tuple = read_files(fdir+fname, loc_list=loca_list)  # this is where the issue has been happening from my tracking.

# print the shape of each df from df_tuple
for df in df_tuple:
	print(df.shape)

ex_10.xlsx
ex_12.xlsx

@phofl
Copy link
Member

phofl commented Aug 17, 2022

Your code is buggy, you are trying to read ex_10 with 12 usecols

@phofl phofl closed this as completed Aug 17, 2022
@phofl phofl added IO CSV read_csv, to_csv and removed Needs Info Clarification about behavior needed to assess issue labels Aug 17, 2022
@phofl phofl added this to the No action milestone Aug 17, 2022
@Vondoe79
Copy link
Author

@phofl thanks for the feedback, but list(range(1,11)) produces 10 columns : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. Could you kindly clarify how this is 12 usecols? Thanks

@phofl
Copy link
Member

phofl commented Aug 17, 2022

It runs into both if blocks. Just use a debugger and step through

@Vondoe79
Copy link
Author

Got it!

@xiaoguazh
Copy link

I've same issue heres with pandas==2.0.2 when I try hanlde all sheets with on function.

My excel workbook has 3 sheets:
sheet1: column A-M
sheet2: column A-M
sheet3: column A-D ** this sheed has less column

I use pd.read_excel(file, sheet_name=v_sheets, usecols="A:M",...) to process the 3 sheets together, with the latest pandas lib, it raise error now for sheet3:

pandas.errors.ParserError: Defining usecols without of bounds indices is not allowed. [4, 5, 6, 7, 8, 9, 10, 11, 12] are out of bounds. (sheet: 3)

with previous version of pandas, this is just a warning and the code can run successfully.

but now the same code crash.

I feel this changes make more CONS than PROS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv Usage Question
Projects
None yet
Development

No branches or pull requests

4 participants