Skip to content

QST:Somehow parse dates in pandas is messing the data as shown here: https://youtu.be/m_4pbEyXSds #42793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks
sv1900 opened this issue Jul 29, 2021 · 12 comments

Comments

@sv1900
Copy link

sv1900 commented Jul 29, 2021

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.


Question about pandas

Note: If you'd still like to submit a question, please read this guide detailing how to provide the necessary information for us to reproduce your question.

# Your code here, if applicable
@sv1900 sv1900 added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Jul 29, 2021
@phofl
Copy link
Member

phofl commented Jul 29, 2021

Please probide a copy pastable example

@phofl phofl added Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Jul 29, 2021
@sv1900
Copy link
Author

sv1900 commented Jul 29, 2021

Here is the code I am using (I do not get any errors when I run it):

import os
import datetime
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from numpy import nan
import io
from matplotlib.pyplot import figure
import time
df1 = pd.read_csv('ML_Input.csv', index_col=0, parse_dates=True)
df2 = pd.read_csv('ML_Input.csv')
sns.set(rc={'figure.figsize':(11, 4)})
df1['GPP'].plot(linewidth=0.5);
sns.set(rc={'figure.figsize':(11, 4)})
df2['GPP'].plot(linewidth=0.5);

image)

Please let me know if you would like me to upload sample data?

@phofl
Copy link
Member

phofl commented Jul 29, 2021

We need a minimal and reproducible example where you provide the actual output and what you would expect. Please read https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports and provide your example accordingly

@sv1900
Copy link
Author

sv1900 commented Aug 3, 2021

ML_Input.csv

Please find attached here the input file to the code above and it gives you the same error as shown in the YouTube link: https://www.youtube.com/watch?v=m_4pbEyXSds

@phofl
Copy link
Member

phofl commented Aug 3, 2021

Could you please check the provided link and give an example accordingly?

@sv1900
Copy link
Author

sv1900 commented Aug 3, 2021

Sorry, I am a beginner in python; I can provide the ipynb file or paste the code here (which is same as the top comment) or follow other objective instruction if any. Actually, if you have used Jupyter, you should be easily able to run the code with the input file provided to grasp what I am saying. Here is a snapshot of the same:
image

@phofl
Copy link
Member

phofl commented Aug 3, 2021

If someone wants to fix this, he will have to debug the code. Therefore a minimal example is required which clearly shows the error and does not confuse with too much data.

You can search the IO Csv Tag in our issues to have a look at other examples for the IO methods or simply read the blog post. There are a lot of examples.

@sv1900
Copy link
Author

sv1900 commented Aug 3, 2021

As the document says "Lets be clear, this is hard and takes time." ; and I wish I had some more of it :) Thanks anyway!! I just hope other researchers are taking care while plotting their data using the parse dates while reading CSVs. Cheers!

@MarcoGorelli
Copy link
Member

Hi @sv1900

You should use dayfirst=True, as that's the date format you're using https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html?highlight=read_csv#pandas.read_csv

dayfirstbool, default False
DD/MM format dates, international and European format.

If you believe there's a bug in pandas, please let us know, include a reproducible example, and we'll reopen - for the future, please note that usage questions are encouraged to be asked on StackOverflow

@MarcoGorelli MarcoGorelli added Usage Question and removed Needs Info Clarification about behavior needed to assess issue labels Aug 3, 2021
@MarcoGorelli MarcoGorelli added this to the No action milestone Aug 3, 2021
@sv1900
Copy link
Author

sv1900 commented Aug 3, 2021

@MarcoGorelli Thanks much for your reply. I think "use dayfirst=True," is not a good solution since in the first place no error is thrown for parse dates without this argument. Why - Because if there is no error thrown and the labels/plot looks almost similar to expected dates with the date.index.dtype confirming that; then: in my opinion it becomes a dangerous proposition to the general user/ researcher using this tool to plot their data without having second thoughts about validating the same. I would either change the function such that it throws an error if the dates are not read correctly despite that python works very well since it is forgiving. I've used R to plot the same - and it throws an error/ will not confirm the class unless the dates are converted 100%. I would actually think it is wise to ask all researchers who have used this tool to double check their plots.

@MarcoGorelli
Copy link
Member

Yeah, there's an open issue about this #12585

I'd love to address this issue when I get a chance, this dayfirst=True thing has bit me a few times in the past

@sv1900
Copy link
Author

sv1900 commented Aug 3, 2021

But you're still alive - so it has hopefully made you stronger; Thanks again; cheers!! - Samir Vinchurkar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants