Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added normal plot for time series #550

Merged
merged 28 commits into from
Mar 5, 2024

Conversation

Gerhardsa0
Copy link
Contributor

@Gerhardsa0 Gerhardsa0 commented Feb 21, 2024

Closes #549

Summary of Changes

Add more plot for time series:

  • TimeSeries.plot_lineplot
  • TimeSeries.plot_scatterplot

@Gerhardsa0 Gerhardsa0 requested a review from a team as a code owner February 21, 2024 10:42
@Gerhardsa0 Gerhardsa0 linked an issue Feb 21, 2024 that may be closed by this pull request
Copy link
Contributor

github-actions bot commented Feb 21, 2024

🦙 MegaLinter status: ✅ SUCCESS

Descriptor Linter Files Fixed Errors Elapsed time
✅ PYTHON black 3 0 0 0.83s
✅ PYTHON mypy 3 0 1.84s
✅ PYTHON ruff 3 0 0 0.16s
✅ REPOSITORY git_diff yes no 0.25s

See detailed report in MegaLinter reports
Set VALIDATE_ALL_CODEBASE: true in mega-linter.yml to validate all sources, not only the diff

MegaLinter is graciously provided by OX Security

Copy link

codecov bot commented Feb 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (3415045) to head (ba11d24).

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #550   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           50        50           
  Lines         2866      2915   +49     
=========================================
+ Hits          2866      2915   +49     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lars-reimann
Copy link
Member

Wouldn't it also make sense to plot numeric feature columns against the time column? In that case, this function should get a parameter column_name to select the column to plot.

@Gerhardsa0
Copy link
Contributor Author

Wouldn't it also make sense to plot numeric feature columns against the time column? In that case, this function should get a parameter column_name to select the column to plot.

Yes, it would give a user a more detailed view on, how the other data points are correlated to the target column. On the other hand, the plots can get messy, but only if a lot of columns are plotted. Also, it should be an optional argument, so the user can always easy plot the time series.

@lars-reimann
Copy link
Member

On the other hand, the plots can get messy, but only if a lot of columns are plotted.

I was only thinking of plotting one feature column against the time column.

Yes, it would give a user a more detailed view on, how the other data points are correlated to the target column. Also, it should be an optional argument, so the user can always easy plot the time series.

Then let's add an optional column_name parameter. If it's not set, we plot the target column. If it is set, we plot the selected column. Validation that the column is numeric should happen in any case.

@Gerhardsa0
Copy link
Contributor Author

Should a user choose between scatter and normal plot?

@lars-reimann
Copy link
Member

lars-reimann commented Feb 22, 2024

Should a user choose between scatter and normal plot?

In the Table class we have plot_lineplot and plot_scatterplot. Both take x_column_name and a y_column_name as required parameters.

We could also have those two methods in TimeSeries, but make x_column_name and y_column_name optional parameters. The default x_column_name would be the name of the time column and the default y_column_name would be the name of the feature column.

@Gerhardsa0
Copy link
Contributor Author

Should a user choose between scatter and normal plot?

In the Table class we have plot_lineplot and plot_scatterplot. Both take x_column_name and a y_column_name as required parameters.

We could also have those two methods in TimeSeries, but make x_column_name and y_column_name optional parameters. The default x_column_name would be the name of the time column and the default y_column_name would be the name of the feature column.

The X-Axis is usally the date in time series or the usual index column. Should I still add the x_column_name ? and should the time series take an optional parameter for a date column?

@lars-reimann
Copy link
Member

lars-reimann commented Feb 27, 2024

The X-Axis is usally the date in time series or the usual index column. Should I still add the x_column_name ?

I'd say yes due to two reasons:

  1. Symmetry with the Table class.
  2. It's useful in order to see correlation between two feature columns.

and should the time series take an optional parameter for a date column?

Hmm, probably not. The time column should include dates (timestamps). Otherwise, we would also need to support separate columns for year/month/day, hour/minute/second/nanos or variations thereof.

@Gerhardsa0
Copy link
Contributor Author

Should a user choose between scatter and normal plot?

In the Table class we have plot_lineplot and plot_scatterplot. Both take x_column_name and a y_column_name as required parameters.

We could also have those two methods in TimeSeries, but make x_column_name and y_column_name optional parameters. The default x_column_name would be the name of the time column and the default y_column_name would be the name of the feature column.

The default value of, y_column_name should be the index, so the user can simple plot the timeseries with a call like: TimeSeries.plot_time_series(). Or is this not wanted behavior?

Gerhardsa0 and others added 5 commits February 27, 2024 23:43
- added tests for scatter and line plot with the additional parameters
…-time-series' into 549-feat-normal-visualization-of-time-series

# Conflicts:
#	src/safeds/data/tabular/containers/_time_series.py
@lars-reimann
Copy link
Member

lars-reimann commented Mar 2, 2024

The default value of, y_column_name should be the index, so the user can simple plot the timeseries with a call like: TimeSeries.plot_time_series(). Or is this not wanted behavior?

Would the time column not make more sense as the default x _column_name? The default y _column_name should be the target.

@Gerhardsa0
Copy link
Contributor Author

The default value of, y_column_name should be the index, so the user can simple plot the timeseries with a call like: TimeSeries.plot_time_series(). Or is this not wanted behavior?

Would the time column not make more sense as the default x _column_name? The default y _column_name should be the target.

Yes, I didn't notice.

@Gerhardsa0
Copy link
Contributor Author

The default value of, y_column_name should be the index, so the user can simple plot the timeseries with a call like: TimeSeries.plot_time_series(). Or is this not wanted behavior?

Would the time column not make more sense as the default x _column_name? The default y _column_name should be the target.

I added this behavior, to the function, which throws up new questions.

The time column can be pretty useful for plotting, so the user could see dates in the plot. But we don't handle them as dates, only as a time dummy right now.

Should the time column only allow numerical values? As we don't want it to handle, “real” date-types.

@lars-reimann
Copy link
Member

Should the time column only allow numerical values? As we don't want it to handle, “real” date-types.

Data types would be a useful addition eventually. Maybe create an issue for this if we don't have one already.

Copy link
Member

@lars-reimann lars-reimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my point of view, this is ready to merge. Thank you!

@lars-reimann lars-reimann merged commit dbdf11e into main Mar 5, 2024
9 checks passed
@lars-reimann lars-reimann deleted the 549-feat-normal-visualization-of-time-series branch March 5, 2024 08:57
lars-reimann pushed a commit that referenced this pull request Apr 3, 2024
## [0.20.0](v0.19.0...v0.20.0) (2024-04-03)

### Features

* add deterministic hash methods to all types ([#573](#573)) ([f6a3ca7](f6a3ca7))
* add fnn functionality ([#529](#529)) ([ce53153](ce53153)), closes [#522](#522)
* add suffixes to models to indicate their task ([#588](#588)) ([d490dee](d490dee))
* added lag_plot ([#548](#548)) ([0fb38d2](0fb38d2)), closes [#519](#519)
* added normal plot for time series ([#550](#550)) ([dbdf11e](dbdf11e)), closes [#549](#549)
* when using from table to time series feature must be given ([#572](#572)) ([ca23f0f](ca23f0f)), closes [#571](#571)

### Bug Fixes

* incorrect type hint for `number_of_bins` parameter ([#567](#567)) ([b434e53](b434e53))
* mark various API elements as internal ([#587](#587)) ([ea176fc](ea176fc)), closes [#582](#582) [#585](#585)
@lars-reimann
Copy link
Member

🎉 This PR is included in version 0.20.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@lars-reimann lars-reimann added the released Included in a release label Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
released Included in a release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: normal visualization of time series
3 participants