Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microsoft.Data.Analysis DataFrameColumns.SetName method is weird #6129

Closed
Tracked by #6144
olavt opened this issue Mar 14, 2022 · 4 comments · Fixed by #6676
Closed
Tracked by #6144

Microsoft.Data.Analysis DataFrameColumns.SetName method is weird #6129

olavt opened this issue Mar 14, 2022 · 4 comments · Fixed by #6676
Labels
enhancement New feature or request Microsoft.Data.Analysis All DataFrame related issues and PRs P2 Priority of the issue for triage purpose: Needs to be fixed at some point.
Milestone

Comments

@olavt
Copy link

olavt commented Mar 14, 2022

.Net Core 3.1
Microsoft.Data.Analysis Nuget package version: 0.19.1

This would be the logical way to call the method:

dataFrame.Columns["Date_left"].SetName("Date");

But, this does not result in the column being properly renamed.

The method needs to be called like this in order to work properly:

dataFrame.Columns["Date_left"].SetName("Date", dataFrame);

Here's a program to reproduce the issue:

using Microsoft.Data.Analysis;
using System;
using System.Linq;

namespace TestDataFRame
{
    internal class Program
    {
        static void Main(string[] args)
        {
            DateTime?[] dates1 = { new DateTime(2022, 03, 01), new DateTime(2022, 03, 02), new DateTime(2022, 03, 03) };
            double?[] closePrices = { 10.5, 12.4, 11.3 };

            DateTime?[] dates2 = { new DateTime(2022, 03, 01), new DateTime(2022, 03, 02), new DateTime(2022, 03, 03), new DateTime(2022, 03, 04) };
            double[] shortPercentages = { 2.34, 2.36, 3.01, 3.04 };

            DataFrame dataFrame1 = new DataFrame();
            dataFrame1.Columns.Add(new PrimitiveDataFrameColumn<DateTime>("Date", dates1));
            dataFrame1.Columns.Add(new DoubleDataFrameColumn("ClosePrice", closePrices));

            var numbers1 = dataFrame1.Columns.GetDoubleColumn("ClosePrice").ToArray();

            DataFrame dataFrame2 = new DataFrame();
            dataFrame2.Columns.Add(new PrimitiveDataFrameColumn<DateTime>("Date", dates2));
            dataFrame2.Columns.Add(new DoubleDataFrameColumn("ShortPercentage", shortPercentages));

            var numbers2 = dataFrame2.Columns.GetDoubleColumn("ShortPercentage").ToArray();

            DataFrame dataFrame = dataFrame1.Merge<DateTime>(dataFrame2, "Date", "Date", joinAlgorithm: JoinAlgorithm.Left);
            dataFrame.Columns["Date_left"].SetName("Date");

            var dates = dataFrame.Columns.GetPrimitiveColumn<DateTime>("Date").ToArray();
        }
    }
}
@luisquintanilla luisquintanilla added the Microsoft.Data.Analysis All DataFrame related issues and PRs label Mar 14, 2022
@michaelgsharp
Copy link
Member

Looks like the columns themselves don't have a reference to the parent DataFrame, hence this behavior. I think the way to fix this would be to have the column have a reference to the DataFrame so it can perform the update as you point out. I dont think it would be a big change. @luisquintanilla

@michaelgsharp michaelgsharp added the P2 Priority of the issue for triage purpose: Needs to be fixed at some point. label Mar 18, 2022
@michaelgsharp
Copy link
Member

Ok, the other option (or we could do both), is that dataFrame.Columns does have a SetColumnName method as well, it just for some reason requires you to pass in the whole column SetColumnName(DataFrameColumn column, string newName). We could just overload that method to just take the name of the current column and the name you want to set it to SetColumnName(string curName, string newName). Then you could just call dataFrame.Columns.SetColumnName("cur", "new");. Thoughts?

@luisquintanilla
Copy link
Contributor

Thanks for reporting this issue @olavt.

@michaelgsharp Thanks for looking into this and providing those solutions. Tracking this issue as part of DataFrame improvements.

@chrisxfire
Copy link

Note that, because SetName has a default value of null for the DataFrame parameter, if a user tries dataFrame["oldColumnName"]("newColumnName"), not only will this fail, but it will also not throw an exception, resulting in a bug that is hard to catch.

You can guess who the user is... 😀

@ghost ghost added the in-pr label May 14, 2023
@ghost ghost removed the in-pr label Jul 6, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Aug 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request Microsoft.Data.Analysis All DataFrame related issues and PRs P2 Priority of the issue for triage purpose: Needs to be fixed at some point.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants