Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG-REPORT] Issue using apply over a column #2447

Open
josemariagarcia95 opened this issue Dec 19, 2024 · 0 comments
Open

[BUG-REPORT] Issue using apply over a column #2447

josemariagarcia95 opened this issue Dec 19, 2024 · 0 comments

Comments

@josemariagarcia95
Copy link

Thank you for reaching out and helping us improve Vaex!

Before you submit a new Issue, please read through the documentation. Also, make sure you search through the Open and Closed Issues - your problem may already be discussed or addressed.

Description
Please provide a clear and concise description of the problem. This should contain all the steps needed to reproduce the problem. A minimal code example that exposes the problem is very appreciated.

Software information

  • Vaex version (import vaex; vaex.__version__):
{'vaex': '4.17.0',
 'vaex-core': '4.17.1',
 'vaex-viz': '0.5.4',
 'vaex-hdf5': '0.14.1',
 'vaex-server': '0.9.0',
 'vaex-astro': '0.9.3',
 'vaex-jupyter': '0.8.2',
 'vaex-ml': '0.18.3'}
  • Vaex was installed via: conda-forge
  • OS: Edición Windows 11 Pro 23H2 (22631.4602)

Additional information
I've been given a CSV file from a previous project and I'm supposed to prepare some scripts with Python to plot the value it contains. The dataset in this CSV file holds data from electric and vibration signals. The data I'm interested in is stored at a column, "A", where each row holds a 16.000-elements-long array of float values, which represents a vibration/electric signal.

I want to use Vaex to exploit its higher performance features, but I found what I think it's a bug when processing the signals. I started adapting code which works in Pandas.

import pandas as pd
import json 
signal_df = pd.read_csv('csv_test.csv', sep=';')
# The DecompressedValue column, despite being stored as a regular array, is read a long string, so in order to turn it into an array, json.loads() has to be applied to each value of the column
signal_df.DecompressedValue = signal_df.DecompressedValue.apply(lambda r: json.loads(r))

However, when trying to replicate the same functionality in Vaex, even if this code runs correctly, trying to access the dataframe after that produces an error (find vaex_test.csv to test it).

import vaex

#shorter file for testing purposes, find attached
test = vaex.from_csv('vaex_test.csv', sep=';')
test['DecompressedValue'] = test['DecompressedValue'].apply(lambda r: json.loads(r))
test.head()

This produce a ValueError:

[12/19/24 12:50:48] ERROR    error evaluating: DecompressedValue at rows 0-5                      [dataframe.py](file:///C:/Users/user/AppData/Local/anaconda3/envs/py310env/lib/site-packages/vaex/dataframe.py):[4101](file:///C:/Users/user/AppData/Local/anaconda3/envs/py310env/lib/site-packages/vaex/dataframe.py#4101)
                             multiprocessing.pool.RemoteTraceback:                                                 
                             """                                                                                   
                             Traceback (most recent call last):                                                    
                               File                                                                                
                             "c:\Users\user\AppData\Local\anaconda3\envs\py310env\lib\mu                  
                             ltiprocessing\pool.py", line 125, in worker                                           
                                 result = (True, func(*args, **kwds))                                              
                               File                                                                                
                             "c:\Users\user\AppData\Local\anaconda3\envs\py310env\lib\si                  
                             te-packages\vaex\expression.py", line 1629, in _apply                                 
                                 result = np.array(result)                                                         
                             ValueError: setting an array element with a sequence. The requested                   
                             array has an inhomogeneous shape after 1 dimensions. The detected                     
                             shape was (5,) + inhomogeneous part.                                                  
                             """
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant