Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word cloud resolution error #1493

Open
3 tasks done
gonzalezhomar opened this issue Nov 1, 2023 · 11 comments
Open
3 tasks done

Word cloud resolution error #1493

gonzalezhomar opened this issue Nov 1, 2023 · 11 comments
Labels
bug 🐛 Something isn't working information requested ❔ Cannot reproduce, waiting for minimum reproduction details.

Comments

@gonzalezhomar
Copy link

Current Behaviour

The ProfileReport fails to generate, and i dont find if the issue is already solved or a way to bypass it.

I think because of the size of my data, it fails to generate the wordcloud.

The full error output is:


IndexError Traceback (most recent call last)
File ~\AppData\Roaming\jupyterlab-desktop\jlab_server\lib\site-packages\wordcloud\wordcloud.py:458, in WordCloud.generate_from_frequencies(self, frequencies, max_font_size)
457 try:
--> 458 font_size = int(2 * sizes[0] * sizes[1]
459 / (sizes[0] + sizes[1]))
460 # quick fix for if self.layout_ contains less than 2 values
461 # on very small images it can be empty

IndexError: list index out of range

During handling of the above exception, another exception occurred:

IndexError Traceback (most recent call last)
File ~\AppData\Roaming\jupyterlab-desktop\jlab_server\lib\site-packages\wordcloud\wordcloud.py:464, in WordCloud.generate_from_frequencies(self, frequencies, max_font_size)
463 try:
--> 464 font_size = sizes[0]
465 except IndexError:

IndexError: list index out of range

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
File ~\AppData\Roaming\jupyterlab-desktop\jlab_server\lib\site-packages\IPython\core\formatters.py:344, in BaseFormatter.call(self, obj)
342 method = get_real_method(obj, self.print_method)
343 if method is not None:
--> 344 return method()
345 return None
346 else:

File ~\AppData\Roaming\jupyterlab-desktop\jlab_server\lib\site-packages\ydata_profiling\profile_report.py:520, in ProfileReport.repr_html(self)
518 def repr_html(self) -> None:
519 """The ipython notebook widgets user interface gets called by the jupyter notebook."""
--> 520 self.to_notebook_iframe()

File ~\AppData\Roaming\jupyterlab-desktop\jlab_server\lib\site-packages\ydata_profiling\profile_report.py:500, in ProfileReport.to_notebook_iframe(self)
498 with warnings.catch_warnings():
499 warnings.simplefilter("ignore")
--> 500 display(get_notebook_iframe(self.config, self))

File ~\AppData\Roaming\jupyterlab-desktop\jlab_server\lib\site-packages\ydata_profiling\report\presentation\flavours\widget\notebook.py:75, in get_notebook_iframe(config, profile)
73 output = get_notebook_iframe_src(config, profile)
74 elif attribute == IframeAttribute.srcdoc:
---> 75 output = get_notebook_iframe_srcdoc(config, profile)
76 else:
77 raise ValueError(
78 f'Iframe Attribute can be "src" or "srcdoc" (current: {attribute}).'
79 )

File ~\AppData\Roaming\jupyterlab-desktop\jlab_server\lib\site-packages\ydata_profiling\report\presentation\flavours\widget\notebook.py:29, in get_notebook_iframe_srcdoc(config, profile)
27 width = config.notebook.iframe.width
28 height = config.notebook.iframe.height
---> 29 src = html.escape(profile.to_html())
31 iframe = f'<iframe width="{width}" height="{height}" srcdoc="{src}" frameborder="0" allowfullscreen></iframe>'
33 return HTML(iframe)

File ~\AppData\Roaming\jupyterlab-desktop\jlab_server\lib\site-packages\ydata_profiling\profile_report.py:470, in ProfileReport.to_html(self)
462 def to_html(self) -> str:
463 """Generate and return complete template as lengthy string
464 for using with frameworks.
465
(...)
468
469 """
--> 470 return self.html

File ~\AppData\Roaming\jupyterlab-desktop\jlab_server\lib\site-packages\ydata_profiling\profile_report.py:277, in ProfileReport.html(self)
274 @Property
275 def html(self) -> str:
276 if self._html is None:
--> 277 self._html = self._render_html()
278 return self._html

File ~\AppData\Roaming\jupyterlab-desktop\jlab_server\lib\site-packages\ydata_profiling\profile_report.py:385, in ProfileReport._render_html(self)
382 def _render_html(self) -> str:
383 from ydata_profiling.report.presentation.flavours import HTMLReport
--> 385 report = self.report
387 with tqdm(
388 total=1, desc="Render HTML", disable=not self.config.progress_bar
389 ) as pbar:
390 html = HTMLReport(copy.deepcopy(report)).render(
391 nav=self.config.html.navbar_show,
392 offline=self.config.html.use_local_assets,
(...)
400 version=self.description_set.package["ydata_profiling_version"],
401 )

File ~\AppData\Roaming\jupyterlab-desktop\jlab_server\lib\site-packages\ydata_profiling\profile_report.py:271, in ProfileReport.report(self)
268 @Property
269 def report(self) -> Root:
270 if self._report is None:
--> 271 self._report = get_report_structure(self.config, self.description_set)
272 return self._report

File ~\AppData\Roaming\jupyterlab-desktop\jlab_server\lib\site-packages\ydata_profiling\report\structure\report.py:387, in get_report_structure(config, summary)
368 section_items: List[Renderable] = [
369 Container(
370 get_dataset_items(config, summary, alerts),
(...)
374 ),
375 ]
377 if len(summary.variables) > 0:
378 section_items.append(
379 Dropdown(
380 name="Variables",
381 anchor_id="variables-dropdown",
382 id="variables-dropdown",
383 is_row=True,
384 classes=["dropdown-toggle"],
385 items=list(summary.variables),
386 item=Container(
--> 387 render_variables_section(config, summary),
388 sequence_type="accordion",
389 name="Variables",
390 anchor_id="variables",
391 ),
392 )
393 )
395 scatter_items = get_interactions(config, summary.scatter)
396 if len(scatter_items) > 0:

File ~\AppData\Roaming\jupyterlab-desktop\jlab_server\lib\site-packages\ydata_profiling\report\structure\report.py:162, in render_variables_section(config, dataframe_summary)
160 variable_type = summary["type"]
161 render_map_type = render_map.get(variable_type, render_map["Unsupported"])
--> 162 template_variables.update(render_map_type(config, template_variables))
164 # Ignore these
165 if reject_variables:

File ~\AppData\Roaming\jupyterlab-desktop\jlab_server\lib\site-packages\ydata_profiling\report\structure\variables\render_text.py:81, in render_text(config, summary)
77 top_items.append(table)
79 if words and "word_counts" in summary:
80 mini_wordcloud = Image(
---> 81 plot_word_cloud(config, summary["word_counts"]),
82 image_format=config.plot.image_format,
83 alt="Mini wordcloud",
84 )
85 top_items.append(mini_wordcloud)
86 template_variables["top"] = Container(top_items, sequence_type="grid")

File ~\AppData\Roaming\jupyterlab-desktop\jlab_server\lib\contextlib.py:75, in ContextDecorator.call..inner(*args, **kwds)
72 @wraps(func)
73 def inner(*args, **kwds):
74 with self._recreate_cm():
---> 75 return func(*args, **kwds)

File ~\AppData\Roaming\jupyterlab-desktop\jlab_server\lib\site-packages\ydata_profiling\visualisation\plot.py:126, in plot_word_cloud(config, word_counts)
124 @manage_matplotlib_context()
125 def plot_word_cloud(config: Settings, word_counts: pd.Series) -> str:
--> 126 _plot_word_cloud(series=word_counts)
127 return plot_360_n0sc0pe(config)

File ~\AppData\Roaming\jupyterlab-desktop\jlab_server\lib\site-packages\ydata_profiling\visualisation\plot.py:38, in _plot_word_cloud(series, figsize)
36 for i, series_data in enumerate(series):
37 word_dict = series_data.to_dict()
---> 38 wordcloud = WordCloud(
39 background_color="white", random_state=123, width=300, height=200, scale=2
40 ).generate_from_frequencies(word_dict)
42 ax = plot.add_subplot(1, len(series), i + 1)
43 ax.imshow(wordcloud)

File ~\AppData\Roaming\jupyterlab-desktop\jlab_server\lib\site-packages\wordcloud\wordcloud.py:466, in WordCloud.generate_from_frequencies(self, frequencies, max_font_size)
464 font_size = sizes[0]
465 except IndexError:
--> 466 raise ValueError(
467 "Couldn't find space to draw. Either the Canvas size"
468 " is too small or too much of the image is masked "
469 "out.")
470 else:
471 font_size = max_font_size

ValueError: Couldn't find space to draw. Either the Canvas size is too small or too much of the image is masked out.

Expected Behaviour

I'm expecting the ProfileReport to generate, but skipping the wordclouds.

Maybe an option to turn the wordclouds off, so the profile generates but skips that.

Data Description

My data is private, about 2 million rows and +200 columns. Can't identify the column (s ???) that is causing the error.

Code that reproduces the bug

import pandas as pd
from ydata_profiling import ProfileReport
profile=ProfileReport(df, title="my_data_report", minimal=True)
profile

pandas-profiling version

v4.6.1

Dependencies

pandas==2.0.3

OS

Windows 11

Checklist

  • There is not yet another bug report for this issue in the issue tracker
  • The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.
  • The issue has not been resolved by the entries listed under Common Issues.
@SoyGema
Copy link

SoyGema commented Nov 7, 2023

Hello there.
I'm experiencing the same issue here.
So far, I´ve tried to remove part of the string coming from the categorical column to try to make an smaller string, using this function . Unfortunaltely , I am not able to make it work yet.

def remove_substring_from_column(df, column_name, substring, inplace=True):
    """
    Removes a specified substring from a particular column in a DataFrame.
    
    Parameters:
    - df (pandas.DataFrame): The input DataFrame.
    - column_name (str): The name of the column from which the substring should be removed.
    - substring (str): The substring to remove.
    - inplace (bool, optional): If True, modifies the input DataFrame directly. If False, returns a modified copy. 
                                Defaults to True.
    
    Returns:
    - pandas.DataFrame or None: If inplace is False, returns a modified copy of the DataFrame. 
                                If inplace is True, returns None and the input DataFrame is modified in place.
    """
    if inplace:
        target_df = df
    else:
        target_df = df.copy()
    
    if column_name in target_df.columns:
        print(f"Before removal: {target_df[column_name].head()}")
        target_df[column_name] = target_df[column_name].str.replace(substring, '', regex=False)
        print(f"After removal: {target_df[column_name].head()}")
    if not inplace:
        return target_df
    print('# ----data removed----#' + str(substring) + ' ' + str(column_name))
    return target_df`

Maybe decreasing the font size or increasing the Canvas Size might work.
Anyone has any idea of how to tackle this , beyond trying changing the string length or doing a label encoding step ?
🙏🙏 Thanks for the time dedicated to tackle this issue ! 🙏🙏

@fabclmnt
Copy link
Contributor

fabclmnt commented Dec 4, 2023

Hi @SoyGema and @gonzalezhomar ,

can you please share what is the python version that you are using? And how are you installing ydata-profiling in your env (pip, conda, etc.)

@fabclmnt fabclmnt added information requested ❔ Cannot reproduce, waiting for minimum reproduction details. and removed needs-triage labels Dec 4, 2023
@SoyGema
Copy link

SoyGema commented Dec 5, 2023

Hi there. we are using . ydata-profiling is installed via pip

Python 3.10.12
ydata-profiling 4.5.0

@jonathanyulan99
Copy link

Following...I am running into the same problem, utilizing both 4.5 and 4.6 ydata-profiling with the above error message and the output being: <Figure size 600x400 with 0 Axes>

@sdbeuf
Copy link

sdbeuf commented Dec 12, 2023

Same issue here

Python 3.11.5
pandas 2.0.3
ydata_profiling v4.6.2

@fabclmnt
Copy link
Contributor

Thank you for the information provided!

This issue is related with the wordcloud plot. We will analyze the issue in more detail, and considered it for the next release expected for mid January.

@fabclmnt fabclmnt added the bug 🐛 Something isn't working label Dec 14, 2023
@fabclmnt fabclmnt changed the title Bug Report Word cloud resolution error Dec 14, 2023
@fabclmnt fabclmnt moved this to Selected for next release in YData-profiling roadmap Dec 14, 2023
@BoPeng
Copy link

BoPeng commented Dec 19, 2023

Just to get around of this error, I change generate_from_frequencies(word_dict) to generate_from_frequencies(word_dict, max_font_size=10) in

wordcloud = WordCloud(
background_color="white", random_state=123, width=300, height=200, scale=2
).generate_from_frequencies(word_dict)

to avoid automatic determination of max_font_size. This is of course not ideal so I will be waiting for a proper solution.

@somnambWl
Copy link

For me, the cause of the problem was that I included in the dataframe also very long ID strings. When removed, the profiling started to work again :)

@chalozin
Copy link

chalozin commented Jul 8, 2024

I'm having a similar issue - @fabclmnt was that issue resolved in the Jan (01/2024 i guess) version?
@gonzalezhomar , did you manage to turn off word cloud?

@hschaeufler
Copy link

hschaeufler commented Oct 19, 2024

I have the same issue (see below). What helped me as a workaround is to set in ydata_profiling/visualisation/plot.py a max_font_size value for the WordCloud

        wordcloud = WordCloud(
            font_path=config.plot.font_path,
            background_color="white",
            max_font_size=100,
            random_state=123,
            width=300,
            height=200,
            scale=2,
        ).generate_from_frequencies(word_dict)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
File ~/.local/share/virtualenvs/dartgen-k85DvGnY/lib/python3.12/site-packages/wordcloud/wordcloud.py:458, in WordCloud.generate_from_frequencies(self, frequencies, max_font_size)
    457 try:
--> 458     font_size = int(2 * sizes[0] * sizes[1]
    459                     / (sizes[0] + sizes[1]))
    460 # quick fix for if self.layout_ contains less than 2 values
    461 # on very small images it can be empty

IndexError: list index out of range

During handling of the above exception, another exception occurred:
...
ValueError: Couldn't find space to draw. Either the Canvas size is too small or too much of the image is masked out.

@vhermecz
Copy link

I'm also having this issue (font_size = int(2 * sizes[0] * sizes[1]). From previous comments I am assuming long values in some columns are causing this. I was hoping that as a workaround one could just disable wordcloud easily, but was not able to find a way to do that quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working information requested ❔ Cannot reproduce, waiting for minimum reproduction details.
Projects
Status: Selected for next release
Development

No branches or pull requests