Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The in-memory storage for tracking rate limits #29908

Closed
3 tasks done
Habeeb556 opened this issue Aug 9, 2024 · 14 comments
Closed
3 tasks done

The in-memory storage for tracking rate limits #29908

Habeeb556 opened this issue Aug 9, 2024 · 14 comments

Comments

@Habeeb556
Copy link

Habeeb556 commented Aug 9, 2024

Bug description

The below warnings appears in logs when startup:

superset[2032316]:  warnings.warn(
superset[2032276]: logging was configured successfully
superset[2032276]: 2024-08-09 22:52:32,367:INFO:superset.utils.logging_configurator:logging was configured successfully
superset[2032276]: 2024-08-09 22:52:32,371:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
superset[2032276]: /swloc/.virtualenvs/supersetvenv5/lib/python3.10/site-packages/flask_limiter/extension.py:333: UserWarning: Using the in-memory storage for tracking rate limits as no storage was explicitly specified. This is not recommended for production use. See: https://flask-limiter.readthedocs.io#configuring-a-storage-backend for documentation about configuring the storage backend.
superset[2032276]:  warnings.warn(
superset[2032328]: logging was configured successfully
superset[2032328]: 2024-08-09 22:52:32,492:INFO:superset.utils.logging_configurator:logging was configured successfully
superset[2032328]: 2024-08-09 22:52:32,519:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
superset[2032328]: /swloc/.virtualenvs/supersetvenv5/lib/python3.10/site-packages/flask_limiter/extension.py:333: UserWarning: Using the in-memory storage for tracking rate limits as no storage was explicitly specified. This is not recommended for production use. See: https://flask-limiter.readthedocs.io#configuring-a-storage-backend for documentation about configuring the storage backend.
superset[2032328]:  warnings.warn(

How to reproduce the bug

Fresh setup.

Screenshots/recordings

No response

Superset version

4.0.2

Python version

3.11

Node version

I don't know

Browser

Not applicable

Additional context

No response

Checklist

  • I have searched Superset docs and Slack and didn't find a solution to my problem.
  • I have searched the GitHub issue tracker and didn't find a similar bug report.
  • I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
Copy link

dosubot bot commented Aug 9, 2024

To resolve the warning about using in-memory storage for tracking rate limits in Superset during startup, you need to configure a persistent storage backend for rate limiting. This can be done by setting the RATELIMIT_STORAGE_URI in your configuration file to a supported storage backend, such as Redis.

Here is how you can set it up in your superset_config.py:

# A storage location conforming to the scheme in storage-scheme. See the limits
# library for allowed values: https://limits.readthedocs.io/en/stable/storage.html
RATELIMIT_STORAGE_URI = "redis://host:port"

Replace "redis://host:port" with the actual URI of your Redis instance. This will configure Superset to use Redis for tracking rate limits instead of the default in-memory storage.

Additionally, you can refer to the docker/pythonpath_dev/superset_config.py for an example configuration that includes Redis settings:

import os

REDIS_HOST = os.getenv("REDIS_HOST", "redis")
REDIS_PORT = os.getenv("REDIS_PORT", "6379")
REDIS_CELERY_DB = os.getenv("REDIS_CELERY_DB", "0")
REDIS_RESULTS_DB = os.getenv("REDIS_RESULTS_DB", "1")

CACHE_CONFIG = {
    "CACHE_TYPE": "RedisCache",
    "CACHE_DEFAULT_TIMEOUT": 300,
    "CACHE_KEY_PREFIX": "superset_",
    "CACHE_REDIS_HOST": REDIS_HOST,
    "CACHE_REDIS_PORT": REDIS_PORT,
    "CACHE_REDIS_DB": REDIS_RESULTS_DB,
}
DATA_CACHE_CONFIG = CACHE_CONFIG

class CeleryConfig:
    broker_url = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_CELERY_DB}"
    result_backend = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}"

CELERY_CONFIG = CeleryConfig

Ensure that your Redis instance is running and accessible at the specified host and port. This will help you avoid the in-memory storage warning and improve the reliability of rate limiting in Superset [1][2].

To continue talking to Dosu, mention @dosu.

@Habeeb556
Copy link
Author

Issue resolved and configured successfully. Awesome!

# A storage location conforming to the scheme in storage-scheme. See the limits
# library for allowed values: https://limits.readthedocs.io/en/stable/storage.html
RATELIMIT_STORAGE_URI = f"redis://{REDIS_HOST}:{REDIS_PORT}"

@Habeeb556
Copy link
Author

++ @dosu could you help me with the warning also:

superset[1501968]: 2024-08-09 23:09:34,652:WARNING:root:Class 'werkzeug.local.LocalProxy' is not mapped

@Habeeb556 Habeeb556 reopened this Aug 9, 2024
Copy link

dosubot bot commented Aug 9, 2024

The warning "Class 'werkzeug.local.LocalProxy' is not mapped" is likely due to the use of LocalProxy objects in the Superset code. To resolve this warning, ensure that these LocalProxy objects are not being inadvertently passed to SQLAlchemy or any other ORM that expects a mapped class. You can access the actual object behind the proxy before passing it to SQLAlchemy:

# Instead of passing the proxy directly, access the underlying object
actual_results_backend = results_backend._get_current_object()
# Use actual_results_backend in your ORM operations

Additionally, there has been a recent update in the Superset codebase that addresses this warning. The update is mentioned in the CHANGELOG/4.0.0.md file under the version 4.0.0 section:

- [#27164](https://github.com/apache/superset/pull/27164) fix: unlock and bump werkzeug (@dpgaspar)

This update includes changes that address the warning related to werkzeug.local.LocalProxy [1].

To continue talking to Dosu, mention @dosu.

@Habeeb556
Copy link
Author

@dosu how to pass this line? where

# Instead of passing the proxy directly, access the underlying object
actual_results_backend = results_backend._get_current_object()
# Use actual_results_backend in your ORM operations

Copy link

dosubot bot commented Aug 9, 2024

To pass the line actual_results_backend = results_backend._get_current_object() in your Superset setup, you need to modify the relevant parts of your code where results_backend is being used. Here’s a step-by-step guide:

  1. Define actual_results_backend in your configuration: Ensure that actual_results_backend is defined in your Superset configuration file (e.g., superset_config.py).

  2. Modify the ResultsBackendManager class: Update the ResultsBackendManager class to use actual_results_backend.

Here is the modified code:

class ResultsBackendManager:
    def __init__(self) -> None:
        self._results_backend = None
        self._use_msgpack = False
        self._actual_results_backend = None

    def init_app(self, app: Flask) -> None:
        self._results_backend = app.config["RESULTS_BACKEND"]
        self._use_msgpack = app.config["RESULTS_BACKEND_USE_MSGPACK"]
        self._actual_results_backend = app.config.get("ACTUAL_RESULTS_BACKEND")

    @property
    def results_backend(self) -> Optional[BaseCache]:
        return self._results_backend

    @property
    def should_use_msgpack(self) -> bool:
        return self._use_msgpack

    @property
    def actual_results_backend(self) -> Optional[BaseCache]:
        return self._actual_results_backend
  1. Use actual_results_backend in ORM operations: Wherever you need to use actual_results_backend in your ORM operations, you can access it via the results_backend_manager instance.

Example usage in your ORM operations:

from superset.extensions import results_backend_manager

def some_orm_operation():
    backend = results_backend_manager.actual_results_backend
    if backend:
        # Perform operations using actual_results_backend
        pass

In the context of superset/commands/sql_lab/results.py, you can integrate actual_results_backend in the SqlExecutionResultsCommand class to store results using the actual results backend:

class SqlExecutionResultsCommand(BaseCommand):
    _key: str
    _rows: int | None
    _blob: Any
    _query: Query

    def __init__(
        self,
        key: str,
        rows: int | None = None,
    ) -> None:
        self._key = key
        self._rows = rows

    def validate(self) -> None:
        if not results_backend_manager.actual_results_backend:
            raise SupersetErrorException(
                SupersetError(
                    message=__("Actual results backend is not configured."),
                    error_type=SupersetErrorType.RESULTS_BACKEND_NOT_CONFIGURED_ERROR,
                    level=ErrorLevel.ERROR,
                )
            )

        read_from_results_backend_start = now_as_float()
        self._blob = results_backend_manager.actual_results_backend.get(self._key)
        stats_logger.timing(
            "sqllab.query.results_backend_read",
            now_as_float() - read_from_results_backend_start,
        )

        if not self._blob:
            raise SupersetErrorException(
                SupersetError(
                    message=__(
                        "Data could not be retrieved from the actual results backend. You "
                        "need to re-run the original query."
                    ),
                    error_type=SupersetErrorType.RESULTS_BACKEND_ERROR,
                    level=ErrorLevel.ERROR,
                ),
                status=410,
            )

        self._query = (
            db.session.query(Query).filter_by(results_key=self._key).one_or_none()
        )
        if self._query is None:
            raise SupersetErrorException(
                SupersetError(
                    message=__(
                        "The query associated with these results could not be found. "
                        "You need to re-run the original query."
                    ),
                    error_type=SupersetErrorType.RESULTS_BACKEND_ERROR,
                    level=ErrorLevel.ERROR,
                ),
                status=404,
            )

    def run(self) -> dict[str, Any]:
        """Runs arbitrary sql and returns data as json"""
        self.validate()
        payload = utils.zlib_decompress(
            self._blob, decode=not results_backend_use_msgpack
        )
        try:
            obj = _deserialize_results_payload(
                payload, self._query, cast(bool, results_backend_use_msgpack)
            )
        except SerializationError as ex:
            raise SupersetErrorException(
                SupersetError(
                    message=__(
                        "Data could not be deserialized from the actual results backend. The "
                        "storage format might have changed, rendering the old data "
                        "stake. You need to re-run the original query."
                    ),
                    error_type=SupersetErrorType.RESULTS_BACKEND_ERROR,
                    level=ErrorLevel.ERROR,
                ),
                status=404,
            ) from ex

        if self._rows:
            obj = apply_display_max_row_configuration_if_require(obj, self._rows)

        return obj

By following these steps, you can integrate actual_results_backend into your Superset setup and use it in your ORM operations [1][2][3].

To continue talking to Dosu, mention @dosu.

@Habeeb556
Copy link
Author

@dosu find the below error:

superset[1654904]: NameError: name 'results_backend' is not defined
superset[1654904]: Failed to create app
superset[1654904]: Traceback (most recent call last):
superset[1654904]:  File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/app.py", line 37, in create_app
superset[1654904]:    app.config.from_object(config_module)
superset[1654904]:  File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask/config.py", line 231, in from_object
superset[1654904]:    obj = import_string(obj)
superset[1654904]:          ^^^^^^^^^^^^^^^^^^
superset[1654904]:  File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/werkzeug/utils.py", line 595, in import_string
superset[1654904]:    __import__(import_name)
superset[1654904]:  File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/config.py", line 1699, in <module>
superset[1654904]:    override_conf = imp.load_source("superset_config", cfg_path)
superset[1654904]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
superset[1654904]:  File "/usr/local/lib/python3.11/imp.py", line 172, in load_source
superset[1654904]:    module = _load(spec)
superset[1654904]:             ^^^^^^^^^^^
superset[1654904]:  File "/swloc/.superset/superset_config.py", line 9, in <module>
superset[1654904]:    actual_results_backend = results_backend._get_current_object()
superset[1654904]:                             ^^^^^^^^^^^^^^^
superset[1654904]: NameError: name 'results_backend' is not defined

@Habeeb556 Habeeb556 reopened this Aug 10, 2024
Copy link

dosubot bot commented Aug 10, 2024

To resolve the "NameError: name 'results_backend' is not defined" error in your superset_config.py file, you need to ensure that results_backend is properly defined and initialized.

Here is how you can define and initialize results_backend in your superset_config.py:

from flask_caching.backends.filesystemcache import FileSystemCache

# Initialize the results_backend
RESULTS_BACKEND = FileSystemCache("/app/superset_home/sqllab")

# Access the actual object behind the proxy
actual_results_backend = RESULTS_BACKEND._get_current_object()

This ensures that results_backend is properly defined and references the FileSystemCache instance. If you are using a different caching backend, make sure to import and initialize it accordingly.

Additionally, ensure you have the necessary dependencies installed for the caching backend you are using. For FileSystemCache, you need the flask-caching package:

pip install flask-caching

This should resolve the NameError and allow Superset to start correctly [1][2].

To continue talking to Dosu, mention @dosu.

@Habeeb556
Copy link
Author

@dosu find this:

superset[1660637]: AttributeError: 'FileSystemCache' object has no attribute '_get_current_object'
superset[1660637]: Failed to create app
superset[1660637]: Traceback (most recent call last):
superset[1660637]:  File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/app.py", line 37, in create_app
superset[1660637]:    app.config.from_object(config_module)
superset[1660637]:  File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/flask/config.py", line 231, in from_object
superset[1660637]:    obj = import_string(obj)
superset[1660637]:          ^^^^^^^^^^^^^^^^^^
superset[1660637]:  File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/werkzeug/utils.py", line 595, in import_string
superset[1660637]:    __import__(import_name)
superset[1660637]:  File "/swloc/.virtualenvs/supersetvenv4/lib/python3.11/site-packages/superset/config.py", line 1699, in <module>
superset[1660637]:    override_conf = imp.load_source("superset_config", cfg_path)
superset[1660637]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
superset[1660637]:  File "/usr/local/lib/python3.11/imp.py", line 172, in load_source
superset[1660637]:    module = _load(spec)
superset[1660637]:             ^^^^^^^^^^^
superset[1660637]:  File "/swloc/.superset/superset_config.py", line 12, in <module>
superset[1660637]:    actual_results_backend = RESULTS_BACKEND._get_current_object()
superset[1660637]:                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
superset[1660637]: AttributeError: 'FileSystemCache' object has no attribute '_get_current_object'

Copy link

dosubot bot commented Aug 10, 2024

The FileSystemCache object does not have a _get_current_object method, which is causing the AttributeError. To resolve this, you should not use _get_current_object with FileSystemCache. Instead, you can directly use the RESULTS_BACKEND object.

Here’s how you can modify your superset_config.py:

  1. Remove the _get_current_object call: Directly use the RESULTS_BACKEND object without trying to access an underlying object.
from flask_caching.backends.filesystemcache import FileSystemCache

# Initialize the results_backend
RESULTS_BACKEND = FileSystemCache("/app/superset_home/sqllab")

# Directly use RESULTS_BACKEND without _get_current_object
actual_results_backend = RESULTS_BACKEND
  1. Ensure proper usage in your application: Wherever you need to use actual_results_backend, you can now directly use it without any additional method calls.

This should resolve the AttributeError and allow Superset to start correctly [1].

To continue talking to Dosu, mention @dosu.

@Habeeb556
Copy link
Author

@dosu but it doesn't solve the warning:

WARNING:root:Class 'werkzeug.local.LocalProxy' is not mapped

Copy link

dosubot bot commented Aug 10, 2024

To address the warning WARNING:root:Class 'werkzeug.local.LocalProxy' is not mapped, you need to ensure that LocalProxy objects are properly handled in your Superset configuration. This warning typically occurs when LocalProxy objects are passed to SQLAlchemy or other ORMs that expect a mapped class.

Here are steps to resolve this:

  1. Access the underlying object behind the LocalProxy: Ensure that you are accessing the actual object behind the LocalProxy before passing it to any ORM operations.

  2. Update your configuration: Modify your superset_config.py to handle LocalProxy objects correctly.

Here is an example of how you can handle LocalProxy objects in your configuration:

from flask import Flask
from werkzeug.local import LocalProxy
from superset.extensions import cache_manager, results_backend_manager

def create_app():
    app = Flask(__name__)
    
    # Initialize your app and extensions here
    # ...

    # Explicitly map or handle LocalProxy objects
    app.config['CACHE'] = cache_manager.cache
    app.config['CONF'] = LocalProxy(lambda: app.config)
    app.config['RESULTS_BACKEND'] = LocalProxy(lambda: results_backend_manager.results_backend)
    app.config['RESULTS_BACKEND_USE_MSGPACK'] = LocalProxy(lambda: results_backend_manager.should_use_msgpack)
    app.config['DATA_CACHE'] = LocalProxy(lambda: cache_manager.data_cache)
    app.config['THUMBNAIL_CACHE'] = LocalProxy(lambda: cache_manager.thumbnail_cache)
    app.config['ASYNC_QUERY_MANAGER'] = LocalProxy(lambda: async_query_manager_factory.instance)
    app.config['EVENT_LOGGER'] = LocalProxy(lambda: _event_logger.get("event_logger"))
    app.config['SECURITY_MANAGER'] = LocalProxy(lambda: appbuilder.sm)

    return app

By explicitly mapping these proxies in your application configuration, you can help ensure that they are properly recognized and handled, which should address the warning [1].

To continue talking to Dosu, mention @dosu.

@Habeeb556
Copy link
Author

@dosu still the below warning exists, how to skip it.

WARNING:root:Class 'werkzeug.local.LocalProxy' is not mapped

Copy link

dosubot bot commented Aug 10, 2024

To skip the warning WARNING:root:Class 'werkzeug.local.LocalProxy' is not mapped, you can configure the logging module to ignore it. Here’s how you can do this:

  1. Suppress the specific warning using the warnings module:
import warnings

# Suppress specific warning
warnings.filterwarnings("ignore", message="Class 'werkzeug.local.LocalProxy' is not mapped")
  1. Add this code at the beginning of your main script or in the __init__.py file where the application is initialized:
# superset/__init__.py or your main script
import warnings

# Suppress specific warning
warnings.filterwarnings("ignore", message="Class 'werkzeug.local.LocalProxy' is not mapped")

This will specifically ignore warnings that match the given message and should prevent the warning from appearing in your logs [1].

To continue talking to Dosu, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant