-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpyc consume cpu on "big" data #329
Comments
Some initial data
generated by |
Confirmed that increasing
Test to show this from __future__ import print_function
import sys
import pickle # noqa
import timeit
import rpyc
import unittest
from nose import SkipTest
import cfg_tests
try:
import pandas as pd
import numpy as np
except Exception:
raise SkipTest("Requires pandas, numpy, and tables")
DF_ROWS = 2000
DF_COLS = 2500
class MyService(rpyc.Service):
on_connect_called = False
on_disconnect_called = False
def on_connect(self, conn):
self.on_connect_called = True
def on_disconnect(self, conn):
self.on_disconnect_called = True
def exposed_write_data(self, dataframe):
rpyc.classic.obtain(dataframe)
def exposed_ping(self):
return "pong"
class TestServicePickle(unittest.TestCase):
"""Issues #323 and #329 showed for large objects there is an excessive number of round trips.
This test case should check the interrelations of
+ MAX_IO_CHUNK
+ min twrite
+ occurrence rate of socket timeout for other clients
"""
config = {}
def setUp(self):
self.cfg = {'allow_pickle': True}
self.server = rpyc.utils.server.ThreadedServer(MyService, port=0, protocol_config=self.cfg.copy())
self.server.logger.quiet = False
self.thd = self.server._start_in_thread()
self.conn = rpyc.connect("localhost", self.server.port, config=self.cfg)
self.conn2 = rpyc.connect("localhost", self.server.port, config=self.cfg)
# globals are made available to timeit, prepare them
cfg_tests.timeit['conn'] = self.conn
cfg_tests.timeit['conn2'] = self.conn2
cfg_tests.timeit['df'] = pd.DataFrame(np.random.rand(DF_ROWS, DF_COLS))
def tearDown(self):
self.conn.close()
self.server.close()
self.thd.join()
cfg_tests.timeit.clear()
def test_dataframe_pickling(self):
# the proxy will sync w/ the pickle handle and default proto and provide this as the argument to pickle.load
# By timing how long w/ out any round trips pickle.dumps and picke.loads takes, the overhead of RPyC protocol
# can be found
rpyc.core.channel.Channel.COMPRESSION_LEVEL = 1
#rpyc.core.stream.SocketStream.MAX_IO_CHUNK = 65355 * 10
level = rpyc.core.channel.Channel.COMPRESSION_LEVEL
max_chunk = rpyc.core.stream.SocketStream.MAX_IO_CHUNK
repeat = 3
number = 1
pickle_stmt = 'pickle.loads(pickle.dumps(cfg_tests.timeit["df"]))'
write_stmt = 'rpyc.lib.spawn(cfg_tests.timeit["conn"].root.write_data, cfg_tests.timeit["df"]); [cfg_tests.timeit["conn2"].root.ping() for i in range(30)]'
t = timeit.Timer(pickle_stmt, globals=globals())
tpickle = min(t.repeat(repeat, number))
t = timeit.Timer(write_stmt, globals=globals())
twrite = min(t.repeat(repeat, number))
headers = ['sample', 'tpickle', 'twrite', 'bytes', 'level', 'max_chunk'] # noqa
data = [repeat, tpickle, twrite, sys.getsizeof(cfg_tests.timeit['df']), level, max_chunk]
data = [str(d) for d in data]
print(','.join(headers), file=open('/tmp/time.csv', 'a'))
print(','.join(data), file=open('/tmp/time.csv', 'a'))
if __name__ == "__main__":
unittest.main() |
For now, the improvements made should be sufficient to close this issue. Other optimizations aren't specific to this issue. |
* Added warning to _remote_tb when the major version of local and remote mismatch (tomerfiliba-org#332) * Added `include_local_version` to DEFAULT_CONFIG to allow for configurable security controls (e.g. `include_local_traceback`) * Update readme.txt * Added break to client process loop when everything is dead * Increased chunk size to improve multi-client response time and throughput of large data tomerfiliba-org#329 * Improved test for response of client 1 while transferring a large amount of data to client 2 * Cleaned up coding style of test_service_pickle.py * Updated issue template * added vs code testing cfgs; updated gitignore venv * Changed settings.json to use env USERNAME * Name pack casted in _unbox to fix IronPython bug. Fixed tomerfiliba-org#337 * Fixed netref.class_factory id_pack usage per tomerfiliba-org#339 and added test cases * Added .readthedocs.yml and requirements to build * Make OneShotServer terminates after client connection ends * Added unit test for OneShotServer. Fixed tomerfiliba-org#343 * Fixed 2.6 backwards incompatibility for format syntax * Updated change log and bumped version --- 4.1.1 * Added support for chained connections which result in netref being passed to get_id_pack. Fixed tomerfiliba-org#346 * Added tests for get_id_pack * Added a test for issue tomerfiliba-org#346 * Corrected the connection used to inspect a netref * Refactored __cmp__ getattr * Extended rpyc over rpyc unit testing and removed port parameter from TestRestricted * Added comment explaining the inspect for intermediate proxy. Fixed tomerfiliba-org#346 * Improved docstring for serve_threaded to address when and when not to use the method. Done tomerfiliba-org#345 * Release 4.1.2 * Fixed versions referred to in security.rst * link docs instead of mitre * set up logging with a better formatter * fix bug when proxy context-manager is being exited with an exception (#1) * logging: add a rotating file log handler * fix bug when proxy context-manager is being exited with an exception (#1) * logging: add a rotating file log handler
Environment
Minimal example
Server:
Client:
but it works fine:
passed too:
The text was updated successfully, but these errors were encountered: