Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset file size #587

Open
jay-m-dev opened this issue Jun 27, 2023 · 1 comment
Open

Dataset file size #587

jay-m-dev opened this issue Jun 27, 2023 · 1 comment
Assignees

Comments

@jay-m-dev
Copy link
Contributor

Loading large datasets (about 150MB) fails at the get_metafeatures step with an unknown error, this error is raised when uploading on both the GUI and the data/datasets/user directory.

alirogpt-lab-1 | 0|lab | child process exited with code null
alirogpt-lab-1 | 0|lab | Error, pythonProcessAsync process exited with status undefined, args: 'ai/metalearning/get_metafeatures.py,649b30d19a7e2b0140231744,-target,target,-identifier_type,fileid,-prediction_type,classification', stderr: 'null', stdout: 'null'
alirogpt-lab-1 | 0|lab | Error: Error, pythonProcessAsync process exited with status undefined, args: 'ai/metalearning/get_metafeatures.py,649b30d19a7e2b0140231744,-target,target,-identifier_type,fileid,-prediction_type,classification', stderr: 'null', stdout: 'null'
alirogpt-lab-1 | 0|lab | at ChildProcess. (/appsrc/lab/pyutils.js:200:11)
alirogpt-lab-1 | 0|lab | at ChildProcess.emit (node:events:513:28)
alirogpt-lab-1 | 0|lab | at maybeClose (node:internal/child_process:1100:16)
alirogpt-lab-1 | 0|lab | at Socket. (node:internal/child_process:458:11)
alirogpt-lab-1 | 0|lab | at Socket.emit (node:events:513:28)
alirogpt-lab-1 | 0|lab | at Pipe. (node:net:301:12)
alirogpt-lab-1 | 1|ai | ai: INFO: 2023 06:57:46 PM UTC: checking results...
alirogpt-lab-1 | PM2 | App [lab:0] exited with code [1] via signal [SIGINT]
alirogpt-lab-1 | PM2 | App [lab:0] starting in -fork mode-
alirogpt-lab-1 | 1|ai | api_utils: ERROR: Unexpected error in LabApi.__request for path 'POST:http://lab:5080/api/experiments':<class 'requests.exceptions.ConnectionError'>
alirogpt-lab-1 | 1|ai | ai: ERROR: Unhanded exception caught: <class 'requests.exceptions.ConnectionError'>
alirogpt-lab-1 | 1|ai | ai: INFO: Shutting down AI engine...
alirogpt-lab-1 | 1|ai | ai: INFO: ...Shutting down Request Manager...
alirogpt-lab-1 | 1|ai | ai: INFO: Goodbye
alirogpt-lab-1 | 1|ai | Traceback (most recent call last):
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 710, in urlopen
alirogpt-lab-1 | 1|ai | chunked=chunked,
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 449, in _make_request
alirogpt-lab-1 | 1|ai | six.raise_from(e, None)
alirogpt-lab-1 | 1|ai | File "", line 3, in raise_from
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 444, in _make_request
alirogpt-lab-1 | 1|ai | httplib_response = conn.getresponse()
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/http/client.py", line 1373, in getresponse
alirogpt-lab-1 | 1|ai | response.begin()
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/http/client.py", line 319, in begin
alirogpt-lab-1 | 1|ai | version, status, reason = self._read_status()
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/http/client.py", line 280, in _read_status
alirogpt-lab-1 | 1|ai | line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/socket.py", line 589, in readinto
alirogpt-lab-1 | 1|ai | return self._sock.recv_into(b)
alirogpt-lab-1 | 1|ai | ConnectionResetError: [Errno 104] Connection reset by peer
alirogpt-lab-1 | 1|ai | During handling of the above exception, another exception occurred:
alirogpt-lab-1 | 1|ai | Traceback (most recent call last):
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
alirogpt-lab-1 | 1|ai | timeout=timeout
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 788, in urlopen
alirogpt-lab-1 | 1|ai | method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 550, in increment
alirogpt-lab-1 | 1|ai | raise six.reraise(type(error), error, _stacktrace)
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 769, in reraise
alirogpt-lab-1 | 1|ai | raise value.with_traceback(tb)
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 710, in urlopen
alirogpt-lab-1 | 1|ai | chunked=chunked,
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 449, in _make_request
alirogpt-lab-1 | 1|ai | six.raise_from(e, None)
alirogpt-lab-1 | 1|ai | File "", line 3, in raise_from
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 444, in _make_request
alirogpt-lab-1 | 1|ai | httplib_response = conn.getresponse()
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/http/client.py", line 1373, in getresponse
alirogpt-lab-1 | 1|ai | response.begin()
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/http/client.py", line 319, in begin
alirogpt-lab-1 | 1|ai | version, status, reason = self._read_status()
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/http/client.py", line 280, in _read_status
alirogpt-lab-1 | 1|ai | line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/socket.py", line 589, in readinto
alirogpt-lab-1 | 1|ai | return self._sock.recv_into(b)
alirogpt-lab-1 | 1|ai | urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
alirogpt-lab-1 | 1|ai | During handling of the above exception, another exception occurred:
alirogpt-lab-1 | 1|ai | Traceback (most recent call last):
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
alirogpt-lab-1 | 1|ai | "main", mod_spec)
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
alirogpt-lab-1 | 1|ai | exec(code, run_globals)
alirogpt-lab-1 | 1|ai | File "/appsrc/ai/ai.py", line 661, in
alirogpt-lab-1 | 1|ai | main()
alirogpt-lab-1 | 1|ai | File "/appsrc/ai/ai.py", line 636, in main
alirogpt-lab-1 | 1|ai | if pennai.check_results():
alirogpt-lab-1 | 1|ai | File "/appsrc/ai/ai.py", line 366, in check_results
alirogpt-lab-1 | 1|ai | last_update=self.last_update)
alirogpt-lab-1 | 1|ai | File "/appsrc/ai/api_utils.py", line 236, in get_new_experiments_as_dataframe
alirogpt-lab-1 | 1|ai | data = self.get_new_experiments(last_update)
alirogpt-lab-1 | 1|ai | File "/appsrc/ai/api_utils.py", line 221, in get_new_experiments
alirogpt-lab-1 | 1|ai | res = self.__request(path=self.exp_path, payload=payload)
alirogpt-lab-1 | 1|ai | File "/appsrc/ai/api_utils.py", line 482, in __request
alirogpt-lab-1 | 1|ai | headers=headers)
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/requests/api.py", line 61, in request
alirogpt-lab-1 | 1|ai | return session.request(method=method, url=url, **kwargs)
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 542, in request
alirogpt-lab-1 | 1|ai | resp = self.send(prep, **send_kwargs)
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 655, in send
alirogpt-lab-1 | 1|ai | r = adapter.send(request, **kwargs)
alirogpt-lab-1 | 1|ai | File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 498, in send
alirogpt-lab-1 | 1|ai | raise ConnectionError(err, request=request)
alirogpt-lab-1 | 1|ai | requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

@jay-m-dev
Copy link
Contributor Author

jay-m-dev commented Jun 27, 2023

The largest file I've been able to upload successfully via the GUI is 144MB.
Using the data/datasets/user directory I've been able to upload a 175MB file.
One thing I noted in the error above is that for the get_metafeatures script, the -identifier_type parameter is set to fileid, even when uploading directly from the data/datasets/user directory. The option to use filepath exists but it looks like it's not used. Perhaps making use of the filepath would allow large datasets to be uploaded directly.

@jay-m-dev jay-m-dev self-assigned this Jun 27, 2023
@jay-m-dev jay-m-dev changed the title Dataset file size (Medium) Dataset file size Jun 27, 2023
@jay-m-dev jay-m-dev changed the title (Medium) Dataset file size [Medium] Dataset file size Jun 27, 2023
@jay-m-dev jay-m-dev changed the title [Medium] Dataset file size Dataset file size Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant