Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClusterODM is dropping a high number of uploads #92

Closed
FJEANNOT opened this issue Feb 8, 2022 · 4 comments · Fixed by #93
Closed

ClusterODM is dropping a high number of uploads #92

FJEANNOT opened this issue Feb 8, 2022 · 4 comments · Fixed by #93

Comments

@FJEANNOT
Copy link
Contributor

FJEANNOT commented Feb 8, 2022

What is the problem?

Since yesterday my Webodm is constantly failing all tasks after i restarted it. I noticed it pulled a newer image from DockerHub and there are no previous versions available on DockerHub.
After investigating i noticed that ClusterODM is closing a lot of POST http requests on the routes /task/new/upload/<task_id>
The error message displayed in Webodm is sometimes Connection error: HTTPSConnectionPool(host='example.com', port=443): Read timed out. (read timeout=30) and some other times just 502.

Even the smallest jobs are failing, i had this issue with a dataset only containing 5 images.

On the web interface of ClusterODM, i can still launch a task, but during the uploads, i get a lot of messages saying Upload of IMG_NAME.jpg failed, retrying...

After seeing this i made a clean install of my entire stack (Webodm webapp & worker, ClusterODM and one locked NodeODM for the autoscaler) on a totally different infrastructure and had the same exact problem.

What should be the expected behavior?

Uploading the files on WebODM or ClusterODM UI should work

How can we reproduce this? (What steps did you do to trigger the problem? If applicable, please include multiple screenshots of the problem! Be detailed)

Install WebODM and ClusterODM and try to upload files to launch a task.
My current installation is on a Kubernetes cluster hosted on scaleway. I can provide the manifests i'm using if needed.
WebODM version: 1.9.11
ClusterODM version: latest on Dockerhub

@FJEANNOT
Copy link
Contributor Author

FJEANNOT commented Feb 9, 2022

I Forked the projet today and start to troubleshoot.

Removing the HandleClose function seems to be the solution for me. Maybe saveStream.close() or fs.unlink() is taking too long?

ClusterODM/libs/taskNew.js

Lines 170 to 183 in 311dbb0

const handleClose = () => {
if (saveStream){
saveStream.close();
saveStream = null;
}
if (fs.exists(saveTo, exists => {
params.imagesCount--;
fs.unlink(saveTo, err => {
if (err) logger.error(err);
});
}));
};
req.on('close', handleClose);
req.on('abort', handleClose);

@pierotofy
Copy link
Member

I wonder if this is due to the fact that the docker image is based off of node:lts; I remember that there were some breaking changes in NodeJS that would lead to some issues in ClusterODM. Wonder what happens if you simply downgrade the node version to 12 or 14.

@FJEANNOT
Copy link
Contributor Author

FJEANNOT commented Feb 9, 2022

Alright i'm going to try this

@FJEANNOT
Copy link
Contributor Author

FJEANNOT commented Feb 9, 2022

Allright it work like a charm in node 14. I can submit a pull request to close this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants