Update disk completion status when files are discarded #52

smclay · 2021-07-05T07:15:43Z

We have some test NGAS servers. They have a limited disk capacity. We use them intensively for testing other software applications. In order to prevent the disks from being completely filled up I have a utility script that runs once per weeks and discards all files more than one day old. However, the tests have been more intensive than normal and the disks were filled to capacity. This highlighted an issue.

When the disks reach capacity they are marked as completed. My utility script then discarded old files freeing up lots of disk space. However, NGAS does not automatically update the completed status. I have to manually edit the database table. I think it would be very useful if NGAS was smart enough to check and update the completion status when files are discarded. Perhaps the janitor or data check thread could monitor the completion status.

awicenec · 2021-07-05T07:40:40Z

Hm, that does not sit very well with what NGAS’ original scope was. Nevertheless, as far as I’ve understood the utility script is actually using NGAS DISCARD commands to remove those files. If that is the case I agree that we should add the automatic reset of the ‘completed’ status. If instead the script is simply removing files from the disk, we should not do that. I don’t think it will be hard to do implement that as part of the DISCARD command implementation in ngamsServer/commands/discard.py. The other, fully supported option would be to recycle the disks, but that might not be what you want. We have some test NGAS servers. They have a limited disk capacity. We use them intensively for testing other software applications. In order to prevent the disks from being completely filled up I have a utility script that runs once per weeks and discards all files more than one day old. However, the tests have been more intensive than normal and the disks were filled to capacity. This highlighted an issue. When the disks reach capacity they are marked as completed. My utility script then discarded old files freeing up lots of disk space. However, NGAS does not automatically update the completed status. I have to manually edit the data table. I think it would be very useful if NGAS was smart enough to check and update the completion status when files are discarded. Perhaps the janitor or data check thread could monitor the completion status. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

rtobar · 2021-07-06T07:36:20Z

The janitor thread actually already checks the disk space:

ngas/src/ngamsServer/ngamsServer/janitor/disk_space_checker.py

Lines 32 to 50 in 3ffd060

    
           def run(srvObj, stopEvt): 
        
               """ 
        
               Check if there is enough disk space for the various 
        
               directories defined. 
        
               """ 
        
               try: 
        
                   srvObj.checkDiskSpaceSat() 
        
               except Exception: 
        
                   logger.exception("Not enough disk space, bringing the system Offline") 
        
                   # Connect to the server and send an OFFLINE command 
        
                   host, port = srvObj.get_self_endpoint() 
        
                   auth = None 
        
                   if srvObj.getCfg().getAuthorize(): 
        
                       auth = srvObj.getCfg().getAuthHttpHdrVal(NGAMS_HTTP_INT_AUTH_USER) 
        
                   client = ngamsPClient.ngamsPClient(host, port, timeout=30, auth=auth) 
        
                   client.offline(force=True)

But the only thing it does is bring the server to Offline state if the disk is full, there's no flag clearing. Apart from the options that Andreas mentioned, I'd say adding some extra logic wouldn't hurt. If we wanted to be conservative we could have a new option to enable the behavior.

smclay · 2021-07-06T07:58:21Z

@awicenec I am using the NGAS DISCARD command for removing old files and retrieving disk space. Therefore making changes in the DISCARD command would work for us.
@rtobar adding an extra config property to enable new functionality in the janitor to update the completion status of disks would also work for us.
Basically either solution would be great. Anything that avoids manually poking in the database would help us a lot.

awicenec · 2021-07-06T08:49:37Z

Using only the janitor thread function to reset the flag is dangerous, since it is simply using a OS function to check how much space is free on a volume. I know we had been (mis-)using the min_free setting in the config file to limit the amount of data NGAS would write to a volume in those cases when that volume was actually shared between different applications. If we would start re-setting the flag using the janitor it would obviously conflict with that kind of setting. As a minimum we would need to check also the min_free setting before doing that. As an alternative I think the logic could be implemented in the DISCARD command itself, essentially doing the opposite of what is done during ARCHIVE. See ngamsArchiveUtils.checkDiskSpace. That also points to another potential issue, which regards the main and replication disks. I know that is not used too much anymore, but the functionality is still there. The completed flag is actually set for both disks, if either one reaches the limit, depending on the synch setting in the config file, thus we would need to revert it for both as well to make that consistent again. The janitor thread actually already checks the disk space: https://github.com/ICRAR/ngas/blob/3ffd06050ce14728c938d3109e44b75f66a29c98/src/ngamsServer/ngamsServer/janitor/disk_space_checker.py#L32-L50 But the only thing it does is bring the server to Offline state if the disk is full, there's no flag clearing. Apart from the options that Andreas mentioned, I'd say adding some extra logic wouldn't hurt. If we wanted to be conservative we could have a new option to enable the behavior. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

awicenec · 2021-07-06T09:08:21Z

OK, that’s good. One more small thing to consider is that the Janitor is not immediate and could in fact be switched off or set to run only once a day. Default is 10 minutes. In any case you would only see the disk being available again after the configured period maximum. @awicenec I am using the NGAS DISCARD command for removing old files and retrieving disk space. Therefore making changes in the DISCARD command would work for us. @rtobar adding an extra config property to enable new functionality in the janitor to update the completion status of disks would also work for us. Basically either solution would be great. Anything that avoids manually poking in the database would help us a lot. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update disk completion status when files are discarded #52

Update disk completion status when files are discarded #52

smclay commented Jul 5, 2021 •

edited

Loading

awicenec commented Jul 5, 2021 via email

rtobar commented Jul 6, 2021

smclay commented Jul 6, 2021

awicenec commented Jul 6, 2021 via email

awicenec commented Jul 6, 2021 via email

Update disk completion status when files are discarded #52

Update disk completion status when files are discarded #52

Comments

smclay commented Jul 5, 2021 • edited Loading

awicenec commented Jul 5, 2021 via email

rtobar commented Jul 6, 2021

smclay commented Jul 6, 2021

awicenec commented Jul 6, 2021 via email

awicenec commented Jul 6, 2021 via email

smclay commented Jul 5, 2021 •

edited

Loading