Deal with large CamTracker log files #107

dostuffthatmatters · 2022-08-27T00:30:25Z

Right now, the CamTracker log files can get pretty big. On Md I just saw a size of > 560 MB for the LEARN_Az_Elev.dat-file and 225 MB for the SunIntensity.dat-file.

The LEARN_Az_Elev.dat-file uses about 65 Bytes per log line and logs at a maximum rate of 0.2 lines per second. 16 hours of operation will produce ~ 11500 log lines and ~ 750KB respectively. Hence, these 560MB are logs from at least 2+ years of operation.

Should we deal with that?

A possible fix could be to empty these log files (except for the header) on the beginning of each day (when CamTracker is not running).

In addition to emptying the file periodically, we can seek the last line directly instead of reading the whole file. See https://stackoverflow.com/a/54278929/8255842.

The text was updated successfully, but these errors were encountered:

dostuffthatmatters · 2023-10-04T21:44:41Z

I will use the code from the link mentioned.

On my machine with a file of 256 MB, with no break between reads, this new way of reading these files is approximately 4000 times as fast as the old way. With breaks this number could be even higher because of cache evictions. Apart from the time, we obviously save a ton of read bandwidth.

import os

def read_last_line_stupid() -> str:
    """Reads the last non empty line of a file"""

    with open("a.txt", "r") as f:
        return f.read().strip("\n ").split("\n")[-1]


def read_last_line_smart() -> str:
    """Reads the last non empty line of a file"""

    with open("a.txt", "rb") as f:
        try:
            f.seek(-64, os.SEEK_END)
            while True:
                currently_last_block = f.read(64).strip(b"\n ")
                f.seek(-128, os.SEEK_CUR)
                if len(currently_last_block) > 0:
                    break
            while b'\n' not in f.read(64):
                f.seek(-128, os.SEEK_CUR)
        except OSError:
            # catch OSError in case of a one line file
            f.seek(0)
        return f.read().decode().strip("\n").split("\n")[-1]

Add function to read the last line efficiently

Use function

Improve function to read last line

dostuffthatmatters · 2023-10-24T12:23:31Z

Script to test this:

import os
import time

with open("c.txt", "w") as f:
    f.write("abcdefg")

with open("c.txt", "rb") as f:
    f.seek(-1, os.SEEK_END)
    try:
        f.seek(-20, os.SEEK_CUR)
    except OSError:
        f.seek(0)
    print(f.read(1))

# exit(0)

with open("a.txt", "w") as f:
    f.write("a" * 63 + "\n")
    f.write("b" * 63 + "\n")
    f.write(("b" * 63 + "\n") * pow(2, 22))
    f.write("d" * 63 + "\n")
    f.write("e" * 63 + "\n")

import os


def read_last_line_stupid() -> str:
    """Reads the last non empty line of a file"""

    with open("a.txt", "r") as f:
        return f.read().strip("\n ").split("\n")[-1]


def read_last_line_smart(ignore_trailing_whitespace: bool = True) -> str:
    """Reads the last non empty line of a file"""

    last_line: bytes = b""
    new_character: bytes = b""

    with open("a.txt", "rb") as f:
        f.seek(-1, os.SEEK_END)

        if ignore_trailing_whitespace:
            while f.read(1) in [b"\n", b" "]:
                try:
                    f.seek(-2, os.SEEK_CUR)
                except OSError:
                    # reached the beginning of the file
                    return ""

            f.seek(-1, os.SEEK_CUR)
            # now the cursor is right before the last
            # character that is not a newline or a space

        while True:
            new_character = f.read(1)
            if new_character == b"\n":
                break
            last_line += new_character
            f.seek(-2, os.SEEK_CUR)

    return last_line.decode().strip()[::-1]


sum_stupid: float = 0
sum_smart: float = 0

for _ in range(10):
    t1 = time.time()
    a = read_last_line_stupid()
    t2 = time.time()
    b = read_last_line_smart()
    t3 = time.time()

    sum_stupid += t2 - t1
    sum_smart += t3 - t2

    assert a == b, f'"{a}" (a) != "{b}" (b)'

print(f"Stupid: {sum_stupid}")
print(f"Smart: {sum_smart}")
print(f"Speedup: x {sum_stupid / sum_smart:2f}")

dostuffthatmatters added the backlog not included in a planned update yet label Aug 27, 2022

dostuffthatmatters changed the title ~~What to do about large CamTracker logs files~~ Deal with large CamTracker log files Aug 27, 2022

dostuffthatmatters added Update: Performance Improvements and removed backlog not included in a planned update yet labels Sep 9, 2022

dostuffthatmatters added this to the 4.0.TBD - Performance Improvements milestone Nov 13, 2022

dostuffthatmatters added status:planned is assigned to a specific milestone type:feature scope:camtracker and removed update: performance improvements labels Nov 13, 2022

dostuffthatmatters added backlog not included in a planned update yet and removed status:planned is assigned to a specific milestone labels Feb 7, 2023

dostuffthatmatters added scope:camtracker and removed type:feature scope:camtracker labels Jul 10, 2023

dostuffthatmatters modified the milestones: 4.X.X - Performance Improvements, 4.1.0 - Upload & Other Improvements Oct 4, 2023

dostuffthatmatters added status:in-progress is being work on in some dev branch and removed backlog not included in a planned update yet labels Oct 4, 2023

dostuffthatmatters added a commit that referenced this issue Oct 4, 2023

#107 - Deal with large CamTracker log files (1)

18f43a1

Add function to read the last line efficiently

dostuffthatmatters added a commit that referenced this issue Oct 4, 2023

#107 - Deal with large CamTracker log files (2)

7c77ca1

Use function

dostuffthatmatters added status:implemented has been implemented in some dev branch and removed status:in-progress is being work on in some dev branch labels Oct 4, 2023

dostuffthatmatters added a commit that referenced this issue Oct 9, 2023

#107 - Deal with large CamTracker log files (3)

b9a09e6

Improve function to read last line

dostuffthatmatters self-assigned this Oct 25, 2023

dostuffthatmatters mentioned this issue Nov 5, 2023

Integration of 4.1.0 #197

Merged

dostuffthatmatters closed this as completed Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deal with large CamTracker log files #107

Deal with large CamTracker log files #107

dostuffthatmatters commented Aug 27, 2022 •

edited

Loading

dostuffthatmatters commented Oct 4, 2023 •

edited

Loading

dostuffthatmatters commented Oct 24, 2023

Deal with large CamTracker log files #107

Deal with large CamTracker log files #107

Comments

dostuffthatmatters commented Aug 27, 2022 • edited Loading

dostuffthatmatters commented Oct 4, 2023 • edited Loading

dostuffthatmatters commented Oct 24, 2023

dostuffthatmatters commented Aug 27, 2022 •

edited

Loading

dostuffthatmatters commented Oct 4, 2023 •

edited

Loading