Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal with large CamTracker log files #107

Closed
dostuffthatmatters opened this issue Aug 27, 2022 · 2 comments
Closed

Deal with large CamTracker log files #107

dostuffthatmatters opened this issue Aug 27, 2022 · 2 comments
Assignees
Labels
scope:camtracker status:implemented has been implemented in some dev branch

Comments

@dostuffthatmatters
Copy link
Member

dostuffthatmatters commented Aug 27, 2022

Right now, the CamTracker log files can get pretty big. On Md I just saw a size of > 560 MB for the LEARN_Az_Elev.dat-file and 225 MB for the SunIntensity.dat-file.

The LEARN_Az_Elev.dat-file uses about 65 Bytes per log line and logs at a maximum rate of 0.2 lines per second. 16 hours of operation will produce ~ 11500 log lines and ~ 750KB respectively. Hence, these 560MB are logs from at least 2+ years of operation.

Should we deal with that?

A possible fix could be to empty these log files (except for the header) on the beginning of each day (when CamTracker is not running).

In addition to emptying the file periodically, we can seek the last line directly instead of reading the whole file. See https://stackoverflow.com/a/54278929/8255842.

@dostuffthatmatters dostuffthatmatters added the backlog not included in a planned update yet label Aug 27, 2022
@dostuffthatmatters dostuffthatmatters changed the title What to do about large CamTracker logs files Deal with large CamTracker log files Aug 27, 2022
@dostuffthatmatters dostuffthatmatters added Update: Performance Improvements and removed backlog not included in a planned update yet labels Sep 9, 2022
@dostuffthatmatters dostuffthatmatters added this to the 4.0.TBD - Performance Improvements milestone Nov 13, 2022
@dostuffthatmatters dostuffthatmatters added backlog not included in a planned update yet and removed status:planned is assigned to a specific milestone labels Feb 7, 2023
@dostuffthatmatters dostuffthatmatters modified the milestones: 4.X.X - Performance Improvements, 4.1.0 - Upload & Other Improvements Oct 4, 2023
@dostuffthatmatters dostuffthatmatters added status:in-progress is being work on in some dev branch and removed backlog not included in a planned update yet labels Oct 4, 2023
@dostuffthatmatters
Copy link
Member Author

dostuffthatmatters commented Oct 4, 2023

I will use the code from the link mentioned.

On my machine with a file of 256 MB, with no break between reads, this new way of reading these files is approximately 4000 times as fast as the old way. With breaks this number could be even higher because of cache evictions. Apart from the time, we obviously save a ton of read bandwidth.

import os

def read_last_line_stupid() -> str:
    """Reads the last non empty line of a file"""

    with open("a.txt", "r") as f:
        return f.read().strip("\n ").split("\n")[-1]


def read_last_line_smart() -> str:
    """Reads the last non empty line of a file"""

    with open("a.txt", "rb") as f:
        try:
            f.seek(-64, os.SEEK_END)
            while True:
                currently_last_block = f.read(64).strip(b"\n ")
                f.seek(-128, os.SEEK_CUR)
                if len(currently_last_block) > 0:
                    break
            while b'\n' not in f.read(64):
                f.seek(-128, os.SEEK_CUR)
        except OSError:
            # catch OSError in case of a one line file
            f.seek(0)
        return f.read().decode().strip("\n").split("\n")[-1]

dostuffthatmatters added a commit that referenced this issue Oct 4, 2023
Add function to read the last line efficiently
dostuffthatmatters added a commit that referenced this issue Oct 4, 2023
@dostuffthatmatters dostuffthatmatters added status:implemented has been implemented in some dev branch and removed status:in-progress is being work on in some dev branch labels Oct 4, 2023
dostuffthatmatters added a commit that referenced this issue Oct 9, 2023
Improve function to read last line
@dostuffthatmatters
Copy link
Member Author

Script to test this:

import os
import time

with open("c.txt", "w") as f:
    f.write("abcdefg")

with open("c.txt", "rb") as f:
    f.seek(-1, os.SEEK_END)
    try:
        f.seek(-20, os.SEEK_CUR)
    except OSError:
        f.seek(0)
    print(f.read(1))

# exit(0)

with open("a.txt", "w") as f:
    f.write("a" * 63 + "\n")
    f.write("b" * 63 + "\n")
    f.write(("b" * 63 + "\n") * pow(2, 22))
    f.write("d" * 63 + "\n")
    f.write("e" * 63 + "\n")

import os


def read_last_line_stupid() -> str:
    """Reads the last non empty line of a file"""

    with open("a.txt", "r") as f:
        return f.read().strip("\n ").split("\n")[-1]


def read_last_line_smart(ignore_trailing_whitespace: bool = True) -> str:
    """Reads the last non empty line of a file"""

    last_line: bytes = b""
    new_character: bytes = b""

    with open("a.txt", "rb") as f:
        f.seek(-1, os.SEEK_END)

        if ignore_trailing_whitespace:
            while f.read(1) in [b"\n", b" "]:
                try:
                    f.seek(-2, os.SEEK_CUR)
                except OSError:
                    # reached the beginning of the file
                    return ""

            f.seek(-1, os.SEEK_CUR)
            # now the cursor is right before the last
            # character that is not a newline or a space

        while True:
            new_character = f.read(1)
            if new_character == b"\n":
                break
            last_line += new_character
            f.seek(-2, os.SEEK_CUR)

    return last_line.decode().strip()[::-1]


sum_stupid: float = 0
sum_smart: float = 0

for _ in range(10):
    t1 = time.time()
    a = read_last_line_stupid()
    t2 = time.time()
    b = read_last_line_smart()
    t3 = time.time()

    sum_stupid += t2 - t1
    sum_smart += t3 - t2

    assert a == b, f'"{a}" (a) != "{b}" (b)'

print(f"Stupid: {sum_stupid}")
print(f"Smart: {sum_smart}")
print(f"Speedup: x {sum_stupid / sum_smart:2f}")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scope:camtracker status:implemented has been implemented in some dev branch
Projects
None yet
Development

No branches or pull requests

1 participant