-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deal with large CamTracker log files #107
Labels
Milestone
Comments
dostuffthatmatters
changed the title
What to do about large CamTracker logs files
Deal with large CamTracker log files
Aug 27, 2022
dostuffthatmatters
added
Update: Performance Improvements
and removed
backlog
not included in a planned update yet
labels
Sep 9, 2022
dostuffthatmatters
added
status:planned
is assigned to a specific milestone
type:feature
scope:camtracker
and removed
update: performance improvements
labels
Nov 13, 2022
dostuffthatmatters
added
backlog
not included in a planned update yet
and removed
status:planned
is assigned to a specific milestone
labels
Feb 7, 2023
dostuffthatmatters
added
scope:camtracker
and removed
type:feature
scope:camtracker
labels
Jul 10, 2023
dostuffthatmatters
modified the milestones:
4.X.X - Performance Improvements,
4.1.0 - Upload & Other Improvements
Oct 4, 2023
dostuffthatmatters
added
status:in-progress
is being work on in some dev branch
and removed
backlog
not included in a planned update yet
labels
Oct 4, 2023
I will use the code from the link mentioned. On my machine with a file of 256 MB, with no break between reads, this new way of reading these files is approximately 4000 times as fast as the old way. With breaks this number could be even higher because of cache evictions. Apart from the time, we obviously save a ton of read bandwidth. import os
def read_last_line_stupid() -> str:
"""Reads the last non empty line of a file"""
with open("a.txt", "r") as f:
return f.read().strip("\n ").split("\n")[-1]
def read_last_line_smart() -> str:
"""Reads the last non empty line of a file"""
with open("a.txt", "rb") as f:
try:
f.seek(-64, os.SEEK_END)
while True:
currently_last_block = f.read(64).strip(b"\n ")
f.seek(-128, os.SEEK_CUR)
if len(currently_last_block) > 0:
break
while b'\n' not in f.read(64):
f.seek(-128, os.SEEK_CUR)
except OSError:
# catch OSError in case of a one line file
f.seek(0)
return f.read().decode().strip("\n").split("\n")[-1] |
dostuffthatmatters
added a commit
that referenced
this issue
Oct 4, 2023
Add function to read the last line efficiently
dostuffthatmatters
added
status:implemented
has been implemented in some dev branch
and removed
status:in-progress
is being work on in some dev branch
labels
Oct 4, 2023
dostuffthatmatters
added a commit
that referenced
this issue
Oct 9, 2023
Improve function to read last line
Script to test this: import os
import time
with open("c.txt", "w") as f:
f.write("abcdefg")
with open("c.txt", "rb") as f:
f.seek(-1, os.SEEK_END)
try:
f.seek(-20, os.SEEK_CUR)
except OSError:
f.seek(0)
print(f.read(1))
# exit(0)
with open("a.txt", "w") as f:
f.write("a" * 63 + "\n")
f.write("b" * 63 + "\n")
f.write(("b" * 63 + "\n") * pow(2, 22))
f.write("d" * 63 + "\n")
f.write("e" * 63 + "\n")
import os
def read_last_line_stupid() -> str:
"""Reads the last non empty line of a file"""
with open("a.txt", "r") as f:
return f.read().strip("\n ").split("\n")[-1]
def read_last_line_smart(ignore_trailing_whitespace: bool = True) -> str:
"""Reads the last non empty line of a file"""
last_line: bytes = b""
new_character: bytes = b""
with open("a.txt", "rb") as f:
f.seek(-1, os.SEEK_END)
if ignore_trailing_whitespace:
while f.read(1) in [b"\n", b" "]:
try:
f.seek(-2, os.SEEK_CUR)
except OSError:
# reached the beginning of the file
return ""
f.seek(-1, os.SEEK_CUR)
# now the cursor is right before the last
# character that is not a newline or a space
while True:
new_character = f.read(1)
if new_character == b"\n":
break
last_line += new_character
f.seek(-2, os.SEEK_CUR)
return last_line.decode().strip()[::-1]
sum_stupid: float = 0
sum_smart: float = 0
for _ in range(10):
t1 = time.time()
a = read_last_line_stupid()
t2 = time.time()
b = read_last_line_smart()
t3 = time.time()
sum_stupid += t2 - t1
sum_smart += t3 - t2
assert a == b, f'"{a}" (a) != "{b}" (b)'
print(f"Stupid: {sum_stupid}")
print(f"Smart: {sum_smart}")
print(f"Speedup: x {sum_stupid / sum_smart:2f}") |
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Right now, the CamTracker log files can get pretty big. On Md I just saw a size of > 560 MB for the
LEARN_Az_Elev.dat
-file and 225 MB for theSunIntensity.dat
-file.The
LEARN_Az_Elev.dat
-file uses about 65 Bytes per log line and logs at a maximum rate of 0.2 lines per second. 16 hours of operation will produce ~ 11500 log lines and ~ 750KB respectively. Hence, these 560MB are logs from at least 2+ years of operation.Should we deal with that?
A possible fix could be to empty these log files (except for the header) on the beginning of each day (when CamTracker is not running).
In addition to emptying the file periodically, we can seek the last line directly instead of reading the whole file. See https://stackoverflow.com/a/54278929/8255842.
The text was updated successfully, but these errors were encountered: