Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support websockets and redirect python stdout to the socket #56

Closed
wants to merge 9 commits into from

Conversation

josephjclark
Copy link
Collaborator

I've opened this as a separate PR to the main server development stuff.

The main thread works fine, but when you call out to it from the CLI you don't see any of the internal python logging (see #51). Not the end of the world but it does make for a pretty poor UX when calling the server from our lovely CLI.

The plan then is to use a websocket instead of a regular post to the service, and have the websocket stream results down. So the server now supports a HTTP or WS interface and you can connect to any service with either one. This all works great.

The Difficulty

The problem is that python's stdout, although it'll be displayed in the server's log stream, doesn't appear in bun's stdout. So when python says print(), it goes to a different stream than buns stdout stream. Which means bun cannot trap it to send into the websocket.

This is a bit if a bummer. Without redirection support from node-calls-python, it means we have to use a child process to get the stdout. And even if we did that we might have to run some filters on the stdout, or find a way to make sure we're capturing the logs for the right job.

Solution

My solution, which is pretty nasty, is to redirect python logs to file and read that file back into node.

It goes like this:

  • I provide a utility API to get a logger
  • the logger will log both to a file and to stdout
  • Each "run" of a service will be given a unique path to log to
  • bun will listen for writes to the log file, pull out the latest, and launch it out of the websocket
  • When the job is done, it writes a special log line to tell bun that no more logs are coming
  • Bun will then resolve the promise and return the result (AFTER all the logs have run)
  • The log file is then removed

The worst bit of the solution is that the only way I've found to read the log back into node is this:

const child = spawn("tail", ["-f", logfile]);
  const rl = readline.createInterface({
    input: child.stdout
  });
  rl.on("line", (line) => {
    if (line === "***END***") {
    // ....
  });

Yep, I'm spawning a child process anyway.

Failed Attempts

The solution is probably the dumbest and worst solution I could fine. But it kinda works.

Here are other things I tried:

  • node-calls-python has no means of sending an IPC message between the python thread and the main thread. I suspect there's a way to do this but I don't think the maintainer has any interest and I do not have the knowledge.
  • I couldn't really work out any other way to send a message from the python process to the main node process
  • I did not try to emit the logs via HTTP. If the file logger is dumb and overcomplicated, then this seems dumber and overcomplicateder.
  • Bun and node both have file streaming APIs, but neither is designed for live/continuous reading. They'll stop streaming when they hit the end of the file.
  • Bun has a beautiful shell API, and instead of using node's child process (which remember is the very thing I'm trying to avoid with node-calls-python), I could do $`tail -f file.log` . But that also has no streaming interface - you have to await it, and of course tail -f won't return the promise.

Bit frustrating really.

Issues

Well, if this works we can close:

#55 #51

Further Work

We could probably release this when done... but I think there is a high chance that logs will interfere if two services happen to be running concurrently.

But I'd rather add either (a) a very simple blocking queue or (b) plug in workerpool and ensure each python context only runs one thing at a time.

I don't think b) is a huge amount of work and probably would need doing anyway at some stage.

@josephjclark josephjclark changed the base branch from main to bun-server May 9, 2024 17:26
@josephjclark
Copy link
Collaborator Author

We need to ensure that the logs from python are visible in the docker stdout. I think. And if not work out where/how to redirect them

@josephjclark josephjclark changed the base branch from bun-server to main May 15, 2024 14:05
@josephjclark
Copy link
Collaborator Author

This has got me thinking AGAIN about the long running python idea.

Right now, every python module creates a logger when it is inited:

logger = createLogger("code_generator.prompts")

This is then shared through the module scope through the module's lifetime.

This means that multiple calls to the same service will use the same logger instance. Unless the logger is re-initialised in main and passed through to each function. Would it get properly GC'd after the run or will references remain? Anyway - this feels like an awkward pattern and just the kind of constraint I don't want to put on devs.

As a result of this PR, that's a real problem, because each logger needs to be re-intialised to write to a different file. Ok, maybe main can do logger.initialize() and we can update the output handler, but this does feel kinda messy.

It's just the sort of problem that doesn't exist if we just spawned and destroyed the python process each time.

I could code a solution around this. But I'm realising that this approach makes contributors have to think about their python in a certain sort of way. What code needs to be re-initialized each time main is called? What secrets could potentially be held in memory, or maybe run stale? I just don't want our contributors worrying about these heavy engineering things.

So that might be it. I am 95% sure we're removing node-calls-python and just running a child process every time.

The websocket streaming stuff needs to stay, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

1 participant