Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iroh start & stop commands #360

Merged
merged 30 commits into from
Oct 21, 2022
Merged

iroh start & stop commands #360

merged 30 commits into from
Oct 21, 2022

Conversation

b5
Copy link
Member

@b5 b5 commented Oct 18, 2022

supersedes #340. closes n0-computer/beetle#166. closes n0-computer/beetle#158, closes n0-computer/beetle#211

re-re-re worked start & stop commands that effectively do this on start:

#/bin/bash
nohup iroh-store > /dev/null &
nohup iroh-p2p > /dev/null &
nohup iroh-gateway > /dev/null &

all daemons now use lockfiles, and all lockfiles contain the PID of the owning process. iroh stop uses this PID to send SIGINT to each process. iroh stop also makes a best effort to remove stray lockfiles.

TODO:

  • docs strings
  • windows support (in a follow-up PR)
  • smoke test on linux
  • tests, really of any kind
  • confirm this isn't breaking android builds

cc @fabricedesre, this modifies lockfiles, writing the PID to the lockfile itself. I'd like to confirm you're ok with these changes. I'd also like to propose we exit with a non-zero status when a new process can't acquire the lock.

@b5 b5 added the c-ctl label Oct 18, 2022
@b5 b5 self-assigned this Oct 18, 2022
@b5 b5 added this to the v0.1.0 milestone Oct 18, 2022
iroh-gateway/src/main.rs Outdated Show resolved Hide resolved
/// Attempt to remove a stray lock file that wasn't cleaned up, returns true
/// if a lock is successfully deleted, and will only attempt to delete if the
/// lock is not currently held
pub fn try_cleanup_dead_lock(prog_name: &str) -> Result<bool> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a test, and I need to review it when I'm not braindead, please scrutinize.

iroh-util/src/lock.rs Outdated Show resolved Hide resolved
iroh-util/src/lock.rs Outdated Show resolved Hide resolved
localops/Cargo.toml Outdated Show resolved Hide resolved
@fabricedesre
Copy link
Contributor

cc @fabricedesre, this modifies lockfiles, writing the PID to the lockfile itself. I'd like to confirm you're ok with these changes. I'd also like to propose we exit with a non-zero status when a new process can't acquire the lock.

Sure, that works for me. I'm not a fan of the binary pid format, but not a hill I want to die on either. Exiting with a non-zero status totally makes sense.

@b5 b5 marked this pull request as ready for review October 18, 2022 16:56
@b5
Copy link
Member Author

b5 commented Oct 18, 2022

@fabricedesre, sounds good. I've changed the format to be a serialized UFT-8 string representation.

@b5 b5 requested review from dignifiedquire and faassen and removed request for dignifiedquire October 18, 2022 16:57
@b5
Copy link
Member Author

b5 commented Oct 18, 2022

@faassen I'd appreciate you smoke testing this on linux, iirc you have a fedora distro lying around, right? I'll also be testing on ubuntu

@@ -0,0 +1,23 @@
//! iroh exit codes
Copy link
Member Author

@b5 b5 Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also move this into an errors package. I've just crammed it here for now, and don't think it's a big deal to move later once we start getting more rigorous about error categorization. Happy to move it now if someone wants to name the crate this belongs in

@faassen
Copy link
Contributor

faassen commented Oct 18, 2022

@faassen I'd appreciate you smoke testing this on linux, iirc you have a fedora distro lying around, right?

Yes, Fedora is my main platform so will do some testing tomorrow. In fact this command will be handy when I do manual testing with the CLI.

@b5
Copy link
Member Author

b5 commented Oct 18, 2022

note that it requires iroh binaries on $PATH. cargo xtask install-dev --build from project root should set you up

@ramfox
Copy link
Contributor

ramfox commented Oct 18, 2022

Finding two bugs:

  1. If you start up one of the binaries individually, the respective lock is not getting cleaned up on shutdown.
  2. This ignores cfg input, so you can't currently have more than one process running on a single machine. Which puts a big damper on local testing. (previously, you could just pass in a path to a custom config file & have the different binaries use different configuration to ensure no port/rpc conflicts & have multiple iroh setups running at the same time)

The second problem is a way bigger conversation: do we want folks to be able to run more than one setup on their machine, if so how do we want them to do it, where would non-default locks live, etc

@@ -0,0 +1,15 @@
[package]
name = "localops"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not private, should be iroh-localops for now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we for now just put this functionality into iroh-util?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd heavily prefer iroh-localops, even just for the amount of conditional compilation that needs to go into this crate. Most of this work is about vetting dependencies for cross-platform compatibility

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does that change when putting it in utils? we already have a lot of crates :/

}

#[cfg(any(target_os = "macos", target_os = "linux"))]
fn stop_process(pid: u32) -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make all apis that use u32s currently, use a new type Pid, that looks sth like this

#[cfg(unix)]
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct Pid(nix::Pid);
#[cfg(not(unix))]
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct Pid; // TODO: fill in for each plattform when supported

#[cfg(unix)]
impl From nix::Pid for Pid {
 // .. 
}

impl std::fmt::Display for Pid {
 // ..
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've marked this as a TODO for later

//! ```

/// Alias for the numeric type that holds system exit codes.
pub type ExitCode = i32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use existing rust types instead of defining new ones :) https://doc.rust-lang.org/std/process/struct.ExitCode.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From that page:

ExitCode is intended to be consumed only by the standard library (via Termination::report()), and intentionally does not provide accessors like PartialEq, Eq, or Hash. Instead the standard library provides the canonical SUCCESS and FAILURE exit codes as well as From for ExitCode for constructing other arbitrary exit codes.

the definition is irritating:

pub struct ExitCode(imp::ExitCode);

this means you can't define a nonstandard exit code as a constant. I'd argue that std authors are pointing us in the right direction: these are iroh exit codes, and should have their own type.

I'm going to rename the type to IrohExitCode to further disambiguate, and await your clarification on why this isn't the right move 😄

Copy link
Contributor

@ramfox ramfox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mac bug:
iroh start says "starting" and "success", but doesn't seem to create any lock files or launch any processes.

the individual binaries create and remove their lock files appropriately

if I start the individual binaries up, I can use iroh stop to shut them down

iroh stop also seems to have 4 output lines, but should only be reporting about 3 processes.

/// Failure to parse contents of lock file
#[error("Corrupt lock file contents at {0}")]
CorruptLock(PathBuf),
#[error("Cannot detrmine status of process holding this lock")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detrmine -> determine

@b5 b5 requested a review from ramfox October 21, 2022 16:28
Err(anyhow!("daemonizing processes on windows is not supported"))
}

pub fn stop(pid: i32) -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for later: you don't really need this indirection, you can just cfg the public method


self.iter += 1;
if current == None {
self.iter = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it on purpose that this iterator is infinite?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it doesn't seem to be, I am confused about this logic here. what are you trying to do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean yes, but now I realize that that doesn't make sense b/c why would you re-use iterators in Rust. I'll remove this reset entirely

/// is already running.
/// The lock is released either when the object is dropped
/// or when the program stops.
pub fn acquire_or_exit(lock: &mut ProgramLock) -> &mut ProgramLock {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a method on ProgramLock

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused, why does this exist here and down below again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried a few variations, and couldn't get the runtime to hang onto the reference at the call site:

what I wanted:

fn main() {
  let lock = ProgramLock::new("iroh-gateway")?.acquire_or_exit();

which caused the compiler to yell about an unused variable. changing lock -> _lock means the lock is Dropd right away, an makes the lock not work at all

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 forgot to remove this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lock -> _lock means the lock is Dropd right away,

That is not true, this is how drop guards work, they are kept for the current scope. It is only dropped immediately if it is _.

.unwrap_or_else(|| std::ffi::OsStr::new(""))
.to_str()
.unwrap()
.split('.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to split on . here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.lock extension in the file name. There's a forthcoming method in rust nightly that'll make this much cleaner

if let Ok(file) = File::create(&self.path) {
file_guard::try_lock(&file, Lock::Exclusive, 0, 1).is_err()
// path exists, examine lock PID
let pid = read_lock(&self.path)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a lot of duplication from is_locked

#[test]
fn test_locks() {
use nix::unistd::{fork, ForkResult::*};
use std::io::{Read, Write};
use std::time::Duration;

// Start we no lock file.
// Start with no lock file.
let _ = std::fs::remove_file("test1.lock");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here as well, please use a temp directory

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is more complicated than it seems. I'll make an issue & address in a follow up

/// running.
pub async fn start(api: &impl Api) -> Result<()> {
// start_services(api, HashSet::from(["store"])).await
start_services(api, HashSet::from(["store", "p2p", "gateway"])).await
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should really not be strings but an enum

});

// construct a new set from the intersection of missing & expected services
let missing_services: HashSet<&str> = services
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HashSet::difference is your friend


/// stop all services by sending SIGINT to any active daemons identified
/// by lockfiles
pub async fn stop(api: &impl Api) -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this take a configuration of services as well?

@b5 b5 merged commit 10378af into main Oct 21, 2022
@dignifiedquire dignifiedquire deleted the b5/iroh_start_simple branch February 20, 2023 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

when starting check for other instances iroh start command iroh stop command
5 participants