Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The file backend does not update the runtime config on file change #398

Open
jdoss opened this issue Feb 1, 2024 · 2 comments
Open

The file backend does not update the runtime config on file change #398

jdoss opened this issue Feb 1, 2024 · 2 comments

Comments

@jdoss
Copy link

jdoss commented Feb 1, 2024

Expected Behavior

The file backend should update the runtime config with new changes when the backend-file-path file is updated.

Current Behavior

It currently does not update the runtime config as far as I can tell. Running the following command:

smee -backend-kube-enabled=false -backend-file-enabled -backend-file-path=/opt/smee/example.yaml -syslog-enabled=false -dhcp-addr=0.0.0.0:67 -dhcp-ip-for-packet=172.16.0.22 -http-addr=172.16.0.22:80 -dhcp-http-ipxe-binary-url="http://172.16.0.22/ipxe/" -dhcp-http-ipxe-script-url="http://172.16.0.22/auto.ipxe" -dhcp-tftp-ip=172.16.0.22:69

with the following backend-file-path /opt/smee/example.yaml

03:cc:aa:c6:0b:36:
  ipAddress: "172.16.0.29"
  subnetMask: "255.255.240.0"
  defaultGateway: "172.16.0.17"
  nameServers:
    - "8.8.8.8"
    - "1.1.1.1"
  hostname: "infra-2"
  domainName: "example.com"
  broadcastAddress: "172.16.0.31"
  ntpServers:
    - "132.163.96.2"
    - "132.163.96.3"
  leaseTime: 86400
  domainSearch:
    - "example.com"
  netboot:
    allowPxe: true
    ipxeScriptUrl: "https://boot.netboot.xyz"

and then I make a change and write /opt/smee/example.yaml it does not update the runtime config. For example I have been changing the IP address from 172.16.0.29 to 172.16.0.28 and the server I am testing with always gets 172.16.0.29 from DHCP.

The info log message https://github.com/tinkerbell/smee/blob/main/internal/backend/file/file.go#L210 never shows up in stdout.

Possible Solution

Something is dun goofed up in https://github.com/tinkerbell/smee/blob/main/internal/backend/file/file.go

Steps to Reproduce (for bugs)

See above.

Context

I am trying to automate my datacenter deployment with iPXE booting in my rack and smee seems pretty great for that job. In the short term I plan on using the file backend to automate imaging Fedora CoreOS servers. Long term I hope that maybe a more programmatic backend like Redis could be added to make it easier to add and remove servers that need to be iPXE booted via Smee.

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):
root@compute-1:~# cat /etc/os-release 
NAME="Fedora Linux"
VERSION="39.20240104.3.0 (CoreOS)"
ID=fedora
VERSION_ID=39
PLATFORM_ID="platform:f39"
PRETTY_NAME="Fedora CoreOS 39.20240104.3.0"
  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:

Podman and CLI for now while I test smee out for my usecase. I am going to port it to Nomad with the Podman driver if I can get things working.

@jdoss
Copy link
Author

jdoss commented Feb 1, 2024

Looks like it works as expected on my Fedora 39 workstation that has btrfs for a file system on /home. My compute node has XFS for a file system.

$ podman run -it --rm --cap-add NET_ADMIN -v ../../backend/file/testdata/example.yaml:/smee/example.yaml:Z --network host quay.io/tinkerbell/smee:v0.11.0 -backend-file-enabled -backend-kube-enabled=false -http-addr=0.0.0.0:4588 -backend-file-path=/smee/example.yaml 
{"level":"info","ts":1706764194.8098412,"caller":"smee/main.go:124","msg":"starting","version":"02731c4"}
{"level":"info","ts":1706764194.809936,"caller":"smee/main.go:129","msg":"starting syslog server","bind_addr":"192.168.1.11:514"}
{"level":"info","ts":1706764194.809962,"caller":"smee/main.go:158","msg":"starting tftp server","bind_addr":"192.168.1.11:69"}
{"level":"info","ts":1706764194.8101096,"caller":"smee/main.go:220","msg":"serving http","addr":"0.0.0.0:4588","trusted_proxies":[]}
{"level":"info","ts":1706764194.8101823,"caller":"smee/main.go:233","msg":"starting dhcp server","bind_addr":"0.0.0.0:67"}
{"level":"info","ts":1706764194.8102882,"logger":"github.com/tinkerbell/ipxedust","caller":"ipxedust@v0.0.0-20231215220341-a535c5deb47a/ipxedust.go:201","msg":"serving iPXE binaries via TFTP","service":"github.com/tinkerbell/smee","addr":"192.168.1.11:69","blocksize":512,"timeout":5,"singlePortEnabled":true}
{"level":"info","ts":1706764194.8103902,"caller":"server/dhcp.go:35","msg":"Server listening on","addr":"0.0.0.0:67"}
{"level":"info","ts":1706764263.0691502,"caller":"file/file.go:210","msg":"file changed, updating cache"}
{"level":"info","ts":1706764263.0691965,"caller":"file/file.go:210","msg":"file changed, updating cache"}
{"level":"info","ts":1706764263.0700104,"caller":"file/file.go:210","msg":"file changed, updating cache"}
{"level":"info","ts":1706764263.0700002,"caller":"file/file.go:210","msg":"file changed, updating cache"}

That is super weird. I am not out of inotify user watches either.

root@compute-1:~# cat /proc/sys/fs/inotify/max_user_watches 
1048576
root@compute-1:~# lsof | grep inotify | wc -l
76

@jdoss
Copy link
Author

jdoss commented Feb 1, 2024

Just vim things....

So on my compute nodes I do most of my file editing with vim and I use vscode on my workstation. This is why I couldn't reproduce the problem locally.

Here is a file edit with nano. It writes to the file as expected.

# podman run -it --rm -v /opt/inotify_test:/test:Z docker.io/library/golang:latest bash
root@92a50b52246d:/go# git clone https://github.com/fsnotify/fsnotify.git
Cloning into 'fsnotify'...
remote: Enumerating objects: 2439, done.
remote: Counting objects: 100% (239/239), done.
remote: Compressing objects: 100% (111/111), done.
remote: Total 2439 (delta 185), reused 168 (delta 125), pack-reused 2200
Receiving objects: 100% (2439/2439), 795.17 KiB | 5.93 MiB/s, done.
Resolving deltas: 100% (1652/1652), done.
root@92a50b52246d:/go# cd fsnotify/
# Here is a file edit with nano. It writes to the file as expected.
root@92a50b52246d:/go/fsnotify# go run ./cmd/fsnotify watch /test/joe 
go: downloading golang.org/x/sys v0.4.0
05:59:54.0437 ready; press ^C to exit
06:00:00.6919   1 WRITE         "/test/joe"
06:00:00.6920   2 WRITE         "/test/joe"
06:00:12.6677   3 WRITE         "/test/joe"
06:00:12.6678   4 WRITE         "/test/joe"
^Csignal: interrupt

and if I use nano to edit /opt/smee/example.yaml we can see that smee reloads the data as expected

# podman run -it --rm --cap-add NET_ADMIN -v /opt/smee/example.yaml:/smee/example.yaml:Z --network host quay.io/tinkerbell/smee:v0.11.0 -backend-kube-enabled=false -backend-file-enabled -backend-file-path=/smee/example.yaml -dhcp-addr=0.0.0.0:67 -dhcp-ip-for-packet=172.16.0.22 -http-addr=0.0.0.0:80 -dhcp-http-ipxe-binary-url="http://172.16.0.22/ipxe/" -dhcp-http-ipxe-script-url="http://172.16.0.22/auto.ipxe" -dhcp-tftp-ip=0.0.0.0:69 -log-level=debug
{"level":"info","ts":1706766884.5203207,"caller":"smee/main.go:124","msg":"starting","version":"02731c4"}
{"level":"info","ts":1706766884.5204117,"caller":"smee/main.go:129","msg":"starting syslog server","bind_addr":"172.16.0.22:514"}
{"level":"info","ts":1706766884.5204427,"caller":"smee/main.go:158","msg":"starting tftp server","bind_addr":"172.16.0.22:69"}
{"level":"info","ts":1706766884.520599,"caller":"smee/main.go:220","msg":"serving http","addr":"0.0.0.0:80","trusted_proxies":[]}
{"level":"info","ts":1706766884.5206847,"caller":"smee/main.go:233","msg":"starting dhcp server","bind_addr":"0.0.0.0:67"}
{"level":"info","ts":1706766884.520827,"caller":"server/dhcp.go:35","msg":"Server listening on","addr":"0.0.0.0:67"}
{"level":"info","ts":1706766884.5208106,"logger":"github.com/tinkerbell/ipxedust","caller":"ipxedust@v0.0.0-20231215220341-a535c5deb47a/ipxedust.go:201","msg":"serving iPXE binaries via TFTP","service":"github.com/tinkerbell/smee","addr":"172.16.0.22:69","blocksize":512,"timeout":5,"singlePortEnabled":true}
{"level":"info","ts":1706766893.4178863,"caller":"file/file.go:210","msg":"file changed, updating cache"}
{"level":"info","ts":1706766893.4179363,"caller":"file/file.go:210","msg":"file changed, updating cache"}
{"level":"info","ts":1706766893.4181042,"caller":"file/file.go:210","msg":"file changed, updating cache"}
{"level":"info","ts":1706766893.4181373,"caller":"file/file.go:210","msg":"file changed, updating cache"}

Here is the same test with vim. It will write the changes out to a different file and then rename. You can see all the stuff it is doing in the directory when I make an edit to a file.

root@92a50b52246d:/go/fsnotify# go run ./cmd/fsnotify watch /test     
06:01:09.7768 ready; press ^C to exit
06:01:12.2588   1 CREATE        "/test/.joe.swp"
06:01:12.2589   2 CREATE        "/test/.joe.swpx"
06:01:12.2589   3 REMOVE        "/test/.joe.swpx"
06:01:12.2590   4 REMOVE        "/test/.joe.swp"
06:01:12.2590   5 CREATE        "/test/.joe.swp"
06:01:12.2590   6 WRITE         "/test/.joe.swp"
06:01:12.2590   7 CHMOD         "/test/.joe.swp"
06:01:15.5136   8 WRITE         "/test/.joe.swp"
06:01:20.8006   9 CREATE        "/test/4913"
06:01:20.8007  10 CHMOD         "/test/4913"
06:01:20.8007  11 REMOVE        "/test/4913"
06:01:20.8007  12 RENAME        "/test/joe"
06:01:20.8007  13 CREATE        "/test/joe~"
06:01:20.8008  14 CREATE        "/test/joe"
06:01:20.8008  15 WRITE         "/test/joe"
06:01:20.8010  16 WRITE         "/test/joe"
06:01:20.8018  17 CHMOD         "/test/joe"
06:01:20.8018  18 CHMOD         "/test/joe"
06:01:20.8018  19 CHMOD         "/test/joe"
06:01:20.8019  20 WRITE         "/test/.joe.swp"
06:01:20.8020  21 REMOVE        "/test/joe~"
06:01:20.8035  22 REMOVE        "/test/.joe.swp"

If we just watch the file, fsnotify records the above as a RENAME instead of a WRITE which I assume why smee is not picking up my changes when using vim.

root@8d0ad50f38c2:/go/src/fsnotify# go run ./cmd/fsnotify watch /test/joe 
05:52:47.1502 ready; press ^C to exit
05:53:01.1088   1 RENAME        "/test/joe"

The only thing I can think of as an improvement here is to change the fsnotify watch to also reload the config if it sees a RENAME. It looks like this issue has been talked about a lot on the fsnotify GH issues page. fsnotify/fsnotify#372 talks about the problem in depth and https://github.com/fsnotify/fsnotify?tab=readme-ov-file#watching-a-file-doesnt-work-well does talk about it as well and has an example of how to better watch a file. Maybe that is worth checking out if this to be addressed in the future.

I'll leave this issue open so you folks can figure out what you want to do here. I am OK with closing it since this is not really bug but a really, really, stupid edge case caused by vim or any other editor that does atomic updates to files.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant