Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[microTVM] Zephyr: Add support for FVP #12125

Merged
merged 1 commit into from
Aug 15, 2022

Conversation

mkatanbaf
Copy link
Contributor

@mkatanbaf mkatanbaf commented Jul 19, 2022

This PR adds corstone300 FVP to the platforms supported by the zephyr. We use the Iris debugger to communicate with the emulator via semihosting, due to the FVP serial port's faulty behavior (It might drop a byte in messages longer than 1 byte).
This PR also changes the generated micro-projects build system from make to ninja.

cc @alanmacd @gromero @mehrdadh

@mkatanbaf mkatanbaf force-pushed the zephyr-fvp-support branch 6 times, most recently from 1635136 to 4801af4 Compare July 25, 2022 18:57
Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @mkatanbaf , left some comments

@mkatanbaf
Copy link
Contributor Author

cc @gromero

@mkatanbaf mkatanbaf force-pushed the zephyr-fvp-support branch 4 times, most recently from 12b6a69 to a0c0b05 Compare August 4, 2022 06:33
@mkatanbaf mkatanbaf marked this pull request as ready for review August 4, 2022 17:17
@mkatanbaf
Copy link
Contributor Author

cc @mehrdadh

@mehrdadh mehrdadh mentioned this pull request Aug 5, 2022
Copy link
Member

@mehrdadh mehrdadh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkatanbaf This is really awesome! thanks for working on this.
Mostly good, I added some comments

tests/scripts/task_python_microtvm.sh Outdated Show resolved Hide resolved
@mkatanbaf mkatanbaf force-pushed the zephyr-fvp-support branch 2 times, most recently from d2d8a79 to ae50647 Compare August 8, 2022 21:39
Copy link
Member

@mehrdadh mehrdadh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Since this PR has significant changes I suggest to test it with microtvm hardware CI as well before merging.

@mkatanbaf
Copy link
Contributor Author

thanks, @mehrdadh. I tested this commit with microTVM hardware CI, and compared the errors against nightly build 316. No new error was introduced.

Copy link
Contributor

@gromero gromero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkatanbaf Thanks for the persistence on making it work despite the issues you found when trying to communicate with the target via FVP's buggy serial-TCP-IP port. think the PR body -- so the final commit message too - could at least mention that due to such FVP serial port limitation the semihosting approach was used as an alternative to talk to the target mps3 board.

Overall LGTM! There are just a couple o minor issues I'd like to address and clarify -- please see comments inline.


env = dict(os.environ)
if self._is_fvp(zephyr_board, emu_platform == "armfvp"):
env["ARMFVP_BIN_PATH"] = str(API_SERVER_DIR / "fvp-hack")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can avoid str() cast since API_SERVER_DIR is already of pathlib.Path() type.

Why it's necessary to chmod the FVP_Corstone_SSE-300_Ethos-U55 bash script? It's already add as an executable to the repo.

So, if all this holds, I think the code from like 631 to 637 can be simply something like:

-            env["ARMFVP_BIN_PATH"] = str(API_SERVER_DIR / "fvp-hack")
-            env["ARMFVP_BIN_PATH"] = os.path.realpath(env["ARMFVP_BIN_PATH"])
-            st = os.stat(env["ARMFVP_BIN_PATH"] + "/FVP_Corstone_SSE-300_Ethos-U55")
-            os.chmod(
-                env["ARMFVP_BIN_PATH"] + "/FVP_Corstone_SSE-300_Ethos-U55",
-                st.st_mode | stat.S_IEXEC,
-            )
+            env["ARMFVP_BIN_PATH"] = (API_SERVER_DIR / "fvp-hack").resolve()

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FVP_Corstone_SSE-300_Ethos-U55 was not an executable on the CI and the tests were failing. I had to explicitly change that with chmod. It was OK on my local machine though!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, I checked and the str cast is needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm got it. odd. yeah it's ok on my local machine as well...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me see if I can understand what's happening, this is odd.

Copy link
Contributor

@gromero gromero Aug 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@driazati Any clue why this file here: https://github.com/apache/tvm/blob/ebd98ce8215eb7499ead0d509457fd432eca7d32/apps/microtvm/zephyr/template_project/fvp-hack/FVP_Corstone_SSE-300_Ethos-U55 (which is indeed an executable) could turn into a non executable when copied inside the CI container as @mkatanbaf said above? It's copied this way to the new location:

shutil.copytree(API_SERVER_DIR / "fvp-hack", project_dir / "fvp-hack")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the script gets sent between a build and a test step in Jenkins it goes through AWS S3 which doesn't have any notion of file permissions, so we add them back in here: https://github.com/apache/tvm/blob/main/ci/jenkins/Build.groovy.j2#L36-L43. Changing that is a little tricky since those changes wont be run in CI unless the source branch is under apache/tvm and not from a fork, so keeping the chmod in code here seems reasonable for now. We can follow up with a change that cleans it up and moves it out to Build.groovy.j2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkatanbaf Could you please add a comment on top of the chmod code explaining it's a CI requirement as per David's explanation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -344,6 +357,18 @@ def _get_nrf_device_args(options):
type="str",
help="Path to the CMSIS directory.",
),
server.ProjectOption(
"arm_fvp_path",
optional=["generate_project"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this options is accessed directly in open() (so used by open_transport when FVP is used), hence it seems a required option for open_transport Project API method? Could you please clarify its domain / use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the "open_transport" to the list of domains the option is used.

args.append("-v")
args.append("run")
env = dict(os.environ)
env["FVP_BIN_PATH"] = str(pathlib.Path(self.options["arm_fvp_path"]).parent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add arm_fvp_path as required or optional in the server options? It seems required here -- or is a check missing that would use options.get()? But if it's a required option by FVP I'm not sure how to add it since this option doesn't apply to QEMU or other boards that don't use FVP. I think currently there is no way to inform that complexity using the server options ... so if it's indeed require at least assert or throw an error if arm_fvp_path is not given here?

I also wonder why we have both FVP_BIN_PATH and ARMFVP_BIN_PATH here. The fvp-hack script uses FVP_BIN_PATH. Couldn't we use only the ARMFVP_BIN_PATH here and in the fvp-hack. In this case the Python code here would set ARMFVP_BIN_PATH to use the fvp-hack script and the fvp-hack, by its turn, would be called by Zephyr build / run system (when subprocess() is called). Then the fvp-hack, when called by Zephyr, would add the iris flags etc?

Or at least add a comment why there are two similar bin paths to the FVP, saying how that's used by the fvp-hack script..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding the second point, we can only use the ARMFVP_BIN_PATH. The other one was left-over code from my attempts to understand why tests were failing on CI addressed here. I removed the FVP_BIN_PATH.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding the first point,you are correct to say that the option is required in open, but only for the FVP projects. (we still need the arm_fvp_path in open transport in _import_iris function. ) As you suggested, I added an assert there to make it clear.

@gromero gromero changed the title Zephyr fvp support [microTVM] Zephyr: Add support for FVP Aug 11, 2022
Copy link
Contributor

@gromero gromero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkatanbaf Thanks for v3. I think that after addressing the unnecessary startwith (see comment in line) and once the CI gets green the PR is ready to land.

emu_platform = None
with open(API_SERVER_DIR / CMAKELIST_FILENAME) as cmake_f:
for line in cmake_f:
if line.startswith("set(EMU_PLATFORM"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkatanbaf This line can be removed now that re.match is in place.

Copy link
Contributor

@gromero gromero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkatanbaf Thanks for v4. LGTM!

@gromero
Copy link
Contributor

gromero commented Aug 12, 2022

@tvm-bot rerun

@gromero
Copy link
Contributor

gromero commented Aug 12, 2022

@mkatanbaf @areusch @mehrdadh I've re-triggered the tests, but apparently @mkatanbaf have done it already once... it seems that one of the FVP processes got stuck for about ~3h:

[2022-08-12T07:40:53.224Z] INFO:__main__:dummy log...
[2022-08-12T07:40:53.224Z] 
[2022-08-12T07:40:53.224Z] INFO:__main__:microTVM Zephyr runtime - running
[2022-08-12T07:40:53.224Z] 
[2022-08-12T07:40:53.224Z] INFO:__main__:IRIS semihosting initialized.
[2022-08-12T07:40:53.224Z] [07:40:53] /workspace/src/runtime/micro/micro_session.cc:368: remote: microTVM Zephyr runtime - running
[2022-08-12T10:25:46.234Z] Sending interrupt signal to process
[2022-08-12T10:25:50.665Z] script returned exit code 143

:(

Full log here: https://ci.tlcpack.ai/blue/rest/organizations/jenkins/pipelines/tvm/branches/PR-12125/runs/55/nodes/432/steps/1475/log/?start=0

@mkatanbaf
Copy link
Contributor Author

@gromero Thanks for re-triggering the tests. This has happened locally a few times before (out of many runs!), and I talked with @areusch about it. This is difficult to reproduce, I ran the tests in a loop overnight for the past few nights, and this issue didn't come up!

@mkatanbaf
Copy link
Contributor Author

There are some failing ethosu codegen tests on the rerun, but I don't think those are caused by this PR

@gromero
Copy link
Contributor

gromero commented Aug 12, 2022

@mkatanbaf Got it! Well, feels like déjà-vu to me (when using QEMU). When you ran locally in a loop did you use any container? Anyways, I know how hard it can be. But how about the last error in test_binary_add_with_non_4d_shapes :

https://ci.tlcpack.ai/blue/rest/organizations/jenkins/pipelines/tvm/branches/PR-12125/runs/56/nodes/432/steps/1490/log/?start=0

It seems not related to that change, so I'm confused...

@gromero
Copy link
Contributor

gromero commented Aug 12, 2022

There are some failing ethosu codegen tests on the rerun, but I don't think those are caused by this PR

@mkatanbaf yeah I agree. So it means the CI is not "quite" stable ... and that other glitches can happen :(

@gromero
Copy link
Contributor

gromero commented Aug 12, 2022

We can re-trigger and hope for the best, but I wonder if the first issue -- in FVP hanging -- is a blocker to merge it. @areusch Thoughts? :)

@mkatanbaf
Copy link
Contributor Author

mkatanbaf commented Aug 12, 2022

@mkatanbaf Got it! Well, feels like déjà-vu to me (when using QEMU). When you ran locally in a loop did you use any container?

yes, I'm using the cortexm docker container (formerly qemu). What I've seen (and trying to reproduce) is that it gets stuck when it tries to close the transport. My guess is that something prevents the subprocess termination.

adds corstone300 FVP to the platforms supported by the zephyr. We use the Iris debugger to communicate with the emulator via semihosting, due to the FVP serial port's faulty behavior.
also changes the generated micro-projects build system from make to ninja.
@mkatanbaf
Copy link
Contributor Author

@tvm-bot rerun

4 similar comments
@mkatanbaf
Copy link
Contributor Author

@tvm-bot rerun

@mkatanbaf
Copy link
Contributor Author

@tvm-bot rerun

@mkatanbaf
Copy link
Contributor Author

@tvm-bot rerun

@mkatanbaf
Copy link
Contributor Author

@tvm-bot rerun

@areusch
Copy link
Contributor

areusch commented Aug 15, 2022

ok let's give this a shot, we can always disable the test in CI if it becomes flaky.

@areusch areusch merged commit 71d772f into apache:main Aug 15, 2022
@gromero
Copy link
Contributor

gromero commented Aug 16, 2022

@areusch really?! We did that for the mps3 board with QEMU ...

@areusch
Copy link
Contributor

areusch commented Aug 18, 2022

@gromero i think we should try it, and CI Monitoring rotation will catch any flakes a lot faster.

xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
adds corstone300 FVP to the platforms supported by the zephyr. We use the Iris debugger to communicate with the emulator via semihosting, due to the FVP serial port's faulty behavior.
also changes the generated micro-projects build system from make to ninja.

Co-authored-by: Andrew Reusch <areusch@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants