Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First Class Python Support (4000USD Bounty) #3928

Open
17 of 33 tasks
lihaoyi opened this issue Nov 9, 2024 · 37 comments · Fixed by #4000
Open
17 of 33 tasks

First Class Python Support (4000USD Bounty) #3928

lihaoyi opened this issue Nov 9, 2024 · 37 comments · Fixed by #4000
Labels
bounty pythonlib Issues related to Mill's python support

Comments

@lihaoyi
Copy link
Member

lihaoyi commented Nov 9, 2024


From the maintainer Li Haoyi: I'm putting a 4000USD bounty on this issue, payable by bank transfer on merged PRs implementing this. Standard bounty terms apply


Python is common in industry, and lacks good build tools. Mill could a be a good tool to help manage large multi-module Python projects (see What Makes Mill Unique?) providing automatic caching and parallelism for the various workflows (dependency resolution, typechecking, testing, packaging, publishing) that are core to the local development experience to help keep them fast and responsive.

The goal of this ticket is to generate a set of Python build examples that match the Java/Scala/Kotlin equivalents. We already have an example PythonModule for demo purposes, but we would need to flesh it out using the equivalent Python tools and libraries.

For the purposes of this ticket, each examples should match as closely as possible the Java/Scala/Kotlin equivalents, to provide a useful minimal-but-still-educational code example, along with associated english documentation and explanations. You should read through the relevant sections of the Building Java with Mill documentation before proceeding, even if you don't know Java, just to get a feel for what the documentation and examples for each section should cover.

  1. example/pythonlib/basic/ (500USD)

    • 1-simple/: A minimal Python module demonstrating typechecking/running/testing/pex
      • We should also demonstrate how to use the Python REPL hooked up to your module
    • 2-custom-build-logic/: A Python module with custom build logic
    • 3-multi-modules/: Multiple inter-related Python modules
  2. example/pythonlib/dependencies/ (500USD)

    • 1-pip-deps/
    • 2-unmanaged-wheels/
    • 3-downloading-unmanaged-wheels/
    • 4-repository-config/: examples of how to use alternate PyPI mirrors or repositories
  3. example/pythonlib/linting/ (1000USD)

    • 1-black/: equivalent to 1-scalafmt but using Black
    • 2-code-coverage/: not sure what the popular Python code coverage lib is, but they should have something
    • 3-ruff/: using Ruff to lint the Python code
  4. example/pythonlib/module/ (500USD)

    • 1-common-config/
    • 2-custom-tasks/
    • 3-override-tasks/
    • 4-compilation-execution-flags/
    • 5-resources/
    • 6-pex-config/
  5. example/pythonlib/testing/ (500USD)

    • 1-test-suite/: we should have examples of Python being used with at least two different testing frameworks: pytest and unittest
    • 2-test-deps/
    • 3-integration-suite/
  6. example/pythonlib/publishing/1-publish-module/, example/pythonlib/basic/4-realistic/ (500USD)

    • publishing/1-publish-module should demonstrate how to publish a Python package to PyPI
    • basic/4-realistic is only doable after all the previous bullets are done, so I'm grouping its bounty together with publishing/
  7. example/pythonlib/web/ (500USD)

    • 1-hello-flask/: hello world website using Flask framework
    • 2-todo-flask/: TodoMVC webapp using Flask framework
  8. example/pythonlib/web/ (500USD)

    • 3-hello-django/:hello world website using Django framework
    • 4-todo-django/: TodoMVC webapp using Django framework
@himanshumahajan138
Copy link
Contributor

himanshumahajan138 commented Nov 10, 2024

@lihaoyi Ready for this but need your Guidance (in between)

I think we should do this in parts...

@lihaoyi
Copy link
Member Author

lihaoyi commented Nov 10, 2024

@himanshumahajan138 yes, will need to think about the best way to approach this. It would be a pretty large scale effort and will definitely need to be broken down into parts

@lihaoyi lihaoyi changed the title First Class Python Support (??? Bounty) First Class Python Support (3500 Bounty) Nov 13, 2024
@lihaoyi lihaoyi changed the title First Class Python Support (3500 Bounty) First Class Python Support (3500USD Bounty) Nov 13, 2024
@lihaoyi lihaoyi added the bounty label Nov 13, 2024
@lihaoyi lihaoyi changed the title First Class Python Support (3500USD Bounty) First Class Python Support (5000USD Bounty) Nov 13, 2024
@lihaoyi lihaoyi changed the title First Class Python Support (5000USD Bounty) First Class Python Support (4000USD Bounty) Nov 13, 2024
@himanshumahajan138
Copy link
Contributor

Aaah! Great approach for subparts

Working on this as it will require huge code base so will start from small to big

Let's do this ✨

@himanshumahajan138
Copy link
Contributor

himanshumahajan138 commented Nov 13, 2024

@lihaoyi I want(have) to create a Proper PythonModule.scala using the Last PR Code(#3882) and Need some time so fully create it for each of the above-mentioned task

So, I request to avoid Clashes if anyone trying to do this please once comment...
(and i know bounties are available for everyone, Still if possible, assign me as one hand code contains flow 🤞)

@mpkocher
Copy link

@lihaoyi Perhaps It would be useful to clarify how pyproject.toml fits (or doesn't fit) into your vision for adding Python support to mill.

https://peps.python.org/pep-0621/

@lihaoyi
Copy link
Member Author

lihaoyi commented Nov 14, 2024

@mpkocher I don't actually know yet, we'll probably need to figure that out as we flesh out the implementation

lihaoyi added a commit that referenced this issue Nov 15, 2024
This sets up the scaffolding for
#3928 and
#3927 to be implemented
@lihaoyi
Copy link
Member Author

lihaoyi commented Nov 15, 2024

For anyone reading, I merged #3970 which sets up the basic scaffolding:

  • Code in pythonlib/src/
  • Unit test code in pythonlib/test/src/, run via ./mill pythonlib.test
  • Unit test code in example/pythonlib/, run via ./mill example.pythonlib.__.local.test

From there you should be able to iterate on the examples necessary for this ticket's bullet points to flesh out the necessary functionality

@himanshumahajan138
Copy link
Contributor

@lihaoyi that's great you just gave the foundation brick for the building ✨

@lihaoyi
Copy link
Member Author

lihaoyi commented Nov 15, 2024

Also the example Python module walkthrough is mandatory reading for anyone who wants to start on this https://mill-build.org/mill/extending/example-python-support.html

@himanshumahajan138
Copy link
Contributor

@lihaoyi i will work on this expanding the previous code just after completing the kotlin springboot

@sideeffffect
Copy link
Contributor

There's this new Python tool https://rye.astral.sh which may be a source of inspiration for Mill support for Python. What is innovative about Rye, compared to previous Python tools is that it's aiming to be a one-stop tool for anything related to working with a Python project:

  • not just managing packages/dependencies, or just
  • managing a venv, but crucially also
  • managing a Python version/implementation (there's e.g. PyPy besides CPython, etc)
  • running the project
  • running the tests in the project
  • publishing the package to a repository
  • running formaters
  • running linters
  • running type checkers
  • etc...

If Mill covered well a broad spectrum of workflows people working with Python need to do, coupled with Mill's goodies, like implicit parallelism and aggressive caching, it could be a tempting proposal for at least part of the Python community. Good luck with this! 🤞

@himanshumahajan138
Copy link
Contributor

himanshumahajan138 commented Nov 17, 2024

@lihaoyi actually i was working and stuck at some testing part as per the official python docs i have seen that python use unittest and pytest library for this purpose
but when i checked the test folder i saw that you used some other way (something like done in javalib and others)

Now the question is what way we have to follow ?
and if you are targeting something like javalib and others then can you give some guidance or some link related to testing docs for mill

by the way i have used unittest for now when i will raise PR we will Groom it there for sure...

Thanks...

@lqhuang
Copy link

lqhuang commented Nov 17, 2024

There's this new Python tool https://rye.astral.sh/

Features of rye have all merged into uv https://docs.astral.sh/uv/. uv is promising and definitely the best practice for now.

Python use unittest and pytest library for this purpose

Yeah, unittest is a built-in standard library. For a demo project, it's quick and easy, though almost every main stream libs adapt pytest as test runner.

@himanshumahajan138
Copy link
Contributor

Python use unittest and pytest library for this purpose

Yeah, unittest is a built-in standard library. For a demo project, it's quick and easy, though almost every main stream libs adapt pytest as test runner.

Yes! but the way @lihaoyi is doing (check the other examples in javalib and others) i am concerned about that coz i didn't got any docs and unable to get a spike of learning 😅

I think @lihaoyi have to provide some Guide and through some light on it

@jodersky
Copy link
Member

jodersky commented Nov 20, 2024

Cross-posting a comment from #3992 here.

I'll work on the missing fundamentals (test support, repl etc), then @himanshumahajan138 will take care of the basic examples, while I work on the dependencies part

@jodersky
Copy link
Member

I'll work on dependencies next, now that #3992 is almost done

lihaoyi pushed a commit that referenced this issue Nov 20, 2024
This is in view of #3928, specifically some essential things required
for the first task, `example/pythonlib/basic/`.

You can look at commits individually if you like. The gist of this PR is
to improve the scaffolding we currently have to unblock the previously
mentioned tasks. It can be summarized by the following points (roughly
each corresponding to one commit):

- [x] make `run` task the default, following the same convention used in
`ScalaModule` and `JavaModule`
- [x] add a command to run an interactive REPL. This is named `console`,
again following the conventions of scalalib
- [x] rework the way source files are handled:
- support multiple source directories, again following similar
conventions of scalalib
- instead of aggreting python scripts in one syntetic directory, keep
them where they are and instead manipulate PYTHONPATH or pass in
parameters to various directories where appropriate
- [x] define a module for unit tests, similar to the TestModule of
scalalib
@jodersky
Copy link
Member

While working on the dependeny examples, I published a dummy package to test.pypi.org, to show how custom indexes can be defined. While doing so, I started looking into the way we could also add publish support to mill, so I can pick that up next too.

One thing I'm wondering is how we should handle building platform-dependent wheels? I'm not so sure how this is typically done, so I'm open to suggestions. It would be very nice if we could replicate the logic as build tasks directly in mill, without resorting to more external tools (which typically are build tools themselves and have their own notion of dependencies), but if it becomes too complex we may need to explore other options.

@lihaoyi
Copy link
Member Author

lihaoyi commented Nov 26, 2024

@jodersky I don't know, for now maybe we can punt on the issue and stick with pure-python? Compiling platform-dependent wheels would first depend on us having a decent story for compiling native C/C++ code, which we currently do not

@himanshumahajan138
Copy link
Contributor

@jodersky Are we stuck???

Coz i was reading the jvm.scala file and there i saw that you used the general jvm.runSubprocess which don't provide the background run support (i think but not sure) see this:

/**
   * Runs a generic subprocess and waits for it to terminate. If process exited with non-zero code, exception
   * will be thrown. If you want to manually handle exit code, check [[runSubprocessWithResult]]
   */
  def runSubprocess(
      commandArgs: Seq[String],
      envArgs: Map[String, String],
      workingDir: os.Path
  ): Unit = {
    runSubprocessWithResult(commandArgs, envArgs, workingDir).getOrThrow
    ()
  }

  /**
   * Runs a generic subprocess and waits for it to terminate.
   *
   * @return Result with exit code.
   */
  def runSubprocessWithResult(
      commandArgs: Seq[String],
      envArgs: Map[String, String],
      workingDir: os.Path
  ): Result[Int] = {
    val process = spawnSubprocessWithBackgroundOutputs(
      commandArgs,
      envArgs,
      workingDir,
      backgroundOutputs = None
    )
    val shutdownHook = new Thread("subprocess-shutdown") {
      override def run(): Unit = {
        System.err.println("Host JVM shutdown. Forcefully destroying subprocess ...")
        process.destroy()
      }
    }
    Runtime.getRuntime().addShutdownHook(shutdownHook)
    try {
      process.waitFor()
    } catch {
      case e: InterruptedException =>
        System.err.println("Interrupted. Forcefully destroying subprocess ...")
        process.destroy()
        // rethrow
        throw e
    } finally {
      Runtime.getRuntime().removeShutdownHook(shutdownHook)
    }
    if (process.exitCode() == 0) Result.Success(process.exitCode())
    else Result.Failure(
      "Interactive Subprocess Failed (exit code " + process.exitCode() + ")",
      Some(process.exitCode())
    )
  }

Now tell me how can we add runBackground support should we use the second task mentioned above (with backgroundOutputs)

i think we can't use other runSubProcess which includes jars, classpaths mainFile and all

@jodersky @lihaoyi Please guide, it's important and i need to learn about this...

@himanshumahajan138
Copy link
Contributor

@jodersky soory to say but i am not getting the runBackground part i read jvm.scala and runmodule.scala but still nof able to add runBackground task
I request you could you please add that to the existing pythonModule after that all gates will be opened and i will finish writing examples for web examples

By the time you do this i will add module and linting examples

Hope you understand...

@jodersky
Copy link
Member

I don't know too much about it. I can take a look in a couple of days

@jodersky jodersky added the pythonlib Issues related to Mill's python support label Nov 30, 2024
@himanshumahajan138
Copy link
Contributor

himanshumahajan138 commented Nov 30, 2024

I don't know too much about it. I can take a look in a couple of days

ya i am working on module currently, wrote some examples, will make a pull tomorrow after that we will take a look into runBackground

@himanshumahajan138
Copy link
Contributor

himanshumahajan138 commented Nov 30, 2024

@jodersky Sir there is a convention or rule (not sure, it came from my experience not docs) when we use importlib.resources then we should have one dir up from the actual resource in the PYTHONPATH so now i am using a trick by adding res folder and then including parent (resources and custom-resources) in the PYTHONPATH but should we do like this see directory structure

# structure

example/pythonlib/module
└── 1-common-config
    ├── build.mill
    ├── custom-resources
    │   └── res
    │       └── MyOtherResources.txt
    ├── custom-src
    │   └── foo2.py
    ├── resources
    │   └── res
    │       └── MyResource.txt
    └── src
        └── foo.py
        
# see PYTHONPATH

{
    "value": [
        "ref:v0:489b58dc:/root/openSource/mill/out/example/pythonlib/module/1-common-config/local/testCached.dest/sandbox/run-1/src",
        "ref:v0:cf21b233:/root/openSource/mill/out/example/pythonlib/module/1-common-config/local/testCached.dest/sandbox/run-1/custom-src",
        "ref:v0:6ca7c1d8:/root/openSource/mill/out/example/pythonlib/module/1-common-config/local/testCached.dest/sandbox/run-1/resources",
        "ref:v0:d4265ae0:/root/openSource/mill/out/example/pythonlib/module/1-common-config/local/testCached.dest/sandbox/run-1/custom-resources",
        "ref:v0:061699e6:/root/openSource/mill/out/example/pythonlib/module/1-common-config/local/testCached.dest/sandbox/run-1/out/generatedSources.dest/generatedSources"
    ],
    "valueHash": 1395812861,
    "inputsHash": -1673489972
}

# and the way i am using it 

    def read_resource(self, package: str, resource_name: str) -> str:
        """Reads the content of a resource file."""
        try:
            with importlib.resources.open_text(package, resource_name) as file:
                return file.read()
        except FileNotFoundError:
            return f"Resource '{resource_name}' not found."


        # Reading resources
        print(f"MyResource: {self.read_resource('res', 'MyResource.txt')}")
        print(f"MyOtherResource: {self.read_resource('res', 'MyOtherResources.txt')}")

please give your suggestions and improvements...

@himanshumahajan138
Copy link
Contributor

Another Workaround is when we add millSourcePath into the PYTHONPATH then we can use custom-resources and resources as packages name in the importlib.resources
see:

# structure

example/pythonlib/module
└── 1-common-config
    ├── build.mill
    ├── custom-resources
    │   └── MyOtherResources.txt
    ├── custom-src
    │   └── foo2.py
    ├── resources
    │   └── MyResource.txt
    └── src
        └── foo.py



# PYTHONPATH
{
    "value": [
        "ref:v0:489b58dc:/root/openSource/mill/out/example/pythonlib/module/1-common-config/local/testCached.dest/sandbox/run-1/src",
        "ref:v0:72f129a0:/root/openSource/mill/out/example/pythonlib/module/1-common-config/local/testCached.dest/sandbox/run-1",
        "ref:v0:972ee54c:/root/openSource/mill/out/example/pythonlib/module/1-common-config/local/testCached.dest/sandbox/run-1/resources",
        "ref:v0:84efb034:/root/openSource/mill/out/example/pythonlib/module/1-common-config/local/testCached.dest/sandbox/run-1",
        "ref:v0:061699e6:/root/openSource/mill/out/example/pythonlib/module/1-common-config/local/testCached.dest/sandbox/run-1/out/generatedSources.dest/generatedSources"
    ],
    "valueHash": -756641350,
    "inputsHash": 258663986
}

# usage 

        # Reading resources
        print(f"MyResource: {self.read_resource('resources', 'MyResource.txt')}")
        print(f"MyOtherResource: {self.read_resource('custom-resources', 'MyOtherResources.txt')}")
        
# build.mill 

	def sources = Task.Sources {
		super.sources() ++ Seq(PathRef(millSourcePath))
	}

	def resources = Task.Sources {
    super.resources() ++ Seq(PathRef(millSourcePath))
  }

One more thing what if we include millSourcePath by default in the PYTHONPATH this will allow us to use millSourcePath as a resource pool and also its dirs as packages

@lihaoyi and @jodersky please once have a look and give some feedback on this asap...

@himanshumahajan138
Copy link
Contributor

himanshumahajan138 commented Nov 30, 2024

Short Story:

We Should add millSourcePath by default in the PYTHONPATH to insure resource pool and package directory structure

  def localPythonPath: T[Seq[PathRef]] = Task {
    Seq(PathRef(millSourcePath)) ++ sources() ++ resources() ++ generatedSources() ++ unmanagedPythonPath()
  }

@himanshumahajan138
Copy link
Contributor

himanshumahajan138 commented Dec 1, 2024

@lihaoyi A New Question Arises

in case of scalalib(also javalib, kotlinlib) MILL_TEST_RESOURCE_DIR value is /root/openSource/mill/out/example/scalalib/module/7-resources/local/testCached.dest/sandbox/run-1/foo/test/resources but in case of pythonlib it's /root/openSource/mill/example/pythonlib/module/5-resources

Why this Behaviour?

@jodersky
Copy link
Member

jodersky commented Dec 3, 2024

I'll take on some of the linting stuff next, specifically black and ruff. This should be fairly quick I think.

@himanshumahajan138 regarding your questions:

  • runBackground: it's true that the current system does not work with running arbitrary processes, only JVM processes. Currently, runBackground spawns a new java process which itself invokes the actual main() and handles cleanup if mill some files change on disk. See https://github.com/com-lihaoyi/mill/blob/7457601efdc5c959aef8bfd7bcb8aeb2513cdab3/scalalib/backgroundwrapper/src/mill/scalalib/backgroundwrapper/MillBackgroundWrapper.java. We can't use that strategy for non-jvm subprocesses, so we'd need to either:

    1. implement the same thing as a python wrapper specifically. Basically, a python script which is called by mill, does the same watching as MillBackgroundWrapper and then calls the main script
    2. implement a more generic system in mill which works for any processes

    I'd favor option 2, since it is more future-proof. However it is also more complex and we'll need to start a discussion on how to handle long-running processes from mill.

  • regarding importlib: I don't think it's possible to read resources from the "root" package, but I also don't think it's a common thing to want to do. Typically, resources will live in at least one directory corresponding to the name of a python module. For example, you could have the following source layout:

    foo/
      - src/foo/main.py
      - resources/foo/data.txt
    

    then access the data.txt in main.py with importlib.resources.open_text("foo", "data.txt")

@himanshumahajan138
Copy link
Contributor

@jodersky i agree and understood, could you please once review my PR this will allow me to move forward to complete it

@lihaoyi
Copy link
Member Author

lihaoyi commented Dec 4, 2024

Seems like Python standalone distributions are now a thing and will continue to be a thing going forward

We can consider using these in Mill, especially once #3930 lands and makes downloading and caching such large binary files convenient

@jodersky
Copy link
Member

jodersky commented Dec 4, 2024

This is pretty neat, we could use it to replace hostPythonCommand which we currently use to bootstrap a python venv. It would also allow easy cross-builds against multiple python versions.

jodersky added a commit that referenced this issue Dec 4, 2024
Adds the capability to build wheels and sdists, and publish them.

The idea is to roughly follow [Pant's approach to publishing python
packages](https://www.pantsbuild.org/stable/docs/python/overview/building-distributions):
act a build frontend which calls out to various specialized packaging
tools.

Part of #3928
jodersky added a commit that referenced this issue Dec 9, 2024
# Pull Request 
Added First Class Python Support [Module Examples]
Part of: #3928

## Description
Module Examples for Python `1-common-config`, `2-custom-tasks`,
`3-override-tasks`, `4-compilation-execution-flags`, `5-resources`,
`6-pex-config` Examples

## Related Issues
- Link to related issue #3928.

## Checklist

- [x]  1-common-config
- [x]  2-custom-tasks
- [x]  3-override-tasks
- [x]  4-compilation-execution-flags
- [x]  5-resources
- [x]  6-pex-config
- [x] Updated Documentation

## Status
Added & Require Review!!!
@jodersky jodersky reopened this Dec 9, 2024
@himanshumahajan138
Copy link
Contributor

himanshumahajan138 commented Dec 17, 2024

@lihaoyi and @jodersky i think web examples are done and just need some little grooming if required after review

Working on Testing Examples

these are final examples, coz @jodersky is working on linting and web are done by me and testing left so working on it...

@himanshumahajan138
Copy link
Contributor

@lihaoyi and @jodersky there are more testing libs in python like robot, doctest, testify, slash, ward

Should i Include them in the TestModule.scala for python please give suggestion asap...

@lihaoyi
Copy link
Member Author

lihaoyi commented Dec 19, 2024

Looking at the Jetbrains python survey (https://lp.jetbrains.com/python-developers-survey-2023/) it seems pytest and unittest are the bulk of test framework usage in Python, so I think no need to do the others for now

@himanshumahajan138
Copy link
Contributor

Looking at the Jetbrains python survey (https://lp.jetbrains.com/python-developers-survey-2023/) it seems pytest and unittest are the bulk of test framework usage in Python, so I think no need to do the others for now

Woah! Great Survey...

Ok then going with the pytest and unittest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment