Skip to content
This repository has been archived by the owner on May 3, 2024. It is now read-only.

feat(lib): add support for git repositories as data source #49

Merged

Conversation

rockchalkwushock
Copy link
Contributor

What does this PR do?

Adds support for users to use git repositories as a data source for PardallMarkdown.

Example

config :pardall_markdown, PardallMarkdown.Content,
  root_path: "https://github.com/rockchalkwushock/codybrunner.dev.git",

Closes #31


def local_path, do: @local_path

def list_files(folder \\ Settings.posts_folder()) do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alfredbaudisch So my thoughts here are we can default to the root of the repository and document that in the docs. We can then do one of two things:

  1. Add another optional setting in the config for the git_repo_data_path,
  2. Have root_path accept a different data type, perhaps a keyword list would be a good option.

For example my website is written in NextJS with MDX all the md(x) files live at data:

config :pardall_markdown, PardallMarkdown.Content,
  root_path: ["https://github.com/rockchalkwushock/codybrunner.dev.git", "data"]

What are your thoughts on how to expose this API to the end user?

@alfredbaudisch
Copy link
Owner

alfredbaudisch commented Oct 12, 2021

This is a very good start, thanks for sharing. We still have some work to do, the repository provider has to be connected to a watcher, the watcher also needs to pool the repository (as opposed to the current FileWatcher which detects file changes on the fly), and then when getting new content, parse it with the FileParser through the current PardallMarkdown workflow.

blogit/server.ex has a good start for a Repository watcher. It needs to be adjusted into PardallMarkdown but the main core is there.

Our current FileWatcher and the new RepositoryWatcher can be loaded and started side by side. Then, RepositoryWatcher could clone the files into root_path and then FileWatcher picks up the changes (as it currently is). By following this, RepositoryWatcher won't have to talk to FileParser and we can rely on the current implementation.

Do you want to give it a try? Otherwise I can do this next step in the weekend.

@rockchalkwushock
Copy link
Contributor Author

Hi @alfredbaudisch I'd like to take another shot at implementing the RepositoryWatcher. I likely won't get a chance until tomorrow evening or Thursday evening at the earliest. I should have a rough implementation up before this weekend though.

Thank you for your guidance and patience. It's challenging, but also exciting to be contributing to an Elixir project! Even more so a project I am absolutely going to use in the coming weeks for a side project!

@rockchalkwushock
Copy link
Contributor Author

@alfredbaudisch I am working on the repository_watcher now. My assumption is that when you mean the watchers can be started "side-by-side" the GenServers will be executed in the application.ex supervisor tree. I apologize if this gets trivial my OTP is pretty noob level.

# application.ex
def start do 
  children = [
    ...
    # Start the repository_watcher.ex first
    # How does the GenServer respond if no config value provided? 
    # Do we just not invoke the watcher?
    # Would it make sense to look for the config here an provide it to the keyword list?
    # { ..., remote: Application.get_env(:pardall_mardown, :remote_source) }
    {PardallMarkdown.RepositoryWatcher, name: PardallMarkdown.RepositoryWatcher},
    {PardallMarkdown.FileWatcher, name: PardallMarkdown.FileWatcher, dirs: [PardallMarkdown.Content.Utils.root_path()]]},
    ...
  ]
  ...
  Supervisor.start_link(children, opts)
end

With PardallMarkdown running as an application inside the mix project if a remote_source is provided the @local_path in the git module needs to be an absolute path "relative" to the environment (local machine or container) and that becomes the root_path for FileWatcher and FileParser to take over from correct?

Now it is my understanding that when a process is created on the beam those processes are isolated from one another so I am slightly confused as to how the FileWatcher is going to be able to listen for changes on RepositoryWatcher. I unfortunately have not had time to work on this side project much any more, but this post is where my understanding of GenServer and OTP stopped.

From your YT video PardallMarkdown is "framework agnostic". You don't have to use it with Phoenix. It can be used in a standalone mix project with no problem so adding something like the Phoenix.PubSub dependency I would assume is not something you would want to do here since that (assumption) would lock the package into Phoenix and require more dependencies. I am not sure in the ecosystem or internally what Elixir provides for passing state between two isolated processes (or perhaps there is an underlying Erlang module that can help with this?)

RepositoryWatcher POC

defmodule PardallMarkdown.RepositoryWatcher do
  use GenServer
  require Logger

  alias PardallMarkdown.RepositoryProvider, as: Repository

  @recheck_interval Application.get_env(:pardall_markdown, PardallMarkdown.Content)[:recheck_pending_remote_events_interval]

  def start_link(args) do
    GenServer.start_link(__MODULE__, args, args)
  end

  def init(args) do
    # Fetch the repository or create a local git repo if remote source not found.
    Repository.repository()

    # Is it possible to update the config at runtime so the `root_path` is updated?
    # If not how does the FileWatcher subscribe to the path being created here?
    # My assumption is that if a `remote_source` is provided and we are executing in Mix.env() == :dev
    # the local_path is pointing to the local git repository and any polling of the remote is disabled.
    # When executing in Mix.env() == :prod then the callbacks associated with listening on the remote are enabled.

    # Start Polling
    schedule_next_recheck()

     # Send out message on the initialized state of the GenServer.
     {:ok, %{}}
  end

  def handle_info() when Mix.env() == :dev, do: # do things with the local git repository "FileWatcher"
  def handle_info() when Mix.env() == :prod, do: # do things with remote_source "FileWatcher + RepositoryWatcher"

  defp schedule_next_recheck, do: Process.send_after(self(), :check_pending_events, @recheck_interval)
end

Other Thoughts Not Relevant To the Above 😂

Also just a thought, but I would figure that this feature could extend past just git providers. For example CMS providers like Contentful et.al. Perhaps this should be renamed to remote_provider and remote_watcher. Under the remote_providers/ there can be implementations for git as well as what I am sure would require specific modules for each CMS.

Thank you for your patience with a noobie 🙏🏻

@alfredbaudisch
Copy link
Owner

alfredbaudisch commented Oct 16, 2021

Nice questions, @rockchalkwushock.

so adding something like the Phoenix.PubSub dependency I would assume is not something you would want

It's because it's not necessary.

I am slightly confused as to how the FileWatcher is going to be able to listen for changes on RepositoryWatcher

  • FileWatcher won't know anything about RepositoryWatcher and RepositoryWatcher won't know anything about FileWatcher.
  • There will be no changes made in the implementation of FileWatcher
  • The function of the RepositoryWatcher is to handle and pool a RepositoryProvider.
  • The RepositoryProvider pulls Git files/changes inside the root_folder. Check your RepositoryProvider.Git.git_repository. In this first implementation, RepositoryProvider.Git.local_path will point to root_path.
  • FileWatcher will keep watching root_folder and it will keep reacting to changes to that folder and its contents, as it currently is.

I would figure that this feature could extend past just git providers.

Yes, but then we would implement something like webhooks, so the headless CMS can call PardallMarkdown webhooks and we refresh the content. But this is separate from implementing the Git provider. We can also always refactor later.

Your OPT remarks in how to start the RepositoryWatcher

Use the same format as FileWatcher is started, passing configuration at runtime, either via a helper function or directly via Application.get_env. Put RepositoryWatcher after FileWatcher.

      {
        PardallMarkdown.FileWatcher,
        name: PardallMarkdown.FileWatcher, dirs: [PardallMarkdown.Content.Utils.root_path()]
      }

Is it possible to update the config at runtime so the root_path is updated? If not how does the FileWatcher subscribe to the path being created here?

I think this is answered above, but let's explain the order of execution of a PardallMarkdown application:

Fresh Start

  • root_path: /home/content/
  • PardallMarkdown starts.
  • FileWatcher starts and checks root_path. It's empty.
  • FileWatcher starts watching for changes (forever).
  • RepositoryWatcher starts, with the target folder as root_path
  • RepositoryWatcher pools RepositoryProvider, which then RepositoryProvider pulls the remote_repository Git repository.
  • Contents are pulled from Git inside /home/content/, creating files and subfolders.
  • FileWatcher detects that contents of root_path were change and recompiles the content.

Lifecycle

  • RepositoryWatcher keeps pooling RepositoryProvider and RepositoryProvider keeps pulling the Git repo for changes
  • FileWatcher keeps watching root_path and if changed - in case RepositoryWatcher pulled changes from the repository - , FileWatcher recompiles the content.

@rockchalkwushock
Copy link
Contributor Author

@alfredbaudisch sorry for the late reply I am just now seeing this. I will see if I can get around to writing the implementation tomorrow for this.

@rockchalkwushock
Copy link
Contributor Author

@alfredbaudisch I think I've just about got it where it needs to be. There are a few known bugs at the moment

  1. RepositoryWatcher will crash if the local_path already exists. Pretty easy condition check to fix this, but my question for you is should that check live in the Git.git_repository/0 function or should that check be made inside the RepositoryWatcher.init/1?
  2. I am still trying to manually mock that the fetch polling is working.

@alfredbaudisch
Copy link
Owner

Thank you! Gonna check it out by the weekend.

@alfredbaudisch
Copy link
Owner

alfredbaudisch commented Oct 24, 2021

@rockchalkwushock I committed the final changes needed to make the feature work. My changes also answer your last questions.

I greatly simplified repository_providers/git.ex, and added the necessary fixes/additions to make it correctly work with :root_path and with the existing path/non-existing conditions.

RepositoryWatcher is loaded and started only in case a remote_repository_url is provided.

FileWatcher correctly ignores .git files and do not send files to be rebuilt in case of no changes in the content itself.

Do you want to add something more? If no, then it is ready to be merged.

The CI build fail is not related to this feature, so we can ignore it.

@rockchalkwushock
Copy link
Contributor Author

@alfredbaudisch looking at this now

@rockchalkwushock
Copy link
Contributor Author

rockchalkwushock commented Oct 24, 2021

@alfredbaudisch everything looks great! Good catch with the hidden files I had not considered that. I also was racking my brain with how to conditionally start the RepositoryWatcher if no remote_url was provided. My JavaScript brain kept saying if/else...still learning how to think the "Elixir Way". Thanks for all your help with getting this to the finish line.

Merge when ready

@alfredbaudisch alfredbaudisch merged commit 7590cb2 into alfredbaudisch:master Oct 24, 2021
@alfredbaudisch
Copy link
Owner

@rockchalkwushock well done. Published https://hex.pm/packages/pardall_markdown/0.4.0 - you are also mentioned in the README and the README has instructions regarding to this new feature.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support to watch external sources, such as GitHub repositories
2 participants