Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache compiler returns filenames, WIP #54

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mdavis-xyz
Copy link
Contributor

Solves #47

I haven't been able to run the new test for this. The download just hangs forever. I'm not sure if AEMO is throttling me or not.

I find the logic in _dynamic_data_fetch_loop to be quite confusing. It's likely there's some edge case I haven't considered (e.g. what if the user passes select_columns, and there's multiple chunks, and the previous run was for a different format, and ...)

Can you please look closely at that?

Or perhaps this change is a good time to refactor that function? e.g. Having a while loop that just tries to download the next chunk until some error happens is a bit confusing.

Why not just do something like:

@cache()
def list_urls_for_month(year, month) -> List[str]:
    """Returns a list of all files (all tables, all chunks) for this month"""

def cache_compiler(start_date, end_date, table, cache_dir) -> List[str]:
    output_paths = []
    for year, month in get_range(start_date, end_date):
        for url in list_urls_for_month():
            raw_path = os.path.join(cache_dir, os.basename(url))
            if not os.path.exists(raw_path):
                download(local_path)
            csv_path = get_final_path(raw_path) # can look inside the zip to get CSV filename
            output_path = swap_ext(csv_path, format) 

            if not os.path.exists(output_path):
                if not os.path.exists(csv_path):
                    extract(raw_path, csv_path)
               convert(csv_path, output_path)
      
             output_paths.append(output_path)
    return output_paths
              
def dynamic_compilers():
    filenames = cache_compiler()
    return pd.concat([pd.read_feather(f) for f in filenames])
    

@mdavis-xyz mdavis-xyz marked this pull request as draft January 25, 2025 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant