Swapping Out Underlying Parquet Files #987
-
Hi! I'm considering using ParadeDB in a scenario where we replace multiple tables' underlying parquet files, with files exported nightly from our ERP, as a sort of brute force update. Is this feasible without causing disruptions or other problems? The database load is otherwise very low at these times. |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 5 replies
-
Hi @RalfNorthman! Thanks for opening this. Can you share more about how you're planning to replace the underlying Parquet files? Would you simply override the Can you share a bit more about your use case and your interest in ParadeDB vs any other tool you might be considering here? Re: your question -- @rebasedming can provide the best answer, but I would suspect that it will work properly if you follow the naming pattern of the existing files and the same path. |
Beta Was this translation helpful? Give feedback.
-
Hello!
Yes, overwriting to the same file names using odbc2parquet. Side note: one of the ERP tables uses EAV, but are exported out on a pivoted wide format to be easier to query. In this new format it has roughly 600 columns, would this pose a problem as a Postgres parquet table? (But even if not, we couldn't use Diesel with it, since Diesel only has support for up to 128 columns).
We are using MariaDB with the CONNECT plugin to query joins between foreign ERP tables and our custom data in native tables, but we are not happy with this solution, especially when it comes to analytical work loads. Therefore, we have used Polars with exported parquet files for some queries. But, it would be much nicer to have everything in the same, Diesel-compatible database (like Postgres). We're happy to see that work on joins between parquet and heap tables is underway in #918.
Sounds promising, but I'm curious to hear what he has to say. |
Beta Was this translation helpful? Give feedback.
-
Manually copying Parquet files into the Postgres directory won't work because we use delta logs on top of the Parquet files, and those logs are generated when you write SQL DML commands. One way to upload Parquet files into Postgres is with a Parquet FDW, and then copy over the foreign Parquet table into an actual Postgres table. I've seen one or two of these extensions that you can try. Alternatively, if you already use Polars, then you can probably convert your Parquet files into CSV and COPY them into Postgres. |
Beta Was this translation helpful? Give feedback.
-
Okay, thanks! If any other relevant information comes to mind I would be happy to hear it. |
Beta Was this translation helpful? Give feedback.
-
Hi again!
EDIT: I found this in the
So I guess the answer to my first question is: yes, with |
Beta Was this translation helpful? Give feedback.
Manually copying Parquet files into the Postgres directory won't work because we use delta logs on top of the Parquet files, and those logs are generated when you write SQL DML commands.
One way to upload Parquet files into Postgres is with a Parquet FDW, and then copy over the foreign Parquet table into an actual Postgres table. I've seen one or two of these extensions that you can try. Alternatively, if you already use Polars, then you can probably convert your Parquet files into CSV and COPY them into Postgres.