-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set conditions using the PAR model #1320
Comments
Hi @tosador, thanks for the feature request. It would be helpful if you could describe what you mean by 3 timeseries. Are |
Hi @npatki, thanks for your reply and sorry for the lack of clarity.
Please find below a snippet of the code used to build this example:
When I run the code, the synthetic data do not satisfy the constraint used to build the dataset:
|
Hi @tosador, the PAR model is suited for data that has multiple sequences within a single table. I think if you have no I wonder if you'll be better off applying a tabular model -- such as CTGAN or GaussianCopula? You can then apply constraints to hardcode logic that BTW if you haven't already, I'd recommend upgrading to the new SDV 1.0 releases, as it fixes some bugs, offers a cleaner API and more functionality. Some relevant docs:
|
Hi @npatki, thanks for your reply. I think that if I apply a tabular model the time dependancy of each timeseries will be lost. I will be able to set the constraints but the correlation between, for example, ts1[i] and ts1[i-1] will be different. What I would like to generate is synthetic financial ohlc bars. So, the dataset is built by 4 time series where:
then, there is a correlation between each time step. In my understanding the PAR model should be the right one, but sometimes the above constraints are not satisfied by the synthetic data. Thanks, I will upgrade to the new SDV 1.0 since I have not done it yet! |
Hi @npatki, I have upgraded SDV to 1.0 and I have really appreciated the cleaner API and the new functionalities. Using SDV 1.0 I wrote the code to generate what I need. However I am getting the below UserWarning: I think that when the PARSynthesizer will handle the constraints, it will be possible to generate synthetic financial bars. Do you know when it will be possible to handle constraints using the PAR model? Below, as reference, the code that creates the timeseries, constraints and produce the UserWarning:
|
Hi @tosador, I'm glad you're finding the new API more clear and useful. That was our main goal 😄 With SDV 1.0, certain "constraints" are automatically me such as enforcing that the min and max values in the synthetic data are within the appropriate ranges. But more complex constraints like We have an open issue #570 for tracking constraints on the PAR model. While it has not yet been prioritized, seeing more usages and demand for this will definitely help us add it to our roadmap. So if you want to add your use case to that issue (including how you want to ultimately use the synthetic data), that will be helpful. |
Problem Description
I'm using ther PAR model to generate synthetic timeseries data and I noticed that it seems not possible to set conditions when generating more than one timeseries together.
Given 3 timeseries, for example, when the first one must always be lower than the second one and the second one lower than the third one, I noticed that sometimes the boundaries are not satisfied and I am not sure if increasing the number of epochs is a feasible solution.
Expected behavior
When simulating timeseries as in the example described above, I expect that each simulated data for the 3 timeseries ts1, ts2 and ts3 satisfies the below condition:
Additional context
<Please provide any additional context that may be relevant to the issue here. If none, please remove this section.>
The text was updated successfully, but these errors were encountered: