Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Synthetic CrossCat datasets #175

Merged
merged 3 commits into from
Aug 21, 2024
Merged

Add Synthetic CrossCat datasets #175

merged 3 commits into from
Aug 21, 2024

Conversation

srvasude
Copy link
Collaborator

Add four synthetic CrossCat datasets for each of the data types. Will add unit tests that verify that we get the appropriate clusterings with these later.

@@ -0,0 +1,4 @@
col1 ~ bernoulli(id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the relation name here have to match the second column of the .obs file? Here it is "col1" but the .obs file uses "has_col1".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this was a bad push. Fixed here and elsewhere.

@@ -0,0 +1,4 @@
col1 ~ stringcat[strings="a:b:c:d",delim=:](id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the "has_" issue, there also appears to be an off by one error. Here, "col1" is the one that has a/b/c/d values, but it is "has_col0" in the .obs file that has those values.

Similarly for the other relations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed as well.

NUM_SAMPLES_1 = 33
NUM_SAMPLES_2 = 50
NUM_SAMPLES = 100

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding some tests for this program?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding integration tests in a later PR that will test the files (test some basic invariants, like creating two IRMs, etc)

@srvasude srvasude merged commit f7a39b8 into master Aug 21, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants