StructuredRegex

Data and Code for paper Benchmarking Multimodal Regex Synthesis with Complex Structures .

Data

We provide raw data, tokenized data, and data with anonymized const strings.

Natural Language Descriptions of raw version contain the original raw annotations from Turkers. In tokenized version, we preprocessed and tokenized the descriptions. In anonymized version, we further replaced the contants mentioned in the descriptions with anonymous symbols. E.g., given the NL-regex pair [it must contain the string "ABC". --> contain(<ABC>)], we replace "ABC" with symbol const0 in both NL and regex, and hence the const-anonymized NL-regex pair should be [it must contain the string const0. --> contain(const0)].

All data is presented in TSV format, with fields including：

problem_id -- unique ID of the target regex.
description -- Turker annotated description.
regex -- target regex.
pos_examples -- positive examples.
neg_examples -- negative examples.
const_values -- mapping from symbols to the real string values, only existing in anonymized version.

Code

Requirements

pytorch > 1.0.0

We've attached pretrained checkpoints in code/checkpoints/pretrained.tar, which is ready to use. You can also reproduce the experimental results following the steps below (Please execute the commands in code directory)

Train

python train.py StReg --model_id <model_id>. The models will be stored in checkpoints/StReg directory with names following model_id*.tar.

Decode

python decode.py StReg <model_id> --split test*. The derivations will be generated using the checkpoints/StReg/<model_id>.tar and be outputed to decodes/StReg/ directory.

Evaluate

python eval.py StReg <decode_id> --split test*. Note that we report DFA accuracy (refer to the paper for more details).

Sampling Regexes and I/O Examples

see README and usage_example.py in toolkit.

Easy API for Checking Equivalence and I/O Consistency

see 'easy_eval/usage_example.py`

It also contains code for parsing the specification into AST that is easy to operate, and some code skeletons that can be completed to convert the specification in our DSL into standard regex.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
code		code
data		data
easy_eval		easy_eval
quick_eval		quick_eval
toolkit		toolkit
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StructuredRegex

Data

Code

Requirements

Sampling Regexes and I/O Examples

Easy API for Checking Equivalence and I/O Consistency

About

Releases

Packages

Languages

License

xiye17/StructuredRegex

Folders and files

Latest commit

History

Repository files navigation

StructuredRegex

Data

Code

Requirements

Sampling Regexes and I/O Examples

Easy API for Checking Equivalence and I/O Consistency

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages