Skip to content

Commit

Permalink
docs: add faqs for clex
Browse files Browse the repository at this point in the history
  • Loading branch information
rootCircle committed Feb 23, 2024
1 parent 2ad9d46 commit 85d4fd6
Show file tree
Hide file tree
Showing 5 changed files with 100 additions and 27 deletions.
17 changes: 5 additions & 12 deletions .idea/workspace.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ cpast generate "S[10,'U']"
## Language Specification

The `clex` language generator is based on a custom grammar specification. It allows you to define input patterns for testing.
For more information on the `clex` language and its usage, please refer to the [Grammar Rules for Clex Generator](./CLEX_LANGUAGE.md).
For more information on the `clex` language and its usage, please refer to the [Grammar Rules for Clex Generator](./clex.specs.md).

## Roadmap

Expand Down
33 changes: 19 additions & 14 deletions clex.specs.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Clex Language

Clex is a generator language, that can generate a set of random numbers/string based on a given grammar rules. It doesn't supports arithmetic, logical or any other relationship except in case of back-references. The AST is same for a language in all the case, while the generated string from the language will vary.
Clex is a generator language, that can generate a set of random numbers/string based on a given grammar rules.

It doesn't support arithmetic, logical or any other relationship except back-references. The AST is same for a language in all the case, while the generated string from the language will vary.

For example: S[4,'U'] can generate "GAHS" or "JHAS" etc.

Expand Down Expand Up @@ -68,18 +70,21 @@ Denotes a set of characters from which a string is going to be randomly generate

### CharacterSet

Just _Character_ enclosed within singles quotes to represent the character set. They are by design enclosed in single quotes, so as to differentiate character set from _DataType_, so as to avoid ambiguity.
Just _Character_ enclosed within singles quotes to represent the character set. They are by design enclosed in single quotes, to differentiate character set from _DataType_, to avoid ambiguity.

### GroupNo

Represents the group number for back-referencing. One awesome thing about clex language is its support for dynamic back-references as compared to static ones as found in regex. Each _CapturingGroup_ captures and stores a element by value indexed from 1. Obviously, it can't be more than the number of _CapturingGroup_ present in _ClexLanguage_.
Represents the group number for back-referencing. One awesome thing about clex language is its support for dynamic back-references as compared to static ones as found in regex. Each _CapturingGroup_ captures and stores an element by value indexed from 1. Obviously, it can't be more than the number of _CapturingGroup_ present in _ClexLanguage_.

### Reference

_Reference_ can be a back-reference to a capturing group (GroupNo) or a numeric value (i64). It is used in Range to specify the bounds. If not specified, default values are used. Prime purpose of _Reference_ is to act as an abstraction layer to store the literal value or the reference of the value that will be guaranteed to be available in future upon use.

Back-referencing is done by using `"\\" GroupNo`, in this case the value in that specific Group is de-referenced upon use and put back in as a value.

lex uses 1-based indexing for backreferences, rather than zero-based like many other regular expression engines.


### PositiveReference

_PositiveReference_ is similar to Reference but ensures that the referenced value is non-negative. It is used in _PositiveRange_.
Expand All @@ -90,11 +95,11 @@ _Quantifier_ specify the number of occurrences for the preceding expression. The

### Range

_Range_ specifies a domain of values for numeric _DataType_ (Integer and Float) from which its value will be generated during generator phase. It includes _Reference_(s) for the lower and the upper bound for the number to be generated. If not specified, default values(INT64_MIN, INT64_MAX) are used. The upper and lower bound is always an integer.
_Range_ specifies a domain of values for numeric _DataType_ (Integer and Float) from which its value will be generated during generator phase. It includes _Reference_(s) for the lower and the upper bound for the number to be generated. If not specified, default values(INT64_MIN, INT64_MAX) are used. The upper and lower bound is always an integer(even if defining range for float data types also). Range is always inclusive, so `[m, n]` would mean that value can be anywhere from `m` to including `n`.

### PositiveRange

_PositiveRange_ is similar to _Range_ but ensures that the specified references are non-negative(using _PositiveReference_). It includes _PositiveReference_ for the lower and the upper bound for the number to be generated. If not specified, default values(0, INT64_MAX) are used. The upper and lower bound is always an non-negative integer.
_PositiveRange_ is similar to _Range_ but ensures that the specified references are non-negative(using _PositiveReference_). It includes _PositiveReference_ for the lower and the upper bound for the number to be generated. If not specified, default values(0, INT64_MAX) are used. The upper and lower bound is always a non-negative integer.

### StringModifier

Expand All @@ -108,7 +113,7 @@ _DataType_ represents different types of data that can be generator. It includes

A _NonCapturingGroup_ is a _UnitExpression_ that groups other expressions without capturing the matched text, i.e. no account in group register is hold for it. The "(?:" and ")" denote the start and end of the non-capturing group. It can contain other unit expressions and may have associated quantifiers. A NonCapturingGroup can be nested and/or store _CapturingGroup_ as well.

However, it's worth mentioning that if the _NonCapturingGroup_ is repeated using _Quantifier_ and there is a _CapturingGroup_ inside that _NonCapturingGroup_, then the _CapturingGroup_ will only have only one group number, not many for each iterations.
However, it's worth mentioning that if the _NonCapturingGroup_ is repeated using _Quantifier_ and there is a _CapturingGroup_ inside that _NonCapturingGroup_, then the _CapturingGroup_ will only have one group number, not many for each iteration.

Example : (?:(N)){3} : In this the group number of N will always be one, irrespective of how many times it's called. It won't be 1, 2, 3.

Expand Down Expand Up @@ -141,7 +146,7 @@ In essence, ClexLanguage is the top-level structure that encapsulates the entire

- Whitespace(s) introduced at any stages are eaten completely by the lexers. So, space are treated the same way as typical comments in other languages.

- _PositiveReference_ must have their dereferenced values always positive. This rule is enforced by ensuring that the value generated in any _CapturingGroup_ is always an non-negative integer.
- _PositiveReference_ must have their dereferenced values always positive. This rule is enforced by ensuring that the value generated in any _CapturingGroup_ is always a non-negative integer.

- At any given instance, _GroupNo_ CANNOT EXCEED the **total number of occurrences of _CapturingGroup_** in that specific Language. So, if there are only three capturing group in that language, then language will not allow _GroupNo_ > 3.

Expand All @@ -165,10 +170,10 @@ In essence, ClexLanguage is the top-level structure that encapsulates the entire

## Examples

- N{2}
- (N) (?:N){\\1}
- (N) (?:S[\\1,])
- (N) (?:S[\\1,'U'])
- N S C
- F[-100,100]
- (N[1,100]) (?:N[1,1000]){\\1} N[1,10000]
- `N{2}` : Generates two random integers.
- `(N) (?:N){\\1}` : Generates a random integer, then the same number of additional integers.
- `(N) (?:S[\\1,])` : Generates a random integer, then a string of that length.
- `(N) (?:S[\\1,'U'])` : Generates a random integer followed by a random string of uppercase letters, where the length of the string is equal to the generated integer.
- `N S C` : Generates a random integer, string, and character.
- `F[-100,100]` : Generates a random floating-point number between -100 and 100.
- `(N[1,100]) (?:N[1,1000]){\\1} N[1,10000]` : Captures a random integer between 1 and 100, then generates that many integers between 1 and 1000, followed by another integer between 1 and 10000.
File renamed without changes.
75 changes: 75 additions & 0 deletions docs/clex/FAQs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
## Clex Frequently Asked Questions (FAQs)

This document is a compilation of frequently asked questions (FAQs) about the Clex language and its grammar. The questions and answers are based on a conversation between a large language model and another entity knowledgeable about Clex.

**General:**

* **What is Clex?**
Clex is a language for generating random text. It uses a simplified grammar similar to regular expressions but focuses on generating random values rather than matching text patterns.

* **What are the strengths of Clex?**
Clex is known for its simplicity, efficiency, and ease of use. Its limited feature set makes it intuitive to learn and use, particularly for users familiar with basic regular expressions.

* **What are some areas for potential improvement in Clex?**
While simplicity is valuable, some users might benefit from features that enhance expressiveness without compromising efficiency. Additionally, user customization options and a potential community around Clex could encourage exploration and expansion of the language.

**Grammar:**

* **What are capturing groups in Clex?**
Capturing groups capture single values during random text generation and can be referenced later in the expression using backreferences. They are numbered based on their order of appearance, starting from 1.

* **Can capturing groups be nested?**
No, nesting of capturing groups is not supported in Clex.

* **How does Clex handle overlapping capturing groups?**
Clex does not directly support overlapping capturing groups. Overlapping matches are either disallowed or require special syntax depending on future development decisions.

* **What are quantifiers in Clex?**
Quantifiers specify how many times a preceding element can be repeated during random text generation. Quantifiers can be applied to non-capturing groups but not directly to capturing groups.

* **Can quantifiers be nested?**
No.

* **Can ranges in data types specify non-numeric values?**
No, ranges within data types are currently limited to numeric values. Supporting character ranges or string length ranges is considered for future development, but potential complexity is a concern.

* **Does Clex support advanced features like conditional branching or lookarounds?**
No, Clex does not currently support features like conditional branching or lookarounds, as these would significantly increase complexity without clear benefits for its core use cases.

* **Does Clex offer user-defined functions or macros for customization?**
No, the current design of Clex does not include user-defined functions or macros. However, future versions might explore alternative mechanisms for customization without compromising simplicity.

* **What happens if a generated integer falls outside the specified range?**
The random generated number adheres to the set bound of min and max value of the specified range, ensuring it never falls off the range.

* **How are potential precision issues dealt with for floating-point numbers?**
Clex uses the standard `double` type for floating-point numbers, providing a familiar level of precision similar to languages like C and C++.

* **Does Clex have built-in mechanisms for error handling or validation?**
Yes, Clex has error handlers at various levels, from the lexer to the parser to the generator. When an error occurs, the program throws an error, which can be handled or not handled by the user/developer.

* **What are common use cases or applications for Clex?**
- Debugging programming problems (like testing CP/DSA questions)
- Live hacking during/after coding contests
- Generating test cases for problem setters

**Specific Questions:**

* **When specifying a character set for string generation, can it include individual characters (e.g., 'a', 'z') alongside character classes (e.g., 'N', 'L')?**
No, including individual characters within the character set for string generation is not currently supported in Clex. This design choice prioritizes random text generation, where character classes offer more flexibility.

* **How is the ambiguity between reference to a group number and a literal numeric value resolved during parsing?**
For backreferences, a backslash (\) prefix denotes a reference to a captured group. So, "1" always refers to a literal value, while "\1" refers to the first captured group.

* **Can quantifiers be nested? For example, is (N{2}){3} a valid expression?**
No, while nesting quantifiers within non-capturing groups is allowed, applying quantifiers directly to capturing groups is not supported in Clex.

**Comparison to Regular Expressions:**

* **How does Clex compare to traditional regex engines like PCRE or RE2?**
Clex shares some similarities with Regex in syntax and concept, but their focuses differ. Regex is used for matching text patterns, while Clex specializes in generating random text. This leads to differences in expressiveness and limitations. Clex prioritizes simplicity and efficiency for random generation, while Regex offers more complex features for pattern matching.

**Additional Notes:**

* This FAQ is based on the current understanding of Clex and its limitations. Future development might introduce changes or enhancements to the language.
* If you have further questions or suggestions regarding Clex, feel free to explore the language and its potential or engage with potential communities that might form around it.

0 comments on commit 85d4fd6

Please sign in to comment.