Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: undeclared identifier: 'TK_UNKNOWN' #1

Closed
JewishLewish opened this issue Dec 12, 2022 · 32 comments
Closed

Error: undeclared identifier: 'TK_UNKNOWN' #1

JewishLewish opened this issue Dec 12, 2022 · 32 comments

Comments

@JewishLewish
Copy link

image

import tables
import toktok
import lexbase


tokens:
    Plus      > '+'
    Minus     > '-'
    Multi     > '*'
    Div       > '/'
    Assign    > '='
    Comment   > '#' .. EOL      # anything from `#` to end of line
    CommentAlt > "/*" .. "*/"   # anything starting with `/*` to `*/`
    Var       > "var"
    Let       > "let"
    Const     > "const"
    BTrue     > @["TRUE", "True", "true", "YES", "Yes", "yes", "y"]
    BFalse    > @["FALSE", "False", "false", "NO", "No", "no", "n"]

when isMainModule:
    var lex = Lexer.init(fileContents = readFile("sample.txt"))
    if lex.hasError:
        echo lex.getError
    else:
        while true:
            var curr = lex.getToken()           # tuple[kind: TokenKind, value: string, wsno, col, line: int]
            echo curr

I get these errors when just running the default sample code

@JewishLewish
Copy link
Author

Looking inside the code, it shows that it has something to do with this code:
image

@georgelemon
Copy link
Member

Thanks! Just updated the README.

Call settings proc before tokens macro, with uppercase enabled and this prefix. I will find a solution for hardcoded TK_UNKNOWN

static:
    Program.settings(
        uppercase = true,
        prefix = "Tk_"
    )

For more inspiration you can check Tim Engine https://github.com/openpeep/tim

@JewishLewish
Copy link
Author

There also seems to be an error with adding values.

I added "LColon" and "RColon" into the token list:

tokens:
    Plus      > '+'
    Minus     > '-'
    Multi     > '*'
    Div       > '/'
    LCol      > '('
    RCol      > ')'

However, when it outputs values: it doesn't recognize it.

(kind: TK_LCOL, value: "", wsno: 0, line: 1, col: 0, pos: 0)
(kind: TK_STRING, value: "test", wsno: 0, line: 1, col: 1, pos: 1)
(kind: TK_RCOL, value: "", wsno: 0, line: 1, col: 7, pos: 7)

The file contents are:

("test")

@georgelemon
Copy link
Member

Seems fine to me, you have TK_LCOL (pos 0) and TK_RCOL (pos 7).

If you're talking about the empty value field, well, there is no need to store (, ) characters

@JewishLewish
Copy link
Author

Ah I didn't notice the "pos" / .col syntax. In that case, it works perfectly!

@JewishLewish
Copy link
Author

Is there a Program.settings for removing whitespace?

@georgelemon
Copy link
Member

No. Whitespaces are counted and stored in wsno field

@georgelemon
Copy link
Member

Not added in docs, but here is how you can handle 2 or more characters for multiple use cases

So for example > which can be >=

tokens:
    GT           > '>':
        GTE      ? '='

    LT           > '<':
        LTE      ? '='

     # string based tokens TK_AT, TK_INCLUDE and TK_MIXIN
    At           > '@':
        Include  ? "include"
        Mixin    ? "mixin"

@JewishLewish
Copy link
Author

JewishLewish commented Dec 13, 2022

Not added in docs, but here is how you can handle 2 or more characters for multiple use cases

So for example > which can be >=

tokens:
    GT           > '>':
        GTE      ? '='

    LT           > '<':
        LTE      ? '='

     # string based tokens TK_AT, TK_INCLUDE and TK_MIXIN
    At           > '@':
        Include  ? "include"
        Mixin    ? "mixin"

I mean if I had:

echo "Test"

echo "Test2"

How do you seperate multi-line when it doesn't even tokenize NEWLINE ("\n")
The only way to do (or atleast I can try to do) is add ";" on the end of the command and make it so if the code sees that then it should break apart and run the current command and continue with the next.

@georgelemon
Copy link
Member

There is no need for tokenizing new lines or whitespaces. Imagine how many useless tokens would be.
That's why you have wsno and line field for each TokenTuple returned by getToken proc.

So based on your example will be

(kind: TK_ECHO, value: "echo", wsno: 0, line: 1, col: 0, pos: 0)
(kind: TK_STRING, value: "Test", wsno: 1, line: 1, col: 5, pos: 5)
(kind: TK_ECHO, value: "echo", wsno: 0, line: 2, col: 0, pos: 0)
(kind: TK_STRING, value: "Test", wsno: 1, line: 2, col: 5, pos: 5)

@georgelemon
Copy link
Member

Anything else should be implemented at Parser level and AST

@JewishLewish
Copy link
Author

There is no need for tokenizing new lines or whitespaces. Imagine how many useless tokens would be. That's why you have wsno and line field for each TokenTuple returned by getToken proc.

So based on your example will be

(kind: TK_ECHO, value: "echo", wsno: 0, line: 1, col: 0, pos: 0)
(kind: TK_STRING, value: "Test", wsno: 1, line: 1, col: 5, pos: 5)
(kind: TK_ECHO, value: "echo", wsno: 0, line: 2, col: 0, pos: 0)
(kind: TK_STRING, value: "Test", wsno: 1, line: 2, col: 5, pos: 5)

That is smart but there is the challenge of telling the code that "This is a new line."
I could make it so it check each line number and when there is a new line then it would execute the code and continue. 🤔

@JewishLewish
Copy link
Author

JewishLewish commented Dec 13, 2022

edit:
I am starting to understand what "wsno" is now because I was confused on where it was getting that value. That is smart!!!

My man you truly have won the internet.

@georgelemon
Copy link
Member

georgelemon commented Dec 13, 2022

Haha. Glad you like this! There are still many things to do.

And this would be a minimal parser for your toktok

type
    Parser* = object
        lex: Lexer
        prev, current, next: TokenTuple
        error: string

proc setError(p: var Parser, msg: string) =
    ## Set parser error
    p.error = "Error ($2:$3): $1" % [msg, $p.current.line, $p.current.pos]

proc walk(p: var Parser, offset = 1) =
    var i = 0
    while offset != i:
        p.prev = p.current
        p.current = p.next
        p.next = p.lex.getToken()
        inc i

var p = Parser(lex: Lexer.init(fileContents = readFile("sample.txt")))
p.current = p.lex.getToken()
p.next = p.lex.getToken()

while p.current.kind != TK_EOF and p.error.len == 0:
    case p.current.kind:
    of TK_LET, TK_VAR, TK_CONST:
        let this = p.current
        if p.next.kind == TK_IDENTIFIER:
            discard # handle var declaration
        else:
            p.setError("Invalid variable declaration expect identifier")
            break
    of TK_PLUS, TK_MINUS, TK_MULTI, TK_DIV:
        discard # handle math
    else: discard # and so on
    walk(p) # walk to next token

if p.error.len != 0:
    echo p.error

@JewishLewish
Copy link
Author

JewishLewish commented Dec 14, 2022

There also seems to be an error with the comment.

When I put:

*This is a test*

/* This is a test*/

I get:

TK_MULTI
TK_IDENTIFIER
TK_IDENTIFIER
TK_IDENTIFIER
TK_IDENTIFIER
TK_MULTI
TK_DIV
TK_MULTI
TK_IDENTIFIER
TK_IDENTIFIER
TK_IDENTIFIER
TK_IDENTIFIER
TK_MULTI
TK_DIV
TK_EOF

However, the "#" seems to work perfectly.

TK_COMMENT
TK_EOF

@georgelemon
Copy link
Member

Yep. That does not work right now. I hope will be fixed soon

georgelemon added a commit that referenced this issue Dec 14, 2022
Signed-off-by: George Lemon <georgelemon@protonmail.com>
@georgelemon
Copy link
Member

Ok, now this should work. Reinstall your toktok package.

tokens:
        Div       > '/':
            BlockComment ? '*' .. "*/"            # everything starting from /* to */ (tail should be a string)
            InlineComment ? '/' .. EOL            # everything starting from // to EOL (end of line)

This is work in progress (markdown use case)

tokens:
        H1   > '#' .. EOL:
            H2 ? '#' .. EOL
            H3 ? '#' .. EOL

georgelemon added a commit that referenced this issue Dec 14, 2022
Signed-off-by: George Lemon <georgelemon@protonmail.com>
@JewishLewish
Copy link
Author

JewishLewish commented Dec 14, 2022

The reason why I was asking was because I am having trouble on trying to design a truly functioning if statement / white statement.
Making one work is easy but trying to make it work INSIDE another is difficult.

The tokenizer needs to have some form of indicator of which part of the code is INSIDE another part of the code.
Ex.

If "True" == "True" {
  //Code Here
}

The {} can be used but in a situation where we put 2 If statements inside another, that becomes complicated. That's why I asked if there was a way to strip whitespace or merge lines similiar to:
If "Test" == "Test" {//Code Here}

@georgelemon
Copy link
Member

georgelemon commented Dec 14, 2022

This is a generic lexer, that's why you should write this kind of logic at parser level.

By the way, instead of tables, I recommend you implement AST nodes using Nim objects.

I made a functional Parser + AST nodes, based on current toktok Lexer that does what you need. You can start from this.
https://github.com/openpeep/toktok/blob/main/examples/program.nim

if 2 == 2 { /* something cool */ }

will produce the following ast

{
  "nodes": [
    {
      "nodeName": "NTCondition",
      "nodeType": 3,
      "ifCond": {
        "nodeName": "NTInfix",
        "nodeType": 6,
        "infixOp": 1,
        "infixLeft": {
          "nodeName": "NTInt",
          "nodeType": 0,
          "intVal": 2
        },
        "infixOpSymbol": "EQ",
        "infixRight": {
          "nodeName": "NTInt",
          "nodeType": 0,
          "intVal": 2
        }
      },
      "ifBody": {
        "nodeName": "NTStmtList",
        "nodeType": 7,
        "stmtList": [
          {
            "nodeName": "NTBlockComment",
            "nodeType": 5,
            "comment": " something cool "
          }
        ]
      },
      "elseBody": null,
      "elifBranch": []
    }
  ]
}

While this can't be valid

if 1 >= 1 { // this fails}

so will throw an error

Error (4:26): EOF reached before closing condition body

Pretty simple

@JewishLewish
Copy link
Author

This is a generic lexer, that's why you should write this kind of logic at parser level.

By the way, instead of tables, I recommend you implement AST nodes using Nim objects.

I made a functional Parser + AST nodes, based on current toktok Lexer that does what you need. You can start from this. https://github.com/openpeep/toktok/blob/main/examples/program.nim

Oh damn that's beautiful.
I, myself, am not that big in the field of computer science however I do love programming and data science.

I started working on a side-project programming lang called "Barcelona" that was suppose to make database/statistics/requests data much easier and straight to the point.
It is an experimental project that has a mixture of Rust's memory safety tools, Python's simple syntax and Nim's Performance. In addition has built in translators to convert code to Python or Rust or make Python files execute Barcelona's code to make request process quicker.

I am going to be having a 6-week college break so I plan to severally work on Barcelona but I am new to understanding how a program. language works.

Any advice would be appreciated.

@georgelemon
Copy link
Member

georgelemon commented Dec 14, 2022

Nim is a good start. Also thanks for trying toktok. Hobby projects are awesome.

Maybe you don't have to write a micro language, you have Nim macros!
https://nim-lang.org/docs/macros.html

Also, check Computer Programming with the Nim Programming Language

@JewishLewish
Copy link
Author

Nim is a good start. Also thanks for trying toktok. Hobby projects are awesome.

Maybe you don't have to write a micro language, you have Nim macros! https://nim-lang.org/docs/macros.html

Also, check Computer Programming with the Nim Programming Language

Nim is an interesting language, I started learning a few days ago and it looks pretty good. I love how it's a mixture of Python's Simplicity, C's Performance and Lisp's flexibility. In addition to the fact that you can compile code to C, C++, Obj C is really cool. (also JS)

The only problem I am having is trying to make a function CERTAIN elements in a list.
Ex. in python's list, you can do List[3:-1] where it would get the 3rd value to 2nd-last value. Nim doesn't seem to have that ability (or I can't find it).

Also Windows Defender seems to find it as a virus after version 1.4.0 (idk why). Cool and unique language to learn.

@georgelemon
Copy link
Member

georgelemon commented Dec 14, 2022

Regarding Windows Defender, that's a false positive. nim-lang/Nim#17820. Report as a false positive, if is possible.

You mean something like this?

var a = @["bread", "cake", "tomorrow"]
echo a[1 .. ^1]     # output @["cake", "tomorrow"]
echo a[2 .. ^1]     # output @["tomorrow"]

Bookmark this String Functions: Nim vs Python. Most of these examples work with seq and array / openarray

@JewishLewish
Copy link
Author

Regarding Windows Defender, that's a false positive. nim-lang/Nim#17820. Report as a false positive, if is possible.

Ah that would make a lot of sense tbh. I recall there was a time where you can trigger people's windows defenders via discord just by sending pictures / videos.

You mean something like this?

Something like that! Thank you!

@JewishLewish
Copy link
Author

image

I tried to make a seperate nim file to run onto the main nim file and it seems to give out an interesting error.

@JewishLewish
Copy link
Author

Update it appears the error is being created because I am importing std/strutils from the 2nd file (not the main file)

@JewishLewish
Copy link
Author

JewishLewish commented Dec 17, 2022

Also
image

UPDATE THIS WAS FIXED!

@JewishLewish
Copy link
Author

Do you think you can add the token for float numbers?

The token for:
2.0 4 4.0

Would be:
(kind: TK_INTEGER, value: "2", wsno: 0, line: 1, col: 0, pos: 0) (kind: TK_PERIOD, value: "", wsno: 0, line: 1, col: 1, pos: 1) (kind: TK_INTEGER, value: "44", wsno: 0, line: 1, col: 2, pos: 2) (kind: TK_PERIOD, value: "", wsno: 0, line: 1, col: 5, pos: 5) (kind: TK_INTEGER, value: "0", wsno: 0, line: 1, col: 6, pos: 6) (kind: TK_EOF, value: "", wsno: 0, line: 1, col: 7, pos: 7)

@georgelemon
Copy link
Member

Thanks for that! Update your local toktok

@JewishLewish
Copy link
Author

Another suggestion I would offer is the idea of multi-threading. Do you think it would be possible for it to be implemented?

@JewishLewish
Copy link
Author

image

Not sure why but it gives off this "duplicate" import warning from tiktok.

@georgelemon
Copy link
Member

try use toktok in a separate file, let's say "tokens.nim", there you will define your tokens. Then import tokens.nim in your parser.

tokens.nim

import toktok

tokens:
  # here

parser.nim

import ./tokens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants