Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rebuild index for new prop #3332

Merged
merged 4 commits into from
Dec 28, 2021
Merged

rebuild index for new prop #3332

merged 4 commits into from
Dec 28, 2021

Conversation

sworduo
Copy link
Contributor

@sworduo sworduo commented Nov 18, 2021

What type of PR is this?

  • bug
  • feature
  • enhancement

What does this PR do?

rebuild tag index with old schema version value

Which issue(s)/PR(s) this PR relates to?

#3274 (reference)

Special notes for your reviewer, ex. impact of this fix, etc:

Additional context:

Checklist:

  • Documentation affected (Please add the label if documentation needs to be modified.)
  • Incompatible (If it is incompatible, please describe it and add corresponding label.)
  • Need to cherry-pick (If need to cherry-pick to some branches, please label the destination version(s).)
  • Performance impacted: Consumes more CPU/Memory

Release notes:

Please confirm whether to reflect in release notes and how to describe:

                                                            `

@Sophie-Xie Sophie-Xie added the ready-for-testing PR: ready for the CI test label Nov 18, 2021
@critical27
Copy link
Contributor

Thx for contribution, does this PR origin from https://discuss.nebula-graph.com.cn/t/topic/6376?

@sworduo
Copy link
Contributor Author

sworduo commented Nov 22, 2021

Thx for contribution, does this PR origin from https://discuss.nebula-graph.com.cn/t/topic/6376?

Yes

Copy link
Contributor

@critical27 critical27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a suggestion to fix this issue:

  1. Perhaps don't modify the RowReader or RowReaderWrapper, because RowReader is bound to a specific schema by design, and the setLatestSchema method will introduce meny problem. (e.g. read from an old schema by latest schema will have undefined behaviors)
  2. To fix this issue, when we call reader->getValueByName in IndexKeyUtils::collectIndexValues, we will return NullType::UNKNOWN_PROP for a not existed property in old schema (see line 55 in RowReaderV2.cpp). When you find the value is unknown, we can use latest schema to check if it is nullable or has default value:
    • return error when it isn't nullable and does not have default value
    • or use the predefined value to build the index

@critical27 critical27 added the community Source: who proposed the issue label Nov 22, 2021
@sworduo
Copy link
Contributor Author

sworduo commented Nov 22, 2021

I have a suggestion to fix this issue:

  1. Perhaps don't modify the RowReader or RowReaderWrapper, because RowReader is bound to a specific schema by design, and the setLatestSchema method will introduce meny problem. (e.g. read from an old schema by latest schema will have undefined behaviors)

  2. To fix this issue, when we call reader->getValueByName in IndexKeyUtils::collectIndexValues, we will return NullType::UNKNOWN_PROP for a not existed property in old schema (see line 55 in RowReaderV2.cpp). When you find the value is unknown, we can use latest schema to check if it is nullable or has default value:

    • return error when it isn't nullable and does not have default value
    • or use the predefined value to build the index

1.The scheme used to read data is the schema included in the row data instead of the latest schema. The latest schema is only used in IndexKeyUtils::collectIndexValues when the schema included in the data is different from the latest schema. Besides, since the index will be deleted when the associated prop is deleted, I'm not very sure what problem will be introduced.

2.In this case, a new parameters named latestSchema need to be added in IndexKeyUtils::collectIndexValues and everywhere call this function need to modify as well. Of course, that's OK.

@sworduo
Copy link
Contributor Author

sworduo commented Nov 22, 2021

By the way, I wonder how to call pre-commit by my-self before commit...

@critical27
Copy link
Contributor

critical27 commented Nov 22, 2021

By the way, I wonder how to call pre-commit by my-self before commit...

You mean the lint failed? We need to do the git clang-format

@critical27
Copy link
Contributor

critical27 commented Nov 22, 2021

1.The scheme used to read data is the schema included in the row data instead of the latest schema. The latest schema is only used in IndexKeyUtils::collectIndexValues when the schema included in the data is different from the latest schema. Besides, since the index will be deleted when the associated prop is deleted, I'm not very sure what problem will be introduced.

2.In this case, a new parameters named latestSchema need to be added in IndexKeyUtils::collectIndexValues and everywhere call this function need to modify as well. Of course, that's OK.

Perhaps I didn't clarify very clearly. The reason I don't suggest to add setLatestSchema is we don't need to do it (RowReader should not have latest schema). And the for each time getTagPropReader and getEdgePropReader we need to call schemaMan->getLatestTagSchemaVersion, it could raise a performance issue. The latest schema only need to be fetched once.

As for a new parameters named latestSchema you said, either use a default nullptr parameter, or add a overloaded version which has 3 parameters of IndexKeyUtils::collectIndexValues. Both ways LGTM.

@sworduo
Copy link
Contributor Author

sworduo commented Nov 22, 2021

1.The scheme used to read data is the schema included in the row data instead of the latest schema. The latest schema is only used in IndexKeyUtils::collectIndexValues when the schema included in the data is different from the latest schema. Besides, since the index will be deleted when the associated prop is deleted, I'm not very sure what problem will be introduced.
2.In this case, a new parameters named latestSchema need to be added in IndexKeyUtils::collectIndexValues and everywhere call this function need to modify as well. Of course, that's OK.

Perhaps I didn't clarify very clearly. The reason I don't suggest to add setLatestSchema is we don't need to do it (RowReader should not have latest schema). And the for each time getTagPropReader and getEdgePropReader we need to call schemaMan->getLatestTagSchemaVersion, it could raise a performance issue. The latest schema only need to be fetched once.

As for a new parameters named latestSchema you said, either use a default nullptr parameter, or add a overloaded version which has 3 parameters of IndexKeyUtils::collectIndexValues. Both ways LGTM.

That's OK. The point is to avoid call getLatestSchem lots of time.

Copy link
Contributor

@critical27 critical27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, only one thing to check, please see the inline comments.

PS: some lint failed, please fix them according to https://github.com/vesoft-inc/nebula/runs/4319257516?check_suite_focus=true

const std::string propName,
const meta::SchemaProviderIf* latestSchema) {
auto value = reader->getValueByName(propName);
if (latestSchema == nullptr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps there is one more condition here to check if the value is a "Null" (literally, sorry about the naming). You only handle the UNKNOWN_PROP.

See the code in RowReader
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition NullType::NULL is handled by IndexKeyUtils::checkValue. Maybe I don't need to handle it in function readValueWithLatestSche.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if the value in RowReader is the NullType::NULL, we should directly return it. What do you think?

Copy link
Contributor

@critical27 critical27 Nov 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, my bad, I see the point here (when != UNKNOWN_PROP will return the value). Never mind.

critical27
critical27 previously approved these changes Nov 25, 2021
Copy link
Contributor

@critical27 critical27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now. Thanks a lot.

@critical27
Copy link
Contributor

Em.. the code format failed again. You could try to read the conduct

@sworduo
Copy link
Contributor Author

sworduo commented Nov 25, 2021

Em.. the code format failed again. You could try to read the conduct

Thx, I will try it

@yixinglu
Copy link
Contributor

Em.. the code format failed again. You could try to read the conduct

Thx, I will try it

hi @sworduo you can lookup the readme in the tests directory to fix the feature format lint error.

@sworduo
Copy link
Contributor Author

sworduo commented Nov 29, 2021

Em.. the code format failed again. You could try to read the conduct

Thx, I will try it

hi @sworduo you can lookup the readme in the tests directory to fix the feature format lint error.

Sorry about the format error. I will fix it tomorrow.

@sworduo
Copy link
Contributor Author

sworduo commented Nov 30, 2021

Em.. the code format failed again. You could try to read the conduct

Thx, I will try it

hi @sworduo you can lookup the readme in the tests directory to fix the feature format lint error.

Hi @yixinglu , when I run make fmt in tests directory. The result is

All done! 💥 💔 💥
136 files reformatted, 3 files left unchanged, 8 files failed to reformat.

A lot of files have been reformatted. And the change is like follows :

     When executing query:
-      """
-      CREATE TAG tag1()
-      """
+    """
+    CREATE TAG tag1()
+    """
     Then the execution should be successful
     # if not exists
     When executing query:
-      """
-      CREATE TAG IF NOT EXISTS tag1()
-      """
+    """
+    CREATE TAG IF NOT EXISTS tag1()
+    """

However, it is failed to reformat index.feature. The error is :

Error: cannot format /data/luoshangjun/source/nebula/tests/tck/features/index/Index.feature: INTERNAL ERROR: Invalid file contents are produced:
Parser errors:
(939:2): expected: #EOF, #TableRow, #StepLine, #TagLine, #ScenarioLine, #ScenarioOutlineLine, #Comment, #Empty, got '`name`(64)'
(940:1): expected: #EOF, #TableRow, #StepLine, #TagLine, #ScenarioLine, #ScenarioOutlineLine, #Comment, #Empty, got ')" |'
(938:7): inconsistent cell count within the table
(948:2): expected: #EOF, #TableRow, #StepLine, #TagLine, #ScenarioLine, #ScenarioOutlineLine, #Comment, #Empty, got '`age`'
(949:1): expected: #EOF, #TableRow, #StepLine, #TagLine, #ScenarioLine, #ScenarioOutlineLine, #Comment, #Empty, got ')" |'
(947:7): inconsistent cell count within the table
Please report a bug on https://github.com/ducminh-phan/reformat-gherkin/issues.
This invalid output might be helpful:
/tmp/rfmt-ghk_hr7pzs20.log

I also run make check-and-diff -C tests which produce the same results. I have no idea how to format it.

@critical27
Copy link
Contributor

You probably need to rebase onto latest master, which will works fine.

@sworduo
Copy link
Contributor Author

sworduo commented Dec 2, 2021

You probably need to rebase onto latest master, which will works fine.

I have been rebased the latest master while produce the same results. Fortunately, formatting is successful in docker...

@Sophie-Xie Sophie-Xie requested review from cangfengzhs and removed request for yixinglu and jievince December 21, 2021 09:32
@cangfengzhs
Copy link
Contributor

LGTM and please deal with the conflict.

@sworduo
Copy link
Contributor Author

sworduo commented Dec 28, 2021

LGTM and please deal with the conflict.

That's OK. I have been resolved the conflict in my machine. However, I can not upgrade to third-party 3.0 to compile the code. The information is here #3462 (comment)

@Sophie-Xie
Copy link
Contributor

LGTM and please deal with the conflict.

That's OK. I have been resolved the conflict in my machine. However, I can not upgrade to third-party 3.0 to compile the code. The information is here #3462 (comment)

Can you push your code, then try to compile it by CI of repo first. 😂

@yixinglu yixinglu merged commit 4db974b into vesoft-inc:master Dec 28, 2021
@sworduo sworduo deleted the rebuildIndexForNewProp branch December 29, 2021 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Source: who proposed the issue ready for review ready-for-testing PR: ready for the CI test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rebuild tag index with old schema version value
5 participants