-
Notifications
You must be signed in to change notification settings - Fork 12
Conversation
"article_id": int(article_id), | ||
"paragraph_pos_in_article": int(paragraph_pos_in_article), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the docstring, article_id
and paragraph_pos_in_article
are already of type int
. Why a casting to int
is added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Python doesn't have static typing, if someone passes any other type than int
we could have a mess.
In particular, if we pass a np.int64
the SQL query breaks, which was indeed what was happening in our case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Was it spotted by bandit
?
Anyway, should we consider... type annotations for all BBS then? ;) Or data validation frameworks for Python arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For type annotations: I am not against as said in the past ;) I am still a bit annoyed that one would have to write them for each function by hand, it would be cool if we could use the numpy docstrings we already have to auto-generate type annotations (if PyCharm can do it, there must be a way I hope).
But afaik type annotations wouldn't have raised any exception here right? They are just useful to help your IDE or a developer to know which type is "expected".
For data validation frameworks: I do not know any, do you have one in mind in particular?
I feel Python's duck typing is a key feature, so unless it's really needed I would prefer avoiding type checking...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto-generate type annotations
According to what I have found, there is tooling to convert docstring types into type annotations.
type annotations wouldn't have raised any exception here right?
The IDE would have complained where the function with the body we discuss is used.
A static type checker like mypy, run by the CI, would have complained.
So, with this, the case where an exception would have been thrown would not be reached.
For data validation frameworks: I do not know any, do you have one in mind in particular?
For validating inputs, I would have pydantic in mind.
I feel Python's duck typing is a key feature, so unless it's really needed I would prefer avoiding type checking...
I agree. However, I would weight it against two other points:
- How well are exceptions managed?
- How critical is it if the process stops?
This being said, I think that checking docstrings types or Python type annotations would already let us be more comfortable with the runtime issues. I won't go for inputs validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice:) I learned a couple of new things:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey guys, awesome job. See my comments.
🏴☠️ Passing Bandit Tests 🏴☠️
Several lines were changed in the code to pass various
bandit
tests. This is what we learned.eval()
is unsafe, useast.literal_eval()
instead—but it's limited to evaluation of constructs involving only standard library!input()
is unsafe in Python 2, but OK in Python 3. This is because in Python 3 it replaces the olderraw_input()
which is the safe version of Python 2.assert
statements should not be used in prod code, because optimized python code will remove such statements. Raise specific exceptions instead.engine.execute()
a SQL query with an f-string with parameters, you are subject to potential injection attacks. So use the following syntax whenever possible.--> 😡 Bad
--> 😃 Good
🏴☠️ Bandit Evaders 🏴☠️
We decided for the time being to ignore 5 SQL injection errors from bandit. We ignored these lines by flagging them with a
# nosec
comment. The reason is that we would have needed a major code refactoring, probably introducing SqlAlchemy's ORM or something even worse.Useful to know for the future: apparently you cannot bind (neither with the syntax above, nor with any other syntax) the name of a table or of a column in a SQL query.
More precisely, these are the errors that we decided to silence.