Skip to content

CPP: Permit more typedefs in WrongTypeFormatArguments.ql #189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 3, 2018

Conversation

geoffw0
Copy link
Contributor

@geoffw0 geoffw0 commented Sep 13, 2018

Adds a test for the WrongTypeFormatArguments.ql issue found earlier this week by @nick. The fix is to make the query a lot less picky about unspecified types - for example:

typedef int wchar_t;
typedef wchar_t WCHAR_T; // WCHAR_T -> wchar_t -> int
typedef int MYCHAR; // MYCHAR -> int (notably not via the wchar_t typedef)

wchar_t *myString1;
WCHAR_T *myString2;
int *myString3;
MYCHAR *myString4;

printf("%S", myString1); // this was always acceptable
printf("%S", myString2); // this was always acceptable
printf("%S", myString3); // this is acceptable now
printf("%S", myString4); // this is acceptable now

(Nick's example looks very different but is also about underlying types of typedefs)

This is something I've been thinking about doing for a while. It reduces the noise caused by results that, while somewhat questionable in theory, are likely OK in practice and are typically caused by combining different pieces of code that use different typedefs. The change also makes the query logic simpler (though there's still a lot to be done in that regard).

I'm going to try and run query differences on this.

@jonas - it will be good to get in before I make a PR for the char16_t issues with the same query. There's some crossover in real world results between the two issues that confuses things without this change.

@Felicity - please check whether I've done the right thing with the change note.

@geoffw0 geoffw0 added the C++ label Sep 13, 2018
@geoffw0
Copy link
Contributor Author

geoffw0 commented Sep 13, 2018

Sorry, messed up my mentions. Should be @nickrolfe @jbj @felicity-semmle

@jbj jbj self-assigned this Sep 14, 2018
Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a lot of results going away. Is the idea that we won't report a mismatch when the types have the same unspecified type even though their unspecified types may not coincide on other platforms?

I look forward to seeing the query differences results.

@geoffw0
Copy link
Contributor Author

geoffw0 commented Sep 20, 2018

That's a lot of results going away. Is the idea that we won't report a mismatch when the types have the same unspecified type even though their unspecified types may not coincide on other platforms?

The short answer is yes. It seems we need to give developers more benefit of doubt than we already were doing.

The long answer:

The query has a notion of the 'expected' type of a format argument, which is usually a primitive type such as int for %i. We already permit typedefs of int using a mechanism similar to getUnspecifiedType.

However in some cases the expected type is itself a typedef - a common case being wchar_t (a typedef for something like int on some compilers), and @nickrolfe's example was about ptrdiff_t (which was a typedef for long). Compared to a primitive type this gives us two additional cases to think about, the underlying type of the expected type, and a typedef to that. For example:

long <-----------------\
 ^                      \
ptrdiff_t (expected)     MY_LONG
 ^
MY_PTRDIFF_T

In the above case we would have accepted only a ptrdiff_t or MY_PTRDIFF_T argument for %td, the reasoning being that though long and MY_LONG are really the same type of data they're conceptually different so (1) you might not get a reasonable formatted result and (2) as you pointed out, they seem more likely to actually be something different on another platform compared to MY_PTRDIFF_T.

After the change we also accept long or MY_LONG above. The reason is it's quite common for people / libraries to make their own typedefs for things conceptually like MY_PTRDIFF_T that actually point to the underlying type as MY_LONG does, often looking something like this:

#ifdef SOME_PLATFORM
  typedef long MY_PTRDIFF_T; // this is supposed to be used as a ptrdiff_t but is actually the same as MY_LONG in the diagram above
#else
  ...
#endif

I believe this pattern was responsible for quite a lot of false positive results, and blocking it will only hide a small number of interesting results (over-represented in the tests). The query differences should hopefully confirm this.

I look forward to seeing the query differences results.

Me too, my first few attempts failed but I'm still hoping to produce them.

@jbj
Copy link
Contributor

jbj commented Sep 20, 2018

Me too, my first few attempts failed but I'm still hoping to produce them.

Did you make the same mistake I made in #202 (comment)?

@geoffw0
Copy link
Contributor Author

geoffw0 commented Sep 24, 2018

Did you make the same mistake I made in #202 (comment)?

No, I think I pushed code that pointed to a ql submodule I hadn't pushed.

@jbj
Copy link
Contributor

jbj commented Sep 28, 2018

Any luck with CPP-Differences?

@geoffw0
Copy link
Contributor Author

geoffw0 commented Oct 2, 2018

Query differences, at last!

https://jenkins.internal.semmle.com/job/Query-Changes/job/CPP-Differences/456/

Most of the changes seem to be where %zd (ssize_t, a typedef for long) was used with a typedef that resolves to unsigned long. This is a signedness change (%zu would be ideal), but the query already explicitly allows signedness changes.

Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the query differences are about %zd used where it should have been %zu. You're right that the query was always supposed to allow this, but apparently it wasn't working for ssize_t.

@jbj jbj merged commit 4ad4b19 into github:master Oct 3, 2018
aibaars added a commit that referenced this pull request Oct 14, 2021
smowton pushed a commit to smowton/codeql that referenced this pull request Jan 17, 2022
Fix type access extraction in field declarations
@geoffw0 geoffw0 deleted the wrongtypedef branch February 10, 2023 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants