Update Input Validation (#1429)

* Update Input Validation - Clarify Server-side & Client-side recommendations - Add references - Style changes for uniformity Signed-off-by: otkd <conrad.bhuiyanvolkoff@owasp.org> * Update cheatsheets/Input_Validation_Cheat_Sheet.md Co-authored-by: Shlomo Zalman Heigh <shlomozalmanheigh@gmail.com> --------- Signed-off-by: otkd <conrad.bhuiyanvolkoff@owasp.org> Co-authored-by: Shlomo Zalman Heigh <shlomozalmanheigh@gmail.com>
OWASP · Jun 17, 2024 · a289f52 · a289f52
1 parent 2ff8461
commit a289f52
Showing 1 changed file with 22 additions and 16 deletions.
diff --git a/cheatsheets/Input_Validation_Cheat_Sheet.md b/cheatsheets/Input_Validation_Cheat_Sheet.md
@@ -12,17 +12,16 @@ Data from all potentially untrusted sources should be subject to input validatio
 
 Input Validation should not be used as the *primary* method of preventing [XSS](Cross_Site_Scripting_Prevention_Cheat_Sheet.md), [SQL Injection](SQL_Injection_Prevention_Cheat_Sheet.md) and other attacks which are covered in respective [cheat sheets](https://cheatsheetseries.owasp.org/) but can significantly contribute to reducing their impact if implemented properly.
 
-## Input validation strategies
+## Input Validation Strategies
 
-Input validation should be applied on both **syntactical** and **Semantic** level.
+Input validation should be applied at both syntactic and semantic levels:
 
-**Syntactic** validation should enforce correct syntax of structured fields (e.g. SSN, date, currency symbol).
-
-**Semantic** validation should enforce correctness of their *values* in the specific business context (e.g. start date is before end date, price is within expected range).
+- **Syntactic** validation should enforce correct syntax of structured fields (e.g. SSN, date, currency symbol).
+- **Semantic** validation should enforce correctness of their *values* in the specific business context (e.g. start date is before end date, price is within expected range).
 
 It is always recommended to prevent attacks as early as possible in the processing of the user's (attacker's) request. Input validation can be used to detect unauthorized input before it is processed by the application.
 
-## Implementing input validation
+## Implementing Input Validation
 
 Input validation can be implemented using any programming technique that allows effective enforcement of syntactic and semantic correctness, for example:
 
@@ -45,7 +44,7 @@ If it's well structured data, like dates, social security numbers, zip codes, em
 
 If the input field comes from a fixed set of options, like a drop down list or radio buttons, then the input needs to match exactly one of the values offered to the user in the first place.
 
-### Validating free-form Unicode text
+### Validating Free-form Unicode Text
 
 Free-form text, especially with Unicode characters, is perceived as difficult to validate due to a relatively large space of characters that need to be allowed.
 
@@ -54,8 +53,8 @@ It's also free-form text input that highlights the importance of proper context-
 The primary means of input validation for free-form text input should be:
 
 - **Normalization:** Ensure canonical encoding is used across all the text and no invalid characters are present.
-- **Character category allow-listing:** Unicode allows listing categories such as "decimal digits" or "letters" which not only covers the Latin alphabet but also various other scripts used globally (e.g. Arabic, Cyrillic, CJK ideographs etc).
-- **Individual character allow-listing:** If you allow letters and ideographs in names and also want to allow apostrophe `'` for Irish names, but don't want to allow the whole punctuation category.
+- **Character category allowlisting:** Unicode allows listing categories such as "decimal digits" or "letters" which not only covers the Latin alphabet but also various other scripts used globally (e.g. Arabic, Cyrillic, CJK ideographs etc).
+- **Individual character allowlisting:** If you allow letters and ideographs in names and also want to allow apostrophe `'` for Irish names, but don't want to allow the whole punctuation category.
 
 References:
 
@@ -64,7 +63,7 @@ References:
 - [UAX 15: Unicode Normalization Forms](https://www.unicode.org/reports/tr15/)
 - [UAX 24: Unicode Script Property](https://unicode.org/reports/tr24/)
 
-### Regular expressions
+### Regular Expressions (Regex)
 
 Developing regular expressions can be complicated, and is well beyond the scope of this cheat sheet.
 
@@ -119,13 +118,13 @@ Some Allowlist validators have also been predefined in various open source packa
 
 - [Apache Commons Validator](http://commons.apache.org/proper/commons-validator/)
 
-## Client Side vs Server Side Validation
+## Client-side vs Server-side Validation
 
-Be aware that any JavaScript input validation performed on the client can be bypassed by an attacker that disables JavaScript or uses a Web Proxy. Ensure that any input validation performed on the client is also performed on the server.
+Input validation **must** be implemented on the server-side before any data is processed by an application’s functions, as any JavaScript-based input validation performed on the client-side can be circumvented by an attacker who disables JavaScript or uses a web proxy. Implementing both client-side JavaScript-based validation for UX and server-side validation for security is the recommended approach, leveraging each for their respective strengths.
 
 ## Validating Rich User Content
 
-It is very difficult to validate rich content submitted by a user. For more information, please see the XSS cheatsheet on [Sanitizing HTML Markup with a Library Designed for the Job](Cross_Site_Scripting_Prevention_Cheat_Sheet.md).
+It is very difficult to validate rich content submitted by a user. For more information, please see the XSS cheat sheet on [Sanitizing HTML Markup with a Library Designed for the Job](Cross_Site_Scripting_Prevention_Cheat_Sheet.md).
 
 ## Preventing XSS and Content Security Policy
 
@@ -152,15 +151,15 @@ Check the [File Upload Cheat Sheet](File_Upload_Cheat_Sheet.md).
 - Use a new filename to store the file on the OS. Do not use any user controlled text for this filename or for the temporary filename.
 - When the file is uploaded to web, it's suggested to rename the file on storage. For example, the uploaded filename is *test.JPG*, rename it to *JAI1287uaisdjhf.JPG* with a random filename. The purpose of doing it to prevent the risks of direct file access and ambiguous filename to evade the filter, such as `test.jpg;.asp or /../../../../../test.jpg`.
 - Uploaded files should be analyzed for malicious content (anti-malware, static analysis, etc).
-- The file path should not be able to specify by client side. It's decided by server side.
+- The file path should not be able to specify by client-side. It's decided by server-side.
 
 ### Public Serving of Uploaded Content
 
 - Ensure uploaded images are served with the correct content-type (e.g. image/jpeg, application/x-xpinstall)
 
-### Beware of "special" files
+### Beware of Specific File Types
 
-The upload feature should be using an allow-list approach to only allow specific file types and extensions. However, it is important to be aware of the following file types that, if allowed, could result in security vulnerabilities:
+The upload feature should be using an allowlist approach to only allow specific file types and extensions. However, it is important to be aware of the following file types that, if allowed, could result in security vulnerabilities:
 
 - **crossdomain.xml** / **clientaccesspolicy.xml:** allows cross-domain data loading in Flash, Java and Silverlight. If permitted on sites with authentication this can permit cross-domain data theft and CSRF attacks. Note this can get pretty complicated depending on the specific plugin version in question, so its best to just prohibit files named "crossdomain.xml" or "clientaccesspolicy.xml".
 - **.htaccess** and **.htpasswd:** Provides server configuration options on a per-directory basis, and should not be permitted. See [HTACCESS documentation](http://en.wikipedia.org/wiki/Htaccess).
@@ -237,3 +236,10 @@ Many mail providers (such as Microsoft Exchange) do not support sub-addressing.
 Some users will use a different _tag_ for each website they register on, so that if they start receiving spam to one of the sub-addresses they can identify which website leaked or sold their email address.
 
 Because it could allow users to register multiple accounts with a single email address, some sites may wish to block sub-addressing by stripping out everything between the `+` and `@` signs. This is not generally recommended, as it suggests that the website owner is either unaware of sub-addressing or wishes to prevent users from identifying them when they leak or sell email addresses. Additionally, it can be trivially bypassed by using [disposable email addresses](#disposable-email-addresses), or simply registering multiple email accounts with a trusted provider.
+
+## References
+
+- [OWASP Top 10 Proactive Controls 2024: C3: Validate all Input & Handle Exceptions](https://owasp.org/www-project-proactive-controls/v4/en/c3-validate-all-input)
+- [CWE-20 Improper Input Validation](https://cwe.mitre.org/data/definitions/20.html)
+- [OWASP Top 10 2021: A03:2021-Injection](https://owasp.org/Top10/A03_2021-Injection/)
+- [Snyk: Improper Input Validation](https://learn.snyk.io/lesson/improper-input-validation/)