Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cookbook regexp2 #2404

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions data/cookbook/regex-misc/00-str.ml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
packages: []
discussion: |
- **Understanding `Str`: `str` is a library that comes with OCaml. It contains many functions that deal with regular expression. The documentation of the `Str` module is in the [API reference](https://v2.ocaml.org/api/Str.html).
- **Alternative Libraries:** The `re` packages provide regular expression functions and supports multiple `regexp` syntaxes (Perl, Posix, Emacs, and Glob). Its functions are also purely functionnal (on the opposite, the `Str.matched_group` and `Str.matched_string` use a global state that prevents the concurrent use of two `regexp` matching sequences). Other packages provide `regexp` functions: `mikmatch`, `ocamlregexkit`, `ppx_regexp`, `pcre`/`pcre2` (compatible with Perl `regexp`), `re2`, `re_parser`, `tyre` (which comes with a PPX preprocessor `ppx_tyre`), and `human-re`. The `ppx-tyre` package defines a `function%tyre` keywork. It works as a native OCaml pattern matching, but on regular expressions. `ppx_regexp` works in the same way with package `re`.
---

(* Compiling a regular expression: Note, the `{regexp|...|regexp}` is a normal string. This syntax avoids the quoting of `\\`. Indicating `regexp` is optional, but it indicates to the code reader that the string contains a regular expression. *)
let regexp = Str.regexp {regexp|\([0-9]+\)-\([0-9]+\)-\([0-9]+\)|regexp}

(* Testing if a string matches the `regexp`: The index (0) indicates the characters from which the matching is performed. `string_match` only matches regular expressions with the string at the given index, while `search_forward` will try to match it at the given index and at the following indexes: *)
let () =
if Str.string_match regexp "1971-01-23" 0 then
print_string "The string match\n"
else
print_string "The string doesn't match\n"
let () =
let str = "Date: 1971-01-23" in
let index = Str.search_forward regexp str 0 in
Printf.printf "Date found at index %d (%s)\n" index
(Str.matched_string str)

(* Getting group substring: Each `\\(` / `\\)` pair permits you to get the substring corresponding to the enclosed `regexp`. By convention, the group 0 is the whole substring matching the `regexp`, and the first explicit group is 1: *)
let () =
let str = "Date: 1971-01-23" in
let _index = Str.search_forward regexp str 0 in
let year = Str.matched_group 1 str
and month = Str.matched_group 2 str
and day = Str.matched_group 3 str in
Printf.printf "year=%s, month=%s, day=%s\n" year month day

37 changes: 37 additions & 0 deletions data/cookbook/regex-misc/01-ppx_regexp.ml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
packages:
- name: "ppx_regexp"
tested_version: "0.5.1"
used_libraries:
- ppx_regexp
- name: "re"
tested_version: "1.10.4"
used_libraries:
- re
discussion: |
- **Understanding `re`:** The `re` library proposes multiple advantages over the `Str` library, which is shipped with OCaml. It supports multiple syntaxes, and its absence of global states permits concurrent pattern matching. It is completed by the `ppx_regexp`, which makes using this library easier. However, only the PCRE syntax is supported.
- **Reference:** `ppx_regexp` is described on [its page](https://github.com/paurkedal/ppx_regexp). It can be completed by the [PCRE syntax](https://www.pcre.org/original/doc/html/pcresyntax.html) or any [PCRE cheat sheet](https://www.debuggex.com/cheatsheet/regex/pcre).
---

(* In order to match a string with a regular expression, we use the `match%pcre` keyword in a way similar to the OCaml `match`: *)

let () =
match%pcre "Date: 1972-01-23 " with
| {re|?<date>(?<year>\d{4})-(?<month>\d\d)-(?<day>\d\d)|re} ->
Printf.printf "Date found: (%s)\n" date;
Printf.printf "Year: (%s)\n" year;
Printf.printf "Month: (%s)\n" month;
Printf.printf "Day: (%s)\n" day;
| _ -> print_string "Date not found\n"

(* In a similar way, we have a `function%pcre` with perform similar tasks *)

let all_digits =
function%pcre
| {re|^\d*$|re} -> true
| _ -> false

let () =
assert (all_digits "1234")
let () =
assert (not @@ all_digits "12x34")
44 changes: 23 additions & 21 deletions data/cookbook/tasks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ categories:
C library, using the OCaml foreign function interface.
- title: Compression
tasks:
- title: Read a gzip compressed text file
- title: Read a gzip Compressed Text File
slug: read-gzip-text-file
- title: Decompressing a Tarball
slug: decompressing-a-tarball
Expand All @@ -30,7 +30,7 @@ categories:
tasks:
- title: Create and Await Promises
slug: create-and-await-promises
- title: Parallelism & Multi-Threading
- title: Parallelism & Multithreading
tasks:
- title: Spawn a Thread and Receive a Response
slug: spawn-a-thread-and-receive-a-response
Expand All @@ -40,9 +40,9 @@ categories:
tasks:
- title: Calculate the SHA-256 Digest of a File
slug: calculate-sha-256-digest-of-file
- title: Sign and Verify a Message with an HMAC Digest
- title: Sign and Verify a Message With an HMAC Digest
slug: sign-and-verify-hmac-digest
- title: Salt and Hash a Password with PBKDF2
- title: Salt and Hash a Password With PBKDF2
slug: salt-and-hash-password-with-pkgdf2
- title: Data Structures & Algorithms
subcategories:
Expand Down Expand Up @@ -105,23 +105,23 @@ categories:

- title: Display Formatted Date and Time
slug: display-formatted-date-time
- title: Parse Date and Time from String
- title: Parse Date and Time From String
slug: parse-date-time-from-string
- title: Debugging
tasks:
- title: Debug Print a Value
slug: debug-print-a-value
- title: Log a Debug / Error Message to Stdout / Stderr
slug: log-debug-error-message
- title: Log to the UNIX Syslog
- title: Log to the Unix Syslog
slug: log-to-unix-syslog
- title: Encoding
tasks:
- title: URL- / Percent-Encode a String
slug: url-percent-encode-string
- title: Encode a String as application/x-www-form-urlencoded
slug: encode-x-www-form-urlencoded
- title: Encode and Decode Bytestrings from Hex-Strings
- title: Encode and Decode Bytestrings From Hex-Strings
slug: encode-decode-hex
- title: Encode and Decode Base64
slug: encode-decode-base64
Expand Down Expand Up @@ -176,7 +176,7 @@ categories:
subcategories:
- title: Vector & Matrix Operations
tasks:
- title: Normalize a Vector
- title: Normalise a Vector
slug: normalize-vector
- title: Matrix Addition and Multiplication
slug: matrix-addition-multiplication
Expand Down Expand Up @@ -214,47 +214,49 @@ categories:
slug: regex-validate-email
- title: Extract Phone Numbers from Text
slug: regex-extract-phone-numbers
- title: Replace All Occurrences of a Text Pattern with Another Pattern
- title: Replace All Occurrences of a Text Pattern With Another Pattern
slug: regex-replace-pattern
- title: Miscellaneous
slug: regex-misc
- title: Web Programming
subcategories:
- title: HTTP Clients
tasks:
- title: Make a HTTP GET Request
- title: Make an HTTP GET Request
slug: make-http-get-request
description: >
Make an HTTP GET request, process response code, follow redirects
- title: Make a HTTP GET Request with Basic Authentication
- title: Make an HTTP GET Request With Basic Authentication
slug: make-http-get-basic-auth
- title: Download a File to a Temporary Directory
slug: download-file-to-temporary-dir
- title: Make a Partial Download with HTTP Range Header
- title: Make a Partial Download With HTTP Range Header
slug: make-partial-download-with-http-range-header
- title: Dealing with HTML
- title: Dealing With HTML
tasks:
- title: Render a HTML Template
- title: Render an HTML Template
slug: render-html-template
- title: Extract all Links from a HTML String
- title: Extract All Links From an HTML String
slug: extract-links-from-html
- title: Check a Webpage for Broken Links
slug: check-webpage-for-broken-links
- title: Running a Web Server
tasks:
- title: Start a Web Server with a Hello World Endpoint
- title: Start a Web Server With a "Hello World" Endpoint
slug: start-a-web-server-hello-world
- title: Start a Web Server that Serves a HTML Template
slug: start-a-web-server-html-template
- title: Use Basic Authentication to Secure a Route
slug: use-basic-auth-on-web-server
- title: Media Types (MIME)
tasks:
- title: Get MIME Type from String
- title: Get MIME Type From String
slug: get-mime-type-from-string
- title: Get MIME Type from Filename
- title: Get MIME Type From Filename
slug: get-mime-type-from-filename
- title: Parse MIME Type of a HTTP Response
- title: Parse MIME Type of an HTTP Response
slug: parse-mime-type-of-http-response
- title: URL and URI processing
- title: URL and URI Processing
tasks:
- title: Parse a URL from String And Access Individual Parts
- title: Parse a URL From String and Access Individual Parts
slug: parse-url-from-string