Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wildcard pattern matching algorithm using FFT #12014

Prev Previous commit
Next Next commit
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
pre-commit-ci[bot] committed Oct 12, 2024
commit 56441f81921c47c3a64312321c055754ed9bee54
24 changes: 14 additions & 10 deletions strings/wildcard_pattern_matching_fft.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import numpy as np
from numpy.fft import fft, ifft


def preprocess_text_and_pattern(text: str, pattern: str) -> tuple[list[int], list[int]]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no test file in this pull request nor any test function or class in the file strings/wildcard_pattern_matching_fft.py, please provide doctest for the function preprocess_text_and_pattern

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no test file in this pull request nor any test function or class in the file strings/wildcard_pattern_matching_fft.py, please provide doctest for the function preprocess_text_and_pattern

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no test file in this pull request nor any test function or class in the file strings/wildcard_pattern_matching_fft.py, please provide doctest for the function preprocess_text_and_pattern

"""Preprocesses text and pattern for pattern matching.

@@ -11,7 +12,7 @@
Returns:
A tuple containing:
- A list of integers representing the text characters.
- A list of integers representing the pattern characters, with 0 for wildcards.

Check failure on line 15 in strings/wildcard_pattern_matching_fft.py

GitHub Actions / ruff

Ruff (E501)

strings/wildcard_pattern_matching_fft.py:15:89: E501 Line too long (91 > 88)

Examples:
>>> preprocess_text_and_pattern("abcabc", "abc*")
@@ -19,12 +20,14 @@
>>> preprocess_text_and_pattern("hello", "he*o")
([3, 2, 4, 4, 5], [3, 2, 0, 5])
"""

unique_chars = set(text + pattern)
char_to_int = {char: i + 1 for i, char in enumerate(unique_chars)} # Unique non-zero integers
char_to_int = {
char: i + 1 for i, char in enumerate(unique_chars)
} # Unique non-zero integers

# Replace pattern '*' with 0, other characters with their unique integers
pattern_int = [char_to_int[char] if char != '*' else 0 for char in pattern]
pattern_int = [char_to_int[char] if char != "*" else 0 for char in pattern]
text_int = [char_to_int[char] for char in text]

return text_int, pattern_int
@@ -44,10 +47,10 @@
>>> fft_convolution(np.array([1, 2, 3]), np.array([0, 1, 0.5]))
array([0. , 1. , 2.5, 3. , 1.5])
"""

n = len(input_seq_a) + len(input_seq_b) - 1
A = fft(input_seq_a, n)

Check failure on line 52 in strings/wildcard_pattern_matching_fft.py

GitHub Actions / ruff

Ruff (N806)

strings/wildcard_pattern_matching_fft.py:52:5: N806 Variable `A` in function should be lowercase
B = fft(input_seq_b, n)

Check failure on line 53 in strings/wildcard_pattern_matching_fft.py

GitHub Actions / ruff

Ruff (N806)

strings/wildcard_pattern_matching_fft.py:53:5: N806 Variable `B` in function should be lowercase
return np.real(ifft(A * B))


@@ -65,29 +68,30 @@
>>> compute_a_fft([1, 2, 3, 1, 2, 3], [1, 2, 3, 0])
array([...]) # Replace with the expected output based on your implementation
"""

n = len(text_int)
m = len(pattern_int)

# Power transforms of the pattern and text based on the formula
p1 = np.array(pattern_int)
p2 = np.array([p ** 2 for p in pattern_int])
p3 = np.array([p ** 3 for p in pattern_int])
p2 = np.array([p**2 for p in pattern_int])
p3 = np.array([p**3 for p in pattern_int])

t1 = np.array(text_int)
t2 = np.array([t ** 2 for t in text_int])
t3 = np.array([t ** 3 for t in text_int])
t2 = np.array([t**2 for t in text_int])
t3 = np.array([t**3 for t in text_int])

# Convolution to calculate the terms for A[i]
sum1 = fft_convolution(p3[::-1], t1)
sum2 = fft_convolution(p2[::-1], t2)
sum3 = fft_convolution(p1[::-1], t3)

# Calculate A[i] using the convolution results
A = sum1[:n - m + 1] - 2 * sum2[:n - m + 1] + sum3[:n - m + 1]
A = sum1[: n - m + 1] - 2 * sum2[: n - m + 1] + sum3[: n - m + 1]

Check failure on line 90 in strings/wildcard_pattern_matching_fft.py

GitHub Actions / ruff

Ruff (N806)

strings/wildcard_pattern_matching_fft.py:90:5: N806 Variable `A` in function should be lowercase

return A


# Main function to run the matching
if __name__ == "__main__":
# Example test case