The Bear's Den

Enter at your own risk

13 December 2024

Challenge 299

Search and Replace

Task 1: Replace Words
Task 2: Word Search

Task 1: Replace Words

Submitted by: Mohammad Sajid Anwar

You are given an array of words and a sentence.

Write a script to replace all words in the given sentence that start with any of the words in the given array.

Example 1

Input: @words = ("cat", "bat", "rat")
       $sentence = "the cattle was rattle by the battery"
Output: "the cat was rat by the bat"

Example 2

Input: @words = ("a", "b", "c")
       $sentence = "aab aac and cac bab"
Output: "a a a c b"

Example 3

Input: @words = ("man", "bike")
       $sentence = "the manager was hit by a biker"
Output: "the man was hit by a bike"

Solution

We search globally for any of the given words in the string that is not preceded by a word character and that is followed by any number of word characters - which will be removed.

use strict;
use warnings;

sub replace_words {
    my $sentence = shift;
    my $words = join '|', map quotemeta, @_;

    $sentence =~ s/(?<!\w)(?:$words)\K\w*//gr;
}

See the full solution to task 1.

Task 2: Word Search

Submitted by: Mohammad Sajid Anwar

You are given a grid of characters and a string.

Write a script to determine whether the given string can be found in the given grid of characters. You may start anywhere and take any orthogonal path, but may not reuse a grid cell.

Example 1

Input: @chars = (['A', 'B', 'D', 'E'],
                 ['C', 'B', 'C', 'A'],
                 ['B', 'A', 'A', 'D'],
                 ['D', 'B', 'B', 'C'])
      $str = 'BDCA'
Output: true

Example 2

Input: @chars = (['A', 'A', 'B', 'B'],
                 ['C', 'C', 'B', 'A'],
                 ['C', 'A', 'A', 'A'],
                 ['B', 'B', 'B', 'B'])
      $str = 'ABAC'
Output: false

Example 3

Input: @chars = (['B', 'A', 'B', 'A'],
                 ['C', 'C', 'C', 'C'],
                 ['A', 'B', 'A', 'B'],
                 ['B', 'B', 'A', 'A'])
      $str = 'CCCAA'
Output: true

Solution

Preliminary Considerations

Generalizing this task.

We consider an undirected graph \(G = (V, E)\) that has vertices \(V\) connected by edges \(E\). Each vertex \(v \in V\) has an attached label \(l(v)\). Now we search for a path \(p = (v_1, v_2,\ldots,v_k)\) such that the corresponding labels \((l(v_1), l(v_2),\ldots,l(v_k))\) equal a given sequence \((l_1,l_2,\ldots,l_k)\).

The graph can be described by a symmetric adjacency matrix \(A = (a_{ij})\) where

\[a_{ij} = \begin{cases} 1 & \text{if } (i, j) \in E \\ 0 & \text{otherwise} \end{cases}\]

Let \(u = (u_i) \text{ and } v = (v_i)\) be two vectors. We define the element-wise vector product \(w = u \cdot v \text{ as } (w_i) = (u_i v_i)\). When \(u\) and \(v\) represent subsets \(U\) and \(V\), then \(u \cdot v\) represents the intersection \(U \cap V\).

Furthermore, we define a characteristic vector \(\chi_L = (c_i^L)\) for the label \(L\) as

\[c_i^L = \begin{cases} 1 & \text{if } l(i) = L \\ 0 & \text{otherwise} \end{cases}\]

In the following we’ll not sharply distinguish between a subset and its representation by an indicator vector and we’ll permit indicator vectors to have any non-negative values where a zero signals the absence of an element and a positive value its presence.

We take an indicator vector \(v_1\) of a subset of vertices \(v_1 \subseteq V\) having the label \(L_1\). The equation \(v_1 = \chi_{L_1} \cdot v_1\) then holds.

The matrix product \(A v_1\) has non-zero elements for all vertices that are direct neighbors of the vertices represented by \(v_1\). Thus the vector \(v_2 = \chi_{L_2} \cdot (A v_1)\) represents the subset \(v_2 \subseteq V\) that are direct neighbors of \(v_1\) and having a label \(L_2\). If \(v_2\) is a zero vector, then there is no path in the graph extending from the vertices in \(v_1\) to any vertex labeled \(L_2\).

Solution Procedure

Based on these considerations we may now formulate a recursive approach to search for a path in \(G\) having a given label sequence.

We are given a problem \(P = (A, L, s, t)\) where \(A\) is an adjacency matrix, \(L\) are the vertices’ labels, \(s\) is a set of starting vertices and \(t\) is a label sequence. Then we proceed as follows:

We find the vector \(n\) of neighbor vertices of \(s\) having a label \(t_1\) as \(n = \chi_{t_1} \cdot (A s)\).
If \(n\) is null, then the problem fails.
If \(n\) is not null and \(t\) has only one element, the the problem succeeds.
Otherwise we loop over the individual elements \(\hat{s} \in s\):
- Find the neighbor vertices of \(\hat{s}\) that are labeled with \(t_1\) as the new starting set \(s^* = \chi_{t_1} \cdot (A \hat{s})\).
- Construct a new adjacency matrix \(A^*\) where \(\hat{s}\) is disconnected from all other vertices by setting the row and column corresponding to \(\hat{s}\) to zero. This ensures that the traversed vertices form a path.
- Build a new label sequence \(t^*\) from \(t\) with its first element removed.
- Recursively solve the problem \(P(A^*, L, s^*, t^*)\).
- When the sub-problem succeeds, the current problem succeeds, too.
- Otherwise we continue looping with the next \(\hat{s} \in s\).
If we fall to the end of the loop, the problem fails.

The idea behind using the vector \(n\) is some kind of “pool testing”: With a single calculation we may find out if a target label is reachable from any of the start vertices and we may stop if it is not. However, if we find out that the label is reachable, we do not know from where and thus we need to try all of the start vertices individually (unless we hit the end of the path, when this doesn’t matter). And we may go one step further: The product \(A n\) indicates the neighbor vertices of \(n\). We may remove any vertex from \(s\) that is not in \(A n\), as it doesn’t have a neighbor labeled \(L_1\), i.e. we may reduce \(s \mapsto s \cdot (A n)\). Finally, when there is only one start vertex (which is not uncommon), we may use \(s^* = n\) directly without further calculation.

Preparation

To solve this task, we need to do construct the initial adjacency matrix representing the allowed “orthogonal paths” as given by the \(M \times N\) grid. Thereto we construct a 4-dimensional \(M \times N \times M \times N\) adjacency “matrix” that has ones on the secondary diagonals for rows and columns. Afterwards this hypermatrix is flattened and symmetrized to a ordinary \(M \cdot N \times M \cdot N\) adjacency matrix.

With \(w_1\) as the first letter from the given word we use \(\chi_{w_1}\) as the starting vector.

Operating on matrices and vectors once again calls for PDL.

Implementation Details

the matrix multiplication is overloaded on the x operator
The element-wise vector product is overloaded on the * operator.
Comparing a vector with a scalar produces a binary indicator vector
The character grid’s structure is needed for the adjacency matrix only. Afterwards a linear indexing of the vertices is used.

Improvements

The adjacency matrix containing mostly zeros could be represented in a sparse format like PDL::CCS::Nd.
If the edge lengths of the letter grid are larger than the search word’s length, the starting vector can be reduced to vertices having a maximum distance of this length to any vertex labeled with the last letter of the search word.

Code

use strict;
use warnings;
use PDL;
use PDL::NiceSlice;
use PDL::Char;
use experimental 'signatures';

sub find_word ($matrix, $word) {
    my $m = PDL::Char->new($matrix);
    my $label = $m->clump(1, 2);

    my $w = PDL::Char->new([$word]);
    my $start = ($label == $w(0))->long;;

    my $adj4 = zeroes +($m((0))->dims) x 2;
    $adj4( 0:-2,,1:-1)->clump(0, 1)->clump(1, 2)->diagonal(0, 1) .= 1;
    $adj4(,0:-2,,1:-1)->clump(0, 1)->clump(1, 2)->diagonal(0, 1) .= 1;
    my $adj = $adj4->clump(0, 1)->clump(1, 2)->sever;
    $adj |= $adj->xchg(0, 1)->sever;

    find_tail($adj, $label, $start, $w(1:-1));
}

sub find_tail ($adj, $label, $start, $tail) {
    my $l = ($label == $tail(0))->long;
    my $next = $l * ($adj x $start);
    return 0 if !any $next;
    return 1 if $tail->dim(0) == 1;

    $start = ($start * ($adj x $next))->long->hclip(1)
        if which($start)->nelem > 1;

    my $ws = which $start;
    for my $v ($ws->list) {
        my $start1 = $ws->nelem == 1 ? $next :
            $l * ($adj x zeroes($start)->set(0,$v,1));

        my $adj1 = $adj->copy;
        $adj1( $v) .= 0;
        $adj1(,$v) .= 0;
        return 1 if find_tail($adj1, $label, $start1, $tail(1:-1));
    }
    0;
}

See the full solution to task 2 containing trace statements.

There is a follow-up for this post.

If you have a question about this post or if you like to comment on it, feel free to open an issue in my github repository.