13 December 2024 | Challenge 299 |
Search and Replace
Task 1: Replace Words
Submitted by: Mohammad Sajid Anwar
You are given an array of words and a sentence.
Write a script to replace all words in the given sentence that start with any of the words in the given array.
Example 1
Input: @words = ("cat", "bat", "rat")
$sentence = "the cattle was rattle by the battery"
Output: "the cat was rat by the bat"
Example 2
Input: @words = ("a", "b", "c")
$sentence = "aab aac and cac bab"
Output: "a a a c b"
Example 3
Input: @words = ("man", "bike")
$sentence = "the manager was hit by a biker"
Output: "the man was hit by a bike"
Solution
We search globally for any of the given words in the string that is not preceded by a word character and that is followed by any number of word characters - which will be removed.
use strict;
use warnings;
sub replace_words {
my $sentence = shift;
my $words = join '|', map quotemeta, @_;
$sentence =~ s/(?<!\w)(?:$words)\K\w*//gr;
}
See the full solution to task 1.
Task 2: Word Search
Submitted by: Mohammad Sajid Anwar
You are given a grid of characters and a string.
Write a script to determine whether the given string can be found in the given grid of characters. You may start anywhere and take any orthogonal path, but may not reuse a grid cell.
Example 1
Input: @chars = (['A', 'B', 'D', 'E'],
['C', 'B', 'C', 'A'],
['B', 'A', 'A', 'D'],
['D', 'B', 'B', 'C'])
$str = 'BDCA'
Output: true
Example 2
Input: @chars = (['A', 'A', 'B', 'B'],
['C', 'C', 'B', 'A'],
['C', 'A', 'A', 'A'],
['B', 'B', 'B', 'B'])
$str = 'ABAC'
Output: false
Example 3
Input: @chars = (['B', 'A', 'B', 'A'],
['C', 'C', 'C', 'C'],
['A', 'B', 'A', 'B'],
['B', 'B', 'A', 'A'])
$str = 'CCCAA'
Output: true
Solution
Preliminary Considerations
Generalizing this task.
We consider an undirected graph \(G = (V, E)\) that has vertices \(V\) connected by edges \(E\). Each vertex \(v \in V\) has an attached label \(l(v)\). Now we search for a path \(p = (v_1, v_2,\ldots,v_k)\) such that the corresponding labels \((l(v_1), l(v_2),\ldots,l(v_k))\) equal a given sequence \((l_1,l_2,\ldots,l_k)\).
The graph can be described by a symmetric adjacency matrix \(A = (a_{ij})\) where
\[a_{ij} = \begin{cases} 1 & \text{if } (i, j) \in E \\ 0 & \text{otherwise} \end{cases}\]Let \(u = (u_i) \text{ and } v = (v_i)\) be two vectors. We define the element-wise vector product \(w = u \cdot v \text{ as } (w_i) = (u_i v_i)\). When \(u\) and \(v\) represent subsets \(U\) and \(V\), then \(u \cdot v\) represents the intersection \(U \cap V\).
Furthermore, we define a characteristic vector \(\chi_L = (c_i^L)\) for the label \(L\) as
\[c_i^L = \begin{cases} 1 & \text{if } l(i) = L \\ 0 & \text{otherwise} \end{cases}\]In the following we’ll not sharply distinguish between a subset and its representation by an indicator vector and we’ll permit indicator vectors to have any non-negative values where a zero signals the absence of an element and a positive value its presence.
We take an indicator vector \(v_1\) of a subset of vertices \(v_1 \subseteq V\) having the label \(L_1\). The equation \(v_1 = \chi_{L_1} \cdot v_1\) then holds.
The matrix product \(A v_1\) has non-zero elements for all vertices that are direct neighbors of the vertices represented by \(v_1\). Thus the vector \(v_2 = \chi_{L_2} \cdot (A v_1)\) represents the subset \(v_2 \subseteq V\) that are direct neighbors of \(v_1\) and having a label \(L_2\). If \(v_2\) is a zero vector, then there is no path in the graph extending from the vertices in \(v_1\) to any vertex labeled \(L_2\).
Solution Procedure
Based on these considerations we may now formulate a recursive approach to search for a path in \(G\) having a given label sequence.
We are given a problem \(P = (A, L, s, t)\) where \(A\) is an adjacency matrix, \(L\) are the vertices’ labels, \(s\) is a set of starting vertices and \(t\) is a label sequence. Then we proceed as follows:
- We find the vector \(n\) of neighbor vertices of \(s\) having a label \(t_1\) as \(n = \chi_{t_1} \cdot (A s)\).
- If \(n\) is null, then the problem fails.
- If \(n\) is not null and \(t\) has only one element, the the problem succeeds.
- Otherwise we loop over the individual elements \(\hat{s} \in s\):
- Find the neighbor vertices of \(\hat{s}\) that are labeled with \(t_1\) as the new starting set \(s^* = \chi_{t_1} \cdot (A \hat{s})\).
- Construct a new adjacency matrix \(A^*\) where \(\hat{s}\) is disconnected from all other vertices by setting the row and column corresponding to \(\hat{s}\) to zero. This ensures that the traversed vertices form a path.
- Build a new label sequence \(t^*\) from \(t\) with its first element removed.
- Recursively solve the problem \(P(A^*, L, s^*, t^*)\).
- When the sub-problem succeeds, the current problem succeeds, too.
- Otherwise we continue looping with the next \(\hat{s} \in s\).
- If we fall to the end of the loop, the problem fails.
The idea behind using the vector \(n\) is some kind of “pool testing”: With a single calculation we may find out if a target label is reachable from any of the start vertices and we may stop if it is not. However, if we find out that the label is reachable, we do not know from where and thus we need to try all of the start vertices individually (unless we hit the end of the path, when this doesn’t matter). And we may go one step further: The product \(A n\) indicates the neighbor vertices of \(n\). We may remove any vertex from \(s\) that is not in \(A n\), as it doesn’t have a neighbor labeled \(L_1\), i.e. we may reduce \(s \mapsto s \cdot (A n)\). Finally, when there is only one start vertex (which is not uncommon), we may use \(s^* = n\) directly without further calculation.
Preparation
To solve this task, we need to do construct the initial adjacency matrix representing the allowed “orthogonal paths” as given by the \(M \times N\) grid. Thereto we construct a 4-dimensional \(M \times N \times M \times N\) adjacency “matrix” that has ones on the secondary diagonals for rows and columns. Afterwards this hypermatrix is flattened and symmetrized to a ordinary \(M \cdot N \times M \cdot N\) adjacency matrix.
With \(w_1\) as the first letter from the given word we use \(\chi_{w_1}\) as the starting vector.
Operating on matrices and vectors once again calls for PDL
.
Implementation Details
- the matrix multiplication is overloaded on the
x
operator - The element-wise vector product is overloaded on the
*
operator. - Comparing a vector with a scalar produces a binary indicator vector
- The character grid’s structure is needed for the adjacency matrix only. Afterwards a linear indexing of the vertices is used.
Improvements
- The adjacency matrix containing mostly zeros could be represented in a sparse format like
PDL::CCS::Nd
. - If the edge lengths of the letter grid are larger than the search word’s length, the starting vector can be reduced to vertices having a maximum distance of this length to any vertex labeled with the last letter of the search word.
Code
use strict;
use warnings;
use PDL;
use PDL::NiceSlice;
use PDL::Char;
use experimental 'signatures';
sub find_word ($matrix, $word) {
my $m = PDL::Char->new($matrix);
my $label = $m->clump(1, 2);
my $w = PDL::Char->new([$word]);
my $start = ($label == $w(0))->long;;
my $adj4 = zeroes +($m((0))->dims) x 2;
$adj4( 0:-2,,1:-1)->clump(0, 1)->clump(1, 2)->diagonal(0, 1) .= 1;
$adj4(,0:-2,,1:-1)->clump(0, 1)->clump(1, 2)->diagonal(0, 1) .= 1;
my $adj = $adj4->clump(0, 1)->clump(1, 2)->sever;
$adj |= $adj->xchg(0, 1)->sever;
find_tail($adj, $label, $start, $w(1:-1));
}
sub find_tail ($adj, $label, $start, $tail) {
my $l = ($label == $tail(0))->long;
my $next = $l * ($adj x $start);
return 0 if !any $next;
return 1 if $tail->dim(0) == 1;
$start = ($start * ($adj x $next))->long->hclip(1)
if which($start)->nelem > 1;
my $ws = which $start;
for my $v ($ws->list) {
my $start1 = $ws->nelem == 1 ? $next :
$l * ($adj x zeroes($start)->set(0,$v,1));
my $adj1 = $adj->copy;
$adj1( $v) .= 0;
$adj1(,$v) .= 0;
return 1 if find_tail($adj1, $label, $start1, $tail(1:-1));
}
0;
}
See the full solution to task 2 containing trace statements.
There is a follow-up for this post.
If you have a question about this post or if you like to comment on it, feel free to open an issue in my github repository.