14 February 2025 | Challenge 308 |
Common Encodings
Task 1: Count Common
Submitted by: Mohammad Sajid Anwar
You are given two array of strings, @str1
and @str2
.
Write a script to return the count of common strings in both arrays.
Example 1
Input: @str1 = ("perl", "weekly", "challenge")
@str2 = ("raku", "weekly", "challenge")
Output: 2
Example 2
Input: @str1 = ("perl", "raku", "python")
@str2 = ("python", "java")
Output: 1
Example 3
Input: @str1 = ("guest", "contribution")
@str2 = ("fun", "weekly", "challenge")
Output: 0
Solution
Generalizing the task to any number of arrays.
As the task makes no restrictions on the strings in each array, we must not assume the strings being unique.
This raises the question, how to deal with non-unique strings.
I presume they count in their multiplicity, i.e. the same string appearing n
times in the first array and m
times in the second array counts as min(n, m)
common strings.
This assumption directly leads to the following solution:
For each string we record the number of appearances in each array.
Then we take the sum over the
minimum number of appearances over all arrays for each word.
use strict;
use warnings;
use List::Util qw(min sum);
sub count_common {
my %count;
my @z = (0) x @_;
while (my ($i, $words) = each @_) {
($count{$_} //= [@z])->[$i]++ for @$words;
}
sum map min(@$_), values %count;
}
See the full solution to task 1.
Task 2: Decode XOR
Submitted by: Mohammad Sajid Anwar
You are given an encoded array and an initial integer.
Write a script to find the original array that produced the given encoded array. It was encoded such that encoded[i] = orig[i] XOR orig[i + 1]
.
Example 1
Input: @encoded = (1, 2, 3), $initial = 1
Output: (1, 0, 2, 1)
Encoded array created like below, if the original array was (1, 0, 2, 1)
$encoded[0] = (1 xor 0) = 1
$encoded[1] = (0 xor 2) = 2
$encoded[2] = (2 xor 1) = 3
Example 2
Input: @encoded = (6, 2, 7, 3), $initial = 4
Output: (4, 2, 0, 7, 4)
Solution
We may XOR
both sides of the given equation with orig[i]
.
Taking into account that
x XOR x == 0
x XOR 0 == x
we find:
encoded[i] XOR orig[i] = orig[i + 1]
Using List-Utilities
This is a nice use case for reductions.
$initial
is given as the first argument.
use strict;
use warnings;
use List::Util 'reductions';
use experimental 'bitwise';
sub decode_xor {
reductions {$a ^ $b} @_;
}
For the sake of completeness, here is the corresponding encoder. In this case slide comes handy.
use List::MoreUtils 'slide';
sub encode_xor {
slide {$a ^ $b} @_;
}
Using PDL
Though the implementations based on List::Util
and List::MoreUtils
appear as simple as possible, it is interesting to take a look at PDL
.
The encoder is straightforward. We shift the original list by one element and perform an XOR operation between both.
use PDL;
use PDL::NiceSlice;
sub encode_xor_pdl {
my $orig = long @_;
$orig(0:-2) ^ $orig(1:-1);
}
When it comes to decoding, things are a bit more complicated.
Here the input for the XOR operation and its output access the same data and therefore we cannot use the overloaded ^
operator.
Instead we need to call the core function explicitly, providing all parameters.
The second operand to XOR and the result are overlapping slices of the same ndarray $orig
.
It is initialized with $init
followed by zeroes.
The zeroes are being overwritten with the recovered original values in the course of the internal loop.
sub decode_xor_pdl {
my $init = shift;
my $enc = long @_;
my $orig = zeroes(long, $enc->dim(0) + 1)->set(0, $init);
PDL::xor($enc, $orig(0:-2), $orig(1:-1), 0);
$orig;
}
Both PDL subs may be called with the list of numbers as an array, an array ref or an ndarray. While it doesn’t make a big difference for the list implementations if the interface is based on an array or an array ref, the impact on the PDL implementations is significant. When given an array or an array reference, most effort involves converting the input into an ndarray, where in the former case this takes even longer than the whole process in the list implementations. To unleash the full power of PDL, the input has to be provided as an ndarray. Though such a comparison between the list and PDL implementations might not be fair, it is at least impressive.
Here is a benchmark for all four subs, operating on \(2^{18}\) random long
numbers.
Rate dec_list enc_list dec_pdl enc_pdl
dec_list 77.4/s -- -9% -96% -99%
enc_list 84.9/s 10% -- -96% -99%
dec_pdl 1895/s 2350% 2134% -- -70%
enc_pdl 6218/s 7938% 7227% 228% --
See the full solution to task 2.
If you have a question about this post or if you like to comment on it, feel free to open an issue in my github repository.