The Bear's Den

Enter at your own risk

15 March 2024

Challenge 260

Uniqe Multisets

Task 1: Unique Occurrences
Task 2: Dictionary Rank

Task 1: Unique Occurrences

Submitted by: Mohammad Sajid Anwar

You are given an array of integers, @ints.

Write a script to return 1 if the number of occurrences of each value in the given array is unique or 0 otherwise.

Example 1

Input: @ints = (1,2,2,1,1,3)
Output: 1

The number 1 occurred 3 times.
The number 2 occurred 2 times.
The number 3 occurred 1 time.

All occurrences are unique, therefore the output is 1.

Example 2

Input: @ints = (1,2,3)
Output: 0

Example 3

Input: @ints = (-2,0,1,-2,1,1,0,1,-2,9)
Output: 1

Solution

The implementation is straightforward: Count the occurrences of each number and then check if there are no duplicate counts.

use strict;
use warnings;
use List::Utils 'pairvalues';
use List::MoreUtils qw(frequency duplicates);

sub uniq_occur {
    ! duplicates pairvalues frequency @_;
}

See the full solution.

Task 2: Dictionary Rank

Submitted by: Mark Anderson

You are given a word, $word.

Write a script to compute the dictionary rank of the given word.

Example 1

Input: $word = 'CAT'
Output: 3

All possible combinations of the letters:
CAT, CTA, ATC, TCA, ACT, TAC

Arrange them in alphabetical order:
ACT, ATC, CAT, CTA, TAC, TCA

CAT is the 3rd in the list.
Therefore the dictionary rank of CAT is 3.

Example 2

Input: $word = 'GOOGLE'
Output: 88

Example 3

Input: $word = 'SECRET'
Output: 255

Solution

Theoretical Analysis

The background of this task are permutations of multisets. Wikipedia’s verbal descriptions for a multiset:

In mathematics, a multiset (or bag, or mset) is a modification of the concept of a set that, unlike a set, allows for multiple instances for each of its elements. The number of instances given for each element is called the multiplicity of that element in the multiset.

and for permutations of multisets:

If $M$ is a finite multiset, then a multiset permutation is an ordered arrangement of elements of $M$ in which each element appears a number of times equal exactly to its multiplicity in $M$. An anagram of a word having some repeated letters is an example of a multiset permutation. If the multiplicities of the elements of $M$ (taken in some order) are $m_{1}, m_{2}, ..., m_{l}$ and their sum (that is, the size of $M$) is $n$, then the number of multiset permutations of $M$ is given by the multinomial coefficient,
\[{n \choose m_{1},m_{2},\ldots ,m_{l}}={\frac {n!}{m_{1}!\,m_{2}!\,\cdots \,m_{l}!}}={\frac {\left(\sum_{i=1}^{l}{m_{i}}\right)!}{\prod_{i=1}^{l}{m_{i}!}}}\]

shall be sufficient for this task.

However, this task does not ask for the total number of multiset permutations (mperms), but for the position of a given mperm in the lexicographical ordered list of all mperms. In the following we assume a multiset $M$ having elements $(1, 2, ...,l)$ with multiplicities $m_{1}, m_{2}, ..., m_{l}$.

Let us denote $\mathcal{P}(M)$ as the set of all mperms of $M$. Then we have $P(M) := |\mathcal{P}(M)| = {n \choose m_{1},m_{2},\ldots ,m_{l}}$ as the number of mperms for the multiset $M$, where $n = \sum_{i=1}^l m_i = |M|$.

As the multiset $M$ is ordered, there is an induced order on $\mathcal{P}(M)$ given by the lexicographical order and we can define the rank of an mperm $p \in \mathcal{P}(M)$ as $R(p) := |\{q \in \mathcal{P}(M) | q \lt p\}|$, i.e. the number of preceding mperms.

If $M = \{m\}$ is a singleton, then $R((m)) = 0$ as there is only one permutation and it has no predecessor.

Let $n = |M| > 1$ and $p = (p_1, ...,p_n) \in \mathcal{P}(M)$ a mperm of $M$.

For every $m \in M, m < p_1$ and every mperm $p^* = (p_2^*, ..., p_n^*) \in \mathcal{P}(M \setminus \{m\})$ we can see that $(m, p^*) \in \mathcal{P}(M)$ and $(m, p^*) \lt p$ as these start with a smaller element than $p$. The number of mperms that can be formed from the remaining elements $\{p_2^*, ..., p_n^*\}$ is known by its multinomial coefficient $P(M \setminus \{m\})$.

This enables us to count all mperms of $M$ that start with an element less than $p_1$. The remaining mperms of $M$ that precede $p$ must all start with $p_1$ and their number is $R((p_2,...,p_n))$ leading to a recursive formula for the rank of $p = (p_1, ...,p_n)$:

$$R((p_1, ...,p_n)) = \sum_{m < p_1} P(M \setminus \{m\}) + R((p_2, ...,p_n))$$

For an implementation of this formula it is important to note that the involved multinomial coefficients can easily be derived from a single initial coefficient. Consider

$$ P(M) = {n \choose m_{1},m_{2},\ldots ,m_{l}} = {\frac {n!}{m_{1}!\,m_{2}!\,\cdots \,m_{l}!}} $$

and for $i \in M$

$$ P(M \setminus \{i\}) = {n - 1 \choose m_{1},m_{2},\ldots m_{i} - 1,\ldots ,m_{l}} = {\frac {(n - 1)!}{m_{1}!\,m_{2}!\,\cdots \,(m_{i} - 1)!\,\cdots \,m_{l}!}} = {\frac{m_i P(M)}{n}} $$

Exploiting the latter relation, this approach has a time complexity $\mathcal{O}(n^2)$ in contrast to the exponential effort of an enumeration of mperms.

See examples in the full solution on larger words where enumeration becomes inapplicable.

Implementation

A mperm may be represented as an array @p of its elements in their order. Though the underlying multiset of a mperm may be derived from the mperm itself, it comes handy to have a separate representation of this multiset as an array @m of the elements multiplicities.

To gain the representations of the mperm and its multiset, a few steps need to be performed:

count the multiplicity of every character in $word
sort the pairs (character, multiplicity) ascending by character
map each unique character in $word to an index from 0 .. l
convert $word to a mperm of the indices 0 .. l
convert $word to a multiset represented by the array of multiplicities
calculate the initial multinomial coefficient for the multiset

The latter two give the starting point to the rank calculation as described in the previous section:

pick the leading element from the mperm
return the rank if it is the last element
loop over all unique elements in the multiset that are less than the current leading element
- calculate and accumulate the cardinality of the mperm set for the multiset without the current loop element
remove the leading element from the multiset and adjust the multinomial coefficient
recursively accumulate the rank for the reduced mperm

Finally a tail-call optimization can be performed: The next recursion level may overwrite the current context. For this purpose the subroutine arguments are given by reference and are modified in place.

The calculated rank is zero-based and needs to be adjusted for the given task.

use v5.24;
use warnings;
use bigint;
use Math::Prime::Util qw(vecsum vecprod factorial vecreduce);
use List::AllUtils qw(sort_by count_by pairs);
use experimental qw(refaliasing signatures);

sub dictionary_rank {
    no bigint;
    my @chars = split //, shift;
    my @freq = sort_by {$_->[0]} pairs count_by {$_} @chars;
    (\my %chartoidx)->@{map $_->[0], @freq} = 0 .. $#freq;
    my @mperm = @chartoidx{@chars};
    my @mset = map $_->[1], @freq;

    use bigint;
    my $mult = factorial(@mperm) / vecprod map factorial($_), @mset;
    multipermtonum(0->copy, $mult, \@mperm, \@mset);
}

sub multipermtonum ($, $, $perm, $set) {
    \my $num = \$_[0];
    \my $mult = \$_[1];
    my $n = @$perm;
    return $num if $n == 1;
    my $first = shift @$perm;

    $num += vecreduce {
        $a + $mult * $set->[$b] / $n;
    } 0, grep $set->[$_], 0 .. $first - 1;

    $mult = $mult * $set->[$first]-- / $n;

    goto &multipermtonum;
}

See the full solution.

If you have a question about this post or if you like to comment on it, feel free to open an issue in my github repository.