15 March 2024 | Challenge 260 |

# Uniqe Multisets

## Task 1: Unique Occurrences

**Submitted by: Mohammad Sajid Anwar**

You are given an array of integers, `@ints`

.

Write a script to return 1 if the number of occurrences of each value in the given array is unique or 0 otherwise.

### Example 1

```
Input: @ints = (1,2,2,1,1,3)
Output: 1
The number 1 occurred 3 times.
The number 2 occurred 2 times.
The number 3 occurred 1 time.
All occurrences are unique, therefore the output is 1.
```

### Example 2

```
Input: @ints = (1,2,3)
Output: 0
```

### Example 3

```
Input: @ints = (-2,0,1,-2,1,1,0,1,-2,9)
Output: 1
```

### Solution

The implementation is straightforward: Count the occurrences of each number and then check if there are no duplicate counts.

```
use strict;
use warnings;
use List::Utils 'pairvalues';
use List::MoreUtils qw(frequency duplicates);
sub uniq_occur {
! duplicates pairvalues frequency @_;
}
```

See the full solution.

## Task 2: Dictionary Rank

**Submitted by: Mark Anderson**

You are given a word, `$word`

.

Write a script to compute the dictionary rank of the given word.

### Example 1

```
Input: $word = 'CAT'
Output: 3
All possible combinations of the letters:
CAT, CTA, ATC, TCA, ACT, TAC
Arrange them in alphabetical order:
ACT, ATC, CAT, CTA, TAC, TCA
CAT is the 3rd in the list.
Therefore the dictionary rank of CAT is 3.
```

### Example 2

```
Input: $word = 'GOOGLE'
Output: 88
```

### Example 3

```
Input: $word = 'SECRET'
Output: 255
```

### Solution

#### Theoretical Analysis

The background of this task are permutations of multisets.
Wikipedia’s verbal descriptions for a **multiset**:

In mathematics, a

multiset(orbag, ormset) is a modification of the concept of a set that, unlike a set, allows for multiple instances for each of its elements. The number of instances given for each element is called the multiplicity of that element in the multiset.

and for **permutations of multisets**:

If \(M\) is a finite multiset, then a

\[{n \choose m_{1},m_{2},\ldots ,m_{l}}={\frac {n!}{m_{1}!\,m_{2}!\,\cdots \,m_{l}!}}={\frac {\left(\sum_{i=1}^{l}{m_{i}}\right)!}{\prod_{i=1}^{l}{m_{i}!}}}\]multiset permutationis an ordered arrangement of elements of \(M\) in which each element appears a number of times equal exactly to its multiplicity in \(M\). An anagram of a word having some repeated letters is an example of a multiset permutation. If the multiplicities of the elements of \(M\) (taken in some order) are \(m_{1}, m_{2}, ..., m_{l}\) and their sum (that is, the size of \(M\)) is \(n\), then the number of multiset permutations of \(M\) is given by themultinomial coefficient,

shall be sufficient for this task.

However, this task does not ask for the total number of multiset permutations (*mperm*s), but for the position of a given *mperm* in the lexicographical ordered list of all *mperm*s.
In the following we assume a multiset \(M\) having elements \((1, 2, ...,l)\) with multiplicities
\(m_{1}, m_{2}, ..., m_{l}\).

Let us denote
\(\mathcal{P}(M)\) as the set of all *mperm*s of \(M\).
Then we have
\(P(M) := |\mathcal{P}(M)| = {n \choose m_{1},m_{2},\ldots ,m_{l}}\)
as the number of *mperm*s for the multiset \(M\), where \(n = \sum_{i=1}^l m_i = |M|\).

As the multiset \(M\) is ordered, there is an induced order on \(\mathcal{P}(M)\) given by the lexicographical order and we can define the *rank* of an *mperm* \(p \in \mathcal{P}(M)\) as
\(R(p) := |\{q \in \mathcal{P}(M) | q \lt p\}|\), i.e. the number of preceding *mperm*s.

If \(M = \{m\}\) is a singleton, then \(R((m)) = 0\) as there is only one permutation and it has no predecessor.

Let \(n = |M| > 1\) and \(p = (p_1, ...,p_n) \in \mathcal{P}(M)\) a *mperm* of \(M\).

For every \(m \in M, m < p_1\) and every *mperm*
\(p^* = (p_2^*, ..., p_n^*) \in \mathcal{P}(M \setminus \{m\})\)
we can see that \((m, p^*) \in \mathcal{P}(M)\) and \((m, p^*) \lt p\) as these start with a smaller element than \(p\).
The number of *mperm*s that can be formed from the remaining elements \(\{p_2^*, ..., p_n^*\}\) is known by its multinomial coefficient \(P(M \setminus \{m\})\).

This enables us to count all *mperm*s of \(M\) that start with an element less than \(p_1\).
The remaining *mperm*s of \(M\) that precede \(p\) must all start with \(p_1\) and their number is
\(R((p_2,...,p_n))\)
leading to a recursive formula for the rank of \(p = (p_1, ...,p_n)\):

For an implementation of this formula it is important to note that the involved multinomial coefficients can easily be derived from a single initial coefficient. Consider

and for \(i \in M\)

Exploiting the latter relation, this approach has a time complexity \(\mathcal{O}(n^2)\) in contrast to the exponential effort of an enumeration of *mperm*s.

See examples in the full solution on larger words where enumeration becomes inapplicable.

#### Implementation

A *mperm* may be represented as an array `@p`

of its elements in their order.
Though the underlying multiset of a *mperm* may be derived from the *mperm* itself, it comes handy to have a separate representation of this multiset as an array `@m`

of the elements multiplicities.

To gain the representations of the *mperm* and its multiset, a few steps need to be performed:

- count the multiplicity of every character in
`$word`

- sort the pairs (character, multiplicity) ascending by character
- map each unique character in
`$word`

to an index from`0 .. l`

- convert
`$word`

to a*mperm*of the indices`0 .. l`

- convert
`$word`

to a multiset represented by the array of multiplicities - calculate the initial multinomial coefficient for the multiset

The latter two give the starting point to the rank calculation as described in the previous section:

- pick the leading element from the
*mperm* - return the rank if it is the last element
- loop over all unique elements in the multiset that are less than the current leading element
- calculate and accumulate the cardinality of the
*mperm*set for the multiset without the current loop element

- calculate and accumulate the cardinality of the
- remove the leading element from the multiset and adjust the multinomial coefficient
- recursively accumulate the rank for the reduced
*mperm*

Finally a tail-call optimization can be performed: The next recursion level may overwrite the current context. For this purpose the subroutine arguments are given by reference and are modified in place.

The calculated rank is zero-based and needs to be adjusted for the given task.

```
use v5.24;
use warnings;
use bigint;
use Math::Prime::Util qw(vecsum vecprod factorial vecreduce);
use List::AllUtils qw(sort_by count_by pairs);
use experimental qw(refaliasing signatures);
sub dictionary_rank {
no bigint;
my @chars = split //, shift;
my @freq = sort_by {$_->[0]} pairs count_by {$_} @chars;
(\my %chartoidx)->@{map $_->[0], @freq} = 0 .. $#freq;
my @mperm = @chartoidx{@chars};
my @mset = map $_->[1], @freq;
use bigint;
my $mult = factorial(@mperm) / vecprod map factorial($_), @mset;
multipermtonum(0->copy, $mult, \@mperm, \@mset);
}
sub multipermtonum ($, $, $perm, $set) {
\my $num = \$_[0];
\my $mult = \$_[1];
my $n = @$perm;
return $num if $n == 1;
my $first = shift @$perm;
$num += vecreduce {
$a + $mult * $set->[$b] / $n;
} 0, grep $set->[$_], 0 .. $first - 1;
$mult = $mult * $set->[$first]-- / $n;
goto &multipermtonum;
}
```

See the full solution.

If you have a question about this post or if you like to comment on it, feel free to open an issue in my github repository.