jack

Problem: What is the shortest base-d sequence which contains all sequences of n d-digits?

I think I have a solution, but it's a bit of a sketch. The best possible is length d^n, assuming all adjacent sequences overlap (n-1), and I aim to show how to make one that length.

Instead of a sequence, consider a loop. We write this as a sequence, but consider the ends to be joined. This is because there's enough symmetry we can probably solve the one problem if we can solve the other, and because we can combine sequences without having to have n-1 tails hanging off the ends.

For instance, for 3-sequences of 2-digits, the loop 00010111 contains all 2^3 3-sequences, two of them wrapping round the end.

Lemma: If two loops contain a lot of n-sequences once each, and somewhere contain the same (n-1)-sequence, they can be combined into one loop which does so, having the sum of their lengths.

Write them as [n-1][loop1] and [n-1][loop2]. Then [n-1][loop1][n-1][loop2] is the sum of the lengths, and contains every sequence the preceeding did, since any n-subsequence corresponds to one in one of the originals.

We use (strong) Induction on d and n[1] to show that there is always such a sequence

Find a solution for n=d=2, and then for arbitrary n,d assume we can produce a desired sequence S(m,c) for any s,m smaller than n,d.

What are the n-sequences we need to include? Consider the groups:

Those which contain no 0s.
Those which contain one 0.
...
Those which contain n-1 0s.
The sequence of n 0s.

The first is all the sequences of d-1 digits length n, so we have a loop s(d-1,n) of length (d-1)^n.

The second. For each position of the zero, the rest of the sequence is determined by an n-1-sequence of d-1 digits, producing n*(d-1)^(n-1). Construct a loop s(d-1,n-1) of length (d-1)^(n-1), writing it as abc...m... (where m represents the n-1'th digit)

Consider loops like that but with a 0 inserted every n digits:

0abc...m0...
a0bc...m.0..
ab0c...m..0.
abc0...m...0
...
abc...0m...

Observe that there are n-1 copies of the original loop, and an extra 0 for each position in the original loop, making a total length of n*(d-1)^(n-1). Also, sequence in the original loop is in one of the new loops with a 0 inserted into any position (just read down the column, noting the zeros move one place right each time).

So these loops contain every sequence of the second category exactly once.

They can be stitched together by considering sequences of nearly all 1s -- one loop must have 0111..1 and a different 1011..1, which contain the n-1 subsequence 0111..1 so can be attatched, etc.

This loops can be stitched the first category loop between 0111..1 and 1111..1. Now we have a loop containing all d-digit n-sequences containing no or one 0.

This is approaching the order desired, and some examples make me believe the other categories can be treated the same way with more concentration on the permutations, but I can't formalise it. I think I've done it by hand for up to 4-sequences of any-digits.

If anyone else wants to object, or fill in the gaps, feel free.

[1] OK, I'm using induction a lot, but look, I'm not using proof by contradiction. It's an actual constructive proof you could in theory turn into an algorithm :)

Flat | Top-Level Comments Only

From:

simont

There is a known algorithm which solves this problem, and you're right that it can always be done in d^n. I know the algorithm (and in fact I have a small script that implements it lying around somewhere or other), but I don't think I've ever seen a proof of why it always works.

cartesiandaemon.livejournal.com

Thank you.

(I assumed someone must have worked it out already, but Ian brought it up at Sonic's party, and everyone else was hot to work it out, so I felt honour-bound to work it out for myself.)

damerell.livejournal.com

I just wrote a script which starts 9999 and appends the lowest digit that does not repeat a previously seen sequence. I don't know if this is your algorithm but it generates a 10003 digit sequence ending 999 (hence equivalent to a 10000 digit loop).

At the party I was thinking "assume some sequence ABCD is repeated" but didn't get very far showing a contradiction from that.

I don't know if this is your algorithm

It is, yes. The source I got it from (which I think was a book by Ian Stewart) asserted that this method always succeeds, for any n and d, but unfortunately didn't include a proof of this claim. However, I've tried it myself on a variety of (n,d) pairs and not seen a failure, and Ian Stewart is generally reliable, so I have no reason to think there isn't a proof. I just don't know what it is.

What is the shortest sequence that contains all n-sequences?

What is the shortest sequence that contains all n-sequences?

no subject

no subject

no subject

no subject

Active Recent Entries