Problem: What is the shortest base-d sequence which contains all sequences of n d-digits?
I think I have a solution, but it's a bit of a sketch. The best possible is length d^n, assuming all adjacent sequences overlap (n-1), and I aim to show how to make one that length.
Instead of a sequence, consider a loop. We write this as a sequence, but consider the ends to be joined. This is because there's enough symmetry we can probably solve the one problem if we can solve the other, and because we can combine sequences without having to have n-1 tails hanging off the ends.
For instance, for 3-sequences of 2-digits, the loop 00010111 contains all 2^3 3-sequences, two of them wrapping round the end.
Lemma: If two loops contain a lot of n-sequences once each, and somewhere contain the same (n-1)-sequence, they can be combined into one loop which does so, having the sum of their lengths.
Write them as [n-1][loop1] and [n-1][loop2]. Then [n-1][loop1][n-1][loop2] is the sum of the lengths, and contains every sequence the preceeding did, since any n-subsequence corresponds to one in one of the originals.
We use (strong) Induction on d and n[1] to show that there is always such a sequence
Find a solution for n=d=2, and then for arbitrary n,d assume we can produce a desired sequence S(m,c) for any s,m smaller than n,d.
What are the n-sequences we need to include? Consider the groups:
Those which contain no 0s.
Those which contain one 0.
...
Those which contain n-1 0s.
The sequence of n 0s.
The first is all the sequences of d-1 digits length n, so we have a loop s(d-1,n) of length (d-1)^n.
The second. For each position of the zero, the rest of the sequence is determined by an n-1-sequence of d-1 digits, producing n*(d-1)^(n-1). Construct a loop s(d-1,n-1) of length (d-1)^(n-1), writing it as abc...m... (where m represents the n-1'th digit)
Consider loops like that but with a 0 inserted every n digits:
0abc...m0...
a0bc...m.0..
ab0c...m..0.
abc0...m...0
...
abc...0m...
Observe that there are n-1 copies of the original loop, and an extra 0 for each position in the original loop, making a total length of n*(d-1)^(n-1). Also, sequence in the original loop is in one of the new loops with a 0 inserted into any position (just read down the column, noting the zeros move one place right each time).
So these loops contain every sequence of the second category exactly once.
They can be stitched together by considering sequences of nearly all 1s -- one loop must have 0111..1 and a different 1011..1, which contain the n-1 subsequence 0111..1 so can be attatched, etc.
This loops can be stitched the first category loop between 0111..1 and 1111..1. Now we have a loop containing all d-digit n-sequences containing no or one 0.
This is approaching the order desired, and some examples make me believe the other categories can be treated the same way with more concentration on the permutations, but I can't formalise it. I think I've done it by hand for up to 4-sequences of any-digits.
If anyone else wants to object, or fill in the gaps, feel free.
[1] OK, I'm using induction a lot, but look, I'm not using proof by contradiction. It's an actual constructive proof you could in theory turn into an algorithm :)
I think I have a solution, but it's a bit of a sketch. The best possible is length d^n, assuming all adjacent sequences overlap (n-1), and I aim to show how to make one that length.
Instead of a sequence, consider a loop. We write this as a sequence, but consider the ends to be joined. This is because there's enough symmetry we can probably solve the one problem if we can solve the other, and because we can combine sequences without having to have n-1 tails hanging off the ends.
For instance, for 3-sequences of 2-digits, the loop 00010111 contains all 2^3 3-sequences, two of them wrapping round the end.
Lemma: If two loops contain a lot of n-sequences once each, and somewhere contain the same (n-1)-sequence, they can be combined into one loop which does so, having the sum of their lengths.
Write them as [n-1][loop1] and [n-1][loop2]. Then [n-1][loop1][n-1][loop2] is the sum of the lengths, and contains every sequence the preceeding did, since any n-subsequence corresponds to one in one of the originals.
We use (strong) Induction on d and n[1] to show that there is always such a sequence
Find a solution for n=d=2, and then for arbitrary n,d assume we can produce a desired sequence S(m,c) for any s,m smaller than n,d.
What are the n-sequences we need to include? Consider the groups:
Those which contain no 0s.
Those which contain one 0.
...
Those which contain n-1 0s.
The sequence of n 0s.
The first is all the sequences of d-1 digits length n, so we have a loop s(d-1,n) of length (d-1)^n.
The second. For each position of the zero, the rest of the sequence is determined by an n-1-sequence of d-1 digits, producing n*(d-1)^(n-1). Construct a loop s(d-1,n-1) of length (d-1)^(n-1), writing it as abc...m... (where m represents the n-1'th digit)
Consider loops like that but with a 0 inserted every n digits:
0abc...m0...
a0bc...m.0..
ab0c...m..0.
abc0...m...0
...
abc...0m...
Observe that there are n-1 copies of the original loop, and an extra 0 for each position in the original loop, making a total length of n*(d-1)^(n-1). Also, sequence in the original loop is in one of the new loops with a 0 inserted into any position (just read down the column, noting the zeros move one place right each time).
So these loops contain every sequence of the second category exactly once.
They can be stitched together by considering sequences of nearly all 1s -- one loop must have 0111..1 and a different 1011..1, which contain the n-1 subsequence 0111..1 so can be attatched, etc.
This loops can be stitched the first category loop between 0111..1 and 1111..1. Now we have a loop containing all d-digit n-sequences containing no or one 0.
This is approaching the order desired, and some examples make me believe the other categories can be treated the same way with more concentration on the permutations, but I can't formalise it. I think I've done it by hand for up to 4-sequences of any-digits.
If anyone else wants to object, or fill in the gaps, feel free.
[1] OK, I'm using induction a lot, but look, I'm not using proof by contradiction. It's an actual constructive proof you could in theory turn into an algorithm :)
no subject
Date: 2006-08-28 04:38 pm (UTC)Broadly speaking I do think that you're correct and that this proof works. The main bits that you need to formalise are:
a) Proof that you've got all combinations in the loops
b) Proof that you have no duplication of combinations in the loops
c) Proof that it's always possible to stick the loops together, and you don't (say) have situations in which the loops form disjoint sets.
Until you can provide proofs of that, though, it'll still claim it's wishful thinking ;)
[What I mean by wishful thinking is that it *may* be possible to get it to work, but it may equally possibly not be possible to get it to work. It's a bit like saying that you can solve the four colour theorum as follows:
"Assume for contractiction that there's a smallest map that cannot be four-coloured. Evidently we can five-colour this map with at most one country being the fifth colour. [Proof: Take out the country with the fewest borders. Four colour the resulting map. Return the country. Turn it into the fifth colour]. If this country borders three countries, it can be coloured the fourth colour. Contradiction, QED. If this country borders four countries, follow this set of proceedures. Contradiction, QED. If this country borders five countries, follow this set of procedures. Contradiction, QED. If the country borders six countries, you have a special case in which all countries have six borders, and this can be three-coloured. Contradiction, QED."
Which works, except for the irritating fact that proving a set of procedures works for the five border case is ... difficult. And while nothing convinces me that the proof can't be patched up to deal with this fact, the fact remains that I, and a lot of very clever people, have spent ages attempting to do the aforementioned patching, and the lack of result speaks for itself ;).]