
This was an excellent question to ask me, but unfortunately, I'm completely blocked on an answer! But I think it's interesting anyway, I will come back with an actual answer after December if one occurs to me.
OK, my theory of puzzles. I think a quintessential puzzle is one where, when you guess the answer or it's pointed out, it's immediately obvious that it's right (else it's not a puzzle, it's just difficult), but it's not obvious in advance. And it's hard to quantify this, but it should not just be a matter of exhaustively trying possible answers, there should be different potential avenues and be able to make a leap to the right answer (eg. "find the prime factors of $largenumber" isn't really a puzzle).
A common sort of puzzle is riddle-puzzles where you need to relax some assumption which you made, but explicitly wasn't included in the statement of the puzzle. These are normally carefully constructed, but can happen in real research too, where some insight that "hey, what if we DIDN'T do this" suddenly makes everything else fall into place.
Until I wrote the previous paragraph I hadn't thought of this as a category, but I asked, are there quintessential puzzles which are not riddle-puzzles? And I guess, they tend to be ones where the answer is reached by a chain of reasoning where each step isn't that hard, but it's impossible to see the next step until you've done the previous one.
Another sort of puzzle that shows up especially amongst mathematicians is an open-ended puzzle. Like, if there's a puzzle "five people have to do X in less than forty seconds", mathematicians don't even hear "five" and "forty", they hear "if N people have to do X, what's the least time it's possible to do it in? provide an example and proof there's no smaller answer". But often it's the case that the question asked is a good place to start, and if you can solve that, it's a similar, harder problem to solve it generalised in various ways. Obviously any question _could_ be generalised like this, but only some actually get more interesting as you do so: sometimes all the generalisations are either the same, or impossible, or a list of special cases.
I really need some specific examples, but can't think of anything good. Anyone else suggest any?
But I'll end on one that I remember, at school, proving that moving a tower of hanoi from one needle to another takes 2^n-1 moves. It was one of the first times I worked something out for myself and wrote it all up with a proof and conclusion, at mum's suggestion.