Yes, there is a misunderstanding. Let's try this...
Let's say you experience frames lasting 100, 200 and 50 milliseconds each. What happens:
1) Naive approach - You move everything X pixels each frame. The result is that the movement speed is completely different every frame. This is almost always a bad thing.
2) DeltaTime - You move everything deltaTime * X pixels each frame, where deltaTime is 100, then 200, the 50, assuming you're timing frames correctly. This way, the movement speed is the same every frame. For simple applications, this is often enough.
But it's not perfect. As an extremely simple example, let's say that a collision is happening from t=150 to t=250 milliseconds. Your movement logic gets the deltaTimes 100, 200 and 50, so it only checks t=100, t=300 and t=350, and thus completely misses this collision. A different set of deltaTimes would result in a very different outcome. Throw in the complexity of a more realistic game and things can get pretty inconsistent to say the least.
3) Fixed timestep - You move everything X pixels at a time, but how many times you do that each frame varies based on the deltaTime. Let's say you want 20 game logic "frames" per second, or one iteration of your game logic every 50 milliseconds. Then on the 100ms frame, you execute the game logic twice. On the 200ms frame, you execute it four times. On the 50ms frame, you execute it once. If you get a frame of, say, 75ms, you execute it once and store a remainder of 25, which gets added to the time of the next frame (so you never "leak" any time).
Hopefully it's obvious why this solves the problem described above. As an additional bonus, I'm fairly sure you need a fixed timestep in order to implement replays or have time-rewinding mechanics, and it probably helps a lot with keeping networked multiplayer games in sync too.