Yes, the drift would be from the receiver but we can rule that possibility out as most Kiwi SDRs are generally locked to GPS reference and will be stable.
I have historically found Sound Blaster cards to also be stable. The sound card settings would be the sample rate, I.E. 44100 Hz, 48000 Hz etc ... Make sure that the "playback" and "recording" devices are all set to the same sample rate or else you will be re-sampling the audio. Also make sure there are no "enhancements" such as virtual sound, loudness equalization etc ... turned on.
The “what u hear” is a Sound Blaster software implemented feature. Windows XP was that last version of the OS to natively support this feature in the audio mixer, so I cannot comment on it's use. Is it possible that other "Windows sounds" such as e-mail notifications or even hum from the microphone input or line in are occurring? “what u hear” will be picking up everything and feeding it to MMSSTV.
That is a large amount of sheer in that audio calibration line so something is upsetting the sampling rate. I agree that the MMSSTV calibration is poorly explained. The first thing you should do is right click on the WWV tick line. This will move that dotted green line located on the right side of your screen on top of the WWV sample. Then you can left click at the bottom of the sample followed by another left click at the top of the sample. That will get you calibrated but will not cure the sheer issue.