For JD from Austin — this time the donkey actually sings

v2 of the singing-face capability: whisper-cli transcribes the vocals, eng-to-ipa maps each word to a phoneme stream, and a six-class viseme overlay paints geometrically distinct mouth shapes per sound. Open AH, wide EE, round OH, tight OO, pressed MM, closed REST. Not a loudness flap.