I did something similar for a math game I made, allowing the game to speak numbers, and the results weren't that bad. I recorded each number from zero through 20, the separate scaling factors (hundred, thousand, million, billion, trillion), then wrote a script to put them all together to speak a number. The results are obviously robotic, as I didn't bother to record various inflections in accordance with where the clips were in the sentence, or even in the same word. It was going to be more work than I was willing to invest.
For generalized speech synthesis, which is what you're implying, you're looking at a ton of work for results that are going to sound painfully robotic (at the very least, you need to record leading, neutral, and trailing inflections for each possible syllable).
I think you'll get better results by recording generic announcements and tying them to players in some other way.