“TaleSinger” is an RPG without combat, set in mythic Iron Age Wales. You play as young bard Gwen, using music and stories to inspire, persuade or shame audiences into standing up to the invading Romans. It features a unique performance mechanic to change the emphasis of songs, and a symbolic magic system to craft totems based on thematic associations of ingredients. The map is divided into a series of open-world hubs, and travelling between them costs time. To succeed you must discover the stories of the Celts and turn them to your own ends.
“TaleSinger” is about the use of story and song to elicit an emotional response, whether it be motivating a village of Celtic warriors into fighting back against an encroaching Roman garrison or eliciting sympathy in a cruel captor to gain your release.
Our characters will be required to talk, interact and emote believably and our players will need to be able to read and understand those emotions being portrayed on screen. There is a lot of dialogue, over 150 speaking characters with almost two dozen Primary character roles. Decisions made by the player result in branching dialogue options that will necessitate the ability to choose and seamlessly blend between multiple lines of dialogue that run the whole gamut of human emotions.
Animating speech is the process of matching the mouth movements of your animation to the phonemes of your audio track and is commonly referred to as “lip-synch”. What is a phoneme you ask? Wikipedia defines it as, “one of the units of sound that distinguish one word from another in a particular language.” “Ooo” and “Aah” for instance, are two different phonetic sounds.
The first people to explore facial animation in any depth were the pioneers of cartoon animation in the early days of Disney, Warner Bros and MGM. Through a mixture of observational research and trial and error, they concluded that an animator could get by with a minimum of 9 main mouth expressions to be used to match the major phoneme sounds. It’s estimated that there are about 35 phoneme shapes in the English language, but the Disney animators realised that several sounds can be represented by the same mouth position. For instance, the closed mouth shape used to represent the sounds M, B & P.
Early 2D games, such as “The Secrets of Monkey Island”, used this style of animation for their dialogue based cut-scenes. In the move to 3D, dialogue animation has become more nuanced in its attempts to become as realistic as possible, but the underlying principle of matching face shapes to phonemes hasn’t altered.
Basic human emotions
Emotions are an incredibly important aspect of human life. They compel us to take action and influence the decisions we make, both large and small. Social communication is an essential part of our daily lives and relationships and emotions allow other people to understand how we are feeling, allowing them to act and intervene if so desired and vice versa.
For more than 40 years the American Psychologist Paul Ekman has supported the view that emotions are discrete, measurable and physiologically different. His most influential work, however, has revolved around the finding that certain emotions are universally recognised, even in cultures that are pre-literate and could not have learned these emotions through any form of media. His conclusion being that these emotions are essentially hard-wired into our DNA. His findings led to the classification of six basic emotions and how to differentiate them:
This is facial animation at its most rudimentary. 9 phoneme shapes and 6 basic emotions are a good base but will only get you so far. They may work for a low-budget, pre-school TV animation show, but they won’t cut the mustard for a AAA RPG console game, played by experienced students of human emotions, not if you want to make any money anyway. Our emotional range is far more nuanced. What about confusion, anxiety or jealousy? A lot of emotions blend together or share common attributes, fear and disgust for instance. Plus, you can have differing intensity levels of emotions transforming into different classifications; surprise can turn into shock by the simple act of widening the eyes, opening the mouth and raising the eyebrows.
So, we have our research and our asset requirements. It became clear very early in the development process that we were going to need a method of generating a wide range of believable emotions, across hundreds of characters, plus a means of efficiently sharing animation data between these characters. Our next step was to identify the tools that will help us achieve our goals to a high standard, on time and on budget.
[END OF PART TWO]