David Schlangen : Home Page > minutes100907
- present: Michaela, Timo, David
- more on predicting end-of... turns? utterances?
- important: need to add certain amount of silence at end of each
training unit, to see if certainty about turn end is higher in
reactive model.
- binning of predictions: end is 3secs or longer away, between 2 and
3, between 2 and 1.8, etc.. bins get progressively larger at
higher distances. or, equvialently: prediction has higher temporal
resolution for closer end-events than for more distant ones.
- use our infrastructure for this:
- player tool that reads in file for utterance and plays the
utterance to the other modules; these modules compute features
which are either recorded in a file (for training) or are sent
to classifier (for live mode / online evaluation).
- should be able to handle various amounts of gold info.
- e.g., minimal version of file:
"utterance ID, file, start.time, end.time"
- version w/ some gold info:
"utterance ID, file, WORD, POS, F0-summary, start.time, end.time"
- evaluation:
- online: factors in time used up by classifier. While classifier
is computing its prediction, point that it predicts as end may
already have been reached! So the measure here is "when could
system have made a decision?"
- more complicated online version: assume certain time needed to
compute a response, and another (shorter) time to release
pre-planned utterance. Given these constraints, how good is
system at always having something ready when turn is yielded,
and how responsive is it?
- a good model of human turn-taking performance could be expected
to make the same systematic errors, e.g. overlap over optional
phrases (talking over tag-questions). Should be factor in / be
considered when doing error analysis.
- offline: can be done on corpus in the same format as for
training & in the usual way of feeding a vector to classifier
and comparing its decision to the truth.
- open question / parameter to play with: what are the increments at
which
- gold info is encoded
- features are created for training and testing
(doesn't have to be the same, larger increments in gold can be
hacked into smaller ones by player)
- each utterance generates (utt_length / increment_length) vectors
for learning & testing.
---> to do: build minimal version with only one feature: how long
has the utterance already gone on for?
das, 09/10/07 03:57 (GMT)
Keyword: emmy,
inpro,
learning,
meetings,
minutes,
predicting TRPsAdd a new page under this one