das :: Activity :: Just Me | People: | Everyone | Friends & Community | Inbox | Just Me |
| Display: | Full-text | Summary |
| Include: | Blog Posts | Blog Comments | Files | Wiki Page | Wiki Comments |
| << Back | Page 2 of 8 | Forward >> |
|
|
minutes080508 das | page | Thu May 08 - present: M, T, D - priority: close processing chain. Finally get something from ASR to parser to DM to TTS -- even if it is only a parrot system! - Dialogue Manager: - can be something like Dipper, i.e. information-state update based.. - or FSA (specified in SCXML or similar?) - but rules can be simple anyway, simple FSA-stuff: - identification -> confirmation (repeat on negative) -> orientation -> confirmation (repeat on negative) -> placement (repeat on negative) - S: "Welche Teil?" U: "Das zweite von links" S: "Das hier?" U: "Nein, daneben". [ --> need to be able to deal with context-dependent utterances ] - do WOz pretty soon? Wizard hears user utterances, can trigger simple prompts: - "Welches Teil?" "Soll ich es drehen" "Wohin?"; "So?" "Hier?" - to hide that Wizard is human, let GUI do mouse movements? I.e., wizard selects parameters of action (selecting piece, rotating it, dragging it), then selects prompt ("So?"); this is then sent to system which executes action (e.g., computes and executes mouse path; plays synchronised utterance). This won't allow us to test reaction to smooth turn-taking (since it is non-incremental; the wizard will have to fully specify the action), but it will allow us to test user reactions & learn about the complexity of their speech. Especially the reactions to CRs like "so?". E.g., "nein, eins weiter hoch". THE FRIGGING WIKI IS BROKEN. you can find the complete minutes on my weblog.
|
|
|
|
minutes080508 das | page | Thu May 08 - present: M, T, D - priority: close processing chain. Finally get something from ASR to parser to DM to TTS -- even if it is only a parrot system! - Dialogue Manager: - can be something like Dipper, i.e. information-state update based.. - or FSA (specified in SCXML or similar?) - but rules can be simple anyway, simple FSA-stuff: - identification -> confirmation (repeat on negative) -> orientation -> confirmation (repeat on negative) -> placement (repeat on negative) - S: "Welche Teil?" U: "Das zweite von links" S: "Das hier?" U: "Nein, daneben". [ --> need to be able to deal with context-dependent utterances ] - do WOz pretty soon? Wizard hears user utterances, can trigger simple prompts: - "Welches Teil?" "Soll ich es drehen" "Wohin?"; "So?" "Hier?" - to hide that Wizard is human, let GUI do mouse movements? I.e., wizard selects parameters of action (selecting piece, rotating it, dragging it), then selects prompt ("So?"); this is then sent to system which executes action (e.g., computes and executes mouse path; plays synchronised utterance). This won't allow us to test reaction to smooth turn-taking (since it is non-incremental; the wizard will have to fully specify the action), but it will allow us to test user reactions & learn about the complexity of their speech. Especially the reactions to CRs like "so?". E.g., "nein, eins weiter hoch". |
|
|
|
Home Page das | page | Thu May 08 Besprechungsprotokolle / meeting minutes (newest first) 05/05/08 minutes080508 14/04/08 minutes140408 03/02/08 minutes030208b 04/12/07 minutes041207 26/11/07 @Timo 19/11/07 minutes191107 13/11/07 minutes131107 05/11/07 minutes051107 22/10/07 minutes221007 01/10/07 minutes2007_10_01 10/09/07 minutes100907 23/08/07 minutes230807 03/07/07 minutes030707 19/06/07 minutes190607 05/06/07 minutes050607_zeitwort2 21/05/07 minutes210507
Sonstiges
|
|
|
|
minutes140408 das | page | Mon Apr 14 - InPro, meeting, minutes, 14/04/08 - present: M, T, G, D - Gabriel demo'ed current state of Higgins. Displays duration of vocal action (both recognised and own) on timeline, uses simple boundary tone classification (up, down) to base decisions on thresholding on. (This is mostly a test of the architecture at the moment, the strategies are very simple.) - dysfluencies: what to do with aborted words? Most likely, sphinx will recognise rubbish. Would be too unrestrictive to include aborted versions of all words; adding other methods (e.g., using prosodic info) would require too much changes at low level of ASR. (Hm. But at some point we'll have frame-level / syllable-level prosodic info anyway. Shouldn't be too hard to let classifier judge whether word was perhaps misrecognised because it was a different, aborted one.) - Timo and Gabriel will work together on getting better classifier for boundary tone detection to work. Does it need to do speaker adaptation? - first step on syntax side: toy grammar for Pento domain. (``Nimm das {Kreuz | Teil | lange Ding} aus der Mitte links oben'') in Higgins parser. - using a grammar as linguistic model in sphinx apparently doesn't work incrementally (doesn't return results before top category has been found), but using statistical LM does work. (Although there are still technical problems, but it looks promising.) - even if we can't use a grammar, we can still bootstrap an n-gram LM with utterances generated from a domain grammar. |
|
|
|
Home Page das | page | Mon Apr 14 Besprechungsprotokolle / meeting minutes (newest first) 14/04/08 minutes140408 03/02/08 minutes030208b 04/12/07 minutes041207 26/11/07 @Timo 19/11/07 minutes191107 13/11/07 minutes131107 05/11/07 minutes051107 22/10/07 minutes221007 01/10/07 minutes2007_10_01 10/09/07 minutes100907 23/08/07 minutes230807 03/07/07 minutes030707 19/06/07 minutes190607 05/06/07 minutes050607_zeitwort2 21/05/07 minutes210507
Sonstiges
|
|
|
|
030208cont das | page | Mon Mar 03 - kurzfristige Projekte: - bababa2, SIGdial Poster - TO DOs, unprioritisiert: a) Silbengrenzen, von Aussprachewörterbuch kommend; b) echtes Audio verwenden, Kielkorpus; c) ASR verwenden, Wörter, ngramme; d) bessere speech states, phrasengrenzen (f. BCs); e) besser TT-Strategien; f) simulation, constant time < (or >) real-time; g) bessere Evaluation; h) interruption management; i) BC management; j) Parametrisierung (chattiness, interruption propability, etc.); k) adaptivity - mögliche Ansätze f. Paper: - in Richtung David T., `believable, non-scripted content-free background chatter' Nicht sehr überzeugend; um online erzeugt zu werden, doch ein wenig resourcenhungrig. Nur für Hintergrundgerede würde das wohl niemand ernsthaft einsetzen. - `simple rules create realistic turn-taking patterns' SSJ rules as *generative* rules, not just descriptive. Shows that such a set of rules, together w/ some audio magic, are enough to produce patterns that are `natural' (in a way that needs to be defined properly). Again sort of upper-bound; to get something like this working properly within a real system, here's what we would need in terms of components. - to do first: b), d), e), g). - needed: more principled metric for `naturalness' of resulting corpus. Multi-dimensional: distribution of gaps & overlaps, balance btw speakers, turn length (in time, but also # of utterances). - `syntactic and prosodic language modelling for incremental utterance segmentation', für Coling utterance end pointing, but in an incremental set up. Needed to know where to clear the chart of the parser. Connected to a well-researched task (i.e., easy to motivate & compare), but different in that we don't allow (as much?) right context. - method: - select only multi-utterance turns; EOUs to find are the turn-internal ones. - use original data & variants w/ various WER. Those need plausible time information. How much does this degrade performance? - what's a good way to evaluate this? follow-on effects of wrong decisions: an insert for example makes us restart the parser, and hence get other things wrong? |
|
|
|
030208cont das | page | Mon Mar 03 - kurzfristige Projekte: - bababa2, SIGdial Poster - TO DOs, unprioritisiert: a) Silbengrenzen, von Aussprachewörterbuch kommend; b) echtes Audio verwenden, Kielkorpus; c) ASR verwenden, Wörter, ngramme; d) bessere speech states, phrasengrenzen (f. BCs); e) besser TT-Strategien; f) simulation, constant time < (or >) real-time; g) bessere Evaluation; h) interruption management; i) BC management; j) Parametrisierung (chattiness, interruption propability, etc.); k) adaptivity - mögliche Ansätze f. Paper: - in Richtung David T., `believable, non-scripted content-free background chatter' Nicht sehr überzeugend; um online erzeugt zu werden, doch ein wenig resourcenhungrig. Nur für Hintergrundgerede würde das wohl niemand ernsthaft einsetzen. - `simple rules create realistic turn-taking patterns' SSJ rules as *generative* rules, not just descriptive. Shows that such a set of rules, together w/ some audio magic, are enough to produce patterns that are `natural' (in a way that needs to be defined properly). Again sort of upper-bound; to get something like this working properly within a real system, here's what we would need in terms of components. - to do first: b), d), e), g). - needed: more principled metric for `naturalness' of resulting corpus. Multi-dimensional: distribution of gaps & overlaps, balance btw speakers, turn length (in time, but also # of utterances). - `syntactic and prosodic language modelling for incremental utterance segmentation', für Coling utterance end pointing, but in an incremental set up. Needed to know where to clear the chart of the parser. Connected to a well-researched task (i.e., easy to motivate & compare), but different in that we don't allow (as much?) right context. - method: - select only multi-utterance turns; EOUs to find are the turn-internal ones. - use original data & variants w/ various WER. - what's a good way to evaluate this? follow-on effects of wrong decisions: an insert for example makes us restart the parser, and hence get other things wrong? |
|
|
|
030208cont das | page | Mon Mar 03 - kurzfristige Projekte: - bababa2, SIGdial Poster - TO DOs, unprioritisiert: a) Silbengrenzen, von Aussprachewörterbuch kommend; b) echtes Audio verwenden, Kielkorpus; c) ASR verwenden, Wörter, ngramme; d) bessere speech states, phrasengrenzen (f. BCs); e) besser TT-Strategien; f) simulation, constant time < (or >) real-time; g) bessere Evaluation; h) interruption management; i) BC management; j) Parametrisierung (chattiness, interruption propability, etc.); k) adaptivity - mögliche Ansätze f. Paper: - in Richtung David T., `believable, non-scripted content-free background chatter' Nicht sehr überzeugend; um online erzeugt zu werden, doch ein wenig resourcenhungrig. Nur für Hintergrundgerede würde das wohl niemand ernsthaft einsetzen. - `simple rules create realistic turn-taking patterns' SSJ rules as *generative* rules, not just descriptive. Shows that such a set of rules, together w/ some audio magic, are enough to produce patterns that are `natural' (in a way that needs to be defined properly). Again sort of upper-bound; to get something like this working properly within a real system, here's what we would need in terms of components. - to do first: b), d), e), g). - needed: more principled metric for `naturalness' of resulting corpus. Multi-dimensional: distribution of gaps & overlaps, balance btw speakers, turn length (in time, but also # of utterances). - `syntactic and prosodic language modelling for incremental utterance segmentation', für Coling utterance end pointing, but in an incremental set up. Needed to know where to clear the chart of the parser. Connected to a well-researched task (i.e., easy to motivate & compare), but different in that we don't allow (as much?) right context. - method: - select only multi-utterance turns; EOUs to find are the turn-internal ones. - use original data & variants w/ various WER. - what's a good way to evaluate this? follow-on effects of wrong decisions: an insert for example makes us restart the parser, and hence get other things wrong? |
|
|
|
030208cont das | page | Mon Mar 03 - kurzfristige Projekte: - bababa2, SIGdial Poster - TO DOs, unprioritisiert: a) Silbengrenzen, von Aussprachewörterbuch kommend; b) echtes Audio verwenden, Kielkorpus; c) ASR verwenden, Wörter, ngramme; d) bessere speech states, phrasengrenzen (f. BCs); e) besser TT-Strategien; f) simulation, constant time < (or >) real-time; g) bessere Evaluation; h) interruption management; i) BC management; j) Parametrisierung (chattiness, interruption propability, etc.); k) adaptivity - mögliche Ansätze f. Paper: - in Richtung David T., `believable, non-scripted content-free background chatter' Nicht sehr überzeugend; um online erzeugt zu werden, doch ein wenig resourcenhungrig. Nur für Hintergrundgerede würde das wohl niemand ernsthaft einsetzen. - `simple rules create realistic turn-taking patterns' SSJ rules as *generative* rules, not just descriptive. Shows that such a set of rules, together w/ some audio magic, are enough to produce patterns that are `natural' (in a way that needs to be defined properly). Again sort of upper-bound; to get something like this working properly within a real system, here's what we would need in terms of components. - to do first: b), d), e), g). - needed: more principled metric for `naturalness' of resulting corpus. Multi-dimensional: distribution of gaps & overlaps, balance btw speakers, turn length (in time, but also # of utterances). - `syntactic and prosodic language modelling for incremental utterance segmentation', für Coling utterance end pointing, but in an incremental set up. Needed to know where to clear the chart of the parser. Connected to a well-researched task (i.e., easy to motivate & compare), but different in that we don't allow (as much?) right context. - method: - select only multi-utterance turns; EOUs to find are the turn-internal ones. - use original data & variants w/ various WER - what's a good way to evaluate this? follow-on effects of wrong decisions: an insert for example makes us restart the parser, and hence get other things wrong? |
|
|
|
030208cont das | page | Mon Mar 03 - kurzfristige Projekte: - bababa2, SIGdial Poster - TO DOs, unprioritisiert: a) Silbengrenzen, von Aussprachewörterbuch kommend; b) echtes Audio verwenden, Kielkorpus; c) ASR verwenden, Wörter, ngramme; d) bessere speech states, phrasengrenzen (f. BCs); e) besser TT-Strategien; f) simulation, constant time < (or >) real-time; g) bessere Evaluation; h) interruption management; i) BC management; j) Parametrisierung (chattiness, interruption propability, etc.); k) adaptivity - mögliche Ansätze f. Paper: - in Richtung David T., `believable, non-scripted content-free background chatter' Nicht sehr überzeugend; um online erzeugt zu werden, doch ein wenig resourcenhungrig. Nur für Hintergrundgerede würde das wohl niemand ernsthaft einsetzen. - `simple rules create realistic turn-taking patterns' SSJ rules as *generative* rules, not just descriptive. Shows that such a set of rules, together w/ some audio magic, are enough to produce patterns that are `natural' (in a way that needs to be defined properly). Again sort of upper-bound; to get something like this working properly within a real system, here's what we would need in terms of components. - to do first: b), d), e), g). - needed: more principled metric for `naturalness' of resulting corpus. Multi-dimensional: distribution of gaps & overlaps, balance btw speakers, turn length (in time, but also # of utterances). - `syntactic and prosodic language modelling for incremental utterance segmentation', für Coling utterance end pointing, but in an incremental set up. Needed to know where to clear the chart of the parser. Connected to a well-researched task (i.e., easy to motivate & compare), but different in that we don't allow (as much?) right context. - method: - select only multi-utterance turns; EOUs to find are the turn-internal ones. - use original data & variants - what's a good way to evaluate this? follow-on effects of wrong decisions: an insert for example makes us restart the parser, and hence get other things wrong? |
| << Back | Page 2 of 8 | Forward >> |