Log on: Remember me
Powered by Elgg

Timo Baumann :: Friends blog

November 04, 2010

how about the following: you feed the first verse of your favourite poem/song lyrics into an ASR and have a TTS read back what the ASR understood. It is very likely to rhyme! If you also read back a nice chorus, people will love it. Now, you just need your TTS to do decent singing and you're set:

U: "Ick heff mol in Hamburg een Veermaster sehn"
S: "gegen $l aber hin <sil> der hach<sil> also eben"
S (canned): "to my hooday, to my hooday"
U: "De masten so scheep as den Schipper sien Been"
S: "hm mach <sil> nach $t es in der <sil> warm $w"
S (canned): "to my hooday, hooday ho."

By the way, this is how singing works in a soccer stadium and the principle behind Chinese whispers.

Keywords: ASR, Echo DM, Gedichtgenerator, poetry generation, silly idea, singing, TTS

Posted by Timo Baumann | 0 comment(s)

April 17, 2010

I've finally written the one script that was missing from the interwebs and that I have longed to have for so long:

#!/usr/bin/perl
# Copyright (C) 2010 Timo Baumann
# This program is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation; either version 2 of the License,
# or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
# See the GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, see <http://www.gnu.org/licenses/>.

use strict;
use warnings;
use Audio::Wav;
use Audio::Wav::Read;

#usage: audio-duration.pl path-or-file1 path-or-file2 ...

my @files;
for my $arg (@ARGV) {
    my $findresult = `find $arg`;
    push @files, grep /.wav$/, split " ", $findresult;
}
#print join " ", @files;
my $duration = 0.0;
my $wav = new Audio::Wav;
for my $file (@files) {
    my $read = $wav->read($file);
    $duration += $read->length_seconds();
}
# convert to something readable
my $readableDuration = "";
if ($duration > 600) {
    my $seconds = int($duration + .5);
    my $minutes = int($duration / 60);
    $seconds -= $minutes * 60;
    my $hours = int($minutes / 60);
    $minutes -= $hours * 60;
    $readableDuration = "(" . ($hours > 0 ? "$hours:" : "") . "$minutes'$seconds\") ";
}
print "$duration seconds ", $readableDuration, "in ", ($#files + 1), " wave files.\n";

Running this in any directory wil yield the duration of audio (only .wav files) of all the files in this directory. If you supply arguments, it will look into the given directories (or files) and tell you the summed duration.

A must-have for any corpus-linguist dealing with loads of audio files!

Keywords: audio, perl

Posted by Timo Baumann | 1 comment(s)

October 11, 2009

I've been back from Stockholm for a while now, gone to Interspeech, SIGdial and YRRSDS and am now back at work in Golm. I'm now again (professionally) centered around two things: Getting our next prototype of an incremental SDS up and running in the next few months and continue to work on my PhD thesis, which I hope to finish some time next year.

Keywords: update

Posted by Timo Baumann | 0 comment(s)

July 01, 2009

I finally got around to package the pitch tracker and some of our incremental result filtering (which was the reason for me travelling to Boulder, USA) as add-ons for Sphinx. Find them on my website.

Keywords: ASR, incremental, pitch, Sphinx

Posted by Timo Baumann | 0 comment(s)

May 20, 2009

I have safely arrived in Stockholm, where I will be visiting researcher at KTH's speech lab for the summer. Also, I will be travelling to NAACL-HLT 2009 in Boulder to present our paper and a smaller workshop in Bielefeld. Also, I will participate in Dialholmia as a student volunteer. Many chances to meet and greet!

Keywords: travel

Posted by Timo Baumann | 0 comment(s)

January 30, 2009

I still need Windows for one piece of software that I use occasionally. So, since I've moved to Ubuntu, I've been using VMware for this (as it was the only solution at that time).

My VMware stopped working under Hardy. Luckily, I never needed my Windows-App for half a year. In Intrepid, I was able to just install vmware from their webpage and it restored my windows session from a year ago. Probably a record-breaking uptime for windows...

So, yesterday I played around with bootchart and found out, that the VMware-services took 4 precious seconds of my (and my battery's) lifetime on every boot. Not really worth it, as I'm unlikely to use my Windows-App anytime soon. So, here's what I did:

remove the links in /etc/rc*.d/*vmware

as the first command in /usr/bin/vmplayer add:

gksu -D "Need root priviledges to start vmware services." /etc/init.d/vmware restart

Works like a charm.

Keywords: bootchart, howto, ubuntu, vmware

Posted by Timo Baumann | 0 comment(s)

January 20, 2009

The title says it all: I am looking for a generic implementation that tells me the edit distance of two lists. The implementations on CPAN all seem to work on string-data. Which is OK for finding typos but makes WER calculation tedious.

So, I want a generic implementation that takes a comparator-function (as in sort {$a <=> $b} @list) and two lists and outputs the edit distance. Nice to have would be distance-weights and really nifty if the value of the comparator function (not only !=0 but how much lower or higher) was taken into account.

Luckily I don't need it now, so I don't have to write it. But it would be a great finger exercise for a Perl-in-NLP class.

EDIT: The obvious module Text::Levenshtein on CPAN actually *miscalculates* Levenshtein-distance for some input. Luckily I wondered what the 3 bugs in the module were about before I just happily used that code... So I ended up slightly modifying an implementation by Eli Bendersky, which already uses lists internally. So I left out the part about the comparator interface for now and just calculate standard WER, which is all I need right now.

Keywords: fixme, helpme, perl

Posted by Timo Baumann | 0 comment(s)

December 18, 2008

If praat (on ubuntu) doesn't want to play any audio, it tells you to consult some Sound-HOWTO (which at least doesn't exist on ubuntu). Unfortunately, http://ubuntuforums.org/archive/index.php/t-64383.html is of little help (and doesn't allow posting anymore as its active phase has expired).

The solution for us was easy: We use an external USB sound card and have deactivated the mainboard sound. For some reason, the sound device is now called "/dev/dsp1" (and "/dev/audio1") and there is no "/dev/dsp" (nor "/dev/audio"). Adding symlinks from "/dev/dsp" to "/dev/dsp1" (and from "/dev/audio" to "/dev/audio1") fixed the problem. Hope this helps.

Keywords: howto, praat, sound, ubuntu

Posted by Timo Baumann | 0 comment(s)

May 15, 2008

Klotz hin Gnubbel

This is what I get with the current acoustic model and a LM that was even trained including the correct sentence (und füge es ein in den Bauch des Elephanten).

Even using only just the correct sentence as a grammar returns und füge es,  instead of the complete sentence. The alignment shows, that es is supposedly spans the complete ein in den Bauch des.

I read that the current models are severely overtrained on one speaker, so I tried one of his utterances (de43-01, die Anwendung wird entwickelt) which is correctly understood if I use it as a grammar (effectively resulting in forced alignment) and which results in the beautiful phantasie wird entwickelt if I include this one sentence in the statistical LM as above. 

Thus, the bad results are probably due to the bad acoustic model. I've already uploaded the PentoNamingCorpus to Voxforge, thus hopefully, acoustic models will improve eventually. But if bad comes to worse, we'll have to train based on KCoRS and Verbmobil...

Keywords: ASR, Sphinx

Posted by Timo Baumann | 0 comment(s)

April 25, 2008

If you just upgraded your Ubuntu to 8.04 and use sox, then you may get the error "sox soxio: Failed reading `some.file': unknown file type `auto'"

In Hardy, all audio formats for sox have been refactored in separate packages libsox-fmt-XYZ. So, either install just the base formats from libsox-fmt-base or get all possible formats with libsox-fmt-all.

Keywords: howto, sox, ubuntu

Posted by Timo Baumann | 0 comment(s)

April 16, 2008

I am currently investigating the ton of classes that implement the Sphinx interface "SearchSpace", or one of the three sub-interfaces. There are 19 in total and I am likely to have to add another one for the feature that I have in mind.

Anyway, I decided that I need something, preferably an Eclipse-plugin to visualize class dependencies, and there are actually a few options:

  • X-Ray would probably do the job, but it doesn't work. Maybe I just don't know how to install it correctly.
  • Byecycle have a great screencast on their page and more friendly installation instructions. It shows a dependency graph between classes and automatically and incrementally optimizes the graph layout. Infinitely. Using 20% of your processor(s). It's quite slow and it seems to be limited to only show dependencies within the package, while the dependencies I'm interested in often cross dependency boundaries (classes from different packages implementing an interface).
Also, I found Fat Jar, an Eclipse plugin that turns your whole project into a single jar. That's something my collegue asked my about the other day.

Keywords: eclipse, java, sphinx

Posted by Timo Baumann | 0 comment(s)

April 12, 2008

This has been programmed before, but here it is for you to see (and use):

IPATextField, a simple descendant of JTextField that will only accept phonetic input (either SAMPA or IPA if you know your uni codes by heart) and show IPA symbols.

You can try it out directly, as a main routine is included. It's even useful as your tiny copy-and-paste-IPA-editor.

/** Copyright (C) 2008 Timo Baumann
 * This program is free software; you can redistribute it and/or modify it
 * under the terms of the GNU General Public License as published by the
 * Free Software Foundation; either version 2 of the License,
 * or (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 * See the GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, see <http://www.gnu.org/licenses/>.
 **/

import java.awt.event.ActionEvent;
import java.awt.event.WindowAdapter;
import java.awt.event.WindowEvent;
import java.text.CharacterIterator;
import java.text.StringCharacterIterator;

import javax.swing.AbstractAction;
import javax.swing.JFrame;
import javax.swing.JTextField;
import javax.swing.text.AttributeSet;
import javax.swing.text.BadLocationException;
import javax.swing.text.Document;
import javax.swing.text.PlainDocument;

public class IPATextField extends JTextField {

    public IPATextField(int cols) {
        super(cols);
    }

    protected Document createDefaultModel() {
        return new IPADocument();
    }

    static class IPADocument extends PlainDocument {

        String search  = "Q?ɡTDSZCχʁNR=" + "IEA{OUY29@6:_";
        char[] replace = "ʔʔgθðʃʒçxrŋʀu0329ɪɛɑæɔʊʏøœəɐːu032F".toCharArray();
        String pass = search + replace + "pbtdkgfvszwjxhmnlrieaouy .";
       
        public void insertString(int offs, String str, AttributeSet a)
           throws BadLocationException
        {
           if (str == null) {
                return;
           }
           StringBuffer sb = new StringBuffer();
           CharacterIterator ci = new StringCharacterIterator(str);
           while (ci.getIndex() < ci.getEndIndex()) {
               char[] c = new char[1];
               c[0] = ci.current();
               String s = new String(c);
               if (pass.contains(s)) {
                   int replaceIndex = search.indexOf(s);
                   if (replaceIndex == -1) {
                       sb.append(s);
                   } else {
                       sb.append(replace[replaceIndex]);
                   }
               }
               ci.next();
           }
           super.insertString(offs, sb.toString(), a);
        }
    }
   
    /**
     * pretty much stolen from the Swing-Tutorial...
     * @param args does not take any arguments
     */
    public static void main(String[] args) {

        //Create the top-level container and add contents to it.
        JFrame frame = new JFrame("VoxforgeDE Lexicon Tool");
        final IPATextField tf = new IPATextField(10);
        tf.addActionListener(new AbstractAction() {
            public void actionPerformed(ActionEvent e) {
                System.out.println(tf.getText());
            }
        });
        frame.add(tf);

        //Finish setting up the frame, and show it.
        frame.addWindowListener(new WindowAdapter() {
            public void windowClosing(WindowEvent e) {
                System.exit(0);
            }
        });
        frame.pack();
        frame.setVisible(true);
    }
}
 

Keywords: howto, java

Posted by Timo Baumann | 0 comment(s)

February 11, 2008

Continuing from the last post, assume you want your OAA-agent to react on certain data changes. You setup a trigger with something like this:

oaaAddTrigger(data, otherSpeechEnd(_), oaaSolve(startTalking(), [reply(none)], [on(add), recurrence(whenever)])

Right? No! Well, yes but that's not enough. You have to make sure, that the data (otherSpeechEnd(X)) is already known to the facilitator.

So, in order for the trigger to work, you need two lines:

oaaAddData(otherSpeechEnd(_), []) 
aaAddTrigger(data, otherSpeechEnd(_), oaaSolve(startTalking(), [reply(none)], [on(add), recurrence(whenever)])

Very nasty behaviour, because the bug only occurs when you've restarted the facilitator and the data type is still unknown.

Keywords: howto, oaa

Posted by Timo Baumann | 0 comment(s)

January 22, 2008

Now, assume you have programmed your great custom RTP payload codec (for whatever reason) following this example (http://java.sun.com/products/java-media/jmf/2.1.1/solutions) in the tutorial.

And it doesn't work. How would you fix it? Here's the answer:

rm ~/.jmf-resource

Everything should be back to normal now. 

Keywords: howto, jmf

Posted by Timo Baumann | 0 comment(s)

November 13, 2007

There is just one problem with the SRI language modeling toolkit: It doesn't come with a configure-scipt and the makefiles don't work out of the box. After an hour of searching through make output, we found out, that ubuntu does not use gawk but mawk as its standard awk implementation. Later on in the build process, this leads to weird errors.

I'll attach the changed common/Makefile.machine.i686 , so you (and I) don't have to redo the work later.

SRILM-Makefile for Ubuntu [document/unknown]

Keywords: howto, srilm, ubuntu

Posted by Timo Baumann | 0 comment(s)

September 11, 2007

Voilà ein klitzekleines Beispiel zum Gebrauch der Weka-API: WekaTest [text/plain]

Keywords: example, weka

Posted by Timo Baumann | 0 comment(s)

August 23, 2007

...

Keywords: general, trivial

Posted by David Schlangen | 0 comment(s)

July 09, 2007

10000 messages à 1 IclINT: 217616 ms
10000 messages à 1 IclList with 320 IclINTs: 291248 ms
10000 messages à 320 bytes as IclDataQ: 781685 ms

Either I have to fix IclDataQ (unpacking the DataQ seems to be horribly slow) or we could omit using OAA for audio transmission and use a TCP stream directly.

Keywords: OAA, performance

Posted by Timo Baumann | 0 comment(s)

Schonmal interessiert gewesen, warum und was bei der Ausführung so lange dauert?

Klar, beim eigenen Programm sollte ich es wissen, aber wenn ich jetzt Toolkit XYZ benutze, welche benutzte Operation ist dann besonders teuer?

Antwort darauf gibt ein Profiler, der das Laufzeitverhalten des Programms analysiert. Für Java macht das JRat ( http://jrat.sourceforge.net/quickstart.html#9.%20Examine%20the%20JR ).

Sehr interessant, aber meine Frage, warum IclDataQ-Pakete dreimal so langsam zu entpacken sind als IclList-Pakete mit den entsprechenden Bytes drin, hat es auch nicht beantwortet. Blödes OAA...

Keywords: Java, JRat, OAA, performance

Posted by Timo Baumann | 0 comment(s)

June 17, 2007

test....

Posted by David Schlangen | 0 comment(s)

more tests...

Posted by David Schlangen | 0 comment(s)

June 12, 2007

After some struggling with CLASSPATHs and awkward type casting exceptions[1], I have successfully built my first OAA-agent! 

It's a data processor that lives in the sphinx-frontend and hands out samples (or other data, depending on where it is positioned in the frontend) to any other agent that asks[2].

Next step is to split the frontend-pipeline in two parts and connect them via OAA. On we go!

 

[1]: OAA-documentation does not say, that ICLList(java.utils.List args) only likes lists that contain ICLTerms...

[2]: Currently this means using oaa_shell to ask for a solution of the goal "audioData(X)" 

Posted by Timo Baumann | 0 comment(s)

June 07, 2007

Spent a couple of days at DECALOG, this year's SemDial (International Workshop on the Semantics and Pragmatics of Dialogue). Raquel presented some of the results from our DEAWU project. All in all, an enjoyable conference, as in the previous years. Only the weather wasn't quite as expected, given that it was Italy...

Keywords: conferences, public

Posted by David Schlangen | 1 comment(s)

May 09, 2007

Gestern stand ich noch wie der Ochs vorm Code-Berg: Der Sphinx-Demo WavFile konnte ich zwar problemlos einen ResultListener hinzufügen, aber der hat einfach keine Zwischenergebnisse ausgespuckt. 

Heute dann der allererste Erfolg: Die Konfiguration der Demo lässt den Decoder die gesamte Audiodatei auf einmal konsumieren, erzeugt also keine Zwischenergebnisse. Lösung: In config.xml die Variable featureBlockSize auf    die gewünschte Anzahl auf einmal zu dekodierender Frames stellen:

    <component name="digitsDecoder" type="edu.cmu.sphinx.decoder.Decoder">
        <property name="searchManager" value="searchManager"/>
        <property name="featureBlockSize" value="1    "/>
    </component>
 

Schon gibt es beliebig viele Zwischenergebnisse.

Ich möchte ja als erstes untersuchen, ab einer wie großen Latenz die Zwischenergebnisse der Erkennung verwertbar sind. Meine aktuelle Planung dazu steht im Wiki auf LatenzCheck.

 

Einen guten Überblick über Sphinx bietet übrigens: http://research.sun.com/techrep/2004/smli_tr-2004-139.pdf

Keywords: Literatur, Sphinx

Posted by Timo Baumann | 3 comment(s)

May 07, 2007

The project has now indeed officially started. Timo Baumann has joined the team as Research Assistant, PhD student, and resident prosody processing Meister.

Keywords: frontp, general, inpro, public

Posted by David Schlangen | 0 comment(s)

<< Back