Wednesday, June 30, 2010

Digital Signal Processor and Text-to-Speech

This is the second post in a series on Text-to-Speech for eLearning written by Dr. Joel Harband and edited by me (which turns out to be a great way to learn).  The first post, Text-to-Speech Overview and NLP Quality, introduced the text to speech voice and discussed issues of quality related to its first component – the natural language processor (NLP). In this post we’ll look at the second component of a text to speech voice: the digital signal processor (DSP) and its measures of quality.

Digital Signal Processor (DSP)

The digital signal processor translates the phonetic language specification of the text produced by the NLP into spoken speech. The main challenge of the DSP is to produce a voice that is both intelligible and natural.  Two methods are used:

  • Formant Synthesis.  Formant Synthesis seeks to model the human voice by computer-generated sounds, using an acoustic model. Typically, this method produces intelligible, but not very natural, speech. These are the robotic voices, like MS Mike, that people often associate with text to speech. Although not acceptable for eLearning, these voices have the advantages of being small and fast programs and so they find application in embedded systems and in applications where naturalness is not required as in toys and in assistive technology.
  • Concatenative Synthesis. To achieve the remarkable naturalness of Paul and Heather, concatenative synthesis is used. A recording of a real human voice is broken down into acoustic units: phonemes, syllables, words, phrases and sentences and stored in a database. The processor retrieves acoustic units from the database in real time and connects (concatenates) them together to best match the input text.

Concatenative Synthesis and Quality

When you think about how concatenative synthesis works – joining together a lot of smaller sounds to form the voice, it suggests where there can be glitches.  Glitches will occur either because there’s not a recorded version of exactly what the sound should be or will occur where the segments are joined when it doesn’t come together quite right. The main strategy is to try to choose database segments that are as long as possible– phrases and even sentences – to minimize the number of connection glitches.

Here is an example of a glitch in Paul when joining the two words “bright” and “eyes”. (It wasn’t easy to find a glitch in Paul – finally found one in a Shakespeare sonnet!)

  • Mike - bright eyes
  • Heather - bright eyes
  • Paul - bright eyes

The output from the best concatenative systems is often indistinguishable from real human voices. Maximum naturalness typically requires speech databases to be very large so the larger the database the higher the quality. Typical TTS voice databases that will be acceptable in eLearning, will be on the order of 100-200 Mb. For lower fidelity applications like telephony, the acoustic unit files can be made smaller by using a lower sampling rate without sacrificing intelligibility and naturalness, making a smaller database (smaller footprint).

By the way, the database is only used to generate the sounds which are then stored as .wav, .mp3, etc.  It is not brought along with the eLearning piece itself.  So a large database is generally a good thing.

Here is a list of the TTS voices offered by NeoSpeech, Acapela and Nuance with their file sizes and sampling rates.

Voice

Vendor

Sampling rate (kHz)

File Size (Mb)

Applications

Paul

NeoSpeech

8

270  (Max DB)

Telephone

Paul

NeoSpeech

16

64

Multi-media

Paul

NeoSpeech

16

490  (Max DB)

Multi-media

Kate

NeoSpeech

8

340  (Max DB)

Telephone

Kate

NeoSpeech

16

64

Multi-media

Kate

NeoSpeech

16

610  (Max DB)

Multi-media

Heather

Acapela

22

110

Multi-media

Ryan

Acapela

22

132

Multi-media

Samantha

Nuance

22

48

Multi-media

Jill

Nuance

22

39

Multi-media

The file size is a combination of the sampling rate and the database size, where the database size is related to the number of acoustics units stored. For example, voices 2 and 3 have the same sampling rate, 16, but voice 3 has a much bigger file size because of the larger database size. In general, the higher sampling rates are used for multimedia applications and the lower sampling rates for telecommunications.  Often larger sizes also indicate a higher price point.

The DSP voice quality is then a combination of the two factors: the sampling rate, which determines the voice fidelity and the database size which determines the quality of concatenation and frequency of glitches – the more acoustic units stored in the database, the better the chances of achieving a perfect concatenation without glitches.

And don’t forget to factor in Text-to-Speech NLP Quality.  Together with DSP quality you get the overall quality of different Text-to-Speech solutions.

Saturday, June 26, 2010

MIndGenius Version 3.6 now with File convertor and Gantt improvements

FYI

On Monday June 28th, MindGenius will release Version 3.6 which will add some new features to the Gantt View as well as the ability to open MindManager (Version 7 and above) files and OPML formatted files into MindGenius. So if you are a MindGenius 3 users I would suggest that you go over to the MindGenius website and download the free upgrade.

Version 3.6 now with file convertor and Gantt improvements


We are pleased to announce we are continuing the major development program we started with the release of Version 3 last year. Version 3.6, one of our regular maintenance releases, will be available from Monday 28th June and is free to all MindGenius Version 3 customers.

As well as routine maintenance work, MindGenius Version 3.6 includes functional updates to the Gantt View and the addition of a file convertor which allows users to import data from MindManager and OPML formats.

The Gantt view has been extremely well
received since its release in March. User feedback on the Gantt View regarding the timeline being fixed units has been addressed with the addition of the ability to set the timeline scale to an appropriate unit of time such as Quarters, Months, Weeks, Days and Hours, as well as a zoom facility to scale the Gantt view to match your overall project duration.

In response to the number of organisations participating in a wholesale deployment of MindGenius to all desktops, we have developed the ability to be able to import maps from Mindjet’s MindManager. Any legacy maps held in the MindManager format can be seamlessly brought across into MindGenius and enable the entire organisation to move forward with MindGenius as the single map format, thus making the sharing of maps considerably easier.


OPML (Outline Processor Markup Language) is an XML format for defining hierarchical data and is suitable for many different types of data lists and is used mainly to exchange data lists on the web. It is also used as a file format for certain iPhone/iPad applications such as MindNode & iThoughts.

Simply select File \ Open to import any of these file types into MindGenius.

This Import functionality has the ability to be extended to import from other sources and suggestions from our customers are very welcome.

Derek Jack, Company Director for MindGenius said:

“Fundamentally, MindGenius is a client driven business. Each release reflects the priorities placed upon us by our client feedback. We appear to have struck a chord with our recent addition of the GANTT functionality within the map. Many customers are enjoying the seamless integration between unstructured data gathering and planning, and then distilling this data into a fully functioning GANTT view. This latest 3.6 release is a reflection of the scale of adoption we have witnessed and the consistent feedback around key improvements we can make to this specific feature. We trust you will enjoy the new capabilities and openly encourage your continued feedback.”

Wednesday, June 23, 2010

Learning Flash

My posts around the Beginning of Long Slow Death of Flash and my post from a CTO perspective that I Cannot Bet on Flash for new development stirred up quite a bit of response.  A lot of it said quite correctly that HTML5 is not there yet.  And that Flash provides things that you can’t do in HTML/JavaScript.  However, there are some pretty amazing things you can do without Flash.

The bottom line is that none of the feedback I’ve received has convinced me that choosing Flash as a delivery option for a new product or project would be a good idea today, especially if I want it to play on mobile and live for 5 years.

But then I received a great question via a comment:

I am a Masters student enrolled in an Instructional Design course with Walden University. I am somewhat new to the field and this article intrigues me. Should I hold off on learning Flash... and focus more on learning HTML5? Or would it be best to learn both? I know a very little about Flash and made it a goal to learn more, but now I wonder. You input is greatly appreciated.

What a great question and kudos to this student for being so on top of things to ask it!

And it was somewhat the inspiration for this month’s Big Question - Tools to Learn.  If you’ve not done so already, you should go read each of the posts there.  They have different perspectives and taken together they provide a pretty good roadmap of how to think about what tools you should learn.

Jeff Goldman in Development Tools I Would Learn If I Were You - Jeff's response to June’s Big Question tells us:

Flash: Yes, Flash is still very much alive and well in e-learning and because it is so embedded in our industry and there is nothing at this time that can provide the rich interactive elements that it provides, I do not see it being “dead” in our field anytime soon. The fact is HTML5 is not there yet and if it ever does get there it will probably be more than 5 years before it is at the level of quality and ease of development that Flash currently provides. However, see my comments under HTML/HTML5.

To me the question is more about where you choose to spend your time.  The list of tools that Harold and Holly provide are pretty lengthy.  And Jeff suggests both Flash and HTML 5.  If you have so much time that you can afford to learn all of these tools, then go ahead.

However, if you have to prioritize Flash vs. HTML 5 vs. ??? … then I would put learning Flash (especially scripting in Flash) way down on priority list at this point.  Remember End of an Era – Authorware – another Macromedia/Adobe product.  These things do eventually die out.  How valuable are your Authorware scripting skills at this point?

Learning Flash today is like learning Authorware in 1997.

So, yes, hold off on learning Flash and focus more on learning HTML 5.

Tuesday, June 22, 2010

Online Exam Preparation and Tutoring – Hot Market

Inc. Magazine published an article The Best Industries for Starting a Business In 2010.  Not sure what to make of most of the article, but they did include Exam Preparation and Tutoring as one of the top ten.

Parents always want their kids to do better on tests. A large number of adults returning to school are also looking for an edge. Given the low barrier to entry, this field is competitive. But if you carve out the right niche, it could be lucrative.

The industry, which includes tutoring in such fields such as special education, language, and music, grew about 7 percent last year.

And it seems like there are lots of eLearning Startups that are taking aim at different aspects of the Business of Learning.  My 12 eLearning Predictions for 2009 included

Increase in Consumer/Education Social Learning Solutions

2008 was an interesting year that saw a myriad of new start-ups offering content through interesting new avenues. Social learning solutions like social homework help provided by Cramster; CampusBug, Grockit, TutorVista, EduFire, English Cafe, and the list goes on and on.

And it seems like Inc. is maybe just a little bit late as there are a bunch of startups going after online exam preparation and online tutoring.  Some eLearning startups rouhgly in this space:

  • Knewton focuses on test preparation online using test experts to help students study.
  • TutorJam offers online tutoring programs for students in K-12, AP classes, and college.
  • Brightstorm focuses on helping students prepare for AP tests, as well as standardized tests.
  • Sums Online provides a wide range of math activities to help at home learners.
  • DreamBox Learning is an education start-up that provides math games for kids. This was recently acquired by Netflix founder Reed Hastings.
  • ProProfs – SAT and certification quizzes.
  • PrepMe – personalized prep for SAT, ACT, PSAT.
  • Tutor.com – online tutoring.

And there are a bunch more out there.  As Inc. tells us – low barrier to entry.  So we should expect lots more.

Saturday, June 19, 2010

ISTE 2010- PaperShow for Teachers

In about a week's time I will be heading to Mile High City to participate in what I consider to be one of the most exciting educational technology conferences in the United States. I have been fortunate to be able to attend the ISTE conferences over the past couple years and am looking forward to attending ISTE 2010 which is being held in Denver this year. I will be helping out at the PaperShow Booth as they launch PaperShow for Teachers, a new and exciting presentation tool for the classroom. I have been using PaperShow for the last year or so in my classroom and it is exciting to see the product evolve. PaperShow has allowed me the flexibility to go into any classroom and make my lessons more interactive and allows me to capture and share classroom notes almost instantaneously as a PowerPoint or PDF file.

PaperShow for Teachers
, will add a whole new array of tools and set of interactive papers that I know teachers will find a welcome addition to their educational technology classroom toolkit. PaperShow for Teachers will add a wide array of new color palettes that can be used as well as math tools for calculations and the measurement of lines and angles. Teachers will be able to use the Reveal tool, to hide part of the screen which can come in handy for displaying questions and their solutions. So whether you want to use PaperShow for Teachers as a digital flip chart or to annotate your PowerPoint or Keynote presentations you will find PaperShow to be an affordable and easy to use tool to jazz up classroom lessons. If you plan on attending ISTE 2010, stop by and say hello, I will be in the PaperShow Booth # 792. So plan on stopping by Booth #792 and get a chance to see for yourself what all the excitement is about as we launch PaperShow for Teachers.


Thursday, June 17, 2010

eLearning Learning Sponsored by Rapid Intake

As you probably know, eLearning Learning has been steadily growing and is now one of the top eLearning sites on the web.  I wanted to let you know about an exciting development for eLearning Learning that’s being announced this morning in the eLearning DevCon Keynote.

Garin Hess and the team from Rapid Intake has stepped in to help me keep the site going both from an effort and financial standpoint. 

I'm very happy to have Garin involved because I've known him for years and he's always done a good job of helping to build the larger eLearning community through conferences that you probably already know about:

Garin was really excited to support this broad community of bloggers.  We both believe that while this is a loose network, it provides an important and really valuable voice.  It's somewhat the whole reason I started eLearning Learning - many people in the world of eLearning miss the great stuff that is going on in blogs.  Of course, if you are reading this, that’s probably not you.  That said – I still believe that everyone should be Subscribed to Best of eLearning Learning.

Otherwise you’ve been missing things like:

And even though I subscribe to most of the blogs that are part of eLearning Learning, I still use the Best Of to make sure I’ve not been missing really good content.

By the way, if you want to know more about the site and/or see ways you could be involved, take a look at: Curator Editor Research Opportunities on eLearning Learning.

Garin - thanks for stepping up to help!

Wednesday, June 16, 2010

iThoughtsHD- Mind Mapping for the iPad

Two weeks ago I went out and purchased an Apple iPad to see what all the excitement was about. I have to say it is a very seductive device and one that is hard to put down. I am enamored by its potential as a tool for some of the students that I work with who have writing, reading and organizational difficulties. The iPad is very fast at doing almost anything you want it to do (except play Flash media) and the battery life is really incredible.

Over the last couple of weeks I have investigated and bought many productivity apps to get an idea of how the iPad could be used as complimentary device to all of the other computers in my household. As someone who loves to mind map I quickly came across iThoughtsHD which I had heard so much about. iThoughtsHD is a full featured mind mapping application that runs on the iPad. It did take me a little getting used to the fact that I didn't have a mouse and had to touch the screen to create new nodes but once I got the hang of it-it was easy to use. I quickly created my first mind map in iThoughtsHD and was able to move my ideas around the map and relink them. Adding icons and formatting the nodes was quite intuitive and fast. There are some nice features to allow you to align your ideas to keep the formatting looking clean and your nodes equidistant. I have also used iThoughtsHD with my Bluetooth keyboard which made it even faster for me to create a mind map. While the interface is rather spartan there is a lot of functionality under the hood and you will find getting around iThoughtsHD to be quick and efficient. Within your mind map you can easily attach links as well as insert task completion icons to keep track of tasks as they are completed.

One of my favorite features is being able to send my iThoughtsHD mind maps to the cloud or transfer them to my computer using the WiFi Transfer option. Using the Box.net service it is rather straight forward to save your mind maps to the cloud that you can then access from any computer connected to the internet. iThoughtsHD supports a wide range of exporting features which makes it ideal if you intend to open your mind map with another mind mapping desktop application. iThoughtsHD supports exporting to : MindManager, MindView, NovaMind, iMindmap, xMind, and Freemind. iThoughtsHD also supports the use Task Completion icon and Start and Due dates for tasks on your mind map.

All in all, iThoughtsHD is a formidable mind mapping application for the iPad and works well with your desktop mind mapping applications. If you purchased an iPad and are looking for a top of the line mind mapping app look no further and download iThoughtsHD.

Monday, June 14, 2010

Power Markers 2 Released-Powerful Project Management Tool

Last week I had a chance to chat with Nick Dufill the developer of Power Markers 2, a powerful project management add-in for MindManager. Nick has been involved with mind mapping for some time now and is well known in the mind mapping arena. Nickwas kind enough to be interviewed and what follows is our conversation.





Brian S. Friedlander: Can you tell us a little bit about your company and how you got started in using mind mapping?

Nick Dufill: I have been using mind mapping software since about 1996, and have
been working with Mindjet and MindManager since 1997. I began by providing product support in the UK, and worked on the MindManager X5
product itself, and have developed many dozens of content-based and software-based extensions for MindManager. I cofounded MindManuals.com and Gyronx, and was also the technical editor for
Wiley's "MindManager for Dummies". Currently I am helping MindManager customers with specialised applications of the software, with a focus on business use and information management. I think there is a lot of unexplored potential in the professional business uses of "mind mapping" software, much more than is being used today. Many maps have a short lifetime, and this limits their value, both actual and perceived - they can become the electronic equivalent of the tear-off flip chart pad. The move from drawing tool to information management tool is a significant way to get more value from the investment.

Brian S. Friedlander: Can you share with us how it came about that you developed Power Markers?

Nick Dufill:I was finding that although maps are very easy to create, it can be hard to find things again, especially if you are in a hurry. Most of my work is in MindManager maps, and I needed to be able to dive into a map, make an update or check something, and get out again just as quickly. If it is a project I am currently working on, this is easily done. But I found that maps that were written a few months ago required a lot more browsing before they "clicked" back into place. Working within the large-scale visualisation is a cumbersome way to keep an eye on a small number of critical points that can bite you. You only need to see the fin to know what to do - reviewing the whole shark every time is not necessary.

Originally, Power Markers was only going to be a roll-up of key markers to the centre of the map, to make it easy to drill down to areas that needed attention. The "Hot List" task pane was added as an afterthought, but it turned out to be far more useful than I expected, so version 2 has developed more around the idea of extracting to-do lists and status lists from maps. Power Markers was never designed on paper first and then implemented, but has grown organically in response to continuous use. This takes longer, but means that the problem it solves is a very practical one.


Brian S. Friedlander:How do you envision users will use Power Markers with their mind maps?

Nick Dufill: Power Markers is really designed to help users implement "dashboard" maps. A dashboard map is a snapshot of a running project or situation that you visit frequently and keep up to date. Of course, this is only one way to use mind mapping software, but given than maps are perfect for storing all kinds of information related to a project orarea of interest, many maps naturally turn into dashboard maps. Power Markers gives you a way to quickly navigate to the essentials points in a map, and reflects the status of the map at a glance - you don't need to surf the whole map to review what is happening, you only need to check what is in the "Hot Lists" that show the most important items.


Brian S. Friedlander: What inherent problems will Power Markers solve for users who create
project dashboards?

Nick Dufill:First and foremost is consistency in the way that information is visualised. Many features of mind mapping software include an element of "user interpretation" in the definition of meaning. Partly because of the richness of mind mapping software features, and partly because of changing habits, we often use many different ways to code meaning into maps. As an example, the humble "action item" could be coded in a dozen different ways, ranging from a highlight colour through to grouping actions together in one part of the map, and including no
mark-up at all - you just *know* it is an action item because you
wrote it, so given the context, what else could it be? True today and tomorrow, but in three months it will not be nearly as clear. Because
Power Markers uses MindManager's Map Markers, and only works if you use them consistently, this helps users to discover the value of consistent labelling of the content of their maps, so that Power Markers can generate useful and accurate lists. This is a big step towards higher value maps.

Once you have got a reliable set of lists that give you a snapshot of the status of the map, the principle benefit is saved time; you can check a map in a few seconds instead of a few minutes, and feel confident that it is an accurate check. Power Markers does not tell you anything that is not in the map if you were to explore it by
hand, but it does it much, much faster and more reliably.

Brian S. Friedlander: What new features did you add to Power Markers 2 that will make it
even easier and more powerful solution?

Nick Dufill:There were three practical problems that I wanted to solve in my dashboard maps; first, I wanted the status to be date-sensitive, so that I knew whether I needed to do stuff right now, or whether it could wait. I also wanted to be able to go straight to hyperlinks and
attachments in a map, as most of the time I embed links and useful documents within the context of a project, rather than keeping separate folders and lists elsewhere, e.g. in browser favourites. I also wanted to easily copy subsets of useful Power Marker configurations from one map to another, so that I could build a dashboard from useful parts of other maps. The first two issues were solved with the addition of "Automatic Markers", where Power Markers detects a particular condition on a topic, and then sets a map marker that can be handled just the same as a manually applied one. The third was solved with "Active Legends" - reversing the way that the
Map Marker legend works. Today, MindManager can create a legend tree in your map from the map markers that you have defined. But with Power Markers, you can design the legend in your map first, then import it back into the map marker configuration. This is a natural way to design your map, and means you can make the marker legends a valuable part of the map - not something that gets overwritten each time you change your mind about the markers that you use. It also means that by copying and pasting a legend tree (or a part of it) to another map, you can copy map marker configurations when building new dashboards.


Brian S. Friedlander: What are the biggest benefits that users will accrue when they use
Power Marker 2?

Nick Dufill: While Power Markers is not a complex idea, the long-term benefits are
a bit more subtle. Having a fast and focused navigation system into the heart of your map is cool, but the presence of this list has important implications for the map too. It means that you do not need to worry about trying to keep lists in your map, which can seriously compromise its design. For example, I have seen a "Getting Things Done" template map that organises information by building lists in the map. This can only work if everything in your map belongs on one
and only one list, otherwise you have to choose whether to duplicate things in multiple locations, or deliberately omit things from a list to which they properly belong, just to keep your map under control. This is the basic weakness of trees. But by moving list-making activities into a separate window, you remove all the associated constraints and compromises from the map itself, and can focus on the best way to organise your map for comprehension. This is where the
tree structure comes into its own by layering detail. So Power Markers has a lot of impact on the fundamental design of maps. One of the dashboard templates in version 2 is a GTD template where the lists are in the Hot List pane, not in the map itself, which means that the same item can appear on as many lists as are necessary, with only one instance in the map itself.

The second long-term benefit arises from thinking about how you can use a set of lists to profile a project, situation or knowledge resource. To design the lists (or map marker groups), you have to stop and think "What do I *need* to know, in order to take action?" You get better at questioning the value of a list - is it something that is just nice to see, or is it actionable? Who will use this information, and how will they use it? Designing the lists for a dashboard map is effectively designing the way that you measure status, which bridges the gap from what is sometimes seen as the less well defined activity of "mapping things" to the realities of
business processes. Power Markers can be used to model established processes through the design of the Hot Lists, reflecting an instance of the process. Mind Maps have always had bad press when it comes to visualising processes, because a tree is not a flow chart and never
will be. But by using the process as a way to *profile* a map instead of trying to draw it in the map, a lot of new possibilities are opened up. I am looking forward to working with people who use MindManager as a platform for implementing either formal or home-grown processes, to understand how Power Markers can be improved further to make it easy to reflect status in the language of the process.


Brian S. Friedlander:Where can MindManager users purchase Power Markers 2?

Nick Dufill: Power Markers is available on the Olympic web site at

www.olympic-limited.co.uk/mindmanager-add-ins/power-markers/

Power Markers is available in two editions - Standard and Pro. The Standard version is free, requires no license key, and will work for up to 15 lists per map. For more than 15 lists per map, the Pro version is required. There is also a free White Paper on designing dashboard maps with MindManager and Power Markers, which explains whydashboard maps are different to other kinds of map, and the steps in their design.

I would be glad to answer any questions either here or at
www.beyond-mind-mapping.com.

Tuesday, June 8, 2010

Text-to-Speech Overview and NLP Quality

This post is a new kind of thing for me. Dr. Joel Harband wrote most of this post and I worked with him on the focus, the content and a little bit of editing - actually I couldn't help myself and I edited this a lot. So this is really a combined effort at this point.

As you know, Text-to-Speech is something that's very interesting to me and Joel knows a lot about it as CEO of Tuval Software Industries maker of Speech-Over Professional. This software adds text-to-speech voice narration to PowerPoint presentations and is used for training and eLearning at major corporations.

Joel was nice enough to jump in and share his knowledge of applying text-to-speech technology to eLearning.

Please let me know if this kind of things makes sense and maybe I'll do more of it. It certainly makes sense given all that's going on in my personal life.

Text-to-Speech Poised for Rapid Growth in eLearning

Text-to-speech (TTS) is now at the point where virtual classrooms were about 4 years ago when they reached a technological maturity where they were mainstream. It took a couple more years for me to say (in 2009) that virtual classrooms reached a tipping point.

Text-to-speech has reached the point of technical maturity. As such, we are standing at the threshold of a technology shift in our industry: text-to-speech voices are set to replace professional voice talents for adding voice narration in e-learning presentations. Text-to-speech can create professional voice narration without any recording which provides significant advantages:

  • keeps narrated presentations continuously up to date (it's too time consuming/expensive to rerecord human narration)
  • faster development - streamlined workflow
  • lower costs.

It's being adopted today in major corporations, but it's still early in the adoption cycle. That said, at a developer’s conference in 2004, Bill Gates made the statement that that although speech technology was one of the most difficult areas, even partial advances can spawn successful applications. This is now the case for text-to-speech: it’s not yet perfect, but it is good enough for a whole class of applications, especially eLearning and training. The reason is that most people learn out of necessity and will accept a marginal reduction in naturalness as long as the speech is clear and intelligible.

There's a lot going on behind the scenes to make text-to-speech work in eLearning. Like most major innovations it needs to be accompanied by a slew of minor supporting innovations that make it practical, easy to use and effective: modulating the voice with speed, pitch and emphasis, adding silent delays, adding subtitles, pronouncing difficult words and coordinating voice with visuals.

Over the course of a few posts, we will attempt to bring readers up to speed on different aspects of this interesting and important subject. The focus of this post is around the quality of Text-to-Speech based on Natural Language processing.

Text-to-speech Basics

To understand how to think about text-to-speech voices and how they compare, it's important to have some background about what they are. Text-to-speech (TTS) is the automatic production of spoken speech from any text input.

The quality criteria for Text-to-Speech Voices are pretty simple. They are:

  • Naturalness
  • Intelligibility

Due to recent improvements in processing speed, speech recognition and synthesis, and the availability of large text and speech databases for modeling, text-to-speech systems now exist that meet both criteria to an amazing degree.

A TTS voice is a computer program that has two major parts:

  1. a natural language processor (NLP) which reads the input text and translates it into a phonetic language and
  2. a digital signal processor (DSP) that converts the phonetic language into spoken speech.

Each of these parts has a specific role and by understanding a bit more about what they do, you can better evaluate quality of the result.

Natural Language Processor (NLP) and Quality

The natural language processor is what knows the rules of English grammar and word formation (morphology). The natural language processor is able to determine the part of speech of each word in the text and thus to determine its pronunciation. More precisely, here's what the natural language processor does:

  1. Expands the abbreviations, etc to full text according to a dictionary.
  2. Determines all possible parts of speech for each word, according to its spelling (morphological analysis).
  3. Considers the words in context, which allows it to narrow down and determine the most probable part of speech of a word (contextual analysis).
  4. Translates the incoming text into a phonetic language, which specifies exactly how each word is to be pronounced (Letter-To-Sound (LTS) module).
  5. Assigns a “neutral” prosody based on division of the sentence into phrases.

This will make more sense by going through examples. And this also provides a roadmap to test quality.

We’ll compare the quality of three TTS voices:

  • Mike - a voice provided by Microsoft in Windows XP (old style).
  • Paul a voice by NeoSpeech - the voice used in Adobe Captivate.
  • Heather a voice by Acapela Group.

Actually, let me have them introduce themselves. Click on the link below to hear them:

  • I'm Mike, an old style robotic voice provided by Microsoft in Windows XP.
  • I'm Paul, a state of the art voice provided by NeoSpeech.
  • I'm Heather, a state of the art voice provided by Acapela-Group.

So, let's put these voices through their paces to see how they do. Actually, in this section, we are going to be testing the natural language processor and its ability to resolve ambiguities of parts of speech in the text.

1. Ambiguity in noun and verb

“Present” can be a noun or a verb, depending on the context. Let’s see how the voices do with the sentence:

“No time like the present to present this present to you.”

  • Mike
  • Paul
  • Heather

Paul and Heather resolve this ambiguity with ease.

Another example: “record” can be a noun or a verb:

“Record the record in record time.”
  • Mike
  • Paul
  • Heather

Again, Paul and Heather resolve this ambiguity with ease

2. Ambiguity in verb and adjective

The word “separate” can be a verb or an adjective.

“Separate the cards into separate piles”

  • Mike
  • Paul
  • Heather

Only Paul gets it right.

3. Word Emphasis (Prosody)

Another type of ambiguity is word emphasis in a sentence: The intended meaning of a spoken sentence often depends on the word that is emphasized, as: “He reads well”, “He reads well”, He reads well”. This is called prosody and is impossible to determine from plain text only. The voices try to achieve a “neutral” prosody that tries to cover all possible meanings. A better way is to use modulation tags to directly emphasize a word. We’ll discuss that in a later post.

4. Abbreviations

Most voices are equipped to translate common abbreviations.

The temperature was 30F, which is -1C.

It weighed 2 kg, which is about 4.5 lb.

Let's meet at 12:00

  • Mike
  • Paul
  • Heather

Heather does the best job.

5. Technical Words

Unless they are equipped with specialized dictionaries, TTS voices will occasionally fail to read technical words correctly. However they can be always be taught to say them correctly by using a phonetic language. Here are some examples. Each voice says the word twice: first by itself (incorrectly) and second after being taught (correctly).

Deoxyribonuclease (dee-ok-si-rahy-boh-noo-klee-ace)

  • Mike
  • Paul

Chymotrypsinogen (kahy-moh-trip-sin-uh-juhclip_image002

  • Mike
  • Paul

More Information

Monday, June 7, 2010

MindMeister Releases Mind Mapping Extention for Google Wave

While I haven't spent enough time using Google Wave to cast my opinion, several of the mind mapping companies are finding it fertile ground to release extensions that add a collaborative mind mapping application that runs inside of the threaded discussion. Today I learned that MindMeister released their Google Wave mind mapping extension that adds the ability to create a simple mind map within a threaded discussion. To the left is a screenshot of the MindMeister within a Google Wave discussion. Granted the features are limited but it is just the beginning for this platform.

Wednesday, June 2, 2010

CS Odessa Announces Update to ConceptDraw MindWave



CS Odessa Announces Update to ConceptDraw MindWave, a No-Cost Mind Mapping Gadget for Google Wave

San Jose, California, June 2, 2010 – CS Odessa has released an updated version of its popular mind mapping tool, ConceptDraw MindWave for Google Wave. This is a no charge Google Wave add-in, which enables users to rapidly collaborate with any amount of people, from 2 to 2000.

ConceptDraw MindWave provides teams the opportunity to interactively collaborate on a mind map while building a map structure that represents the focus of collaboration. The mind map can then take full advantage of Google Wave playback, or be downloaded to one’s desktop for further development using ConceptDraw MINDMAP.

Improved Functionality of ConceptDraw MindWave 2:

· Getting Started mind map on first launch to introduce new users to functionality.

· Horizontal scroll bar for more versatile navigation.

· Automatic hyper linking of URL topic text.

· Added help and privacy resources to gadget

· New shortcuts and many usability improvements

Olin Reams, General Manager for the Americas at CS Odessa, observes, “We demonstrated ConceptDraw MindWave, at the recent Google I/O Developers Conference. ConceptDraw MindWave was presented in the Sandbox and was a huge success.”

“Many of the developers and spectators in attendance were able to immediately see the great potential in being able to build and store a mind map whose access can be embedded anywhere HTML appropriate, such as a web page or enterprise portal. Participants are then free to discover and interact with the wave mind map, join the conversation, and add or edit information. The end result is a great collaborative productivity tool.”

ConceptDraw MindWave is a no cost gadget for Google Wave. To install ConceptDraw MindWave, please visit: http://www.conceptdraw.com/mindwave.

ABOUT CS ODESSA

Founded in 1993, Computer Systems Odessa supplies cross-platform productivity tools and graphics technologies to professional and corporate users around the world. With headquarters in Odessa, Ukraine and an office in California, CS Odessa sells products internationally through resellers in over 25 countries. The ConceptDraw Productivity Line of products has won numerous awards and is used by hundreds of thousands of people all over the world.

www.conceptdraw.com