speech recognition – AI News https://news.deepgeniusai.com Artificial Intelligence News Thu, 17 Sep 2020 15:47:54 +0000 en-GB hourly 1 https://deepgeniusai.com/news.deepgeniusai.com/wp-content/uploads/sites/9/2020/09/ai-icon-60x60.png speech recognition – AI News https://news.deepgeniusai.com 32 32 Researchers achieve 94% power reduction for on-device AI tasks https://news.deepgeniusai.com/2020/09/17/researchers-achieve-power-reduction-on-device-ai-tasks/ https://news.deepgeniusai.com/2020/09/17/researchers-achieve-power-reduction-on-device-ai-tasks/#respond Thu, 17 Sep 2020 15:47:52 +0000 https://news.deepgeniusai.com/?p=9859 Researchers from Applied Brain Research (ABR) have achieved significantly reduced power consumption for a range of AI-powered devices. ABR designed a new neural network called the Legendre Memory Unit (LMU). With LMU, on-device AI tasks – such as those on speech-enabled devices like wearables, smartphones, and smart speakers – can take up to 94 percent... Read more »

The post Researchers achieve 94% power reduction for on-device AI tasks appeared first on AI News.

]]>
Researchers from Applied Brain Research (ABR) have achieved significantly reduced power consumption for a range of AI-powered devices.

ABR designed a new neural network called the Legendre Memory Unit (LMU). With LMU, on-device AI tasks – such as those on speech-enabled devices like wearables, smartphones, and smart speakers – can take up to 94 percent less power.

The reduction in power consumption achieved through LMU will be particularly beneficial to smaller form-factor devices such as smartwatches; which struggle with small batteries. IoT devices which carry out AI tasks – but may have to last months, if not years, before they’re replaced – should also benefit.

LMU is described as a Recurrent Neural Network (RNN) which enables lower power and more accurate processing of time-varying signals.

ABR says the LMU can be used to build AI networks for all time-varying tasks—such as speech processing, video analysis, sensor monitoring, and control systems.

The AI industry’s current go-to model is the Long-Short-Term-Memory (LSTM) network. LSTM was first proposed back in 1995 and is used for most popular speech recognition and translation services today like those from Google, Amazon, Facebook, and Microsoft.

Last year, researchers from the University of Waterloo debuted LMU as an alternative RNN to LSTM. Those researchers went on to form ABR, which now consists of 20 employees.

Peter Suma, co-CEO of Applied Brain Research, said in an email:

“We are a University of Waterloo spinout from the Theoretical Neuroscience Lab at UW. We looked at how the brain processes signals in time and created an algorithm based on how “time-cells” in your brain work.

We called the new AI, a Legendre-Memory-Unit (LMU) after a mathematical tool we used to model the time cells. The LMU is mathematically proven to be optimal at processing signals. You cannot do any better. Over the coming years, this will make all forms of temporal AI better.”

ABR debuted a paper in late-2019 during the NeurIPS conference which demonstrated that LMU is 1,000,000x more accurate than the LSTM while encoding 100x more time-steps.

In terms of size, the LMU model is also smaller. LMU uses 500 parameters versus the LSTM’s 41,000 (a 98 percent reduction in network size.)

“We implemented our speech recognition with the LMU and it lowered the power used for command word processing to ~8 millionths of a watt, which is 94 percent less power than the best on the market today,” says Suma. “For full speech, we got the power down to 4 milli-watts, which is about 70 percent smaller than the best out there.”

Suma says the next step for ABR is to work on video, sensor and drone control AI processing—to also make them smaller and better.

A full whitepaper detailing LMU and its benefits can be found on preprint repository arXiv here.

The post Researchers achieve 94% power reduction for on-device AI tasks appeared first on AI News.

]]>
https://news.deepgeniusai.com/2020/09/17/researchers-achieve-power-reduction-on-device-ai-tasks/feed/ 0
Esteemed consortium launch AI natural language processing benchmark https://news.deepgeniusai.com/2019/08/15/consortium-benchmark-ai-natural-language-processing/ https://news.deepgeniusai.com/2019/08/15/consortium-benchmark-ai-natural-language-processing/#respond Thu, 15 Aug 2019 16:24:15 +0000 https://d3c9z94rlb3c1a.cloudfront.net/?p=5938 A research consortium featuring some of the greatest minds in AI are launching a benchmark to measure natural language processing (NLP) abilities. The consortium includes Google DeepMind, Facebook AI, New York University, and the University of Washington. Each of the consortium’s members believe a more comprehensive benchmark is needed for NLP than current solutions. The... Read more »

The post Esteemed consortium launch AI natural language processing benchmark appeared first on AI News.

]]>
A research consortium featuring some of the greatest minds in AI are launching a benchmark to measure natural language processing (NLP) abilities.

The consortium includes Google DeepMind, Facebook AI, New York University, and the University of Washington. Each of the consortium’s members believe a more comprehensive benchmark is needed for NLP than current solutions.

The result is a benchmarking platform called SuperGLUE which replaces an older platform called GLUE with a “much harder benchmark with comprehensive human baselines,” according to Facebook AI. 

SuperGLUE helps to put NLP abilities to the test where previous benchmarks were beginning to pose too simple for the latest systems.

“Within one year of release, several NLP models have already surpassed human baseline performance on the GLUE benchmark. Current models have advanced a surprisingly effective recipe that combines language model pretraining on huge text data sets with simple multitask and transfer learning techniques,” Facebook said.

In 2018, Google released BERT (Bidirectional Encoder Representations from Transformers) which Facebook calls one of the biggest breakthroughs in NLP. Facebook took Google’s open-source work and identified changes to improve its effectiveness which led to RoBERTa (Robustly Optimized BERT Pretraining Approach).

RoBERTa basically “smashed it,” as the kids would say, in commonly-used benchmarks:

“Within one year of release, several NLP models (including RoBERTa) have already surpassed human baseline performance on the GLUE benchmark. Current models have advanced a surprisingly effective recipe that combines language model pretraining on huge text data sets with simple multitask and transfer learning techniques,” Facebook explains.

For the SuperGLUE benchmark, the consortium decided on tasks which meet four criteria:

  1. Have varied formats.
  2. Use more nuanced questions.
  3. Are yet-to-be-solved using state-of-the-art methods.
  4. Can be easily solved by people.

The new benchmark includes eight diverse and challenging tasks, including a Choice of Plausible Alternatives (COPA) causal reasoning task. The aforementioned task provides the system with the premise of a sentence and it must determine either the cause or effect of the premise from two possible choices. Humans have managed to achieve 100 percent accuracy on COPA while BERT achieves just 74 percent.

Across SuperGLUE’s tasks, RoBERTa is currently the leading NLP system and isn’t far behind the human baseline:

You can find a full breakdown of SuperGLUE and its various benchmarking tasks in a Facebook AI blog post here.

deepgeniusai.com/">AI & Big Data Expo events with upcoming shows in Silicon Valley, London, and Amsterdam to learn more. Co-located with the IoT Tech Expo, , & .

The post Esteemed consortium launch AI natural language processing benchmark appeared first on AI News.

]]>
https://news.deepgeniusai.com/2019/08/15/consortium-benchmark-ai-natural-language-processing/feed/ 0
Google details Project Euphonia work to improve voice recognition inclusivity https://news.deepgeniusai.com/2019/08/14/google-project-euphonia-voice-recognition/ https://news.deepgeniusai.com/2019/08/14/google-project-euphonia-voice-recognition/#respond Wed, 14 Aug 2019 12:48:45 +0000 https://d3c9z94rlb3c1a.cloudfront.net/?p=5934 Google has provided details of its Project Euphonia work designed to improve the inclusivity of voice recognition for people with disabilities that impair their speech. Degenerative diseases like amyotrophic lateral sclerosis (ALS) are known for causing speech impairments. Today’s voice recognition systems often cannot recognise the speech of individuals suffering from such diseases, despite those... Read more »

The post Google details Project Euphonia work to improve voice recognition inclusivity appeared first on AI News.

]]>
Google has provided details of its Project Euphonia work designed to improve the inclusivity of voice recognition for people with disabilities that impair their speech.

Degenerative diseases like amyotrophic lateral sclerosis (ALS) are known for causing speech impairments. Today’s voice recognition systems often cannot recognise the speech of individuals suffering from such diseases, despite those individuals arguably set to benefit the most from the automation offered by the technology.

Google has set out to solve the problem with Project Euphonia.

Dimitri Kanevsky, a Google researcher who himself has impaired speech, can be seen in the video below using a system called Parrotron to convert his speech into one understandable by Google Assistant:

The researchers provide a background of Project Euphonia’s origins:

“ASR [automatic speech recognition] systems are most often trained from ‘typical’ speech, which means that underrepresented groups, such as those with speech impairments or heavy accents, don’t experience the same degree of utility.

…Current state-of-the-art ASR models can yield high word error rates (WER) for speakers with only a moderate speech impairment from ALS, effectively barring access to ASR reliant technologies.”

As the researchers highlight, part of the problem is that training sets primarily consist of ‘typical speech’ without much-needed variety to represent all parts of society (this even includes heavy accents, to some degree.)

The researchers set out to record dozens of hours of voice recordings from individuals with ALS to help train their AI. However, the resulting training set is still not ideal as each person with ALS sounds unique dependent on the progression of the disease and how it’s affecting them.

Google was able to reduce its word error rate by using a baseline voice recognition model, experimenting with some tweaks, and training it with the new recordings.

The method substantially improved the recognition but the researchers found it could occasionally struggle with phonemes in one of two key ways:

  1. The phoneme isn’t recognised and therefore the word along with it,
  2. The model has to guess at what phoneme the speaker meant.

The second problem is fairly trivial to solve. By analysing the rest of the sentence’s context, the AI can often determine the correct phoneme. For example, if the AI hears “I’m reading off to the cub,” it can probably determine the user meant “I’m heading off to the pub”.

You can read the full paper on arXiv here ahead of its presentation at the Interspeech conference in Austria next month.

deepgeniusai.com/">AI & Big Data Expo events with upcoming shows in Silicon Valley, London, and Amsterdam to learn more. Co-located with the IoT Tech Expo, , & .

The post Google details Project Euphonia work to improve voice recognition inclusivity appeared first on AI News.

]]>
https://news.deepgeniusai.com/2019/08/14/google-project-euphonia-voice-recognition/feed/ 0
Speech and facial recognition combine to boost AI emotion detection https://news.deepgeniusai.com/2019/01/17/speech-facial-recognition-ai-emotion-detection/ https://news.deepgeniusai.com/2019/01/17/speech-facial-recognition-ai-emotion-detection/#respond Thu, 17 Jan 2019 13:02:48 +0000 https://d3c9z94rlb3c1a.cloudfront.net/?p=4463 Researchers have combined speech and facial recognition data to improve the emotion detection abilities of AIs. The ability to recognise emotions is a longstanding goal of AI researchers. Accurate recognition enables things such as detecting tiredness at the wheel, anger which could lead to a crime being committed, or perhaps even signs of sadness/depression at... Read more »

The post Speech and facial recognition combine to boost AI emotion detection appeared first on AI News.

]]>
Researchers have combined speech and facial recognition data to improve the emotion detection abilities of AIs.

The ability to recognise emotions is a longstanding goal of AI researchers. Accurate recognition enables things such as detecting tiredness at the wheel, anger which could lead to a crime being committed, or perhaps even signs of sadness/depression at suicide hotspots.

Nuances in how people speak and move their facial muscles to express moods have presented a challenge. Detailed in a paper (PDF) on Arxiv, researchers at the University of Science and Technology of China in Hefei have made some progress.

In the paper, the researchers wrote:

“Automatic emotion recognition (AER) is a challenging task due to the abstract concept and multiple expressions of emotion.

Inspired by this cognitive process in human beings, it’s natural to simultaneously utilize audio and visual information in AER … The whole pipeline can be completed in a neural network.”

Breaking down the process as much as I can, the system is made of two parts: one for visual, and one for audio.

For the video system, frames of faces run through a further two computational layers: a basic face detection algorithm, and three facial recognition networks that are ‘emotion-relevant’ optimised.

As for the audio system, algorithms which process sound are input with speech spectrograms to help the AI model focus on areas most relevant to emotion.

Things such as measurable characteristics are extracted from the four facial recognition algorithms from the video system and matched with speech from the audio counterpart to capture associations between them for a final emotion prediction.

A database known as AFEW8.0 contains film and television shows that were used for a subchallenge of EmotiW2018. The AI was fed with 653 video and corresponding audio clips from the database.

In the challenge, the researchers AI performed admirably – it correctly determined the emotions ‘angry,’ ‘disgust,’ ‘fear,’ ‘happy,’ ‘neutral,’ ‘sad,’ and ‘surprise’ about 62.48 percent of the time.

Overall, the AI performed better on emotions like ‘angry,’ ‘happy,’ and ‘neutral,’ which have obvious characteristics. Those which are more nuanced – like ‘disgust’ and ‘surprise’ – it struggled more with.

deepgeniusai.com/">AI & Big Data Expo events with upcoming shows in Silicon Valley, London, and Amsterdam to learn more. Co-located with the IoT Tech Expo, , & .

The post Speech and facial recognition combine to boost AI emotion detection appeared first on AI News.

]]>
https://news.deepgeniusai.com/2019/01/17/speech-facial-recognition-ai-emotion-detection/feed/ 0