Baidu Opens Access To Its Speech Recognition Technology, But Dreams Elsewhere
摘要： What stage has Baidu reached in speech recognition and artificial intelligence technology? What new progress has Baidu made in speech recognition? Why did Baidu decide to open access to its speech recognition technology?
As early as the 2016 Baidu Conference held this September, Robin Li, founder and CEO of Chinese internet giant Baidu, already said that “speech and image are replacing text as the mainstream way of expression”. During this year’s conference, Baidu Brain, the combination of Baidu’s artificial intelligence, big data and deep learning technology, was the absolute focus. After the conference, Baidu successfully synthesized Chinese popular singer Zhang Guorong’s voice through its speech synthesis technology, and exerted every effort to promote its new progress in unmanned driving. It is quite obvious Baidu is betting on artificial intelligence technology in hopes of regaining its past glory. During the just-passed World Internet Conference held in Chinese eastern town Wuzhen, Robin even maintained that “the era of mobile internet has ended”.
On November, 22nd, Baidu held the third anniversary for its open speech recognition platform and announced four types of completely new technology, Emotion Synthesis, Long-Distance Solution, Awake (Version 2) and Long-Time Speech Recognition Solution. All these new technologies will be open to users and developers. Wu En’da, Chief Scientist of Baidu, said that:
“There’s huge potential in these new technologies. It is probable that they could significantly improve the efficiency of human-machine interaction. Speech recognition technology will be applied in various scenarios in the future, and will certainly change the way human and machine interact.”
In addition, we can see from these new updates Baidu’s efforts to further improve user experiences and user scenario of speech recognition technology. Through Emotion Synthesis technology, machines will be able to make sounds similar to human beings, which will certainly improve the user experience. Through Long-Distance Solution, Baidu’s speech recognition system could recognize speech coming from three to five meters away with an accuracy of 93-94%, which will enable smart home device developers to create a better user experience for their speech-controlled smart device. Through Long-Time Solution, stenographer and journalists will be able to save much of their effort when dealing with long-time speech.
“While it will take one to two years more for scientists to achieve major breakthroughs in some areas of artificial intelligence technology, speech recognition technology has already been quite mature in many areas,” Wu told TMTpost.
Based on Baidu’s public data, the accuracy of its speech recognition system has already reached 97%, and over 140,000 developers have already been adopting Baidu’s technology through its open platform. Speaking of the future of artificial intelligence, Wu said that:
“Artificial intelligence has already created significant value to lots of companies, including Baidu, and Baidu has already found a path to revolutionize many different industries through artificial intelligence.”
If we look at how diverse the areas where speech recognition technology is adopted are, we may imagine the huge potential of artificial intelligence. Since Baidu can’t explore all the potentials by itself, Baidu decided to share its speech recognition technology with third-party companies and the large crowd of developers.
As a matter of fact, many tech giants have been increasing their investment in speech recognition, one of the core interaction technologies in artificial intelligence. Earlier this August, Microsoft launched the fourth generation of Cortana, which is not only upgraded in technologies, but also attached to more social functions.
For example, the upgraded Cortana will be more active in human-machine communication. In Japan, Cortana even debuted a song. Moreover, Cortana is becoming more professional in various fields of study.
Microsoft used to be very cautious about attaching commercial elements to Cortana, but it has made several attempts to adopt its speech recognition assistant Cortana in various fields this year. We may understand Microsoft’s shift from the words of Rocky Lu, vice executive president of Microsoft Global:
Cortana is a carrier of all the progress Microsoft has made in artificial intelligence, search engine and big data in the past two decades. At Microsoft, human-machine communication is not only a foundation, but also a key element in ushering in the era of artificial intelligence.
This March, Google opened access to its API for speech search and input. Google Cloud Speech API covers over eighty languages and can be adopted in various forms of speech recognition and translation.
At the same time, Apple has also been trying to make a difference in speech recognition technology. Through embedding Siri in iPhone, Apple has already gained tens of millions of users. Amazon acquired speech recognition company Yap in 2011 and released its speech recognition assistant Echo in 2014. At the same time, thousands of startups are also focusing on this area in hopes of achieving breakthroughs and revolutionize the entire industry.
As Wu said, the path of artificial intelligence is becoming clearer and clearer, so many companies are willing to have a try. In comparison, speech recognition technology, the natural entry of artificial intelligence, has already been quite mature. This might explain why competition isn’t so fierce in this area and many companies are even willing to share their technologies with other developers.
[The article is published and edited with authorization from the author @Zhang Lin, please note source and hyperlink when reproduce.]
Translated by Levin Feng (Senior Translator at PAGE TO PAGE), working for TMTpost.