Our models are on par with premium Google models and also really simple to use.
We are proud to announce that we have built from ground up and released our high-quality (i.e. on par with premium Google models) speech-to-text Models for the following languages:
You can find all of our models in our repository together with examples, quality and performance benchmarks. Also we invested some time into making our models as accessible as possible — you can try our examples as well as PyTorch, ONNX, TensorFlow checkpoints. You can also load our model via TorchHub.
Please go here to see the original table https://github.com/snakers4/silero-models#getting-started
Speech-to-text has traditionally had high barriers of entry due to a number or reasons:
Here are some of the typical problems that existing ASR solutions and approaches had before our release:
First we tried to alleviate some of these problems for the community by publishing the largest Russian spoken corpus in the world (see our Habr post here). Now we try to solve these problems as follows:
We believe that modern technology should be embarrassingly simple to use. In our work we follow these design principles:
Now the smallest we could compress our models is around 50 Megabytes. We still have plans to compress our Enterprise Edition models up to ~20 Megabytes without loss of fidelity. We also are planning to release Community Edition model for other popular languages.
Originally published at https://habr.com.