How to use Microsoft’s speech-to-text API to create a dataset, and then train a model with it on the Peltarion platform.
Create a text dataset from voice recordings

Data scientists are always on the lookout for new datasets, but in spite of the myriad of data being generated and processed everyday they can be surprisingly hard to find. Creating your own datasets is therefore often the most time efficient way to go if you want to start exploring a project idea quickly.
Here’s one way you can go about creating a dataset for text using Microsoft’s speech-to-text API, and then using it to train a model on the Peltarion platform.
- Follow this link to find Microsoft’s instructions for their speech-to-text APIs. You can choose to work with recorded audio files or by talking into your microphone.
- Create your dataset by recording people speaking and attaching a label to each recorded snippet.
- Turn it into a text dataset using the speech-to-text API according to Microsoft’s instructions above.
- Follow one of our tutorials to train a text classification model using the natural language processing model English BERT (our tutorial shows you how to build a model to detect the sentiment of reviews) or multilingual BERT (our tutorial on this can tell you the genre of a book based on an extract from it) depending on which languages your recordings are in.
- Deploy the model using our one-click REST API to start using it.
For more about text-based AI models or multilingual BERT check out our intro series on the topic.
- NLP ,
Data science topics
02/ For more on data science
- Data science /
Building a Stack Overflow question tagging model with public BigQuery data
In this blog post, I'll show how you can shape one of these public datasets, the Stack Overflow posts dataset, to train a question tagging BERT model.
- Data science /
Connecting the dots in Neural Networks
- Data science /
Image similarity with deep learning explained
How can a neural network deal with a vague concept like similarity? Why do we measure angles with the cosine similarity? Is it that far from classification?
- Data science /
Peltarion on Microsoft's AI Show
- Data science /
What is BERT?