🎁Big news are here! 📚
We have added a new beta-feature for text classification, aka sentiment analysis support, to our platform. This is the first step in a long list of Natural Language Processing (NLP) capabilities to come.
Now, you can upload sentences and paragraphs of text to the platform in your CSV-file. For better parsing support, before uploading the CSV-file to the platform, make sure your text feature is double-quoted, e.g., as "This is a sentence for text feature example used in the CSV!". The rest of CSV-rules apply as before, i.e., all columns should be well-formatted and shouldn’t contain empty or NULL values.
Before saving the dataset version for text classification, you should set the language model you want the text feature to be encoded with. Currently, we provide English, Swedish, and Finnish language models. The The Sequence length parameter enables you to define the maximum number of words used per sample. If there are more words, we cap them, if there are less, we pad the sample.
Build a text classifier.
Different from images, text is a 1-dimensional data type, and to make processing text easier, we have now also added 1D-convolution and pooling capabilities to the model builder. Most importantly, when using text as input feature, make sure to add the new Text embedding (beta) block before starting with the convolution filters. On the Text embedding (beta) block, you define how your input text is tokenized before feeding it to the neural network. You can use randomly initialized embedding or one of the FastText CC-licensed pre-trained embeddings.
Testing the text classifier
To make it easier to quickly test your model on new sentences, we have provided a Text Classifier web app. Define your deployment URL, token and input parameter, enter a new text and press ▶️ to predict!
Testing over cURL works as for any other deployment, help is available in our Knowledge center.
🔄 More transfer learning capabilities! We’ve now added pretrained weights also for Resnetv2 for large-sized images, with weights trained on ImageNet. Resnetv2 is the recommended architecture for computer vision tasks like image classification. If you’ve previously tried out VGG-network with our Car damage tutorial, you might find it interesting to see how Resnetv2-network performs compared with VGG.
When using a pretrained snippet for transfer learning, make sure to drop the weights for the feature extraction layers and adjust the number of nodes for the last dense layer to match the number of categories to predict.
✅⏯ Simplified navigation buttons added to the projects, datasets, experiments and deployments lists. Now you can easily perform the main actions like Run, Duplicate, Pause or navigate to the Evaluation graphs directly from the experiment in the experiments list. This makes iterating over different model ideas and evaluating their performance so much faster and smoother!
🔡 The order of the features included in a feature set is now explicit and editable. When creating a new feature set, you can select the needed features and re-order them in the feature set window as fits best your model usage.
📚 More tutorials available! You can go through the basic steps of transfer learning from our image classification case in car damage tutorial. We’ve also added an example of solving a segmentation problem on the Peltarion platform in the skin lesion detection tutorial. Try them out!
👩🏻🏫 Spread the AI-knowledge through easy invites. Are you on our new Community Edition free offer? You can always invite a friend to the platform and collaborate with them building new AI models, since Community Edition comes with 50 free computing hours and 2 seats per organization. Go to Members page and invite a friend using their email address. If you change your mind, you can remove the friend from your organizaton and invite another one if needed. Each invited friend also gets their own organization with free computing quota!
🏋🏽 Transfer learning capabilities enabled! Building well-performing neural networks is a complex task that requires specific skills, knowledge, resources and data to succeed. We’ve taken a leap towards reducing the complexity and helping you succeed with deep learning; we’ve added one of the first Pretrained snippets. The first available pretrained snippet is the VGG feature extractor trained on 1.2M images from the ImageNet dataset.
Pretrained snippets is an extremely powerful feature that has numerous benefits like reducing the time and skills needed to get started and lowering the costs. It will unlock many companies who don’t own large sets of data to get value in their specific domain, since pretrained networks already have learned the basic representations of data structures and can be trained on a small domain-specific dataset to provide value. We have also grouped the deep neural network blocks to hide the unnecessary complexity and fit the model in the canvas. You can always expand and collapse the groups the see what they consists of, and add additional blocks in the end to adjust the model functions.
How to use transfer learning with pretrained snippets
Follow these simple guidelines to get value of transfer learning capabilities for an image classification task:
Import a well-formated image dataset with labels, with images of size at least 32x32px. As an example, you can use sidekick to prepare and upload the HAM10000 dataset.
Create a new experiment, add a VGG feature extractor from the Pretrained snippets section. In the dialogue window, make sure the “Trainable” setting is set to NO.
Set input as images.
Add 2 Dense blocks and a Target block. Make sure the number of target nodes matches the number of classes you need to predict, set loss to Categorical crossentropy and the activation in the last layer to Softmax.
Define suitable batch size to make sure your model fits in memory and click Run!
Tip: To make the most out of the pretrained weights, make sure to initially set the all blocks to not trainable except for the last 2 blocks. These 2 are the blocks that learn the class representations of your dataset. As you see your network succeed during training, you can gradually duplicate and unlock weights to train more layers as you see fit.
This is just the first step. Stay tuned for more pretrained snippets and helpful tutorials available soon!
More information about VGG here.
🖌 The Dataset view has been redesigned!
Feature settings are now available on the inspector panel and shown when you select a specific feature. This way all the configurations are always available for you in the same familiar location, and it leaves more space for showing data examples in the dataset area. In the feature settings, you can edit the proposed label of the feature, and set the appropriate way of encoding the feature or leave it without any. Platform provides suitable encoding options based on the data types of your features.
Missing the one-hot encoded option? This is now called Categorical and can be used for both text and numbers. For numeric features (incl images and float tensors) standardization and min-max encoding are available.
Note that you no longer have to form feature sets in order to use them as input and target features in the model. However, in case your dataset has many features, we’ve added a Combined feature capability for your convenience. This enables combining many features into a single combined feature and makes it easier to use it as the model input. All the shape calculations are done automatically by the platform.
⚽ Newly released Peltarion Platform Sidekick Beta project at your service!
Great news from Peltarion Data Science/Machine Learning team warriors - we’ve provided you with an open-source public library with Apache2.0 license to make your life easier while working with the end-to-end tasks of your AI project on the Platform. Sidekick helps you with two main tasks:
Prepare the data into suitable format for Platform ingestion.
Running predictions through the Platform REST API once you’ve deployed your trained model. Feel free to pull, use and contribute. Follow the guidelines in README to get started quickly!
📚 Yay! The Knowledge center is alive and kicking. Damn good looking but there’s more under the hood:
Focus on user experience. We want to help all our users become AI superheroes.
Findability. All articles are search engine optimized, and we’ve added a search capability.
Future-ready. Knowledge center will keep improving, always focusing on enabling all our users to do great stuff on our Platform.
🖇 The aggregation method for precision and recall for single-label multi-class problems have been changed from micro-averaging to macro-averaging.
For this type of problem, micro-averaging results in both precision and recall being exactly the same as accuracy. That does not provide any additional information about the model’s performance. Macro-averaged precision and recall provides a complementary metric to the overall accuracy, since it will be low for models that only perform well on the common classes while performing poorly on the rare classes.
Support for multi-class classification models with higher dimensionality targets now on the platform.
Previously, each row in the dataset had to corresponde to exactly one class. We now allow targets of higher dimensionality, eg a target that is a vector of different classes, or a target that is an image with one class per pixel. This unlocks use cases such as multi-class semantic segmentation of images.
To train a multi-class target model, the target data can be represented by a numpy array, where the last axis is interpreted as the class label and needs to be one-hot-encoded before importing into the platform.
Visualizations for higher dimensionality targets on the Evaluation page now available.
Previously, the metrics under Model evaluation were only computed when the target corresponded to exactly one class or to exactly one numeric value only. We now provide the graphs also for multi-dimensional targets, for example a vector of numeric values or a target image with one class per pixel.
In the case of a classification problem with multi-dimensional target the confusion matrix is sampled to a maximum of 500 000 values. For a multi-dimensional target each value in the confusion matrix corresponds to a vector element, or to a pixel in the target image. This means that the total number of values in the confusion matrix will be many more than the number of samples in the dataset.
In the case of a regression problem with multi-dimensional target each dot in the scatter plot represents an element in the target vector or a pixel value in the target image. For visibility reasons, the scatter plot is sampled to show maximum 500 data points. The error distribution plot is based on 5000 sampled values.
Minimap is dead, RIP minimap!
Instead, say Hello to zooming capabilities on the Modeling canvas. We know you’ve been longing for this, so we’ve introduced the zooming capabilities to the model builder, as well as added some basic key commands, like Cmd/Ctrl+A, Cmd/Ctrl+C, Cmd/Ctrl+V for quick and easy blocks selection and copy/paste. Remember Option/Alt+Click, Hold and Drag still works to help you pan around along the whole model canvas.
Blocks and Settings tabs have gotten a face lift
Both dataset and runtime settings are now defined on the Settings tab in the Inspector. When selecting a block on the Modeling canvas, you can adjust the block parameters in the Blocks tab. When you Shift+Click to select more than one block, you can change their common settings together!
All errors and warnings messages are brought to the Information-center-popup
The Information-center-popup is located in the lower left corner of the Modeling canvas and clicking on the error message will guide you directly to the problematic area, to help solve issues with just a click or two!
Quick overview of the Running jobs queue on your organization Projects page
Anyone in your team can now have a better overview of who’s training which model and where the GPU hours are spent. In case all the GPUs are busy, the new initiated jobs will appear in Queued status, and those recently completed or paused experiments will be listed as Trained experiments.
Note that if you select a specific project in the Projects list, you will get to see the actual GPU and storage usage as well as Running jobs queue for this specific project!
Notification reminder about soon expiring quota plan provided a few days before end of validity
Make sure to run your experiments in time and contact our email@example.com to extend a payment plan. After quota plan has expired, you can still view, access and delete your data, experiments and deployments during 90 days.
Binary prediction labels are flipped for the confusion matrix on Evaluation view.
We now make sure that when you are solving a binary classification problem, our computation engine omits 1 to positive and 0 to negative predictions, to make it intuitive to read the confusion matrix when evaluating the model performance.
Data for already trained experiments is not re-calculated. However, when duplicating or resuming model training, the confusion matrix will be showing flipped values from before and after the resume. If this model has some epoch checkpoint saved from before the change as well as after it, you will see the confusion matrix with 0- -1 labels places switched, having the epoch from after the change show the confusion matrix with correct labels.
This does not change how precision, recall, AUC and binary crossentropy are computed, so the experiment values are still comparable from before and after this change!
R2 computation improved for regression problems.
Previously we had to compute R2 separately for each batch. We’ve changed our metrics library to compute the total sum of squares and residual sum of squares independently. This means the resulting R2 will closer resemble the value you would get when computing it once for the entire dataset. This change does not affect other regression metrics like MSE, RMSE, MAE, MAPE. This change does not affect classification metrics.
The compiler option for “Data access seed” in the “Setup and run” dialog is now randomized for each experiment, both when creating a completely new experiment and when duplicating a previous experiment.
The data access seed is used for controlling in which order the data is accessed during training. Randomization of the seed means each experiment will be independent of each other since data access will be different for each experiment. This is desired behavior for comparing performance between models and runs.
Note that in order to achieve deterministic training behavior, the user can still manually set the same seed between experiments.
New set of deep neural networks snippets available. We have created a handy list of well-known and well-performing networks in the Snippets panel to help you get started, this includes Resnet, Densenet, Inception, Tiramisu and more. Check out each snippet tooltip, find the best suitable one for your problem type and input data, add it to the modeling canvas and start experimenting!
Beware that currently the snippets are not pre-trained and training a deep network on a large set of data may consume a significant amount of GPU power.
Parameters, blocks & settings panels on Dataset, Modeling and Evaluation views are now collapsible! When working with big datasets with many features and deep models with many layers, it’s helpful to have more space for exploring and building of the models. We’ve also added a toolbar above the working area for each view to make sure you always find the necessary buttons in the same place. Note that some of the buttons have shifted to the upper right corner from your usual location.
The calculation method for model performance metrics on the Evaluation view has been changed. Previously the metrics for Regression problems were calculated on normalized data, which has now been changed to calculation on denormalized data.
This affects metrics: MSE, MAE, MAPE, MSLE.
Note that if you have paused experiments and resume training after this change, you will experience peculiarities for the metrics graph.
If you have training processes running, the metrics for those will continue to be calculated on normalized data.
Historical metric values for completed experiments are not changed. This change does not affect the experiment Loss. Note that we’ve also added a few new metrics - RMSE and Gradient norm!
More help and guidance to the model serving through deployment API now available. OpenAPI doc is directly downloadable from the Deployment view. Link to deployment API help page with code snippets in Knowledge center is also added. Check out how to call the deployment API directly from the terminal or Python notebook.
Projects list now has search and filter capability! Search for your own projects or other team members projects without scrolling through long lists!
Tagging capability now available for experiments lists. Add tags to your experiments to quickly identify, search and filter specific experiments. Tags are inherited when an experiment is duplicated.
Organization members list now available with purchased quota plan information and membership management capabilities for administrators. Invite co-workers to collaborate or remove accounts that are obsolete with a few clicks.
New deployment solution released with persistent deployment. No more 48-hour limitation! Create a new persistent deployment, choose a suitable experiment checkpoint and enable the deployment for API calls! Once obsolete, disable the deployment to save resources.
Major redesign of Evaluation view graphs with additional context-specific performance metrics (for classification problems) and more loss functions (for regression problems) published during training.
Easier graph settings with wall time capability now available.
Consistent search and filter of experiments by name, creator, loss, and experiment status now also on Evaluation view.
Quick navigation links between Modeling, Evaluation and Deployment views added for each experiment.
Improved user experience during statistics calculations. Histograms and other feature statistics are updated incrementally while being calculated.
Vastly improved the time from uploading a dataset to when it’s available for training, including faster statistics calculation and saving the dataset version.
Advanced optimizer parameters are now available in the experiment settings panel.
Dataset, training, and validation subsets pre-selection heuristics added for faster new experiment definition.
Dataset statistics for features are now available.
Groups and selections renamed to feature sets and subsets for clarity.
Copy with weights for transfer learning with selected blocks.
Updated tutorials for datasets and modeling.
Improved accuracy calculation for model evaluation.
New search and filter of experiments by name, creator, loss and experiment status.
Modeling view redesign.
Filtered list of experiments on Deployment view.
Go to experiment from Deployment view.
Stay in the know by signing up for occasional emails with tips, tricks, deep learning insights, product updates, event news and webinar invitations.
We promise not to spam you or share your email with any third party. You can change your preferences at any time. See our privacy policies.
Please check your email inbox account to confirm, set, or update your communication preferences.