Data cleaning. Data cleaning. Data cleaning. You know this, and I don't have to tell you that the quality of your training and validation datasets are key. What I do want to tell you is that we have added Outlier handling capabilities to the platform. Now, how about that?
Visualize and manage your outliers on platform
So, what are outliers and why are they important to stay alert on?
An outlier is a datapoint in your dataset that, for any reason, has a value that is far from the main group. Outliers are not uncommon, and can for example be caused if the value was entered incorrectly by mistake, if the value was collected during an unusual circumstance or if the value was a result of natural variation. Either way, the outlier represents a diversity in your dataset that does not represent the norm.
In order for a model to perform well, the quality of the data it is trained on is of utmost importance. Outliers will affect model performance negatively and you might end up with a badly performing model.
Ok. Now how can I handle outlier values on the Peltarion platform?
Glad you asked.
The Outlier handling option is available for numerical data inputs and presents users with histogram views of their data, which allows users to identify potential outliers, see the number of outliers per feature, as well as manage them by setting valid value ranges (min and max). Values outside of the set range will be removed from the dataset version.
The feature can easily be found via the Data cleaning tab on the Datasets view. Choose dataset features to manage, select appropriate range and click on the Apply changes button to remove the outliers from your sample.

Sounds interesting?
You can learn more about how to handle outliers on the Peltarion platform by visit this page.
Happy data cleaning!