Product development /

Visualize and manage your outliers on platform

November 9 2021/6 min read

Data cleaning. Data cleaning. Data cleaning. You know this, and I don't have to tell you that the quality of your training and validation datasets are key. What I do want to tell you is that we have added Outlier handling capabilities to the platform. Now, how about that?

So, what are outliers and why are they important to stay alert on? 

An outlier is a datapoint in your dataset that, for any reason, has a value that is far from the main group. Outliers are not uncommon, and can for example be caused if the value was entered incorrectly by mistake, if the value was collected during an unusual circumstance or if the value was a result of natural variation. Either way, the outlier represents a diversity in your dataset that does not represent the norm.

In order for a model to perform well, the quality of the data it is trained on is of utmost importance. Outliers will affect model performance negatively and you might end up with a badly performing model.

Ok. Now how can I handle outlier values on the Peltarion platform?

Glad you asked.

The Outlier handling option is available for numerical data inputs and presents users with histogram views of their data, which allows users to identify potential outliers, see the number of outliers per feature, as well as manage them by setting valid value ranges (min and max). Values outside of the set range will be removed from the dataset version.

The feature can easily be found via the Data cleaning tab on the Datasets view. Choose dataset features to manage, select appropriate range and click on the Apply changes button to remove the outliers from your sample.

Sounds interesting?

You can learn more about how to handle outliers on the Peltarion platform by visit this page.

Happy data cleaning!

  • Susanne Björkman

    Susanne Björkman

    Product Marketing Manager

    Susanne Björkman is part of the commercial team at Peltarion where she has role of Product Marketing Manager. She is passionate about data-driven insights, user experience and product development; and comes from a professional background in Enterprise Cloud Data Management and Analytics.

02/ More on Product development