Import files and data sources to the Platform

The Peltarion Platform makes it easy to import files and data sources, combining them and preparing them for experiments.

Requirements on imported datasets

Data preprocessing

ETL (Extract, Transform, Load) must be done outside the Platform.

Dataset file size

The maximum size when you upload a file is 2GB.

Note
There is no size limit if you import a dataset file from an URL.

Data formats supported by the Peltarion Platform

File formats supported for data import:

Csv file specifications

The csv file is a tabular comma-separated data file containing headers. Supported encoding is UTF-8. Data containing special characters is only supported in case of UTF-8 encoding. Find out more in known issues.

The csv-file must contain a constant number of columns.

The data need to be well-formatted, that is, the data type in each column is consistent and contains no empty- or null-values.

Example: List traffic on a specific location at a specific date and time.

Supported subtypes

  • f4, 32-bit floating-point number

  • string

  • integer

  • text (paragraphs of words and sentences, including numbers, punctuation, etc.)

Csv file limitations

  • One-hot encoding categories: 2 000

  • Maximum number of rows: 10 000 000

  • Maximum number of columns: 5 000

Npy file specifications

The npy file is a binary file format for a one NumPy array where NumPy is a library for the Python programming language.

Supported subtypes

  • f4, 32-bit floating-point number

Upload

The npy can be used to import images, then each pixel is represented as a floating-point number.

In a raw npy file samples are assumed to be arranged along the first dimension, for example, if the npy array have dimensions (1000, 20, 10, 3) the platform will treat it as 1000 separate tensors with height 20, width 10, and 3 channels.

Little-Endian byte order is supported. Fortran order not supported.

The platform supports multi-class classification for targets of higher dimensionality, e.g., a target that is a vector of different classes, or a target that is an image with one class per pixel.

In this kind of classification problem the target data is represented by a numpy array, where the last axis is interpreted as the class label and needs to be one-hot-encoded before importing into the platform.

Visualization on the Datasets view

Visualisation of npy files depends on the number of dimensions. Find out more in known issues.

Example: A 10x10 pixel grayscale image will consist of 100 floating-point numbers. If the image is a 10x10 pixel RGB image it will consist of 300 floating-point numbers.

Png file specifications

Supported subtypes

  • 24 bit, 256x256 pixel

Note
When png files are imported into the Platform they are converted to npy.

Zip file specifications

The zip file is used to bundle image or numpy files together. Supported formats for the individual files are png, jpg and npy.

The zip file must also contain a csv file with at least one column with one row per image containing the relative path to each image. The csv file may also contain other columns which will be handled the same way as in a standalone csv file.

The name of the csv file does not matter as long as the zip file contains a single csv file. If the zip file contains multiple csv files, you must name the one intended to be used as index.csv to remove ambiguity.

All images or numpy files must be of the same format and have the same dimensions.

Example:

File structure illustration

Text file specifications

Datasets with text for sentiment analysis need to include:

  • 1 column with text to be used as input

  • 1 column as label to be used as predicted category in the Target block.

Max length of text

Max length of text in single cell is 20480 characters.

Use quote "" for optimal formating

Quote text and double quote a double quote in the text for optimal formating.
Example: "This is a text with a ""quoted string"" easy peasy"

It works with new lines as well.
Example: "This is a text
with a new line."

How to import a file or a data source

To import a file or data source you can either upload it or import via an URL.

Upload files

In the Upload files tab you can either drag and drop the file(s) to drop area or click Choose files to upload the file(s) from your computer.

Import files

In the Import files tab and just enter the URL to a public data file in the URL field. This is recommended for files bigger than 2Gb.

Combining data types into one dataset

You can create rich datasets by combining different data types into a joint dataset. For example, you can join labelled image data with additional features.

When importing multiple files or file archives into the same dataset, the examples are assumed to be in the same order and joined automatically.