The Peltarion Platform makes it easy to import files and data sources, combining them and preparing them for experiments.
ETL (Extract, Transform, Load) must be done outside the Platform.
Dataset file size
The maximum size when you upload a file is 2GB.
|There is no size limit if you import a dataset file from an URL.|
The csv file is a tabular comma-separated data file containing headers. Supported encoding is UTF-8. Data containing special characters is only supported in case of UTF-8 encoding. Find out more in known issues.
The csv-file must contain a constant number of columns.
The data need to be well-formatted, that is, the data type in each column is consistent and contains no empty- or null-values.
Example: List traffic on a specific location at a specific date and time.
f4, 32-bit floating-point number
Csv file limitations
Categorical encoding categories: 2 000
Maximum number of rows: 10 000 000
Maximum number of columns: 5 000
The npy file is a binary file format for a one NumPy array where NumPy is a library for the Python programming language.
f4, 32-bit floating-point number
The npy can be used to import images, then each pixel is represented as a floating-point number.
In a raw npy file samples are assumed to be arranged along the first dimension, for example, if the npy array have dimensions (1000, 20, 10, 3) the platform will treat it as 1000 separate tensors with height 20, width 10, and 3 channels.
Little-Endian byte order is supported. Fortran order not supported.
The platform supports multi-class classification for targets of higher dimensionality, e.g., a target that is a vector of different classes, or a target that is an image with one class per pixel.
In this kind of classification problem the target data is represented by a numpy array, where the last axis is interpreted as the class label and needs to be categorically encoded before importing into the platform.
Visualization on the Datasets view
Visualisation of npy files depends on the number of dimensions. Find out more in known issues.
Example: A 10x10 pixel grayscale image will consist of 100 floating-point numbers. If the image is a 10x10 pixel RGB image it will consist of 300 floating-point numbers.
The zip file is used to bundle image or numpy files together. Supported formats for the individual files are png, jpg and npy.
The zip file must also contain an index.csv file with at least one column with one row per image containing the relative path to each image. The index.csv file may also contain other columns which will be handled the same way as in a standalone csv file.
All images or numpy files must be of the same format and have the same dimensions.
To import a file or data source you can either upload it or import via an URL.
In the Upload files tab you can either drag and drop the file(s) to drop area or click Choose files to upload the file(s) from your computer.
In the Import files tab and just enter the URL to a public data file in the URL field. This is recommended for files bigger than 2Gb.
You can create rich datasets by combining different data types into a joint dataset. For example, you can join labelled image data with additional features.
When importing multiple files or file archives into the same dataset, the examples are assumed to be in the same order and joined automatically.