Import files and data sources to the Platform
The Peltarion Platform makes it easy to import files and data sources, combining them and preparing them for experiments.
Requirements on imported datasets
ETL (Extract, Transform, Load) must be done outside the Platform.
Dataset file size
The maximum size when you upload a file is 2GB.
|There is no size limit if you import a dataset file from an URL.|
Data formats supported by the Peltarion Platform
Csv file specifications
The csv file is a tabular comma-separated data file containing headers. Supported encoding is UTF-8. Data containing special characters is only supported in case of UTF-8 encoding. Find out more in known issues.
The csv-file must contain a constant number of columns.
The data need to be well-formatted, that is, the data type in each column is consistent and contains no empty- or null-values.
Example: List traffic on a specific location at a specific date and time.
f4, 32-bit floating-point number
text (paragraphs of words and sentences, including numbers, punctuation, etc.)
Csv file limitations
One-hot encoding categories: 2 000
Maximum number of rows: 10 000 000
Maximum number of columns: 5 000
Npy file specifications
The npy file is a binary file format for a one NumPy array where NumPy is a library for the Python programming language.
f4, 32-bit floating-point number
The npy can be used to import images, then each pixel is represented as a floating-point number.
In a raw npy file samples are assumed to be arranged along the first dimension, for example, if the npy array have dimensions (1000, 20, 10, 3) the platform will treat it as 1000 separate tensors with height 20, width 10, and 3 channels.
Little-Endian byte order is supported. Fortran order not supported.
The platform supports multi-class classification for targets of higher dimensionality, e.g., a target that is a vector of different classes, or a target that is an image with one class per pixel.
In this kind of classification problem the target data is represented by a numpy array, where the last axis is interpreted as the class label and needs to be one-hot-encoded before importing into the platform.
Visualization on the Datasets view
Visualisation of npy files depends on the number of dimensions. Find out more in known issues.
Example: A 10x10 pixel grayscale image will consist of 100 floating-point numbers. If the image is a 10x10 pixel RGB image it will consist of 300 floating-point numbers.
Png file specifications
24 bit, 256x256 pixel
|When png files are imported into the Platform they are converted to npy.|
Zip file specifications
The zip file is used to bundle image or numpy files together. Supported formats for the individual files are png, jpg and npy.
The zip file must also contain a csv file with at least one column with one row per image containing the relative path to each image. The csv file may also contain other columns which will be handled the same way as in a standalone csv file.
The name of the csv file does not matter as long as the zip file contains a single csv file.
If the zip file contains multiple csv files, you must name the one intended to be used as
index.csv to remove ambiguity.
All images or numpy files must be of the same format and have the same dimensions.
Text file specifications
Datasets with text for sentiment analysis need to include:
1 column with text to be used as input
1 column as label to be used as predicted category in the Target block.
Max length of text
Max length of text in single cell is 20480 characters.
Use quote "" for optimal formating
Quote text and double quote a double quote in the text for optimal formating.
"This is a text with a ""quoted string"" easy peasy"
It works with new lines as well.
"This is a text
with a new line."
How to import a file or a data source
To import a file or data source you can either upload it or import via an URL.
In the Upload files tab you can either drag and drop the file(s) to drop area or click Choose files to upload the file(s) from your computer.
In the Import files tab and just enter the URL to a public data file in the URL field. This is recommended for files bigger than 2Gb.
Combining data types into one dataset
You can create rich datasets by combining different data types into a joint dataset. For example, you can join labelled image data with additional features.
When importing multiple files or file archives into the same dataset, the examples are assumed to be in the same order and joined automatically.