Import files and data sources to the Platform

The Peltarion Platform makes it easy to import data from a variety of sources, and to set it up to train your own models.

Upload data that is organized as many examples of one or more features, where each feature can be a numerical value, a category, an image, or text. You can select one or many features to use as input to your models, and set which feature is the target that your models will learn to predict.

There are two main ways to import datasets inside your project: using our data library, or by uploading your own files.

Data library: ready-made datasets

The Data library contains datasets that are ready to be added to your project.

In the Datasets overview page, click the Data library button to open the Data library window. Use it to:

  • Quickly import the data required to follow one of the tutorials

  • Get started training and evaluating models using reference datasets

  • Prototype models on data similar to your own, and use transfer learning when you are ready to train on your own data

Please note that datasets, machine-learning models, weights, topologies, research papers and other content, including open source software, (collectively referred to as Content) provided and/or suggested by Peltarion for use in the Platform and otherwise, may be subject to separate third party terms of use or license terms. You are solely responsible for complying with the applicable terms. Peltarion makes no representations or warranties about Content. You expressly relieve us from any and all liability, loss or risk arising (directly or indirectly) from Your use of any third party content.

The datasets in the data library may come from third party sources and are provided for convenience. Read the dataset licenses to know the particular terms of each dataset.

Data warehouse: import datasets from Azure Synapse and BigQuery

If you have data stored in Azure Synapse or BigQuery online storage, you can import it to the Platform and start training models on it.
This is done easily in three steps: connect to your data warehouse, select the dataset you want to import, preview the content to make sure it’s correct, and that’s it.

The exact steps depend on your service provider. Currently we support

File import

Import files that you have prepared with the data you want to use.
You can upload existing files to the Platform in several ways:

  • Click Choose files to upload files directly from your local computer. You can also just drag and drop them in the dotted area.
    This should be limited to files of 5 GB or smaller, to limit the risk of connection issues during upload.

  • You can also use the Data API to let a script (or program) upload the files you want into a new dataset.

  • Import data from a URL, if your files are hosted online. This is recommended for large files, as it usually provides a better and faster connection.

File formats supported

You can upload 3 types of files into the Platform:

  • csv: A comma-separated values text file, the easiest way to upload data with heterogeneous features

  • npy: A saved NumPy array file, where different examples are listed along the first array dimension

  • zip: A compressed zip file, containing one or more csv, npy, or image files

Each file type has its own specifications and requirements.

Importing images

To work with images, you can use any of the

  • jpg/jpeg

  • png

  • gif

If you have the images locally, archive them inside a zip file, either with an index file or without an index file.

You can also use images that are stored in Google Cloud Storage or Amazon S3.
In that case, specify the image URL as a feature, and the Platform will import them as image features.

Uploading several files

You can upload several files into the same dataset.
In that case, additional files are assumed to contain additional features (columns) for the same examples, and not additional examples (rows) of the same features.

This can be useful if your training examples have several features which are saved in different files.
If you want to upload multiple files, keep in mind that:

  • All the files must be uploaded at the time when the dataset is created

  • All the files must contain the same number of examples (the number of examples is determined from the first file you upload)

  • All the files must contain the examples in the same order

Was this page helpful?