Navigation

  • index
  • modules |
  • next |
  • previous |
  • DriftAI documentation »
  • driftai package »

Data Related Objects¶

Datasources¶

class driftai.data.datasource.Datasource(data_uri)[source]¶

Bases: abc.ABC

Abstract datasource

get_data()[source]¶

Get all datasource data

get_info()[source]¶

Datasource summary

Returns:Dictionary used to serialize DriftAI Datasource instance
Return type:dict
get_infolist()[source]¶

Get list of labeled indices

Returns:First element of the tuple is the index and the second element is the label
Return type:list of tuples
get_path()[source]¶

Get the location of datasource

Returns:File system datasource location
Return type:str
get_uri()[source]¶

Get datasource location URI formated

Returns:Datasource location
Return type:str
static load_from_data(data)[source]¶

Create datasource from serialized data

Parameters:data (dict) – Dictionary containing serialized datasource data
Returns:
Return type:Datasource
class driftai.data.datasource.DirectoryDatasource(path, parsing_pattern)[source]¶

Bases: driftai.data.datasource.Datasource

Parameters:
  • path (str) –
    Location of the dataset. Accept formats are:
    • Filesystem path
    • File URI
  • parsing_pattern (Pattern to get the label and data from file. Example: {testset}/{class}/{filename}.[txt|tsv]) –
get_data()[source]¶

Get all data under the datasource path

Returns:First element of the tuple is the index and the second element is the label
Return type:list of tuples
get_info()[source]¶

Directory datasource summary

Returns:Dictionary used to serialize an DriftAI DirectoryDatasource instance
Return type:dict
get_infolist()[source]¶

Get list of labeled indices

Returns:First element of the tuple is the index and the second element is the label
Return type:list of tuples
loader¶
class driftai.data.datasource.FileDatasource(path, label=None, first_line_heading=True)[source]¶

Bases: driftai.data.datasource.Datasource

Datasource subclass Responsible of handling datasets comming from a local file like csv files

Parameters:
  • path_to_data (str) –
    Location of the dataset. Accept formats are:
    • Filesystem path
    • File URI
  • label (str, optional) – Name of the label. If label is left to None the default label is assumed to be the last column
  • first_line_heading (bool, optional) – If True considers that first line is the header
get_data()[source]¶

Get the content of csv file

Returns:DataFrame wrapping the csv content
Return type:pandas.DataFrame
get_info()[source]¶

Datasource summary

Returns:Dictionary used to serialize DriftAI Datasource instance
Return type:dict
get_infolist()[source]¶

Get list of labeled indices

Returns:First element of the tuple is the index and the second element is the label
Return type:list of tuples
Raises:OptAppFileDatasourceNotCompatibeException – If file extension is not compatible with DriftAI
label¶
class driftai.data.datasource.ImageDatasource(path, parsing_pattern='{testset}/{class}_{}.[png|jpg|jpeg]')[source]¶

Bases: driftai.data.datasource.DirectoryDatasource

loader(idx)[source]¶

Dataset¶

class driftai.data.dataset.Dataset[source]¶

Indexed dataset over a datasource

Parameters:
  • datasource (Datasource) – Datasource of the dataset
  • problem_type (str, optional) – Objective of the algorithm. If problem type is not set manually, driftai will infere it automatically Possible values are: binary_clf, clf or regression
  • creation_date (datetime) – Creation date of the dataset. Should not be set manually
  • id (str) – Unique identifier for Dataset
static collection()[source]¶

Get table containing datasets

Returns:
Return type:TinyDB instance
static from_dir(path, path_pattern=None, datatype='img')[source]¶

Create a Dataset from dir

Parameters:
  • path (str) – DataSource location path
  • path_pattern (str, optional) – Pattern to generate metadate. If path_pattern is left to None the default path_pattern is taken
  • datatype (str, optional) – Directory datatype
Returns:

Return type:

DirectoryDatasource

generate_subdataset(method, by)[source]¶

Creates a subdataset of the current Dataset

Parameters:
  • method (str) – Evaluation sets split approach. Can be: train_test k_fold
  • by (float, int) – If train_test method is specified, by represents the traininig set size. For example: .85 If k_fold method is specified, by is the number of folds
get_data()[source]¶

Get datasource data

get_info()[source]¶

Get info to serialize a Dataset instance

Returns:Dictionariy containing a Dataset object summary:
{
    "datasource": dict containing path, first_line_heading and label of the datasource,
    "infolist": <TODO>,
    "problem_type": <multiclass clf, regression, binary clf>,
    "creation_date": <creation date of the dataset>,
    "id": <unique identifier>
}
Return type:dict
get_labels()[source]¶

Get all the labels

Returns:List with all labels
Return type:list
id¶

Get the unique identifier of the Persistent instance

Returns:Unique identifier
Return type:str
classmethod load_from_data(data)[source]¶

Creates a Dataset object from serialized JSON data coming from TinyDB

Parameters:data (dict) – JSON data from TinyDB
Raises:OptAppInvalidStructureException – In case file keys are incorrect
Returns:New Dataset instance
Return type:driftai.Dataset
static read_file(path, label=None, first_line_heading=True)[source]¶

Create a Dataset from a file

Parameters:
  • path (str) – DataSource location path
  • label (str, optional) – Name of the label. If label is left to None the default label is assumed to be the last column
  • first_line_heading (bool, optional) – If True considers that first line is the header

SubDataset¶

class driftai.data.dataset.SubDataset[source]¶
Parameters:
  • dataset (Dataset) – DriftAI dataset which the current subdataset inherits from
  • method (str) – Evaluation sets split approach. Can be: train_test, k_fold
  • by (float, int, optional) – If train_test method is specified, by represents the traininig set size. For example: .85 If k_fold method is specified, by is the number of folds
  • indices (dict) –

    Contains the number of sets and the indices of each set:

    {
        "method": str
        "indices:" {
            "train": list of int
            "test": list of int
        }
    }
    
    Should not be set by the developer
    
  • id (str, optional) – Unique identifier
  • creation_date (str, datetime, optional) – Creation date of the subdataset. Should not be set manually
static collection()[source]¶

Get table containing subdatasets

Returns:
Return type:TinyDB instance
get_info()[source]¶

Get info to serialize a SubDataset instance

Returns:Contains subdataset essential information:
{
    "dataset": str, parent dataset path,
    "creation_date": str, Subdataset creation date,
    "id": str,
    "indices": dict, structure specified at the costructor parameters documentation,
    "path": str, subdataset path
}
Return type:dict
get_test_data(subset)[source]¶

Get the test data of a subset

Parameters:subset (str) – subset identifier
Returns:Containing all instances which belog to test set with its label:
{
    "X": list,
    "y": list
}
Return type:dict
get_test_labels(subset)[source]¶

Get the labels of test set of an specific subset

Parameters:subset (str) – subset identifier
Returns:Ground truths of subset’s test data
Return type:list
get_train_data(subset)[source]¶

Get the training data of a subset

Parameters:subset (str) – subset identifier
Returns:Containing each training set instance with its label:
{
    "X": list,
    "y": list
}
Return type:dict
get_train_labels(subset)[source]¶

Get the labels of training set of an specific subset

Parameters:subset (str) – subset identifier
Returns:Ground truths of subset’s training data
Return type:list
id¶

Get the unique identifier of the Persistent instance

Returns:Unique identifier
Return type:str
classmethod load_from_data(data)[source]¶

Loads a subdataset from data coming from TinyDB

Parameters:data (dict) – JSON data
Raises:OptAppSubDatasetInfoFileWrongStructureException – If data has worng keys
Returns:New SubDataset instance
Return type:driftai.SubDataset

Table of Contents

  • Data Related Objects
    • Datasources
    • Dataset
    • SubDataset

Previous topic

driftai package

Next topic

Parameters Objects

This Page

  • Show Source

Quick search

Navigation

  • index
  • modules |
  • next |
  • previous |
  • DriftAI documentation »
  • driftai package »
© Copyright 2019, Francesc Guitart Bravo, Guillem Orellana Trullols. Created using Sphinx 1.8.4.