mindspore.mindrecord

mindspore.mindrecord

Introduction to mindrecord:

Mindrecord is a module to implement reading, writing, search andconverting for MindSpore format dataset. Users could load(modify)mindrecord data through FileReader(FileWriter). Users could alsoconvert other format dataset to mindrecord data throughcorresponding sub-module.

class mindspore.mindrecord.FileWriter(file_name, shard_num=1)[source]
Class to write user defined raw data into MindRecord File series.
- Parameters
- - file_name (str) – File name of MindRecord File.
  - shard_num (int, __optional) – Number of MindRecord File (default=1).It should be between [1, 1000].
- Raises
- ParamValueError – If file_name or shard_num is invalid.
- addindex(_index_fields)[source]
- Select index fields from schema to accelerate reading.
  - Parameters
  - index_fields (list[str]) – Fields would be set as index which should be primitive type.
  - Returns
  - MSRStatus, SUCCESS or FAILED.
  - Raises
  - - ParamTypeError – If index field is invalid.
    - MRMDefineIndexError – If index field is not primitive type.
    - MRMAddIndexError – If failed to add index field.
- addschema(_content, desc=None)[source]
- Returns a schema id if added schema successfully, or raise exception.
  - Parameters
  - - content (dict) – Dict of user defined schema.
    - desc (str, __optional) – String of schema description (default=None).
  - Returns
  - int, schema id.
  - Raises
  - - MRMInvalidSchemaError – If schema is invalid.
    - MRMBuildSchemaError – If failed to build schema.
    - MRMAddSchemaError – If failed to add schema.
- commit()[source]
- Flush data to disk and generate the correspond db files.
  - Returns
  - MSRStatus, SUCCESS or FAILED.
  - Raises
  - - MRMOpenError – If failed to open MindRecord File.
    - MRMSetHeaderError – If failed to set header.
    - MRMIndexGeneratorError – If failed to create index generator.
    - MRMGenerateIndexError – If failed to write to database.
    - MRMCommitError – If failed to flush data to disk.
- classmethod openfor_append(_file_name)[source]
- Open MindRecord file and get ready to append data.
  - Parameters
  - file_name (str) – String of MindRecord file name.
  - Returns
  - Instance of FileWriter.
  - Raises
  - - ParamValueError – If file_name is invalid.
    - FileNameError – If path contains invalid character.
    - MRMOpenError – If failed to open MindRecord File.
    - MRMOpenForAppendError – If failed to open file for appending data.
- setheader_size(_header_size)[source]
- Set the size of header.
  - Parameters
  - header_size (int) – Size of header, between 16KB and 128MB.
  - Returns
  - MSRStatus, SUCCESS or FAILED.
  - Raises
  - MRMInvalidHeaderSizeError – If failed to set header size.
- setpage_size(_page_size)[source]
- Set the size of Page.
  - Parameters
  - page_size (int) – Size of page, between 32KB and 256MB.
  - Returns
  - MSRStatus, SUCCESS or FAILED.
  - Raises
  - MRMInvalidPageSizeError – If failed to set page size.
- writeraw_data(_raw_data, validate=True)[source]
- Write raw data and generate sequential pair of MindRecord File.
  - Parameters
  - - raw_data (list[dict]) – List of raw data.
    - validate (bool, __optional) – Validate data according schema if it equals to True,or validate data according to blob fields (default=True).
  - Raises
  - - ParamTypeError – If index field is invalid.
    - MRMOpenError – If failed to open MindRecord File.
    - MRMValidateDataError – If data does not match blob fields.
    - MRMSetHeaderError – If failed to set header.
    - MRMWriteDatasetError – If failed to write dataset.

class mindspore.mindrecord.FileReader(file_name, num_consumer=4, columns=None, operator=None)[source]
Class to read MindRecord File series.
- Parameters
- - file_name (str) – File name of MindRecord File.
  - num_consumer (int, __optional) – Number of consumer threads which load data to memory (default=4).It should not be smaller than 1 or larger than the number of CPU.
  - columns (list[str], optional) – List of fields which correspond data would be read (default=None).
  - operator (int, __optional) – Reserved parameter for operators (default=None).
- Raises
- ParamValueError – If file_name, num_consumer or columns is invalid.
- close()[source]
- Stop reader worker and close File.
- finish()[source]
- Stop reader worker.
  - Raises
  - MRMFinishError – If failed to finish worker threads.
- get_next()[source]
- Yield a batch of data according to columns at a time.
  - Yields
  - dict – keys is the same as columns.
  - Raises
  - MRMUnsupportedSchemaError – If schema is invalid.

class mindspore.mindrecord.MindPage(file_name, num_consumer=4)[source]
Class to read MindRecord File series in pagination.
- Parameters
- - file_name (str) – File name of MindRecord File.
  - num_consumer (int, __optional) – Number of consumer threads which load data to memory (default=4).It should not be smaller than 1 or larger than the number of CPU.
- Raises
- - ParamValueError – If file_name, num_consumer or columns is invalid.
  - MRMInitSegmentError – If failed to initialize ShardSegment.
- property candidate_fields
- Return candidate category fields.
  - Returns
  - list[str], by which data could be grouped.
- property category_field
- Getter function for category field
- get_category_fields()[source]
- Return candidate category fields.
- readat_page_by_id(_category_id, page, num_row)[source]
- Query by category id in pagination.
  - Parameters
  - - category_id (int) – Category id, referred to the return of read_category_info.
    - page (int) – Index of page.
    - num_row (int) – Number of rows in a page.
  - Returns
  - List, list[dict].
  - Raises
  - - ParamValueError – If any parameter is invalid.
    - MRMFetchDataError – If failed to read by category id.
    - MRMUnsupportedSchemaError – If schema is invalid.
- readat_page_by_name(_category_name, page, num_row)[source]
- Query by category name in pagination.
  - Parameters
  - - category_name (str) – String of category field’s value,referred to the return of read_category_info.
    - page (int) – Index of page.
    - num_row (int) – Number of row in a page.
  - Returns
  - str, read at page.
- read_category_info()[source]
- Return category information when data is grouped by indicated category field.
  - Returns
  - str, description of group information.
  - Raises
  - MRMReadCategoryInfoError – If failed to read category information.
- setcategory_field(_category_field)[source]
- Set category field for reading.

Note

Should be a candidate category field.

- Parameters
-

category_field (str) – String of category field name.

- Returns
-

MSRStatus, SUCCESS or FAILED.

class mindspore.mindrecord.Cifar10ToMR(source, destination)[source]
Class is for transformation from cifar10 to MindRecord.
- Parameters
- - source (str) – the cifar10 directory to be transformed.
  - destination (str) – the MindRecord file path to transform into.
- Raises
- ValueError – If source or destination is invalid.
- transform(fields=None)[source]
- Executes transformation from cifar10 to MindRecord.
  - Parameters
  - fields (list[str], optional) – list of index fields, ie. [“label”] (default=None).
  - Returns
  - SUCCESS/FAILED, whether successfully written into MindRecord.

class mindspore.mindrecord.Cifar100ToMR(source, destination)[source]
Class is for transformation from cifar100 to MindRecord.
- Parameters
- - source (str) – the cifar100 directory to be transformed.
  - destination (str) – the MindRecord file path to transform into.
- Raises
- ValueError – If source or destination is invalid.
- transform(fields=None)[source]
- Executes transformation from cifar100 to MindRecord.
  - Parameters
  - fields (list[str]) – list of index field, ie. [“fine_label”, “coarse_label”].
  - Returns
  - SUCCESS/FAILED, whether successfully written into MindRecord.

class mindspore.mindrecord.ImageNetToMR(map_file, image_dir, destination, partition_number=1)[source]
Class is for transformation from imagenet to MindRecord.
- Parameters
- - map_file (str) –

the map file which indicate label.the map file content should like this:

Copyn02119789 1 pen
n02100735 2 notebook
n02110185 3 mouse
n02096294 4 orange

image_dir (str) – image directory contains n02119789, n02100735, n02110185, n02096294 dir.

destination (str) – the MindRecord file path to transform into.

partition_number (int, __optional) – partition size (default=1).

Raises
ValueError – If map_file, image_dir or destination is invalid.
transform()[source]
Executes transformation from imagenet to MindRecord.
- Returns
- SUCCESS/FAILED, whether successfully written into MindRecord.

class mindspore.mindrecord.MnistToMR(source, destination, partition_number=1)[source]
Class is for transformation from Mnist to MindRecord.
- Parameters
- - source (str) – directory which contain t10k-images-idx3-ubyte.gz,train-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz,train-labels-idx1-ubyte.gz.
  - destination (str) – the MindRecord file directory to transform into.
  - partition_number (int, __optional) – partition size (default=1).
- Raises
- ValueError – If source/destination/partition_number is invalid.
- transform()[source]
- Executes transformation from Mnist to MindRecord.
  - Returns
  - SUCCESS/FAILED, whether successfully written into MindRecord.