mindspore.mindrecord

Introduction to mindrecord:

Mindrecord is a module to implement reading, writing, search andconverting for MindSpore format dataset. Users could load(modify)mindrecord data through FileReader(FileWriter). Users could alsoconvert other format dataset to mindrecord data throughcorresponding sub-module.

  • class mindspore.mindrecord.FileWriter(file_name, shard_num=1)[source]
  • Class to write user defined raw data into MindRecord File series.

    • Parameters
      • file_name (str) – File name of MindRecord File.

      • shard_num (int, __optional) – Number of MindRecord File (default=1).It should be between [1, 1000].

    • Raises

    • ParamValueError – If file_name or shard_num is invalid.

    • addindex(_index_fields)[source]

    • Select index fields from schema to accelerate reading.

      • Parameters
      • index_fields (list[str]) – Fields would be set as index which should be primitive type.

      • Returns

      • MSRStatus, SUCCESS or FAILED.

      • Raises

        • ParamTypeError – If index field is invalid.

        • MRMDefineIndexError – If index field is not primitive type.

        • MRMAddIndexError – If failed to add index field.

    • addschema(_content, desc=None)[source]

    • Returns a schema id if added schema successfully, or raise exception.

      • Parameters
        • content (dict) – Dict of user defined schema.

        • desc (str, __optional) – String of schema description (default=None).

      • Returns

      • int, schema id.

      • Raises

        • MRMInvalidSchemaError – If schema is invalid.

        • MRMBuildSchemaError – If failed to build schema.

        • MRMAddSchemaError – If failed to add schema.

    • commit()[source]

    • Flush data to disk and generate the correspond db files.

      • Returns
      • MSRStatus, SUCCESS or FAILED.

      • Raises

        • MRMOpenError – If failed to open MindRecord File.

        • MRMSetHeaderError – If failed to set header.

        • MRMIndexGeneratorError – If failed to create index generator.

        • MRMGenerateIndexError – If failed to write to database.

        • MRMCommitError – If failed to flush data to disk.

    • classmethod openfor_append(_file_name)[source]

    • Open MindRecord file and get ready to append data.

      • Parameters
      • file_name (str) – String of MindRecord file name.

      • Returns

      • Instance of FileWriter.

      • Raises

        • ParamValueError – If file_name is invalid.

        • FileNameError – If path contains invalid character.

        • MRMOpenError – If failed to open MindRecord File.

        • MRMOpenForAppendError – If failed to open file for appending data.

    • setheader_size(_header_size)[source]

    • Set the size of header.

      • Parameters
      • header_size (int) – Size of header, between 16KB and 128MB.

      • Returns

      • MSRStatus, SUCCESS or FAILED.

      • Raises

      • MRMInvalidHeaderSizeError – If failed to set header size.
    • setpage_size(_page_size)[source]

    • Set the size of Page.

      • Parameters
      • page_size (int) – Size of page, between 32KB and 256MB.

      • Returns

      • MSRStatus, SUCCESS or FAILED.

      • Raises

      • MRMInvalidPageSizeError – If failed to set page size.
    • writeraw_data(_raw_data, validate=True)[source]

    • Write raw data and generate sequential pair of MindRecord File.

      • Parameters
        • raw_data (list[dict]) – List of raw data.

        • validate (bool, __optional) – Validate data according schema if it equals to True,or validate data according to blob fields (default=True).

      • Raises

        • ParamTypeError – If index field is invalid.

        • MRMOpenError – If failed to open MindRecord File.

        • MRMValidateDataError – If data does not match blob fields.

        • MRMSetHeaderError – If failed to set header.

        • MRMWriteDatasetError – If failed to write dataset.

  • class mindspore.mindrecord.FileReader(file_name, num_consumer=4, columns=None, operator=None)[source]
  • Class to read MindRecord File series.

    • Parameters
      • file_name (str) – File name of MindRecord File.

      • num_consumer (int, __optional) – Number of consumer threads which load data to memory (default=4).It should not be smaller than 1 or larger than the number of CPU.

      • columns (list[str], optional) – List of fields which correspond data would be read (default=None).

      • operator (int, __optional) – Reserved parameter for operators (default=None).

    • Raises

    • ParamValueError – If file_name, num_consumer or columns is invalid.

    • close()[source]

    • Stop reader worker and close File.

    • finish()[source]

    • Stop reader worker.

      • Raises
      • MRMFinishError – If failed to finish worker threads.
    • get_next()[source]

    • Yield a batch of data according to columns at a time.

      • Yields
      • dict – keys is the same as columns.

      • Raises

      • MRMUnsupportedSchemaError – If schema is invalid.
  • class mindspore.mindrecord.MindPage(file_name, num_consumer=4)[source]
  • Class to read MindRecord File series in pagination.

    • Parameters
      • file_name (str) – File name of MindRecord File.

      • num_consumer (int, __optional) – Number of consumer threads which load data to memory (default=4).It should not be smaller than 1 or larger than the number of CPU.

    • Raises

      • ParamValueError – If file_name, num_consumer or columns is invalid.

      • MRMInitSegmentError – If failed to initialize ShardSegment.

    • property candidate_fields

    • Return candidate category fields.

      • Returns
      • list[str], by which data could be grouped.
    • property category_field

    • Getter function for category field

    • get_category_fields()[source]

    • Return candidate category fields.

    • readat_page_by_id(_category_id, page, num_row)[source]

    • Query by category id in pagination.

      • Parameters
        • category_id (int) – Category id, referred to the return of read_category_info.

        • page (int) – Index of page.

        • num_row (int) – Number of rows in a page.

      • Returns

      • List, list[dict].

      • Raises

        • ParamValueError – If any parameter is invalid.

        • MRMFetchDataError – If failed to read by category id.

        • MRMUnsupportedSchemaError – If schema is invalid.

    • readat_page_by_name(_category_name, page, num_row)[source]

    • Query by category name in pagination.

      • Parameters
        • category_name (str) – String of category field’s value,referred to the return of read_category_info.

        • page (int) – Index of page.

        • num_row (int) – Number of row in a page.

      • Returns

      • str, read at page.
    • read_category_info()[source]

    • Return category information when data is grouped by indicated category field.

      • Returns
      • str, description of group information.

      • Raises

      • MRMReadCategoryInfoError – If failed to read category information.
    • setcategory_field(_category_field)[source]

    • Set category field for reading.

Note

Should be a candidate category field.

  1. - Parameters
  2. -

category_field (str) – String of category field name.

  1. - Returns
  2. -

MSRStatus, SUCCESS or FAILED.

  • class mindspore.mindrecord.Cifar10ToMR(source, destination)[source]
  • Class is for transformation from cifar10 to MindRecord.

    • Parameters
      • source (str) – the cifar10 directory to be transformed.

      • destination (str) – the MindRecord file path to transform into.

    • Raises

    • ValueError – If source or destination is invalid.

    • transform(fields=None)[source]

    • Executes transformation from cifar10 to MindRecord.

      • Parameters
      • fields (list[str], optional) – list of index fields, ie. [“label”] (default=None).

      • Returns

      • SUCCESS/FAILED, whether successfully written into MindRecord.
  • class mindspore.mindrecord.Cifar100ToMR(source, destination)[source]
  • Class is for transformation from cifar100 to MindRecord.

    • Parameters
      • source (str) – the cifar100 directory to be transformed.

      • destination (str) – the MindRecord file path to transform into.

    • Raises

    • ValueError – If source or destination is invalid.

    • transform(fields=None)[source]

    • Executes transformation from cifar100 to MindRecord.

      • Parameters
      • fields (list[str]) – list of index field, ie. [“fine_label”, “coarse_label”].

      • Returns

      • SUCCESS/FAILED, whether successfully written into MindRecord.
  • class mindspore.mindrecord.ImageNetToMR(map_file, image_dir, destination, partition_number=1)[source]
  • Class is for transformation from imagenet to MindRecord.

    • Parameters
      • map_file (str) –

the map file which indicate label.the map file content should like this:

  1. Copyn02119789 1 pen
  2. n02100735 2 notebook
  3. n02110185 3 mouse
  4. n02096294 4 orange
  1. -

image_dir (str) – image directory contains n02119789, n02100735, n02110185, n02096294 dir.

  1. -

destination (str) – the MindRecord file path to transform into.

  1. -

partition_number (int, __optional) – partition size (default=1).

  • Raises
  • ValueError – If map_file, image_dir or destination is invalid.

  • transform()[source]

  • Executes transformation from imagenet to MindRecord.

    • Returns
    • SUCCESS/FAILED, whether successfully written into MindRecord.
  • class mindspore.mindrecord.MnistToMR(source, destination, partition_number=1)[source]
  • Class is for transformation from Mnist to MindRecord.

    • Parameters
      • source (str) – directory which contain t10k-images-idx3-ubyte.gz,train-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz,train-labels-idx1-ubyte.gz.

      • destination (str) – the MindRecord file directory to transform into.

      • partition_number (int, __optional) – partition size (default=1).

    • Raises

    • ValueError – If source/destination/partition_number is invalid.

    • transform()[source]

    • Executes transformation from Mnist to MindRecord.

      • Returns
      • SUCCESS/FAILED, whether successfully written into MindRecord.