mindarmour.detectors

This module includes detector methods on distinguishing adversarial examplesfrom benign examples.

  • class mindarmour.detectors.ErrorBasedDetector(auto_encoder, false_positive_rate=0.01, bounds=(0.0, 1.0))[source]
  • The detector reconstructs input samples, measures reconstruction errors andrejects samples with large reconstruction errors.

Reference: MagNet: a Two-Pronged Defense against Adversarial Examples,by Dongyu Meng and Hao Chen, at CCS 2017.

  • Parameters
    • auto_encoder (Model) – An (trained) auto encoder whichrepresents the input by reduced encoding.

    • false_positive_rate (float) – Detector’s false positive rate.Default: 0.01.

    • bounds (tuple) – (clip_min, clip_max). Default: (0.0, 1.0).

  • detect(inputs)[source]

  • Detect if input samples are adversarial or not.

    • Parameters
    • inputs (numpy.ndarray) – Suspicious samples to be judged.

    • Returns

    • list[int], whether a sample is adversarial. if res[i]=1, then theinput sample with index i is adversarial.
  • detectdiff(_inputs)[source]

  • Detect the distance between the original samples and reconstructed samples.

    • Parameters
    • inputs (numpy.ndarray) – Input samples.

    • Returns

    • float, the distance between reconstructed and original samples.
  • fit(inputs, labels=None)[source]

  • Find a threshold for a given dataset to distinguish adversarial examples.

    • Parameters
    • Returns

    • float, threshold to distinguish adversarial samples from benign ones.
  • setthreshold(_threshold)[source]

  • Set the parameters threshold.

    • Parameters
    • threshold (float) – Detection threshold. Default: None.
  • transform(inputs)[source]

  • Reconstruct input samples.

    • Parameters
    • inputs (numpy.ndarray) – Input samples.

    • Returns

    • numpy.ndarray, reconstructed images.
  • class mindarmour.detectors.DivergenceBasedDetector(auto_encoder, model, option='jsd', t=1, bounds=(0.0, 1.0))[source]
  • This class implement a divergence-based detector.

Reference: MagNet: a Two-Pronged Defense against Adversarial Examples,by Dongyu Meng and Hao Chen, at CCS 2017.

  • Parameters
    • auto_encoder (Model) – Encoder model.

    • model (Model) – Targeted model.

    • option (str) – Method used to calculate Divergence. Default: “jsd”.

    • t (int) – Temperature used to overcome numerical problem. Default: 1.

    • bounds (tuple) – Upper and lower bounds of data.In form of (clip_min, clip_max). Default: (0.0, 1.0).

  • detectdiff(_inputs)[source]

  • Detect the distance between original samples and reconstructed samples.

The distance is calculated by JSD.

  1. - Parameters
  2. -

inputs (numpy.ndarray) – Input samples.

  1. - Returns
  2. -

float, the distance.

  1. - Raises
  2. -

NotImplementedError – If the param option is not supported.

  • class mindarmour.detectors.RegionBasedDetector(model, number_points=10, initial_radius=0.0, max_radius=1.0, search_step=0.01, degrade_limit=0.0, sparse=False)[source]
  • This class implement a region-based detector.

Reference: Mitigating evasion attacks to deep neural networks viaregion-based classification

  • Parameters
    • model (Model) – Target model.

    • number_points (int) – The number of samples generate from thehyper cube of original sample. Default: 10.

    • initial_radius (float) – Initial radius of hyper cube. Default: 0.0.

    • max_radius (float) – Maximum radius of hyper cube. Default: 1.0.

    • search_step (float) – Incremental during search of radius. Default: 0.01.

    • degrade_limit (float) – Acceptable decrease of classification accuracy.Default: 0.0.

    • sparse (bool) – If True, input labels are sparse-encoded. If False,input labels are one-hot-encoded. Default: False.

Examples

  1. Copy>>> detector = RegionBasedDetector(model)
  2. >>> detector.fit(Tensor(ori), Tensor(labels))
  3. >>> adv_ids = detector.detect(Tensor(adv))
  • detect(inputs)[source]
  • Tell whether input samples are adversarial or not.

    • Parameters
    • inputs (numpy.ndarray) – Suspicious samples to be judged.

    • Returns

    • list[int], whether a sample is adversarial. if res[i]=1, then theinput sample with index i is adversarial.
  • detectdiff(_inputs)[source]

  • Return raw prediction results and region-based prediction results.

    • Parameters
    • inputs (numpy.ndarray) – Input samples.

    • Returns

    • numpy.ndarray, raw prediction results and region-based prediction results of input samples.
  • fit(inputs, labels=None)[source]

  • Train detector to decide the best radius.

    • Parameters
    • Returns

    • float, the best radius.
  • setradius(_radius)[source]

  • Set radius.

  • transform(inputs)[source]

  • Generate hyper cube for input samples.

    • Parameters
    • inputs (numpy.ndarray) – Input samples.

    • Returns

    • numpy.ndarray, hyper cube corresponds to every sample.
  • class mindarmour.detectors.SpatialSmoothing(model, ksize=3, is_local_smooth=True, metric='l1', false_positive_ratio=0.05)[source]
  • Detect method based on spatial smoothing.

    • Parameters
      • model (Model) – Target model.

      • ksize (int) – Smooth window size. Default: 3.

      • is_local_smooth (bool) – If True, trigger local smooth. If False, nonelocal smooth. Default: True.

      • metric (str) – Distance method. Default: ‘l1’.

      • false_positive_ratio (float) – False positive rate overbenign samples. Default: 0.05.

Examples

  1. Copy>>> detector = SpatialSmoothing(model)
  2. >>> detector.fit(Tensor(ori), Tensor(labels))
  3. >>> adv_ids = detector.detect(Tensor(adv))
  • detect(inputs)[source]
  • Detect if an input sample is an adversarial example.

    • Parameters
    • inputs (numpy.ndarray) – Suspicious samples to be judged.

    • Returns

    • list[int], whether a sample is adversarial. if res[i]=1, then theinput sample with index i is adversarial.
  • detectdiff(_inputs)[source]

  • Return the raw distance value (before apply the threshold) betweenthe input sample and its smoothed counterpart.

    • Parameters
    • inputs (numpy.ndarray) – Suspicious samples to be judged.

    • Returns

    • float, distance.
  • fit(inputs, labels=None)[source]

  • Train detector to decide the threshold. The proper threshold makesure the actual false positive rate over benign sample is less thanthe given value.

    • Parameters
    • Returns

    • float, threshold, distance larger than which is reportedas positive, i.e. adversarial.
  • setthreshold(_threshold)[source]

  • Set the parameters threshold.

    • Parameters
    • threshold (float) – Detection threshold. Default: None.
  • class mindarmour.detectors.EnsembleDetector(detectors, policy='vote')[source]
  • Ensemble detector.

    • Parameters
      • detectors (Union__[tuple, list]) – List of detector methods.

      • policy (str) – Decision policy, could be ‘vote’, ‘all’ or ‘any’.Default: ‘vote’

    • detect(inputs)[source]

    • Detect adversarial examples from input samples.

      • Parameters
      • inputs (numpy.ndarray) – Input samples.

      • Returns

      • list[int], whether a sample is adversarial. if res[i]=1, then theinput sample with index i is adversarial.

      • Raises

      • ValueError – If policy is not supported.
    • detectdiff(_inputs)[source]

    • This method is not available in this class.

    • fit(inputs, labels=None)[source]

    • Fit detector like a machine learning model. This method is not availablein this class.

    • transform(inputs)[source]

    • Filter adversarial noises in input samples.This method is not available in this class.

  • class mindarmour.detectors.SimilarityDetector(trans_model, max_k_neighbor=1000, chunk_size=1000, max_buffer_size=10000, tuning=False, fpr=0.001)[source]
  • The detector measures similarity among adjacent queries and rejects querieswhich are remarkably similar to previous queries.

Reference: Stateful Detection of Black-Box Adversarial Attacks by StevenChen, Nicholas Carlini, and David Wagner. at arxiv 2019

  • Parameters
    • trans_model (Model) – A MindSpore model to encode input data into lowerdimension vector.

    • max_k_neighbor (int) – The maximum number of the nearest neighbors.Default: 1000.

    • chunk_size (int) – Buffer size. Default: 1000.

    • max_buffer_size (int) – Maximum buffer size. Default: 10000.

    • tuning (bool) – Calculate the average distance for the nearest kneighbours, if tuning is true, k=K. If False k=1,…,K.Default: False.

    • fpr (float) – False positive ratio on legitimate query sequences.Default: 0.001

Examples

  1. Copy>>> detector = SimilarityDetector(model)
  2. >>> detector.fit(Tensor(ori), Tensor(labels))
  3. >>> adv_ids = detector.detect(Tensor(adv))
  • clear_buffer()[source]
  • Clear the buffer memory.

  • detect(inputs)[source]

  • Process queries to detect black-box attack.

    • Parameters
    • inputs (numpy.ndarray) – Query sequence.

    • Raises

    • ValueError – The parameters of threshold or num_of_neighbors is not available.
  • detectdiff(_inputs)[source]

  • Detect adversarial samples from input samples, like the predict_probafunction in common machine learning model.

    • Parameters
    • inputs (Union__[numpy.ndarray, list, tuple]) – Data been used asreferences to create adversarial examples.

    • Raises

    • NotImplementedError – This function is not available in class SimilarityDetector.
  • fit(inputs, labels=None)[source]

  • Process input training data to calculate the threshold.A proper threshold should make sure the false positiverate is under a given value.

    • Parameters
    • Returns

  1. -

list[int], number of the nearest neighbors.

  1. -

list[float], calculated thresholds for different K.

  1. - Raises
  2. -

ValueError – The number of training data is less than max_k_neighbor!

  • get_detected_queries()[source]
  • Get the indexes of detected queries.

    • Returns
    • list[int], sequence number of detected malicious queries.
  • get_detection_interval()[source]

  • Get the interval between adjacent detections.

    • Returns
    • list[int], number of queries between adjacent detections.
  • setthreshold(_num_of_neighbors, threshold)[source]

  • Set the parameters num_of_neighbors and threshold.

    • Parameters
      • num_of_neighbors (int) – Number of the nearest neighbors.

      • threshold (float) – Detection threshold. Default: None.

  • transform(inputs)[source]

  • Filter adversarial noises in input samples.