TimeSeriesPrediction
Introduction for TimeSeriesPrediction
Knowing the future makes things easier for us.
Many businesses are naturally cyclical in time series, especially for those that directly or indirectly serve “people”. This periodicity is determined by the regularity of people’s daily activities. For example, people are accustomed to ordering take-out at noon and in the evenings; there are always traffic peaks in the morning and evening; even for services that don’t have such obvious patterns, such as searching, the amount of requests at night is much lower than that during business hours. For applications related to this kind of business, it is a natural idea to infer the next day’s metrics from the historical data of the past few days, or to infer the coming Monday’s access traffic from the data of last Monday. With predicted metrics or traffic patterns in the next 24 hours, we can better manage our application instances, stabilize our system, and meanwhile, reduce the cost.
TimeSeriesPrediction is used to forecast the kubernetes object metric. It is based on PredictionCore to do forecast.
Features
A TimeSeriesPrediction sample yaml looks like below:
apiVersion: prediction.crane.io/v1alpha1
kind: TimeSeriesPrediction
metadata:
name: node-resource-percentile
namespace: default
spec:
targetRef:
kind: Node
name: 192.168.56.166
predictionWindowSeconds: 600
predictionMetrics:
- resourceIdentifier: node-cpu
type: ResourceQuery
resourceQuery: cpu
algorithm:
algorithmType: "percentile"
percentile:
sampleInterval: "1m"
minSampleWeight: "1.0"
histogram:
maxValue: "10000.0"
epsilon: "1e-10"
halfLife: "12h"
bucketSize: "10"
firstBucketSize: "40"
bucketSizeGrowthRatio: "1.5"
- resourceIdentifier: node-mem
type: ResourceQuery
resourceQuery: memory
algorithm:
algorithmType: "percentile"
percentile:
sampleInterval: "1m"
minSampleWeight: "1.0"
histogram:
maxValue: "1000000.0"
epsilon: "1e-10"
halfLife: "12h"
bucketSize: "10"
firstBucketSize: "40"
bucketSizeGrowthRatio: "1.5"
- spec.targetRef defines the reference to the kubernetes object including Node or other workload such as Deployment.
- spec.predictionMetrics defines the metrics about the spec.targetRef.
- spec.predictionWindowSeconds is a prediction time series duration. the TimeSeriesPredictionController will rotate the predicted data in spec.Status for consumer to consume the predicted time series data.
PredictionMetrics
apiVersion: prediction.crane.io/v1alpha1
kind: TimeSeriesPrediction
metadata:
name: node-resource-percentile
namespace: default
spec:
predictionMetrics:
- resourceIdentifier: node-cpu
type: ResourceQuery
resourceQuery: cpu
algorithm:
algorithmType: "percentile"
percentile:
sampleInterval: "1m"
minSampleWeight: "1.0"
histogram:
maxValue: "10000.0"
epsilon: "1e-10"
halfLife: "12h"
bucketSize: "10"
firstBucketSize: "40"
bucketSizeGrowthRatio: "1.5"
MetricType
There are three types of the metric query:
ResourceQuery
is a kubernetes built-in resource metric such as cpu or memory. crane supports only cpu and memory now.RawQuery
is a query by DSL, such as prometheus query language. now support prometheus.ExpressionQuery
is a query by Expression selector.
Now we only support prometheus as data source. We define the MetricType
to orthogonal with the datasource. but now maybe some datasources do not support the metricType.
Algorithm
Algorithm
define the algorithm type and params to do predict for the metric. Now there are two kinds of algorithms:
dsp
is an algorithm to forcasting a time series, it is based on FFT(Fast Fourier Transform), it is good at predicting some time series with seasonality and periods.percentile
is an algorithm to estimate a time series, and find a recommended value to represent the past time series, it is based on exponentially-decaying weights historgram statistics. it is used to estimate a time series, it is not good at to predict a time sequences, although the percentile can output a time series predicted data, but it is all the same value. so if you want to predict a time sequences, dsp is a better choice.