支持算子

当前Paddle-Lite共计支持算子204个,其中基础算子78个,附加算子127个。

基础算子

默认编译的算子,共计78个。

Host端Kernel是算子在任意CPU上纯C/C++的具体实现,具有可移植性强的特点,因此,它一般作为各特定平台算子实现的补充。

举例PaddleLite在ARM上部署模型,如果模型中某个算子没有ARM端Kernel,但是有Host端Kerenel,那么模型优化阶段该算子会选择Host端Kerenel,该模型还是可以顺利部署。

OP NameHostX86CUDAARMOpenCLFPGA华为NPU百度XPU瑞芯微NPU联发科APU颖脉NNA
affine_channel   Y       
affine_grid   Y       
arg_max   Y       
assign_value  YY       
batch_norm Y Y  YYY  
bilinear_interp  YYY Y    
box_coder   YY      
calib  YY Y     
cast Y Y   Y   
concat YYYY Y YY 
conv2d YYYYYYYYY Y
conv2d_transpose   Y  Y  Y 
density_prior_box   Y       
depthwise_conv2d YYYYYYYYY Y
depthwise_conv2d_transpose           
dropout YYYYYYY   
elementwise_add YYYYYYYYY 
elementwise_div   Y  Y Y  
elementwise_max   Y       
elementwise_mod   Y       
elementwise_mul YYYYYY YY 
elementwise_pow           
elementwise_sub YYYY Y Y  
elu   Y       
expandY   Y Y    
expand_asY          
fc YYYYYY YY Y
feedY Y  Y     
fetchY    Y     
fill_constantY          
fill_constant_batch_size_likeYY         
flattenY   Y   Y  
flatten2Y   Y   Y  
fusion_elementwise_add_activation  YYYYY  Y 
fusion_elementwise_div_activation   Y  Y    
fusion_elementwise_max_activation   Y       
fusion_elementwise_mul_activation  YY  Y    
fusion_elementwise_sub_activation  YYY Y    
grid_sampler   YY      
instance_norm   YY Y    
io_copy  Y YY     
io_copy_once  Y YY     
layout  YYYY     
leaky_relu YYYY Y    
matmul YYY  YY   
mul YYY  YY   
multiclass_nmsY    Y     
multiclass_nms2Y          
nearest_interp  YYY Y    
pad2d   YY Y Y  
pool2d YYYYYYYYY Y
prelu   Y       
prior_box   Y Y     
range   Y       
reduce_mean   Y  Y    
relu YYYY Y YY Y
relu6   YY Y Y  
reshapeYY  Y YY   
reshape2YY  Y YYY  
scale YYYYYYYY  
search_fc YY        
sequence_topk_avg_pooling YY        
shuffle_channel   Y  Y    
sigmoid YYYY Y Y  
slice Y YY  Y   
softmax YYY  YYYY 
split   Y  Y    
squeezeY          
squeeze2Y          
stack Y Y   Y   
subgraph      YYYY 
tanh YYYY YY   
thresholded_relu   Y  Y    
transpose YYYY YY   
transpose2 YYYY YYY  
unsqueezeY     Y    
unsqueeze2Y     Y    
yolo_box  YY   Y   

附加算子

附加算子共计127个,需要在编译时打开--build_extra=ON开关才会编译,具体请参考参数详情

OP NameHostX86CUDAARMOpenCLFPGA华为NPU百度XPU瑞芯微NPU联发科APU
abs  YY      
anchor_generator   Y      
assignY         
attention_padding_mask          
axpy   Y      
beam_search_decode   Y      
beam_search_decode   Y      
box_clip   Y      
calib_once  YY Y    
clip   Y      
collect_fpn_proposals   Y      
conditional_blockY         
crf_decodingY         
crop   Y      
ctc_alignY         
decode_bboxes   Y      
deformable_conv   Y      
distribute_fpn_proposals   Y      
equalY         
exp   YY     
fake_channel_wise_dequantize_max_abs          
fake_dequantize_max_abs          
fake_quantize_abs_max          
fake_quantize_dequantize_abs_max          
fake_quantize_dequantize_moving_average_abs_max          
fake_quantize_moving_average_abs_max          
fake_quantize_range_abs_max          
floor   Y      
gather Y Y   Y  
gelu Y        
generate_proposals   Y      
greater_equalY         
greater_thanY         
group_norm   Y      
gru YYY Y    
gru_unit   Y      
hard_sigmoid   YY Y   
hard_swish   Y      
im2sequence   Y      
increment   Y  Y   
is_emptyY         
layer_norm Y Y  YY  
layout_once  YY Y    
less_equalY         
less_thanY     Y   
lod_reset   Y      
log   Y  Y   
logical_andY         
logical_notY         
logical_orY         
logical_xorY         
lookup_table YYY   Y  
lookup_table_dequant   Y      
lookup_table_v2 YYY      
lrn   YY     
lstm   Y      
match_matrix_tensor YY       
max_pool2d_with_index          
mean   Y      
merge_lod_tensor   Y      
negative   Y      
norm   Y Y    
not_equalY         
one_hotY         
pixel_shuffleY  YY     
pow   Y      
printY         
read_from_arrayY         
reciprocal   Y      
reduce_max   Y      
reduce_prod   Y      
reduce_sum Y     Y  
relu_clipped   Y  Y   
retinanet_detection_outputY         
roi_align   Y      
rsqrt   Y      
search_aligned_mat_mul YY       
search_attention_padding_mask YY       
search_grnn YY       
search_group_padding YY       
search_seq_arithmetic YY       
search_seq_depadding YY       
search_seq_fc YY       
search_seq_softmax YY       
sequence_arithmetic YY       
sequence_concat YY       
sequence_conv Y Y      
sequence_expand   Y      
sequence_expand_as Y        
sequence_mask  Y       
sequence_pad  Y       
sequence_pool YYY      
sequence_pool_concat  Y       
sequence_reshape Y        
sequence_reverse YY       
sequence_reverse_embedding  Y       
sequence_softmax   Y      
sequence_unpadY Y       
shapeYY        
sign          
softsign Y    Y   
split_lod_tensor   Y      
sqrt      Y   
square Y Y  Y   
swish   YY     
top_k   Y      
topk_pooling  Y       
uniform_random          
var_conv_2d YY       
where_indexY         
whileY         
write_to_arrayY         
xpuconv2d       Y  
xpuembedding_with_eltwise_add       Y  
xpufc       Y  
xpummdnn_bid_emb_att       Y  
xpummdnn_bid_emb_grnn_att       Y  
xpummdnn_bid_emb_grnn_att2       Y  
xpummdnn_match_conv_topk       Y  
xpummdnn_merge_all       Y  
xpummdnn_search_attention       Y  
xpumulti_encoder       Y  
xpuresnet_cbam       Y  
xpuresnet50       Y  
xpusfa_head       Y  
matrix_nmsY