Performance of Prediction Library

Result

Method Backbone Size Network Resolution Operator API / FPS Stream API / FPS Other Framework / FPS Batch Size
OpenPose COCO 209.3MB 656 x 368 19.78 27.32 8 (OpenPose) 8
Tiny VGG + PAF 34.7 MB 384 x 256 66.62 124.925 / 8
MobileNet + PAF 17.9 MB 432 x 368 50.89 84.32 / 8
ResNet50 + PAF 45.0 MB 432 x 368 50.89 84.32 8.5 (TF-Pose) 8
ResNet18 + Pose Proposal 50.3 MB 384 x 384 212.42 349.17 / 64

Environment: System@Ubuntu18.04, GPU@1070Ti, CPU@i7(12 logic cores).

Tested Video Source: Crazy Updown Funk(resolution@640x360, frame_count@7458, source@YouTube)

OpenPose performance is not tested with batch processing as it seems not to be implemented. (see here)

Suggestions

  • PAF post processing is slow. Batch processing will not accelerate PAF and will bring little improvement in the speed.

  • And Pose Proposal post processing is fast(over 8k FPS in single core). So any optimization(e.g. batch processing) in DNN inference will be remarkable for the throughput of the pipeline. For example, using batch size 8 we got 164 FPS, using batch size 64 we got 349 FPS, and using batch size 128 we got 383 FPS.