FACS Manual Coding vs. Automated AU Detection Tools: Comparison
Manual FACS (certified human coders using the 2002 Ekman/Friesen/Hager manual) remains the gold standard for precision and completeness. Automated tools scale efficiently but trade off accuracy, coverage, and nuance.
Key Comparison Table
AspectManual FACS (Certified Coders)Automated AU Tools (e.g., OpenFace, FaceReader, etc.)Accuracy / ReliabilityHigh (inter-coder agreement 0.70 required for certification; often 0.80–0.90 with experts). Objective but subjective judgment involved.Variable: AUC 0.65–0.81 across systems. Strong on common AUs (e.g., AU12 smile); weaker on subtle/rare ones. Often below certified human levels. AU CoverageFull ~44–46 AUs combinations, intensity (A–E), laterality, timing. Handles non-additive interactions .Limited (typically 17–20 AUs). Many AUs excluded; struggles with complex combinations.Speed & ScalabilityVery slow: 50–100 hours training; 50–60 minutes per minute of video.Real-time or near real-time (e.g., 30 FPS). Processes hours of video quickly.CostHigh (training, time, multiple coders for reliability).Lower ongoing (software licenses or open-source).Best ForResearch requiring precision (psychology, deception, clinical, subtle expressions).Large-scale screening, real-time apps, animation, initial analysis.StrengthsAnatomically grounded, context-aware, handles occlusion/pose variability with human judgment, full nuance.Objective, consistent within model, processes massive datasets, no fatigue.LimitationsTime-intensive, expensive, potential coder drift/bias (mitigated by certification).Sensitive to lighting, pose, occlusion, ethnicity/age diversity; lower sensitivity to subtle/intense variations; over-relies on posed data.ValidationGold standard; used to validate automated systems.Often validated against manual FACS but shows gaps (e.g., FaceReader ~0.67–0.81 agreement).
Popular Automated Tools
OpenFace (open-source, widely used): Good for research; dynamic models improve with video; strong on some AUs (e.g., AU6, AU12) but limited set and variable accuracy.
FaceReader (Noldus): Commercial; higher landmark count (468 vs. OpenFace’s 67); better FACS agreement in some validations (~0.70 ); marketed for reliability but still not equivalent to certified humans.
Others: Affectiva, AFAR, Py-Feat, RealEye, emerging AI models. Performance varies by dataset (posed > spontaneous).
Performance Insights (from studies)
Automated systems excel on common expressions (happiness, surprise) but lag on negative/subtle ones and in naturalistic settings (e.g., low detection rates ~25% in some real-world videos due to occlusion/lighting).
They often outperform on speed but underperform on comprehensive encoding compared to manual coding.
Hybrids (automated pre-screening human verification) are increasingly common for best results.
Bottom Line: Use manual FACS when scientific rigor or subtle details matter (aligns with experts like Ekman, Rosenberg, Fonagy, etc., in your chart). Use automated tools for volume, prototyping, or when perfect precision isn’t critical. Many researchers combine both.