Machine Learning Approach for Skill Evaluation in Robotic-Assisted Surgery
Evaluating surgeon skill has predominantly been a subjective task. Development of objective methods for surgical skill assessment are of increased interest. Recently, with technological advances such as robotic-assisted minimally invasive surgery (RMIS), new opportunities for objective and automated assessment frameworks have arisen. In this paper, we applied machine learning methods to automatically evaluate performance of the surgeon in RMIS. Six important movement features were used in the evaluation including completion time, path length, depth perception, speed, smoothness and curvature. Different classification methods applied to discriminate expert and novice surgeons. We test our method on real surgical data for suturing task and compare the classification result with the ground truth data (obtained by manual labeling). The experimental results show that the proposed framework can classify surgical skill level with relatively high accuracy of 85.7%. This study demonstrates the ability of machine learning methods to automatically classify expert and novice surgeons using movement features for different RMIS tasks. Due to the simplicity and generalizability of the introduced classification method, it is easy to implement in existing trainers. .
Skill assessment, Surgeon skill, Robotic-assisted surgery, Classification, Machine learning.
Despite advances in computer systems and simulation methods, today surgical training is still based on manual assessment involving significant expert monitoring [1, 2]. For many years, surgical skills have been learned in the operation room under direct supervision of expert surgeons . These methods are threatened with a lack of consistency, reliability and efficiency due to the subjective nature of expertsâ intervention . Subjective skill assessment methods are being surrendered for more structured techniques such as Objective Structured Assessment of Technical Skills (OSATS) . Using OSATS, an expert surgeon gives scores to surgical trainees based on predefined criteria such as flow of surgery, motion time and final product by observing the surgery in person or watching the recorded video of the operation.
The new technology innovations such as robotic-assisted minimally invasive surgery (RMIS), open great opportunities for automated objective skill assessment which was not available before . The potential of recording motion and video data, has been motivated for a new automatic RMIS skill assessment system [7, 8]. Current systems like da Vinci (Intuitive Surgical, Sunnyvale, CA)  record motion and video data, enabling development of computational models to analyze surgical skills through data-driven approaches . However, elaborating such models has always lagged behind. It is, however, quite clear that to develop any framework that automatically evaluates surgical skills, a more rigorous model of surgical procedures is needed .
A number of researchers developed skill assessment methods by decomposing a surgical tasks into pre-defined surgical gestures . Most existing work in this area uses statistical approaches such as Hidden Markov Model (HMM) [13, 14, 15] and descriptive curve coding (DCC) . Although these methods have the ability to find the underling structure of MIS/RMIS tasks, they are context-based and suffer from requiring large number of training samples and complex parameter tuning, causing in a lack of robustness in the results . On the other hand, most research in objective surgical skill assessment has been focusing entirely on motion features because of their simplicity in implementation and interpretation . Metrics such as operation time, speed, number of hand movements , force and torque signatures , path length and motion smoothness [17, 20] have been widely used to identify the relation between the features and surgical tools movement pattern of expert and novices during Laparoscopic surgery .
Although previous work built the foundation of objective surgical skill assessment, the current state of the art has a few shortcomings. First, they mostly focus on descriptive statistical methods to show the dependency of surgical skill level and GFMs. However, these measures alone are not an adequate proficiency measurement. More advanced techniques such as data mining and machine learning algorithms need to be applied . While machine learning techniques have been used extensively in other fields [23, 24, 25] because of their advantages over traditional statistical methods such as robustness, better prediction ability and higher tolerance violence of assumptions (e. c. normality or undependability of data) [26, 27], but it is only recently that these methods have been considered to analyze RMIS tasks [28, 29]. Additionally, human factor study should be developed to have better understanding of this aspect in surgical training [30, 31]. Thus, developing quantitative classification techniques that can automatically and accurately evaluate surgical skills needs to be investigated.
2 Surgical Skill Evaluation Framework
In this paper, we develop a predictive framework for objective skill assessment based on the trajectory movement of the surgical robot arms. For this, we quantify surgical task by extracting movement features from the raw motion data for suturing task. Different classifiers, including logistic regression and support vector machines have been applied. The classifier with the high accuracy can be used to automatically predict the skill level of surgeon.
2.1 Features Extraction
Surgical tasks have different characteristics, such as smoothness, straightness or response orientation, which account for competence while relying only on instrument motion. For instance, studies have shown that the tool motion of an experienced surgeon has more clearly defined features than that of a less experienced surgeon while performing the same task . Figures 1 illustrates the Cartesian position plots of an expert and a novice surgeon doing four throw suturing on the da Vinci surgical robot.
In order to transform these parameters into quantitative metrics, we applied kinematic analysis theory that has been successfully used in previous works to study psychomotor skills . Metrics such as task completion time, length of path, depth perception and velocity can show some aspect of surgeon’s dexterity. However, other aspects such as smoothness, curvature, torsion and complexity of the motion need to be quantified. In the following, we explain the six important features from the clinical point of view.
Time to Complete (TTC): is defined as total time required to complete the task, measured in seconds.
Path Length (PL): is the length of the curve described by the tip of the instrument while performing the task (in ). We calculate it using sum of all consecutive pairs’ Euclidean distance.
Depth Perception (DP): is the total distance traveled by the instrument along its axis (in cm).
Speed: can be defined as the magnitude of velocity and calculated as the rate of position change from previous time step as , where can be calculated as a Euclidean distance between point and of point (in ). Given that the time difference between two consecutive frames in our signal is constant, is equal to 1.
Motion Smoothness: is a measure of the rhythmic pattern of acceleration and deceleration. Smoothness has most often been based on minimizing jerk, the third time derivative of position, which represents a change in acceleration (in ).
Curvature: measures the straightness of the path and is calculated at each point by the following equation 
where and are instantaneous velocity and acceleration of the instrument tips respectively, which can be calculated directly by computing the first and second derivatives of the positions of the instrument tips. The curvature measures how fast a curve is changing direction at a given point. For straight and smooth movement, the mean of curvature is close to zero, while larger values indicate curved an jerky movements.
2.2 Surgical Skill Classification
Features that are extracted in the previous section are used to quantify the movement pattern of surgeons with different levels of dexterity. Our aim is to build a discriminative model to differentiate between surgeons with different levels of expertise while doing RMIS tasks. Surgeons are categorized into two skill levels, expert and novice. Thus, this is a binary classification problem that can be resolved by machine learning algorithms. In particular, we compared two frequently used machine learning techniques, Logistic regression  and Support Vector Machine .
2.2.1 Logistic Regression (LR)
One of the well-established statistical models is the Logistic regression where the dependent variable is categorical. In this model, logit transformation of a linear combination of features is used to resolve a binary classification problem. Formally, the logistic regression model can be formalized as
where is the coefficient for corresponding feature and is the probability of belonging to one of the classes.
2.2.2 Support Vector Machine (SVM)
Support vector machine (SVM) is one of the important classification method that constructs a hyperplane and tries to maximize the margin that separate two class of data shows as . The ability to learn the non-linear separable function by mapping the data to a higher dimensional space makes this classifier unbeatable for some classification problems. Linear SVM can be formalized as
where is the class label for data. In order to solve the non-linear classification problem, SVM uses a kernel transformation. In this study we applied radial basis function (RBF) which is one of the most popular kernel functions used in SVM , defined as
where controls the width of RBF function.
3 Experimental Results
In this section, we describe the experimental method including the dataset and feature extraction in detail for each surgical task. We also explain the performance metrics that we used to evaluate the proposed automated surgical skill assessment framework.
We implemented our framework on real robotic surgery data presented in . This is comprised of data for suturing task (see Figure 2). Eight right-handed surgeons with different skill levels performed suturing around 5 times. We analyze kinematic data captured using the API of the da Vinci robot at 30 Hz to extract features. The data includes manual annotation for surgeonâs skill based on a global rating score (GRS). Surgeons are divided into two categories of experts and novices based on their scores.
Feature are extracted for both hands using Cartesian positions of right and left patient-side manipulator end-effectors of da Vinci arms. Before computing the features, the raw data were filtered using a local regression weighted linear least square method, which reduces the noise in signal data and keeps the detail of the pattern. Speed, motion smoothness and curvature are temporal features and were calculated for each point in data. Therefore, the descriptive statistics including mean and standard deviation are derived for these features. Finally, a total of 17 features are derived from each trajectory. Then, we employed principle component analysis (PCA), a dimensionality reduction technique that is based on an orthogonal transformation to reduce the set of possible correlated features to a smaller set of uncorrelated features that are linear combinations of the original features .
3.2 Performance Evaluation
Classifier validation was conducted using two model validation schemas as suggested in . The first is leave-one-super-trial-out (LOSO), where one trial for each one of the surgeons is left out for testing. The second is leave-one-user-out (LOUO), where we leave out all the trials from one surgeon for testing. While the first validation method evaluates the robustness of a method for repeating a task by leaving out one trial for all subjects, the second setup evaluates the robustness of a method when a subject (i.e., surgeon) is not previously seen in the training data. The performance of the different classification methods was determined by classification accuracy, which is expressed in terms of percentage of subjects in the test set that are classified correctly.
The results of performing two classification methods, logistic regression and SVM using LOSO and LOUO for suturing, are shown in Table 1. The best accuracy was obtained for the combination of all movement features for both tasks. Table 1 shows that the best overall accuracy that has been achieved for suturing is 85.7% in LOSO and 71.9% in LOUO. Results also show that logistic regression for LOSO and SVM for LOUO model validation schema provide the best classification performance.
From the results shown in Table 1 , the classification accuracy improves when combination of spatial and curvature features are used. This is consistent with previous studies which emphasized that task completion time and distance traveled are insufficient to explain all aspects of surgical assessment. The features that have been used in this study can be applied globally on RMIS tasks as they have the potential to identify additional aspects of surgeon skill level which cannot be quantified by task completion time and distance traveled alone.
Table 1 also shows that for almost all experiments, the best overall accuracy obtained was from logistic regression for LOSO schema while SVM gives the best result for LOUO. It should be noted that, the result of LOUO provides an insight into the ability of the algorithms to evaluate the skill level of a surgeon that was unseen during the training phase. Therefore, we can conclude that the underlying pattern of different surgeons with same skill level is not linear. Therefore, SVM with a nonlinear kernel, such as the one we applied in this study (RBF), has better classification ability to assess the skill level of surgeons who are not previously seen in the training data. In other words, SVM has more generalizability in this context. Interestingly, experts can be classified with higher accuracy compared to novices due to the consistency in the values of movement features for experts. It is also important to mention that the overall accuracy of surgical skill classification decreases 16% when we switch from LOSO validation schema to LOUO. This suggests that surgeons with the same level of expertise perform suturing in a more similar way.
This study demonstrates the ability of machine learning methods to automatically distinguish between expert and novice performance in robotic-assisted surgical tasks. It is generally accepted that not only the skill level of surgeon vary, but also each surgical task has different levels of complexity. This complexity is not only captured through the features extracted from trajectory movement data, but also through more advanced machine learning methods that are needed to model the underlying pattern of surgical skill level. The results presented in this paper could form a basis for decision support tools that effectively, objectively and automatically evaluate surgeonâs dexterity and provide more personalized skill assessment and online feedback to trainees based on their performance. Furthermore, the proposed method can be applied on a more granular level of tasks in robotic-assisted surgery, such as surgical gestures, to provide more insight into the skill level differences of surgeons. Future research could focus on performing more validation studies with a larger number of participants for different surgical tasks, which would yield a larger training set with the potential for improving the classification result.
-  T. P. Grantcharov, L. Bardram, P. Funch-Jensen, and J. Rosenberg, “Assessment of technical surgical skills,” The European journal of surgery, vol. 168, no. 3, pp. 139–144, 2002.
-  M. J. Fard, A. K. Pandya, R. B. Chinnam, M. D. Klein, and R. D. Ellis, “Distance-based time series classification approach for task recognition with application in surgical robot autonomy,” The International Journal of Medical Robotics and Computer Assisted Surgery, pp. n/a–n/a, 2016, rCS-16-0026.R2. [Online]. Available: http://dx.doi.org/10.1002/rcs.1766
-  R. K. Reznick, “Teaching and testing technical skills,” The American journal of surgery, vol. 165, no. 3, pp. 358–361, 1993.
-  B. Schout, A. Hendrikx, F. Scheele, B. Bemelmans, and A. Scherpbier, “Validation and implementation of surgical simulators: a critical review of present, past, and future,” Surgical endoscopy, vol. 24, no. 3, pp. 536–546, 2010.
-  J. Martin, G. Regehr, R. Reznick, H. MacRae, J. Murnaghan, C. Hutchison, and M. Brown, “Objective structured assessment of technical skill (osats) for surgical residents,” British Journal of Surgery, vol. 84, no. 2, pp. 273–278, 1997.
-  F. Lalys and P. Jannin, “Surgical process modelling: a review,” International journal of computer assisted radiology and surgery, vol. 9, no. 3, pp. 495–511, 2014.
-  C. E. Reiley, H. C. Lin, D. D. Yuh, and G. D. Hager, “Review of methods for objective surgical skill evaluation,” Surgical endoscopy, vol. 25, no. 2, pp. 356–366, 2011.
-  M. J. Fard, S. Ameri, R. B. Chinnam, and R. D. Ellis, “Soft boundary approach for unsupervised gesture segmentation in robotic-assisted surgery,” IEEE Robotics and Automation Letters, vol. 2, no. 1, pp. 171–178, Jan 2017.
-  G. Guthart and J. K. Salisbury Jr, “The intuitivetm telesurgery system: Overview and application.” in ICRA, 2000, pp. 618–621.
-  M. Jahanbani Fard, “Computational modeling approaches for task analysis in robotic-assisted surgery,” 2016.
-  A. Pandya, L. A. Reisner, B. King, N. Lucas, A. Composto, M. Klein, and R. D. Ellis, “A review of camera viewpoint automation in robotic and laparoscopic surgery,” Robotics, vol. 3, no. 3, pp. 310–329, 2014.
-  L. MacKenzie, J. Ibbotson, C. Cao, and A. Lomax, “Hierarchical decomposition of laparoscopic surgery: a human factors approach to investigating the operating room environment,” Minimally Invasive Therapy & Allied Technologies, vol. 10, no. 3, pp. 121–127, 2001.
-  J. Rosen, M. Solazzo, B. Hannaford, and M. Sinanan, “Task decomposition of laparoscopic surgery for objective evaluation of surgical residents’ learning curve using hidden markov model,” Computer Aided Surgery, vol. 7, no. 1, pp. 49–61, 2002.
-  L. Tao, E. Elhamifar, S. Khudanpur, G. D. Hager, and R. Vidal, “Sparse hidden markov models for surgical gesture classification and skill evaluation,” in Information Processing in Computer-Assisted Interventions. Springer, 2012, pp. 167–177.
-  B. Varadarajan, C. Reiley, H. Lin, S. Khudanpur, and G. Hager, “Data-derived models for segmentation with application to surgical assessment and training,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2009. Springer, 2009, pp. 426–434.
-  N. Ahmidi, P. Poddar, J. D. Jones, S. S. Vedula, L. Ishii, G. D. Hager, and M. Ishii, “Automated objective surgical skill assessment in the operating room from unstructured tool motion in septoplasty,” International Journal of Computer Assisted Radiology and Surgery, vol. 10, no. 6, pp. 981–991, 2015.
-  M. K. Chmarra, S. Klein, J. C. F. De Winter, F. W. Jansen, and J. Dankelman, “Objective classification of residents based on their psychomotor laparoscopic skills,” Surgical Endoscopy and Other Interventional Techniques, vol. 24, no. 5, pp. 1031–1039, 2010.
-  V. Datta, S. Mackay, M. Mandalia, and A. Darzi, “The use of electromagnetic motion tracking analysis to objectively measure open surgical skill in the laboratory-based model,” Journal of the American College of Surgeons, vol. 193, no. 5, pp. 479–485, 2001.
-  J. Rosen, B. Hannaford, C. G. Richards, and M. N. Sinanan, “Markov modeling of minimally invasive surgery based on tool/tissue interaction and force/torque signatures for evaluating surgical skills,” Biomedical Engineering, IEEE Transactions on, vol. 48, no. 5, pp. 579–591, 2001.
-  S. Cotin, N. Stylopoulos, M. P. Ottensmeyer, P. F. Neumann, D. W. Rattner, and S. Dawson, “Metrics for Laparoscopic Skills Trainers: The Weakest Link!” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2002, vol. 2488, 2002, pp. 35–43.
-  T. N. Judkins, D. Oleynikov, and N. Stergiou, “Objective evaluation of expert and novice performance during robotic surgical training tasks,” Surgical endoscopy, vol. 23, no. 3, pp. 590–597, 2009.
-  S. Dreiseitl and M. Binder, “Do physicians value decision support? A look at the effect of decision support systems on physician opinion,” Artificial Intelligence in Medicine, vol. 33, no. 1, pp. 25–30, 2005.
-  S. Ameri, M. J. Fard, R. B. Chinnam, and C. K. Reddy, “Survival analysis based framework for early prediction of student dropouts,” in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 2016, pp. 903–912.
-  M. J. Fard, S. Chawla, and C. K. Reddy, “Early-stage event prediction for longitudinal data,” in Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 2016, pp. 139–151.
-  M. J. Fard, S. Ameri, and A. Zeinal Hamadani, “Bayesian approach for early stage reliability prediction of evolutionary products,” in Proceedings of the International Conference on Operations Excellence and Service Engineering. Orlando, Florida, USA, 2015, pp. 361–371.
-  K. P. Murphy, Machine Learning: A Probabilistic Perspective. The MIT Press, aug 2012.
-  M. J. Fard, P. Wang, S. Chawla, and C. K. Reddy, “A bayesian perspective on early stage event prediction in longitudinal data,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 12, pp. 3126–3139, Dec 2016.
-  Y. Kassahun, B. Yu, A. T. Tibebu, D. Stoyanov, S. Giannarou, J. H. Metzen, and E. Vander Poorten, “Surgical robotics beyond enhanced dexterity instrumentation: a survey of machine learning techniques and their role in intelligent and autonomous surgical actions,” International Journal of Computer Assisted Radiology and Surgery, vol. 11, no. 4, pp. 553–568, 2016.
-  M. J. Fard, S. Ameri, and R. D. Ellis, “Toward personalized training and skill assessment in robotic minimally invasive surgery,” arXiv preprint arXiv:1610.07245v2, 2016.
-  R. D. Ellis, M. J. Fard, K. Yang, W. Jordan, N. Lightner, and S. Yee, “Management of medical equipment reprocessing procedures: A human factors/system reliability perspective,” in Advances in Human Aspects of Healthcare. CRC Press, 2012, pp. 689–698.
-  K. Yang, N. Lightner, S. Yee, M. J. Fard, and W. Jordan, “Using computerized technician competency validation to improve reusable medical equipment reprocessing system reliability,” in Advances in Human Aspects of Healthcare. CRC Press, 2012, pp. 556–564.
-  H. C. Lin, I. Shafran, D. Yuh, and G. D. Hager, “Towards automatic skill evaluation: Detection and segmentation of robot-assisted surgical motions,” Computer Aided Surgery, vol. 11, no. 5, pp. 220–230, 2006.
-  D. G. Kleinbaum and M. Klein, Logistic Regression, ser. Statistics for Biology and Health. New York, NY: Springer New York, 2010.
-  L. Saitta, “Suppor Vector Networks,” Machine Learning, vol. 20, pp. 273–297, 1995.
-  V. Vapnik, Statistical Learning Theory, 1998.
-  Y. Gao, S. S. Vedula, C. E. Reiley, N. Ahmidi, B. Varadarajan, H. C. Lin, L. Tao, L. Zappella, B. Béjar, D. D. Yuh et al., “JHU-ISI gesture and skill assessment working set (JIGSAWS): A surgical activity dataset for human motion modeling,” in Modeling and Monitoring of Computer Assisted Interventions (M2CAI)â MICCAI Workshop, 2014.
-  H. Abdi and L. J. Williams, “Principal component analysis,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 2, no. 4, pp. 433–459, jul 2010.