Toward Personalized Training and Skill Assessment in Robotic Minimally Invasive Surgery
Despite the immense technology advancement in the surgeries the criteria of assessing the surgical skills still remains based on subjective standards. With the advent of robotic-assisted minimally invasive surgery (RMIS), new opportunities for objective and autonomous skill assessment is introduced. Previous works in this area are mostly based on structured-based method such as Hidden Markov Model (HMM) which need enormous pre-processing. In this study, in contrast with them, we develop a new shaped-based framework for automatically skill assessment and personalized surgical training with minimum parameter tuning. Our work has addressed main aspects of skill evaluation; develop gesture recognition model directly on temporal kinematic signal of robotic-assisted surgery, and build automated personalized RMIS gesture training framework which . We showed that our method, with an average accuracy of 82% for suturing, 70% for needle passing and 85% for knot tying, performs better or equal than the state-of-the-art methods, while simultaneously needs minimum pre-processing, parameter tuning and provides surgeons with online feedback for their performance during training.
Toward Personalized Training and Skill Assessment in Robotic Minimally Invasive Surgery
Mahtab J. Fard, Sattar Ameri, and R. Darin Ellis
00footnotetext: Manuscript received July 23, 2016; revised August 8, 2016.00footnotetext: M. J. Fard, S. Ameri and R.D. Ellis are with the Department of Industrial and System Engineering, Wayne State University, Detroit, MI, 48202 USA e-mail: email@example.com.
Robotic Surgery, Gesture classification, Surgical skill assessment, Time series classification, Dynamic time warping.
T HE hospital operating room is a challenging work environment where surgical skills have been learned there with direct supervision of expert surgeons for many years . This procedure is very time-consuming and subjective that cause surgeon’s skill evaluation be non-robust and unreliable . Different surgical gestures have different levels of complexity and the skill level of surgeon varies and can be enhanced with teaching and training [3, 4]. Hence, it is important to find underlying signatures of surgeon for each surgical gesture to be able to asses and evaluate the quality of the skills that were learned. The aim of this study is to build a personalized surgical training framework and skill assessment through a quantitative methods.
With the new technology innovations, such as minimally invasive surgery and more advanced, robotic minimally invasive surgery (RMIS), the need for non-subjective based skill evaluation have been arisen . Although these technologies introduce new challenges in skill assessment due to the steep learning curve but on the other hand, they open a new opportunities for objective and automated surgical assessment which was not available before . Current systems like da Vinci (Intuitive Surgical, Sunnyvale, CA)  record motion and video data, enabling development of computational models to recognize and analyze surgeon’s skills and performance through data-driven approaches.
The key step for autonomous skill evaluation of surgeons is to develop techniques that are capable of accurately recognizing surgical gestures . These can then frame the premise for creating quantitative measures of surgical skills and consequently automatically annotate those gestures that needs more training . A range of techniques have been developed to assess surgical skills of junior surgeon . Most of the prior work extracts features from kinematic and video data and build gesture classification models using Hidden Markov Models (HMMs) based approaches [11, 12, 13] and descriptive curve coding (DCC) . However, these methods are very time-consuming, interactive and subjective which result in lack of consistency, reliability and efficiency in real-time feedback .
In order to address these drawbacks, one natural approach is to develop shaped-based time series classification methods directly on temporal kinematic signal, captured during surgeries . In this paper, we extend our previous work [17, 18] to investigate the feasibility of building personalized gesture training and skill assessment framework. In this framework, the similarity of two time series determines by comparing their individual Dynamic Time Warping (DTW) point values . Dynamic Time Warping (DTW) is a well-known technique for time series classification . The similarity that has been derived from DTW, can be used as an input to the -Nearest Neighbors algorithm (NN), a popular classification method, to classify a new data based on its similarity to other sample data . Our work has addressed two main aspects of skill evaluation; develop gesture recognition model directly on temporal kinematic signal of robotic-assisted surgery, and build automated personalized RMIS gesture training framework which provide online augmented feedback using the model trained in classification step. Using the proposed framework, one can also evaluate skill between novice and expert surgeons.
The aim of our work is to build a personalized framework for surgical task training. Thus, the skill evaluation framework that developed in this study, contains of two key components as shown in Fig 1. The first component is to measure the similarity between surgemes performed by different surgeons and recognize them based on the -Nearest Neighbor approach. Then, based on the classification result we can evaluate the performance of surgeon for doing each gesture. Consequently, gestures that needs more training can be identified.
Ii-a Similarity Measures and Gesture Recognition
The first component of the framework is to measure the similarity between motion time series signals of surgemes performed by surgeons. Time series data are one example for longitudinal data [21, 22, 23], which different method can be used to measure similarity of data. We applied distance between two time series as a similarity measure . Shaped-based similarity measure techniques are among the well-developed methods in this area where determine the similarity of the overall shape of two time series by directly comparing their individual point values . It is in contrast with feature-based and structure-based where first features need to be extracted in order to find higher-level structures of the series.
One of the simplest ways to estimate the distance between two time series is to use any norm. Given two time series A and B, their distance can be determined by comparing local point values as
where is a positive integer, is the length of the time series, and are the element of time series A and B, respectively. If =2 the Eq.(II-A) defines the Euclidean distance, the most common distance measure for time series. Despite the simplicity and efficiency of Euclidean distance which makes it the most popular distance measure, its major drawback add a limitation to this method. It requires that both input sequences be of the same length, and it is sensitive to distortions, e.g. shifting, noise, outlier. In order to handle this problem warping distances such as Dynamic Time Warping (DTW) proposed to search for the best alignment between two time series .
Consider and where and have and dimension respectively, the two sequences can be arranged as matrix like the sides of a grid in which the distance between every possible combination of time instances and is stored. To find the best match between two sequences, a path through the grid that minimizes the overall distance is needed. This path can be efficiently found using dynamic programming. If cumulative distance as distance for current cell and the minimum of cumulative distance of adjacent elements, the distance defines as
where can be calculated using Eg. (II-A) for =2.
The second component of the framework is classification algorithm based on the -Nearest Neighbors (NN) approach. The NN is a supervised distance-based classification method. In time series classification domain, -Nearest Neighbor shows promising result . Despite its simplicity, -Nearest Neighbors has been very successful in classification problems . NN classifier is instance-based learning where instead of constructing a general model, it simply stores instances of training data. During the classification phase majority vote of the nearest neighbor for each point is computed. Thus, the label for the query point is assigned based on the most representatives within the nearest neighbors of the points.
Ii-B Gesture Performance Evaluation
Once we get the classification result for each individual surgeon we are able to find the gesture that need more training or practice. For this purpose, first we need the tabulated results of gesture classifications for each surgeon into a corresponding confusion matrix (Table I). Basically, true positives (TP) is the number of correctly classified instances and true negatives (TN) are the number of correctly classified instances that do not belong to the gesture. If a gesture is incorrectly assigned to the gesture, it is a false positive (FP) and if it is not classified as gesture instances it is a false negative (FN).
|Gesture X||Not Gesture X|
|Not Gesture X||FP||TN|
Based on the values in the confusion matrix, the overall accuracy for gesture classification can be defined as
Gestures that needs more training can be identified using recall which also called sensitivity and is defined as
Finally, gestures that needs more robust definition can be identified using Precision
as it a measure of result relevancy.
Iii experimental setup
As briefly explained in introduction, we are using data presented in . This is comprised of data for different fundamental surgical tasks performed by eight right-handed surgeons with different skill levels (expert, intermediate and novice). Each user performed around 5 trails of the task. For each of the task, we analyze kinematic data captured using the API of the da Vinci at 30 Hz. The data includes 76 motion variables which consist of 19 features for each robotic arms, left and right master side, and the left and right patient side. In this paper we build a personalized training framework for suturing, needle passing and knot tying (Figure 5).
The three surgical tasks are defined as follow:
Suturing (SU): the surgeon picks up needle then proceeds to the incision and passes through tissue. Then after the needle pass, the surgeon extracts the needle out of the tissue.
Needle-Passing (NP): the surgeon picks up the needle and passes it through four small metal hoops from right to left.
Knot-Tying (KT): the surgeon picks up one end of a suture tied to flexible tube attached at its ends to the surface of the bench-top model, and ties a single loop knot.
In order to compare the accuracy of our proposed gesture recognition framework with other methods [28, 13], we used Leave-one-user-out (LOUO) setup in dataset. In LOUO, eight folds are created, each for one surgeon with 50 iterations. The LOUO shows the robustness of a method when a subject is not previously seen in the training data. Thus, it helps us to personalize skill assessment for each individual surgeon.
|Gesture Index and Description|
|SU/NP/KT||G1: Reaching for needle with right hand|
|SU/NP||G2: Positioning needle|
|G3: Pushing needle through tissue|
|G4: Transferring needle from left to right|
|G5: Moving to center with needle in grip|
|G6: Pulling suture with left hand|
|G8: Orienting needle|
|SU||G9: Using right hand to help tighten suture|
|G10: Loosening more suture|
|SU/NP/KT||G11: Dropping suture at end and moving to end points|
|KT||G12: Reaching for needle with left hand|
|G13: Making C loop around right hand|
|G14: Reaching for suture with right hand|
|G15: Pulling suture with both hands|
Iv Results and Discussion
In this section, we report the experimental results of gesture recognition using the proposed classification method. Then, we will discuss the personalized skill evaluation framework that provides assessment to surgeons during their RMIS training.
Iv-a Distance-based Skill Evaluation
First we start with different surgical gesture frequency analysis during RMIS tasks. Figure 9 presents the average number of surgical gesture occurrence for one surgery trial. It shows that, there are some surgical gestures that are very infrequent (such as G9 and G10 in suturing) and are mostly done by novice surgeons. This suggests that, those gestures are intermediate or correction positioning moves. Thus, they cannot be a good indicator when the performance of classifiers are measured. Figure 9 also indicates that the difference between number of each surgical gesture performed by novices in needle passing is significant compare to experts while for suturing and knot tying this is almost the same. Hence, one can conclude that novice surgeons might need more training for needle passing task.
In order to have a better understanding about experts’ and novices’ pattern during RMIS tasks, we compute the pairwise DTW distance within group of expert surgeons and compare it with DTW distance between novices and experts. Figure 10 presents the boxplot for different RMIS tasks. It shows that expert surgeons do the tasks in a more similar pattern compare to novices. This conclusion is also valid for each surgical gestures separately as it shows in Figure 14. It also indicates the feasibility of using DTW distance measure as a skill evaluation metric. These results align with intuition behind the DTW distance approach that proposed in this paper for gesture classification and skill assessment.
Iv-B Surgical Gesture Classification
Table III shows the accuracy of the proposed method obtained for each surgeon doing different tasks. From the Table, it can be observed that knot tying has the minimum overall standard deviation which suggests that this task is possibly perform in more similar pattern between surgeons. On the other hand, such a difference among surgeons for needle passing and suturing suggests that the experts and novices can be separated more distinctly than knot tying.
|Suturing||Needle Passing||Knot Tying|
Iv-C Personalized Skill Assessment
Finally, we examine the proposed personalized skill assessment framework for RMIS tasks. First, the result in Table III should be expanded for each surgical gesture. Thus, we need to tabulated gesture classifications outcomes into a corresponding confusion matrix. Tables IV-VI show the confusion matrix for these three tasks. The diagonal shows the correctly classified surgical gestures. Results in Table IV and V show that for suturing and needle passing, gestures G5 and G8 have both low recall and precision. This suggests that these two surgical gestures might need more training or on the other hand, they might not be defined properly. It is good to mention that G1 and G11 can be considered as an idle position for both suturing and and needle passing. For knot tying, Table VI suggests more training for gesture G12 and more G1 and G11 for more precise definition for this task.
In order to have a personalized skill assessment system that is capable of providing online feedback to surgeon during training, one should find for each surgeon, which surgical gesture does not have the same pattern as expert surgeon. In other words, which surgical gesture does not recognize correctly. In this regard, detail confusion matrix in the form of heatmap for an expert and a novice surgeon shows in Figures 21. From this Figure, as an example, we can observed that compare to expert, a novice surgeon who do suturing might need more training for gesture G4, G6 and G3. One can also conclude that Needle passing is the most challenging task and the novice surgeon needs more training for almost all gesture except G3. On the other hand, Knot tying seems to be the most straight forward task and the results would suggest more training for G12 for the novice surgeon.
Despite the tremendous enhancements in adequacy of surgical treatments in the recent years, the criteria of assessing surgeon’s surgical skill remains subjective. With the advent of robotic surgery, the need for trained surgeons increase. Consequently, there is a huge demand for objectively evaluation of surgical skill and improve surgical training efficiency. On the other hand, the importance of recognizing gestures during surgery is self-evident by the fact that many applications deal with motion and gesture signal. In this paper, we proposed a personalized skill assessment and training framework based on time series similarity measure algorithm. We developed surgical gesture recognition model on temporal kinematic signal of robotic-assisted surgery. Based on the proposed framework, we built automated personalized RMIS gesture training system which provide online augmented feedback. Despite the simplicity of the proposed method, it have been shown in literature that it is difficult to beat . The performance of the proposed framework based on the experimental results are encouraging with the accuracy of approximately 80% for suturing, 70% for needle passing and 85% for knot tying. These results establish the feasibility of applying time series classification methods on RMIS temporal kinematic signal data to recognize different surgical gestures during robotic minimally invasive surgery. A key advantage of our approach is its simplicity by using directly on the temporal kinematic signal of robotic-assisted surgery. However, there may be utility in extending our work by adding noise or other tasks (beside those in the training set) to the data in order to build a more robust gesture recognition method.
It should be noted that, from the results in this paper one can conclude that the global surgical gesture dictionary and their definition (Table II) need modification in order to have a universal language of surgery. More importantly, the accuracy and the robustness of any supervised classification method relies on a priori labels which are the ground truth in any machine learning methods. In this experiment, the labels are given by expert surgeons. This implies a subjective based annotation for surgical gesture labelling system. Though, imprecise labels result in significantly different classification accuracy. Furthermore, reliable classification is possible in light of the fact that the model learn its parameters from the precise training data with less human involvement. One way to overcome to this challenge is to build an unsupervised classification model to automatically decompose task to its gestures .
This paper present a step toward automatic recognition of surgical gesture also provide an insight about online feedback during training. Thus, perhaps most excitingly, the proposed framework can lay the groundwork towards development of semi-autonomous robot behaviors, such as automatic camera control during robotic-assisted surgery by online recognizing the surgical gestures that is being performed [fard2016distance]. Additionally, human factor study should be developed to have better understanding of this aspect in surgical training [30, 31]. Our intuitive approach for finding similarities between two time series queries is based on DTW distance which directly applied on the temporal kinematic signal data. Despite the promising result in this paper, a future technical challenge will be to build a more generalized models that are capable of capturing abnormal pattern of surgeon during surgery by applying rare event classification approaches.
-  R. K. Reznick, “Teaching and testing technical skills,” The American journal of surgery, vol. 165, no. 3, pp. 358–361, 1993.
-  A. Darzi, S. Smith, and N. Taffinder, “Assessing operative skill: needs to become more objective,” BMJ: British Medical Journal, vol. 318, no. 7188, p. 887, 1999.
-  Y. Kassahun, B. Yu, A. T. Tibebu, D. Stoyanov, S. Giannarou, J. H. Metzen, and E. Vander Poorten, “Surgical robotics beyond enhanced dexterity instrumentation: a survey of machine learning techniques and their role in intelligent and autonomous surgical actions,” International Journal of Computer Assisted Radiology and Surgery, vol. 11, no. 4, pp. 553–568, 2016.
-  R. D. Ellis, A. J. Munaco, L. A. Reisner, M. D. Klein, A. M. Composto, A. K. Pandya, and B. W. King, “Task analysis of laparoscopic camera control schemes,” The International Journal of Medical Robotics and Computer Assisted Surgery, 2015, DOI:10.1002/rcs.1716.
-  P. Van Hove, G. Tuijthof, E. Verdaasdonk, L. Stassen, and J. Dankelman, “Objective assessment of technical surgical skills,” British journal of surgery, vol. 97, no. 7, pp. 972–987, 2010.
-  F. Lalys and P. Jannin, “Surgical process modelling: a review,” International journal of computer assisted radiology and surgery, vol. 9, no. 3, pp. 495–511, 2014.
-  G. Guthart and J. K. Salisbury Jr, “The intuitivetm telesurgery system: Overview and application.” in ICRA, 2000, pp. 618–621.
-  M. J. Fard, S. Ameri, R. B. Chinnam, and R. D. Ellis, “Soft boundary approach for unsupervised gesture segmentation in robotic-assisted surgery,” IEEE Robotics and Automation Letters, vol. 2, no. 1, pp. 171–178, Jan 2017.
-  M. Jahanbani Fard, “Computational modeling approaches for task analysis in robotic-assisted surgery,” 2016.
-  C. E. Reiley, H. C. Lin, D. D. Yuh, and G. D. Hager, “Review of methods for objective surgical skill evaluation,” Surgical endoscopy, vol. 25, no. 2, pp. 356–366, 2011.
-  J. Rosen, B. Hannaford, C. G. Richards, and M. N. Sinanan, “Markov modeling of minimally invasive surgery based on tool/tissue interaction and force/torque signatures for evaluating surgical skills,” Biomedical Engineering, IEEE Transactions on, vol. 48, no. 5, pp. 579–591, 2001.
-  C. E. Reiley and G. D. Hager, “Task versus subtask surgical skill evaluation of robotic minimally invasive surgery,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2009. Springer, 2009, pp. 435–442.
-  L. Zappella, B. Béjar, G. Hager, and R. Vidal, “Surgical gesture classification from video and kinematic data.” Medical image analysis, vol. 17, no. 7, pp. 732–45, Oct. 2013.
-  N. Ahmidi, P. Poddar, J. D. Jones, S. S. Vedula, L. Ishii, G. D. Hager, and M. Ishii, “Automated objective surgical skill assessment in the operating room from unstructured tool motion in septoplasty,” International Journal of Computer Assisted Radiology and Surgery, vol. 10, no. 6, pp. 981–991, 2015.
-  B. Schout, A. Hendrikx, F. Scheele, B. Bemelmans, and A. Scherpbier, “Validation and implementation of surgical simulators: a critical review of present, past, and future,” Surgical endoscopy, vol. 24, no. 3, pp. 536–546, 2010.
-  T.-c. Fu, “A review on time series data mining,” Engineering Applications of Artificial Intelligence, vol. 24, no. 1, pp. 164–181, 2011.
-  M. J. Fard, A. K. Pandya, R. B. Chinnam, M. D. Klein, and R. D. Ellis, “Distance-based time series classification approach for task recognition with application in surgical robot autonomy,” The International Journal of Medical Robotics and Computer Assisted Surgery, pp. n/a–n/a, 2016, rCS-16-0026.R2. [Online]. Available: http://dx.doi.org/10.1002/rcs.1766
-  M. J. Fard, S. Ameri, A. K. Chinnam, Ratna B. Pandya, M. D. Klein, and R. D. Ellis, “Machine learning approach for skill evaluation in robotic-assisted surgery,” in Lecture Notes in Engineering and Computer Science: Proceedings of The World Congress on Engineering and Computer Science 2016, 2016, pp. 433–437.
-  D. Bernad, “Finding patterns in time series: a dynamic programming approach,” Advances in knowledge discovery and data mining, 1996.
-  N. Bhatia et al., “Survey of nearest neighbor techniques,” International Journal of Computer Science and Information Security, vol. 8, no. 2, pp. 302–305, 2010.
-  M. J. Fard, S. Ameri, and A. Zeinal Hamadani, “Bayesian approach for early stage reliability prediction of evolutionary products,” in Proceedings of the International Conference on Operations Excellence and Service Engineering. Orlando, Florida, USA, 2015, pp. 361–371.
-  M. J. Fard, S. Chawla, and C. K. Reddy, “Early-stage event prediction for longitudinal data,” in Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 2016, pp. 139–151.
-  M. J. Fard, P. Wang, S. Chawla, and C. K. Reddy, “A bayesian perspective on early stage event prediction in longitudinal data,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 12, pp. 3126–3139, Dec 2016.
-  E. Keogh and S. Kasetty, “On the need for time series data mining benchmarks: a survey and empirical demonstration,” Data Mining and knowledge discovery, vol. 7, no. 4, pp. 349–371, 2003.
-  J. Lin, S. Williamson, K. Borne, and D. DeBarr, “Pattern recognition in time series,” Advances in Machine Learning and Data Mining for Astronomy, vol. 1, pp. 617–645, 2012.
-  W. A. Chaovalitwongse, Y.-j. Fan, S. Member, and R. C. Sachdeo, “On the Time Series K -Nearest Neighbor Classification of Abnormal Brain Activity,” IEEE Transactions on Systems, Man and Cybernetics, Part A: System and Humans, vol. 37, no. 6, pp. 1005–1016, 2007.
-  Y. Gao, S. S. Vedula, C. E. Reiley, N. Ahmidi, B. Varadarajan, H. C. Lin, L. Tao, L. Zappella, B. Béjar, D. D. Yuh et al., “JHU-ISI gesture and skill assessment working set (JIGSAWS): A surgical activity dataset for human motion modeling,” in Modeling and Monitoring of Computer Assisted Interventions (M2CAI)â MICCAI Workshop, 2014.
-  C. E. Reiley, H. C. Lin, B. Varadarajan, B. Vagvolgyi, S. Khudanpur, D. D. Yuh, and G. D. Hager, “Automatic recognition of surgical motions using statistical modeling for capturing variability,” Studies in health technology and informatics, vol. 132, no. 1, pp. 396–401, Jan. 2008.
-  P. Schäfer, “Towards time series classification without human preprocessing,” in Machine Learning and Data Mining in Pattern Recognition. Springer, 2014, pp. 228–242.
-  R. D. Ellis, M. J. Fard, K. Yang, W. Jordan, N. Lightner, and S. Yee, “Management of medical equipment reprocessing procedures: A human factors/system reliability perspective,” in Advances in Human Aspects of Healthcare. CRC Press, 2012, pp. 689–698.
-  K. Yang, N. Lightner, S. Yee, M. J. Fard, and W. Jordan, “Using computerized technician competency validation to improve reusable medical equipment reprocessing system reliability,” in Advances in Human Aspects of Healthcare. CRC Press, 2012, pp. 556–564.