sWSI: A Low-cost and Commercial-quality Whole Slide Imaging System on Android and iOS Smartphones

# sWSI: A Low-cost and Commercial-quality Whole Slide Imaging System on Android and iOS Smartphones

In this paper, scalable Whole Slide Imaging (sWSI), a novel high-throughput, cost-effective and robust whole slide imaging system on both Android and iOS platforms is introduced and analyzed. With sWSI, most mainstream smartphone connected to a optical eyepiece of any manually controlled microscope can be automatically controlled to capture sequences of mega-pixel fields of views that are synthesized into giga-pixel virtual slides. Remote servers carry out the majority of computation asynchronously to support clients running at satisfying frame rates without sacrificing image quality nor robustness. A typical 15x15mm sample can be digitized in 30 seconds with 4X or in 3 minutes with 10X object magnification, costing under $1. The virtual slide quality is considered comparable to existing high-end scanners thus satisfying for clinical usage by surveyed pathologies. The scan procedure with features such as supporting magnification up to 100x, recoding z-stacks, specimen-type-neutral and giving real-time feedback, is deemed work-flow-friendly and reliable. M obile health, Image processing, Cloud computing for healthcare, Whole slide imaging ## 1 Introduction Virtual slides generated from whole slide imaging (WSI) systems is an essential component of digitized diagnostic process, as it provides extended field-of-views(FoVs) under microscopes without handling specimen physically[1]. However, the automated scanners that are commonly used to capture and process such data cost approximately$50,000 or more up-front even for low-frequency usage.

In many developing countries, this financial cost alone has significantly impeded modernizing related departments in hospital, such as the that of pathology in China. Lacking digitization then undermines the productivity and diagnostic accuracy, widely leading to poorer administrative attention and tighter budgets.

In recent years, two alternative solutions have attracted much academic and commercial interest. One is aborting the automation feature thus leaving the operator to control the microscope manually, reducing the product package to a dedicated digital camera and softwares [2][3], costing as low as $10,000. The other attempts to make best use of smartphones, which not only have integrated capturing and processing ability but also are widely distributed among clinical professionals thus lowering the start-up cost to near zero. A small number of products in the later category in either research or commercial stage has been evaluated by clinical professionals[4], but to the limited knowledge of the authors, all of them are made exclusively for high-end iPhones and are not commercially available yet. Although rarely explained explicitly, robustness to guarantee successful virtual slide generation could be a serious obstacle between publishable researches and commercial products. Additionally, diversity in hardware and operating systems might be the reason that Android phones, though dominating handset market in developing countries, are largely ignored. In this paper, a WSI system on maintream smartphones just became publicly available with commercial-quality and low cost named scalable WSI (sWSI)[6] is introduced and evaluated. It offers fast and reliable WSI on most handsets, average Androids or flagship iPhones alike, reducing up-front cost to about$100 and the average service cost per scan is under $1. Pathologists recognize it as an attractive alternative to stand-alone scanners, especially for quick scans such as with frozen sections as well as medium/low-frequency usages. The rest of this paper is organized as following. In Section 2, the overview of system architecture is illustrated. In Section 3, the client’s and server’s functions as well as the major techniques to guarantee robustness are analyzed In Section 4, the on-the-fly distortion correction model is formulated with a solution algorithm presented. In Section 5, subjective performance evaluations by surveyed clinical users of both automated scanner and sWSI are summarized, which a conclusion drawn in Section 6. ## 2 System Overview There are two essential and costly components in a typical WSI scanners: the capturing unit, typically a set of lens with a distortion-calibrated digital eyepiece, and on-board or external high-performance computers. Like any dedicated devices, since both parts are specifically built for the system, they are mostly non-productive when the system is idle thus waste much of their value when under-used. Unfortunately, this is commonly the case for smaller hospitals where complicated pathological diagnosis occurs but only occasionally. This situation, coupled with consumer electronics’ performance approaching medical-grade tools, led to the idea of creating sWSI. ### 2.1 Hardware To provide full WSI functionality at a dramatically lower cost, sWSI aims to reduce cost of both hardware necessities. For the optical part, it reversibly upgrades existing microscopes with built-in cameras of smartphones and compensate for the unknown optical distortions computationally, as discussed in detail in Section 4. For the computing part, it utilizes smartphones for light-weight real-time processing and transfers the major bulk to shared remote servers so to allow temporal-multiplexing for improving utilization rate and cost-sharing. Even though the prices of mainstream smartphones spreads widely, much of it came in the form of user-friendly features such as security or battery life that is largely irrelevant to sWSI. Thanks to fast expansion of smartphone markets, their cameras, which used to be the critical link in such clinical applications, are now on par with many dedicated digital eyepieces [7]. Overall, newer smartphone models can easily meet the minimal requirement listed in Table. 1 at price as low as$100. It should be also noted that the higher-end ones that meet the optional specification for better performance may be brought at deep discounts as used or refurbished, which may suffer short battery life or a repaired screen but does not affect the performance of sWSI.

Considering the fact that most professionals in research and health-care services already own a handset or better ones as specified above, sWSI requires installing only one adapter for each pair of existing smartphone and optical microscope. These microscope-smartphone adaptors are available with many commercial options as well as open-source designs for DIY 3D printing, though the ones specifically built for each phone model are preferred so to minimize need for adjusting camera-eyepiece alignment and to block disruption light sources. One setup is demonstrated in Fig. 3 of an used iPhone 6 costing $200 installed on Olympus BH2-BHS microscope with scalScope adapter, which took about 15 seconds to set up. ### 2.2 Software In addition to image compressing, transferring and virtual slide synthesizing as needed in any whole slide scanning systems, the software in sWSI is also responsible for automatically measuring and compensating hardware diversity. Unfortunately, fully localizing many of these functions are beyond the reach of mass produced mobile devices. Synthesizing the virtual slide from FoVs requires at least several GB of RAM and sequentially processing hundreds of FoVs at full resolutions can take an hour or more on a mobile CPU. Besides, since the virtual slides will be stored remotely anyway, there is little extra cost in moving the bulk of processing onto remote servers, as implemented in sWSI. The downside of this distributed computing model is introducing significant risks of failure by splitting the processing work-flow into asynchronous ones, but in sWSI this is solved as explained in Section 3. Another practical issue worth noticing is that due to architecture and driver support issues beyond the scope of this paper, most Android phones only support JPEG image capture at higher resolution, which cannot be processed pixel-by-pixel. This significantly constrains data flux since each FoV taken has to go through extra encoding-decoding process costing several hundred milliseconds, depending on CPU power and resolution. As a result, the sWSI Android app limits the capturing resolution to about 3MP and generally achieves throughput of about 1 to 3 FoVs per second, except for certain models with drivers offering high-resolution pixel data of images captured, such as the OnePlus X. ## 3 Fail-proof Distributed Processing ### 3.1 Basic scan procedures and interaction In sWSI, a smartphone client app is responsible for gathering user’s input, capturing and processing the FoVs as well as guiding users interactively, with a user interface during scan as presented in Fig.3. There is very little difference between the scanning procedure with sWSI as shown in Fig. 4, and that practised by most microscope users except for requiring updating magnifications, thus is not further discussed here. ### 3.2 Real-time feedback on Clients The client’s share of processing focuses on speed and robustness instead of accuracy thus uses down-sampled copies of camera input. It roughly estimate pairwise translation of FoVs by stitching each captured FoV with the last one through key point detection and matching with the SURF algorithm[8]. This translation is then used in three ways: updating a mini-map illustrating current location on the slide, feeding a finite-state machine to manage the kernel asynchronously and providing feedback to users as guidance for operating the microscope. The feedback and their triggers include: #### Moving too fast The translation is too far so the key point matching in SURF may be unreliable. #### Lost No reliable translation can be obtained. The causes cannot be further distinguished by the machine but should be noticeable to the users, such as moving so fast that there is little overlapping between the current pair of FOVs or the camera is out of focus. #### Touching a boundary There are few key points detected so the FoV is likely near a boundary. #### No error The translation is reliable. With users following the hints, sWSI essentially creates a closed feedback loop that allow scan-time interference against potential failure, such as inability to focus properly on thick samples or to track positioning on barren ones. This mechanism thus prevents most flops due to sample preparation and user operation before spending long time in completing the scan, which is a common issue with automatic scanners. ### 3.3 Full resolution processing with a-priori knowledge on Servers The cloud servers in sWSI are the primary powerhouses of computation. With full resolution FoVs and scan results from clients, servers re-stitch the adjacent FoVs at maximal accuracy, correct distortion and generate the virtual whole slide. The asynchronous two-staged stitching performed respectively on the clients and servers, however, has inherent weak spots on both stability and efficiency. On one hand, the FoV pairwise stitching is based on key point detection and matching, whose outcome in turn is resolution-dependent. As a result, such outcomes in down-sampled and original resolution may potentially be significantly inconsistent. In many cases, as in almost every virtual slides constructed from 100 FoVs or more with prototypes of sWSI, the full-resolution stitching produce unreliable matching at least once or more. On the other hand, by the law of large numbers, it is desirable to match as many key point pairs as possible for accurate estimation of the FoV-wise matching function, especially where this function has high degrees of freedom as is the case of sWSI where raw images are non-linearly distorted in unknown patterns. The computational cost of brute-and-force key point matching, however, grows quadratically with number of key points. To encounter both issues at once, sWSI employs a a-priori-knowledge-based SURF KP detection and matching algorithm on the server. Recall that SURF detects KPs from a virtual image pyramid that has lower resolution on higher layers. In sWSI, instead of detecting with one threshold across all layers, multiple thresholds are chosen adaptively as following. First, one threshold is picked to ensure at least KPs are detected on layers in total, where is a constant multiplier, is the number of KP detected in the down-sampled copy during scan, is the index of the upper most layer and is derived from the power-of-two down-sample rate during scan as  lds=log2rds,lds∈N. (1) Next, threshold for detection on layer is adaptively chosen so that KPs are detected on each lower layers, where is another constant. With this thresholding approach, most KPs on the scan stage can be detected at full resolution with additional ones from lower layers that are more localized but with higher resolution, while the total number of KPs is controlled by and thus would not over-expand. Afterwards, instead of brute-and-force matching by calculating difference of all pairs of KPs and picking the optimal set of match, sWSI selectively calculate those within a constant distance from the coordinates indicated by the scan stage translation with up-sampling and assume all others infinitely large. Assuming there are and KPs detected respectively on the pair of FoVs and approximately other KPs within each KP’s selected matching region, a brute-and-force matching needs to calculate differences of KP pairs while the proposed methods does so only times. Considering that the is in the range of thousands while is usually under 10, the proposed modification dramatically reduces calculation yet yields nearly identical results. Since KP matching takes a large number of float point operations thus consumes a large portion of time, this reduction boosts the overall efficiency of sWSI by over . ## 4 On-the-fly Image Distortion Correction When stitching each FoV pair to match KPs, the projection function can be in any format so long as it minimize error without over-fitting. Combining all FoVs into a single continuous view, however, requires the projection to be linear thus the non-linear distortion must be corrected first. If not, the order of the stacked-up non-linear transfer function of each FoV onto the whole slide will keep growing by each FoV and become very inefficient to solve. Designed to fit any combination of microscope and smartphone models, sWSI assumes a generalized high-order polynomial (HOP) inverse-distortion model [9], which mathematically approximates any function with marginal error if the order is sufficiently high, as proven by the Taylor’s theorem. Specifically, it is assumed that there exists a constant but unknown HOP projection function for each scan procedure that maps the raw FoVs into a corrected 2D space, where any matched pixel pairs in overlapping FoVs share the same phase difference for that FoV pair. In another word, after the raw FoVs are corrected by a HOP function, each adjacent FoV pair can be stitched with just translation onto each other with small error. In sWSI, this HOP projection matrix is solved iteratively based on FoV-pair-wise KP matching, formulated as following. First, assume the HOP model has orders. Also name the two FoVs in the th FoV pair as and , whereas is stitched onto . For point on with a 2D coordinate , its polynomial expansion kernel is derived as [9] , where  ¯ϕOp(u,v)=(1,u,v,u2,uv,v2,…,vOp) (2) thus with the number of dimensions being . Similarly, the point’s exact match in has a coordinate and kernel and , respectively. For simpler notations, also define . Next, note the correction projection matrix as . The linear projection used to stitch the pair after correction is an affine one in the form of , where and are the translation and rotation components, respectively. The whole model would ideally satisfy  ¯xi,jβRi+¯Ti−¯yi,jβ=0 (3) for all point pairs across all FoV pairs, where is constant and are FoV-pair-dependent but point-pair-independent. In reality, correction error exists and the process turns into solving a constrained optimization problem  β,{¯Ti,Ri}=argminβ,{¯Ti,Ri}∑iSi(β,¯Ti,Ri), (4) where  Si(¯β,¯Ti,Ri)=∑j[||¯xi,jβRi+¯Ti−¯yi,jβ||2+λ||¯xβ−^xi,j||2]. (5) The term here prevents the projections from collapsing into all zeros and is used. It should be noted that based on the assumption that should correct and only correct non-linear distortions, the elements in  β=[β00,β10,…,βNe−10β01,β11,…,βNe−11]T=[β00,β10,β20,~β0β01,β11,β21,~β1]T (6) satisfy  β00=β01=β20=β11=0,β10=β21=1 (7) Then, the multi-variable non-linear equation of Eq. 4 can be solved by iteratively fixing either or and find the least-mean-square solution of the other until convergence. Specifically, by freezing ,  [¯TTi,RTi]T=(∑j(¯yi,jβˇxTi,j))(∑j(ˇxi,jˇxTi,j))−1 (8) where ]. By keeping constant and splitting the elements as  ¯Ti=[Ti,0Ti,1],Ri=[Ri,00Ri,01Ri,01Ri,11] (9) variable elements in can be solved as  [~β0~β1]=∑i,j((MTi,jMi,j)−1)∑i,j(MTi,j~Li,j) (10) where  Mi,j=[Mi,j,00(~x,~y)Mi,j,01(~x,~y)Mi,j,10(~x,~y)Mi,j,11(~x,~y)]~Li,j=[~Li,j,0(~xi,j,~yi,j)~Li,j,1(~xi,j,~yi,j)]. (11) Omitting subscription and for simplicity, elements in Eq.11 are calculated from  M00=(R00~x−~y)T(R00~x−~y)+R201~xT~x+λ~xT~xM01=R10(R00~x−~y)T~x+R01~xT(R11~x−~y)M10=R10~xT(R00~x−~y)+R01(R11~x−~y)T~xM11=R210~xT~x+(R11~x−~y)T(R11~x−~y)+λ~xT~x (12)  ~L0=−~T0(R00~x−~y)T−R01~T1~xT~L1=−R10~T0~xT−~T1(R11~x−~y)T (13) where and are respective sub-vectors of and calculated as  ~ϕOp(u,v)=(u2,uv,v2,u3,u2v,…,vOp). (14) From experiments, it is shown that is generally sufficient and the model takes about 10 iterations to converge. ## 5 Results and Evaluations Currently, sWSI services are in open beta test for iPhones internationally with a special version in P.R.China for both Android and iPhones. Due to national Internet gateway issues, using the version on the opposite side of the boarder may experience slower connection. Sample slides produced by both Android and iPhones can be accessed on the products homepage[6], with data available for further evaluation by contacting the authors. The sWSI systems have been used by both trained pathologists and assistant technicians to scan hundreds of samples with satisfaction, sometimes preferred over automatic scanners for versatility and robustness. 30 of sWSI users (including 22 pathologists) from 8 hospitals in Shanghai, Nanjing, and Luoyang, P.R.China are randomly chosen and continuously followed for at least one month to complete a survey. Most of them report about throughput of 1 FoV per second after just using it for a few times and no failure encountered as long as they operated properly. Statistics of their rating on sWSI experience compared to taking still images and using automatic scanners is summarized in Table 5, with a score scale of 1 (Poor) to 5 (Perfect). It should be noted that many of these respondents uses automatic scanners almost daily, thus may have overrated the cost-for-effect of them as compared to those work in low-level hospitals who do so much less often. Yet, there are a few issues to be solved in future research and development. Firstly, the FoVs are often non-evenly illuminated and leaves noticeable brightness discontinuity in the virtual slide, possibly caused by improper installation of handsets on the adapter or poor light source. Secondly, some parameters on Android phones cannot be controlled through API for older Android OS. Weakened control may lead to improper configuration, such as a long exposure time causing blur. Lastly but not least, the openGL driver which offers GPGPU computing potential, are very tricky to work with and produces unexpected results on many smartphone models for reasons unknown. Preliminary research using GPGPU on iPhones brought a dramatic boost in processing speed over 60%, but older models upgraded to iOS10 no longer works properly. ## 6 Conclusion In this paper, an ultra-low-cost whole slide imaging system with client hosted on mainstream Android and iOS smartphones is introduced and analyzed. Compared to automatic scanners and high-end-computer-based solutions, this alternative dramatically reduces the setup cost to as low as$100 per unit with service cost under \$1 per scan.

By employing distributed image processing, both robustness and efficiency are covered. Through high performance computing and realtime feedback, user friendliness is optimized with minimal manual input, leaving most interface-kernel coordination and even image distortion correction fully automated. 30 surveyed clinical professionals give sWSI a higher score on most aspects as compared to automatic scanners, except for a slightly poorer image quality and lower throughput.

### References

1. Ghaznavi F, Evans A, Madabhushi A, et al. Digital Imaging in Pathology: Whole-Slide Imaging and Beyond. Annual Review of Pathology-mechanisms of Disease, 2013, 8(1): 331-359.
2. Gherardi A, Bevilacqua A. Real-time whole slide mosaicing for non-automated microscopes in histopathology analysis. Journal of Pathology Informatics, 2013, 4(2)
3. Piccinini F, Bevilacqua A, Lucarelli E, et al. Automated image mosaics by non-automated light microscopes: the MicroMos software tool. Journal of Microscopy, 2013, 252(3): 226-250
4. Auguste L, Palsana D. Mobile Whole Slide Imaging (mWSI): a low resource acquisition and transport technique for microscopic pathological specimens. BMJ Innovations, 2015: bmjinnov-2015-000040.
5. Bay H, Tuytelaars T, Van Gool L, et al. SURF: speeded up robust features. Lecture Notes in Computer Science, 2006: 404-417.
6. Dr. Terry Information Technology. http://www.mydigipath.com/.
7. Ozdalga, Errol. ”The Smartphone in Medicine: A Review of Current and Potential Use Among Physicians and Students.” Journal of Medical Internet Research 14.5 (2012).
8. Juan, Luo, and Gwun Oubong. ”SURF applied in panorama image stitching.” international conference on image processing (2010): 495-499.
9. Kaynig, Verena, et al. ”Fully automatic stitching and distortion correction of transmission electron microscope images.” Journal of Structural Biology 171.2 (2010): 163-173.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters