The link to the formal publication is via
http://dx.doi.org/10.1016/j.compeleceng.2017.06.008
.
An Analysis Framework for Hardware and Software Implementations with Applications from Cryptography
Abstract
With the richness of presentday hardware architectures, tightening the synergy between hardware and software has attracted a great attention. The interest in unified approaches paved the way for newborn frameworks that target hardware and software codesign. This paper confirms that a unified statistical framework can successfully classify algorithms based on a combination of the heterogeneous characteristics of their hardware and software implementations. The proposed framework produces customizable indicators for any hybridization of processing systems and can be contextualized for any area of application. The framework is used to develop the Lightness Indicator System (LIS) as a casestudy that targets a set of cryptographic algorithms that are known in the literature to be tiny and light. The LIS targets stateoftheart multicore processors and highend Field Programmable Gate Arrays (FPGAs). The presented work includes a generic benchmark model that aids the clear presentation of the framework and extensive performance analysis and evaluation.
keywords:
Analysis, Hardware, Software, Gate Arrays, Algorithms, Cryptography1 Introduction
With the advancements in highperformance computing, algorithms have a wide range of efficient implementation options. Current computers can be equipped with multicore processors, Graphics Processing Units (GPUs), and highend programmable devices, such as, FPGAs. The variety of processing options are supported by a wealth of codesign tools that facilitates hardware and software implementations JD072 (); KDH08 (). Nevertheless, several questions remain on what algorithm is the best to suite an implementation option, and viceversa. How would an algorithm perform within hybrid processing systems, and how to make an evaluation based on heterogeneous performance measurements?
The core of any performance measurement includes measures, metrics, and indicators. Indicators are defined as qualitative or quantitative factors, or variables that provide simple and reliable means to measure achievement. A qualitative performance indicator is a descriptive characteristic, an opinion, a property or a trait. However, a quantitative performance indicator is a specific numerical measurements resulted by counting, adding, averaging numbers or other computations asbjorn1995benchmarking (). Qualitative and quantitative measurements can be combined to define measurement frameworks and benchmarks damajsustainability (). There is a large number of hardware and software benchmarks in the literature. Yet, limited research work is reported to address developing analysis frameworks for heterogeneous hardware and software implementations.
In this paper, we present a statistical analysis framework for performance profiling of related algorithms running under different hardware and software subsystems. The framework comprises criteria, indicators, and measurements obtained from heterogeneous sources. The measurements are statistically combined to produce indicators that capture the algorithmic, software, and hardware characteristics of the assessed algorithms. The developed framework enables the deep and thorough reasoning about each hardware and software subsystem, and combines heterogeneous characteristics to provide overall ratings, rankings, and classifications. The proposed framework is customizable for any hybridization of processing systems and can target any model of computation or area of application.
The paper includes the development of a generic benchmark model that serves as a specification pattern of analysis and evaluation frameworks. The model captures the activities, resources, implementation, mathematical formulation, and intended measurements of an analysis framework or a benchmark. The developed model can be used to describe any benchmark with simplicity and clarity. The model is adopted to present the proposed analysis framework.
To validate the proposed framework in application, a casestudy is carried out with application from cryptography. The casestudy enables the development of theLIS with its bouquet of statistical indicators. The LIS formulates the proposed framework within the context of lightweight cryptographic algorithms. The proposed performance analysis classifies the investigated algorithms into a combination of their mathematical, software, and hardware characteristics. The two main targeted high performance computing devices are multicore processors for software implementations and FPGAs for hardware implementations.
The rest of the paper is organized so that Section 2 surveys the literature. In Section 3, the motivation, research questions, and the paper contribution are presented. In Section 4, the generic benchmark model and the analysis framework are presented. Section 5 introduces the LIS according to the generic model. A thorough performance analysis and evaluation is presented in Section 6. Section 7 concludes the paper and sets the ground for future work.
2 Related Work
2.1 Benchmarks
Benchmarks are widely addressed in the literature. Famous benchmarks include Whetstone, LINPAC, Dhrystone, Standard Performance Evaluation Corporation (SPEC), etc Linpack2011 (); weicker1984dhrystone (); henning2000spec (). Several developments of embedded systems Benchmarks are lead by the Embedded Microprocessor Benchmark Consortium (EEMBC). EEMBC helps system designers in selecting the optimal processors, smartphones, tablets, and networking appliances. EEMBC mainly targets embedded system’s hardware and software. EEMBC organizes its benchmark suites targeting automotive, digital media, multicore processors, networking, automation, signal processing, handheld devices, and browsers. The benchmarks developed by EEMBC include AutoBench, BrowsingBench, AndBench, and MiBench EEMBC2014 (). Cryptography benchmarks are designed to measure the performance of different cryptographic algorithms running under different systems, such as, GPUs or other processors. Rukhin et al. in rukhin2001statistical () presents a statistical test suite for random and pseudorandom number generators for cryptographic applications. Yue et al. in yue2006npcryptbench () presents a cryptographic benchmark suite for network processors (NPCryptBench).
2.2 Hardware/Software CoDesign Evaluation Frameworks
Performance analysis and evaluation within hardware/software (HW/SW) codesign investigations are usually based on a variety of metrics. Besides standard metrics, such as execution time, maximum frequency, throughput, hardware resource utilization, power consumption, etc., several metrics are identified within the context of application. JainMendon and Sass in JainMendon2014873 () proposed a HW/SW codesign approach for implementing sparse matrix vector multiplication on FPGAs. Within the context of application, the authors evaluated their approach by analyzing the hardware and software implementations in terms of the speed of processing floating point operations, bandwidth efficiency, data block size, communication time, etc. LumbiarresLopez et al. LumbiarresLopez2016324 () implemented, within a codesign environment, a countermeasure against sidechannel analysis attacks. The used applicationspecific metrics comprise the difference in change of input current over time and correlations between data and power consumption. All the aforementioned investigations employed the standard codesign metrics. In Park20131578 (), the performance of block tridiagonal solvers was evaluated under heterogeneous multicore processors and GPUs. The evaluation was mainly based on analyzing memory performance and measuring the total execution times of different scenarios.
The standard metrics of codesign applies to partitioned hardware and software implementations. The focus in partitioned implementation is the analysis and evaluation of the developed subsystems with an aim to find the best possible partitioning strategy. Wu et al. Shi20147 () studied the performance and algorithmic aspects of a proposed heuristic partitioning algorithm. The produced implementations were analyzed withrespectto execution time, resource utilization, and the attained solution quality as related to the smallest possible error. In Jemai2015259 (), Jemai and Ouni proposed a partitioning strategy based on control data graphs. The partitioning algorithm was deployed within three different casestudies. The metrics analyzed across the three studies comprised the number of partitions, software execution time, hardware resource utilization, software resource utilization, etc. The authors adopted a pattern chart that summarizes the performance analysis results and aids the evaluational of the algorithm.
Nevertheless the evaluation approaches for HW/SW codesign considered various aspects, limited attempts were made to combine multiple measurements and characteristics in unified indicators. Spacy et al. in Spacey2009159 () investigated the automatic quantification of acceleration opportunities for programs across a wide range of heterogeneous architectures. The investigation focused on allowing designers to identify promising implementation platforms before investing in a particular HW/SW codesign and a specific partitioning scenario. The authors unified many hardware and software characteristics into a single execution time estimate. The incorporated hardware characteristics included cycle time, number of parallel execution units, execution efficiency, bus latency, bus width, and hardware size capacity. The employed software characteristics included the execution time, the number of parallel execution slots, program code unit iterations, control flows, data flows, and size of codes. Additional composite indicators were developed to calculate speedup factors. The combined characteristics were used to calculate codesign performance estimates and evaluating opportunities for hardware acceleration.
3 Research Objectives
The modern trend in computer systems is clearly in the direction of further hybridization using highend coprocessing systems. Hybrid systems are mainly studied within HW/SW codesign and coanalysis frameworks. To increase the effectiveness of coanalysis and accordingly codesign, the following research opportunities are highlighted:
1.0

The identification of commonlyused metrics in HW/SW codesign

Addressing the need for the contextualization of applicationspecific metrics, properties, and key indicators in HW/SW codesign applications

Evaluating implementations  given the heterogeneous characteristics of the targeted systems

The identification of optimized combinations of hardware and/or software implementations based on coanalysis

The limited attempts in the literature to combine multiple measurements and characteristics in unified indicators that can rank, rate, classify, and evaluate algorithms for hardware and software implementations

The limited work in the literature to develop coanalysis frameworks that target cryptographic algorithms
The research objective of this paper is mainly to develop a statistical framework that can combine heterogeneous characteristics of algorithms and their implementations in hardware and software. The framework aims at being portable across different hardware and software systems, customizable, scalable, and able to target any area of application. In addition, the framework aids the composition of a bouquet of indicators to capture specific desirable properties and enable classifying, ranking, rating, and evaluating algorithms and implementations. In addition, the objectives include the follows:
1.0

Provide a generic benchmark model that serves as a specification pattern of analysis and evaluation frameworks. The developed model aims at being clear, simple, and highly reusable. The model is used to present the developed analysis framework.

Validate the proposed framework by developing the Lightness Indicator System for cryptographic ciphers.

Perform a thorough analysis and evaluation based on the LIS system for a set of cryptographic ciphers.

Studying the integration of the developed framework within and Integrated Development Environment (IDE) that can connect to various hardware and software implementation and analysis tools.
The paper includes a thorough evaluation of the framework and a discussion on its usefulness.
4 The Generic Model and The Analysis Framework
4.1 The Generic Benchmark Model
The proposed generic model diagrams the continuum of important elements of benchmarks and analysis frameworks. The generic model defines the goal, inputs, activities, output, outcomes, and the desired performance profile of a benchmark. The model captures the relationships among the resources, implementation, mathematical formulation, and the obtained results. Moreover, it standardizes the evaluation process that can be applied to any benchmark. The proposed model consists of the following six elements; the model elements are diagrammed in Figure 1:
1.0

Goal: is the definition of the aim of the benchmark, or the analysis framework, and what does it mainly provide.

Input: is the identification of the algorithms under study, implementation environments, reference algorithm, performance metrics, etc.

Activities: are the implementations of the algorithms under the identified environments and collection of results.

Output: is the formulation of the key indicators and development of their rubrics  if any.

Outcomes: are the formulations of the statistical assessment as combinations of the Output.

Performance: is the application of the developed assessment framework to profile and classify algorithms according to the obtained results.
The proposed model provides a generic profiling pattern than can be used for any benchmark or analysis framework.
4.2 The Analysis Framework
The proposed analysis framework classifies the heterogeneous sources of measurements into analysis profiles (APs), such as, general algorithmic, hardware, and/or software. The development of each profile includes the identification of a set of Key Indicators (KIs). The indicators are the most extensive part of the measurement framework and should be carefully developed within the context of application. The measurements associated with the identified indicators may include quantities, scores from scalerubrics, etc. The measured indicators are then each divided by a measurement from a reference algorithm for normalization and for producing performance ratios. Accordingly, Combined Measurement Indicators (CMIs) are calculated using the Geometric Mean of KI ratios. The generic equation of CMIs is as follows:
Where
is the KI of the AP,
and is the reference measurement of the indicator
The Geometric Mean is used, in the CMI equation, as it is able to measure the central tendency of data values that are obtained from ratios. Using the Geometric Mean insures the following two important properties bullen2003handbook (); denscombe2014good (); hennessy2011computer ():
1.0

Geometric Mean of the ratios is the same as the ratio of Geometric Means.

Ratio of the Geometric Means is equal to the Geometric Mean of performance ratios; which implies that when comparing two different implementations’ performance, the choice of the reference implementation is irrelevant hennessy2011computer ().
The developed statistical framework can be applied in different areas of applications and using different APs. The application includes contextualizing the KIs of every AP according to the characteristics of the targeted area.
5 The Application of the LIS System to Cryptographic Algorithms
In the following sections, we present the LIS based on the generic model.
5.1 Goal
The LIS is a HW/SW statistical framework that provides performance profiling for lightweight cryptographic algorithms running under different hardware and software systems. The LIS combines a bouquet of performance metrics that include speed, algorithmic complexity, memory efficiency, algorithmic strength, hardware size, etc.
5.2 Input
The input identifies the targeted algorithms, computing systems, and the performance metrics. The LIS targets a set of cryptographic algorithms that are specific for applications running on low resources. The selected ciphers are classified, in the literature, as tiny, small, lightweight, or ultralightweight ciphers. The targeted ciphers are Skipjack, 3WAY, XTEA, KATAN and KTANTAN, and Hight. The reference cipher is the AES daemen2002design ().
The two targeted high performance computing devices are the Dell Precision T7500 with its dual quadcore Xeon processor and 24 GB of RAM. The targeted FPGA is Altera StratixIV.
The identified performance metrics of the LIS are classified into general algorithmic, hardware, and software profiles. The general algorithmic profile includes the complexity of the algorithms and their security strength. Within the multicore environment, the software profile includes execution times, clocks per instruction, throughput, and a cache analysis. Within the FPGA environment, the hardware profile includes the resource utilization, propagation delay, throughput, power consumption, etc.
5.3 Activities
The activities includes software implementations under C and hardware implementations under VHDL. The Software tools used for software and hardware implementations and profiling are Quartus, ModelSim, and Intel VTune Amplifier under Visual Studio.
5.4 Output
The outputs of the analysis framework are measures and indicators. The measures are the general algorithmic, hardware, and software profiles. The indicators of the general algorithmic profile intend to capture the complexity and ciphering strength of the algorithm; the KIs include the following in bold:
1.0

Algorithm Complexity (AC): Asymptotic complexity analysis using the Big, small, and notations

Cipher Strength: based on Key Size (KS), Number of Rounds(NR), and the text Block Size (BS)
Complexity analysis of algorithms is the determination of the amount of resources necessary to execute them. To analyze the complexity of studied algorithms, we study their asymptotic behavior. The asymptotic behavior classifies algorithms according to their rate of growth with respect to the increase in input size. The following standard complexity analysis classification is adopted cormen2001introduction ():
1.0

: The rate of growth of an algorithm is asymptotically no worse than the function but can be equal to it.

: The rate of growth of an algorithm is asymptotically no better than the function .

: The rate of growth of an algorithm is asymptotically equal to the function .
Here, is the size of input.
To facilitate the assessment of the studied ciphers, a rubric is created. The rubric scale points are logarithmic low (LL), logarithmic high (LH), Linear (L), Almost Quadratic (AQ), and Higher than Quadratic (HQ). For instance, LL describes the case when the complexity is asymptotically no worse than but can be equal to it; such a complexity is formulated as . The complete description of the rubric is shown in Table 1. In preparation for the statistical formulation, we map this qualitative properties onto quantities. For every point in the scale, we map it onto a fixed number. Hence, each point in the scale is mapped onto the values 20%, 40%, 60%, 80%, and 100%.
General  Scale  

Indicator  Logarithmic Low  Logarithmic High  Linear  Almost Quadratic  Higher than Quadratic 
Complexity Analysis  but better than Linear  but worse than Linear 
Cipher Strength is an assessment of the algorithm based on a variety of aspects that can include Key Size, the Number of Rounds, and the Block Size jorstad1997cryptographic (). Key size or key length is the size measured in bits of the key used in cryptographic algorithms. The security of the cryptographic algorithms is function of the length of its key. For some algorithms, such as those targeted in this investigation, the longer the key, the more resistant is the algorithm menezes2010handbook (). However, in the broader context, the relation between key lengths and security could be more delicate lenstra2004key (). For example, key sizes of 80, 160, and 1024 bits, nevertheless different, they imply comparable security when 80 is for a symmetric cipher, 160 is for a hash length, and 1024 is for RSA modulus lenstra2004key (). In addition, Elliptic Curve Cryptography (ECC) is famous for its strength that can be attained at relatively small key sizes. For instance, a comparable security level can be achieved using RSA with a key size of 15360 bits and ECC with a key size of only 512 bits maletsky2015rsa (). Investigations relating the level of security, and the strength of the algorithm, to the key size are given wide and careful attention in the literature lenstra2004key (); maletsky2015rsa (); Keylength2017 (). The most recent standardized key size requirements for security are published at Keylength2017 ().
Furthermore, block ciphers transform a plaintext block of several bits into an encrypted block. The block size cannot be too short in order to secure the cryptographic scheme. In other words, the larger the block size is, the greater the cipher strength menezes2010handbook (). In addition, rounds are important to the strength of ciphers; a single round is usually responsible for mixes, permutations, substitutions, and shifts in the text being encrypted. Mostly, more rounds lead to greater confusion and diffusion and hence stronger security. Indeed, indicators like the Key Size, Number of Rounds, and Block Size should be carefully adopted and specified within the scope of the targeted cryptographic algorithms. The proposed indicators are not necessarily applicable to all cryptographic algorithms.
The software profile includes the following indicatorsPattHenn2013 ():
1.0

Execution Time (ET): the time between the start and the completion of a task.

Throughput (TH): the total amount of work done in a given time .

Clock Cycle per Instruction (CPI): the average number of clock cycles each instruction takes to execute.

Cache Miss Ratio (CMR): the ratio of memory accesses cache miss.
The hardware profile includes ET, TH and the following indicators: \setstretch1.0

Propagation Delay (PD): the time required for a signal from an input pin to propagate through combinational logic and appear at an external output pin.

LookUp Table (LUT): the number of combinational adaptive lookup tables required to implement an algorithm in hardware. The number of LUTs is an indicator of the size of hardware in Altera devices. In other devices, the area could be measured in terms the total number of gates, logic elements, slices, etc.

Logic Register (LR): the total number of logic registers in the design.

Power Consumption (PC): the power consumption of the developed hardware in Watts.
5.5 Outcomes
The Outcomes element is the formulation of CMIs as function of KIs. The Lightness Indicator LI is the main CMI calculation in the presented statistical analysis framework. The LI is calculated in terms of several APs; three for the current study, namely the General Algorithmic Profile (GAP), Software Profile (SWP), and Hardware Profile (HWP). The simplified form of LI is shown in Equation 1:
(1) 
and hence
Where is the number of key indicators.
The weighted version of LI is denoted by wLI in Equation 2. The weighted version enables the emphasis of specific indicators. If all the assigned weights are equal, the wLI is the same as LI.
(2) 
Where is the weight of the ratio.
The LIS enables the classification of cryptographic algorithms according to their lightness. A higher LI is achieved through a higher throughput, a more efficient memory performance, more compact size, less complexity, less power consumption, and less resource utilization. The LI is either directly or inversely proportional to the indicators. The aim of the chosen proportion is to emphasize lightness; the proportions could be modified to capture other properties. The master LIS formula using the developed indicators is shown in Equation 3. The indicators that are common to the Software (sw) and Hardware (hw) profiles are labeled with the profile name.
(3) 
The LIS provides the following set of combined statistical indicators:
Complexity Indicator (CI):
Security Strength Indicator (SSI):
Hardware Lightness Indicator (HLI):
Software Lightness Indicator (SLI):
Speed Indicator (SI):
5.6 Performance
The analysis based on the LIS Output and Outcomes provides measurements for all KIs and enables the calculation of the defined CMIs. The results include rating, ranking, and classifying the targeted algorithms. The analysis and evaluation of results are presented in Section 6. The six elements of the LIS are summarized in Figure 2
5.7 Programming Interface
The developed statistical framework is embedded in a sample codesign IDE. The IDE is implemented using Java under Netbeans. Moreover, the code editor is implemented using RSyntaxTextArea Java framework, while the IDE theme is implemented using JTattoo Java framework. The used implementation and performance evaluation tools comprise Altera Quartus and Altera ModelSim for Hardware implementation and analysis, and Intel vTune Amplifier under Visual Studio for Software analysis. The developed IDE connects to Altera Quartus using TCL commands to synthesize and generate timing analyses, pin assignments for FPGA boards, and generate bit files to program the targeted FPGAs. The IDE connects to Intel vTune Amplifier, using Command Line and Batch Files, to perform the software analysis and calculating the total execution time, CPI, etc. The generated hardware and software analysis files are exported to MS Excel to produce the complete analysis profile and charts.
6 Analysis and Evaluation
6.1 Performance Analysis
The LIS is an application of an analysis framework that can be the core part of a benchmark within HW/SW codesign. The developed framework is applied by developing the LIS on several cryptographic ciphers that are presented in the literature as lightweight, tiny, small, or minute. The developed system enables the validation of the lightness of the algorithms through measurements and statistical analysis.
In accordance with our generic benchmark model, and upon the identification of the system Goal and Input, the Activities are done according to the following procedure:
1.0

Implement hardware using VHDL under Quartus

Analyze the hardware profile using Quartus and ModelSim

Implement software using Clanguage under Visual Studio and its integrated Intel VTune Amplifier

Analyze the software profile using Intel VTune Amplifier

Derive and analyze the general algorithmic profile

Combine and analyze the results from all profiles using a statistical software tool
With the finalization of Activities, the following steps complete the elements of the framework:
1.0

Produce the Output key indicators

Calculate the combined indicators of the Outcomes

Build the overall Performance report
Tables 2, 3, and 4 present the derivation and implementation results of the general algorithmic, software, and hardware profiles. On the simple indicators level, the Skipjack algorithm achieved the highest software execution throughput of , while the highest hardware execution throughput, is achieved by the KTANTAN48 algorithm. The 3Way algorithm attained the smallest hardware area of and .
Algorithm Name 
AC  Mapped AC  KS  NR  BS 

Skipjack 
AQ  0.8  80  32  64 
XTEA  AQ  0.8  96  64  64 
3WAY  AQ  0.8  128  11  96 
HIGHT  AQ  0.8  128  32  64 
KATAN32  AQ  0.8  80  254  32 
KATAN48  AQ  0.8  80  254  48 
KATAN64  AQ  0.8  80  254  64 
KTANTAN32  AQ  0.8  80  254  32 
KTANTAN48  AQ  0.8  80  254  48 
KTANTAN64  AQ  0.8  80  254  64 
AES  AC  0.8  192  12  128 
Algorithm Name  BS  ET(sec)  TH(Mbps)  CPI  CMR 

Skipjack  64.000  0.410  156.098  1.327  0.164 
XTEA  64.000  2.570  24.903  0.729  0.033 
3WAY  96.000  2.320  41.379  1.107  0.036 
HIGHT  64.000  8.640  7.407  1.330  0.000 
KATAN32  32.000  27.460  1.165  0.634  0.006 
KATAN48  48.000  40.330  1.190  0.634  0.004 
KATAN64  64.000  52.830  1.211  0.627  0.003 
KTANTAN32  32.000  791.080  0.040  0.986  0.001 
KTANTAN48  48.000  803.320  0.060  0.975  0.001 
KTANTAN64  64.000  821.830  0.078  0.965  0.001 
AES  128.000  23.210  5.515  1.235  0.004 
Algorithm Name  ET(sec)  TH(Mpbs)  PD(nsec)  ALUT  LR  PC(mW) 

Skipjack  7.49  8.55  11.90  554.00  142.00  331.01 
XTEA  6.18  10.35  11.10  2799.00  135.00  332.77 
3WAY  0.80  120.00  3.82  77.00  167.00  331.01 
HIGHT  1.85  34.59  127.78  2036.00  72.00  332.66 
KATAN32  1.47  21.77  43.57  2145.00  540.00  328.63 
KATAN48  1.89  25.40  79.94  3982.00  556.00  329.95 
KATAN64  2.38  26.89  78.31  4315.00  572.00  330.94 
KTANTAN32  0.09  372.09  40.03  1947.00  112.00  328.58 
KTANTAN48  0.10  480.00  72.78  3662.00  128.00  329.81 
KTANTAN64  0.15  438.36  79.30  4075.00  144.00  331.00 
AES  1.46  6.83  5.35  3998.00  750.00  654.87 
The Lightness, Complexity, Security Strength, Hardware Lightness, Software Lightness, and Speed indicators are shown in Table 5. The algorithm that attained a larger indicator value is lighter, of a less complexity, of a higher security strength, or faster than the algorithm with a lower indicator value. Figures 3, 4, 5, and 6 present a perCMI comparison. The 3Way got the best lightness index of , while attained the lowest with an index of . The best CI index is ; attained by the 3Way algorithm. The best SSI, HLI, and SI indices are , , and ; all attained by the 3Way algorithm. The HIGHT algorithm achieved the highest SLI index of .
Algorithm Name  LI  CI  SSI  HLI  SLI  SI 

Skipjack  1.57  1.11  1.16  1.42  2.46  2.81 
XTEA  1.22  1.05  0.93  1.18  1.70  1.48 
3WAY  2.52  1.19  1.22  5.24  1.75  5.08 
HIGHT  1.93  1.01  1.03  2.02  3.40  1.43 
KATAN32  0.89  0.98  0.82  1.12  0.69  0.59 
KATAN48  0.79  0.90  0.74  0.90  0.70  0.47 
KATAN64  0.76  0.85  0.69  0.86  0.71  0.44 
KTANTAN32  1.04  0.89  0.82  3.88  0.18  0.48 
KTANTAN48  0.95  0.83  0.74  3.14  0.20  0.47 
KTANTAN64  0.89  0.78  0.69  2.76  0.21  0.44 
AES  1.00  1.00  1.00  1.00  1.00  1.00 
6.2 General Evaluation
The current investigation can be evaluated at the levels of the framework development, application, and contextualization. The developed framework is unique in combining algorithmic, hardware, and software characteristics to provide unified performance evaluation criteria and useful performance indicators. The framework addresses the need for methods that can analyze the performance and deal with the hybrid nature of modern computing systems. The investigation proposes the creation of unified indexes/indicators that can capture specific qualities in terms of a wide range of heterogeneous key performance indicators, such as the LIS. The LI served as a master CMI while an indicator like the SI is developed with focus on speed. Indeed, the framework is scalable and upgradeable without changing the statistical computation or the structure of the measurement. For instance, an additional AP can be incorporated into the calculations of the LIS to include the performance characteristics of GPUs.
At the application level, the developed framework can be used to examine qualities of importance and interest to developers or users. For example, the presented LIS enables the indexing and classifying of cryptographic algorithms. Here, the qualities of importance are the lightness, speed, complexity, and security strength. The LIS can be applied to examine the same qualities for a similar area of application, such as, signal processing. In signal processing, the SLI and HLI can be reused. However, the LI and CI needs to be redefined within the context of signal processing, and the SSI is not applicable. Signal processors are usually embedded within realtime application and characterized by their numerical accuracy, acceleration schemes, and the ability to perform fast computations and data access ingle2016digital. A Reliability and Accuracy Indicator (RAI) CMI can be created to combine the desired characteristics and capture, index, and aid in the classification of signal processing algorithms.
The contextualization of the framework in relation to the targeted area of application produces a rich and comprehensive set of reference KIs. KIs, such as ET, TH, CPI, CMR, PD, LUT, LR, and PC, are independent of the context of application and thus highly reusable. Other KIs, such as KS, NR, and BS are specific to cryptographic algorithms. KIs can measure quantities or describe qualities. Qualities can be easily specified using rubrics and mapped onto quantities that can be substituted into the indicator equations. KIs should be carefully identified and developed by experts in the targeted area of application, and supported by evident applicability. The contextualization of the application for the proposed RAI, in signal processing, can comprise KIs, such as, Memory Access Time as a quantitative measurement. The availability of Specialized Addressing Modes can be captured as a qualitative indicator.
This paper presents a generic model that specifies the elements of benchmarks and/or analysis frameworks. The benchmark model is used to present the LIS, nevertheless the model can be used to describe any benchmark. The developed model is generic, simple, concise, and aids the clear description of benchmarks using a unified pattern.
The developed statistical framework is applied through a casestudy that targets a class of cryptographic algorithms. The selected algorithms are presented in the literature as tiny, small, minute, and light. The casestudy provided a unified classification criteria that include LI. The proposed framework successfully classified the targeted algorithms according to their hardware, software, and algorithmic characteristics. The addressed algorithms are widely implemented, analyzed, and evaluated in the literature. The work presented in the literature is limited to single algorithm evaluation, single system implementation, such as either hardware or software, and still make holistic claims of lightness based on limited number of indicators. The used reference implementation is the AES cipher with a key size of bits. The use of other key sizes, such as bits, doesn’t change the algorithm classifications or falsify the analysis as it is consistently applied for all the targeted algorithms. However, different indicator values are expected.
The developed framework is intended to capture hardware and software properties. The current investigation is limited to nonpartitioned implementations, where the whole computation is delegated to a coprocessor. Partitioned implementations can be evaluated, based on the proposed framework, by analyzing the KIs of the hardware and software subsystems. The obtained KI measurements capture the subsystem characteristics. In addition, carefully defined CMIs can rank, rate, and classify different partitioning strategies per optimization target, such as, area, speed, power consumption, etc.
An example similar investigation is presented by Spacey et al. in Spacey2009159 (). The authors combined several hardware and software characteristics within a heuristic to produce a single execution time estimate. The best time estimate is identified based on heterogeneous performance and architectural characteristics of different hardware and software partitions. Our proposed framework would, with no doubt, enrich such investigations and provide versatile estimates with CMIs such as HLI, SLI, CI, SI, and/or other customized indicators.
7 Conclusion
Modern highperformance computers are hybrids of multicore processors, GPUs, FPGAs, etc. In this paper, a statistical framework is developed to provide thorough analysis and evaluation of algorithms and their implementations on different processing systems. A generic benchmark model is created to present the framework with clarity. The framework categorizes processing subsystems into profiles, where each can be contextualized according to a specific application. The statistical framework is adopted to analyze and evaluate a set of cryptographic algorithms that are claimed to be small in size, tiny, and efficient. The proposed framework enabled the creation of several key indicators including the lightness, complexity, security strength, and speed indicators. The two main targeted highperformance computing devices are multicore processors for software implementations and highend FPGAs for hardware implementations. The developed lightness indicator ranks the 3Way algorithm as the lightest among all with an LI of 2.52. Hight achieves the second best lightness with a score of 1.93. The lowest score of 0.79 was attained by KATAN64. The casestudy validates the statistical framework and leads to a successful classification of the targeted algorithms. The obtained results are based on a combination of three profiles including the algorithmic, software, and hardware profiles. The presented framework enjoys being scalable, upgradeable, and portable acrossapplications. Future work includes incorporating additional processing systems, targeting other areas of application, and embedding the framework within a codesign IDE and target partitioned implementations.
References
References
 (1) I. Damaj, Parallel algorithms development for programmable devices with application from cryptography, International Journal of Parallel Programming 35 (6) (2007) 529–572.
 (2) S. Kasbah, I. Damaj, R. Haraty, Multigrid solvers in reconfigurable hardware, Computational and Applied Mathematics, Elsevier 213 (2008) 79 94, issue 1.
 (3) A. Rolstadås, I. F. for Information Processing, Benchmarking: theory and practice, Vol. 148, Chapman & Hall, 1995.
 (4) I. Damaj, A. A. Kranov, The sustainability of technical education: A measurement framework, in: The American Society of Engineering Education MidAtlantic Conference, ASEE, New York, US, 2013, pp. 47–59.
 (5) J. Dongarra, P. Luszczek, Encyclopedia of Parallel Computing, Springer US, 2011, Ch. LINPACK Benchmark, pp. 1033–1036.
 (6) R. P. Weicker, Dhrystone: a synthetic systems programming benchmark, Communications of the ACM 27 (10) (1984) 1013–1030.
 (7) J. L. Henning, SPEC CPU2000: Measuring CPU performance in the new millennium, Computer 33 (7) (2000) 28–35.

(8)
EEMBC, Website (2017).
URL http://www.eembc.org/  (9) A. Rukhin, J. Soto, J. Nechvatal, M. Smid, E. Barker, A statistical test suite for random and pseudorandom number generators for cryptographic applications, Tech. rep., DTIC Document (2001).
 (10) Y. Yue, C. Lin, Z. Tan, Npcryptbench: a cryptographic benchmark suite for network processors, ACM SIGARCH Computer Architecture News 34 (1) (2006) 49–56.
 (11) S. JainMendon, R. Sass, A hardwaresoftware codesign approach for implementing sparse matrix vector multiplication on FPGAs, Microprocessors and Microsystems 38 (8) (2014) 873–888. doi:10.1016/j.micpro.2014.02.004.
 (12) R. LumbiarresLopez, M. LopezGarcia, E. CantoNavarro, A new countermeasure against sidechannel attacks based on hardwaresoftware codesign, Microprocessors and Microsystems 45, Part B (2016) 324 – 338. doi:http://dx.doi.org/10.1016/j.micpro.2016.06.009.
 (13) A. J. Park, K. S. Perumalla, Efficient heterogeneous execution on large multicore and accelerator platforms: Case study using a block tridiagonal solver, Journal of Parallel and Distributed Computing 73 (12) (2013) 1578 – 1591, heterogeneity in Parallel and Distributed Computing. doi:http://dx.doi.org/10.1016/j.jpdc.2013.07.012.
 (14) W. Shi, J. Wu, S.K. Lam, T. Srikanthan, Algorithmic aspects for biobjective multiplechoice hardware/software partitioning, 2014, pp. 7–12. doi:10.1109/PAAP.2014.42.
 (15) M. Jemai, B. Ouni, Hardware software partitioning of control data flow graph on system on programmable chip, Microprocessors and Microsystems 39 (45) (2015) 259–270. doi:10.1016/j.micpro.2015.04.006.
 (16) S. Spacey, W. Luk, P. Kelly, D. Kuhn, Rapid design space visualisation through hardware/software partitioning, 2009, pp. 159–164. doi:10.1109/SPL.2009.4914913.
 (17) P. Bullen, D. Mitrinovic, M. Vasic, Handbook of means and theirs inequality (2003).
 (18) M. Denscombe, The good research guide: for smallscale social research projects, McGrawHill Education (UK), 2014.
 (19) J. L. Hennessy, D. A. Patterson, Computer architecture: a quantitative approach, Elsevier, 2011.
 (20) J. Daemen, V. Rijmen, The design of Rijndael: AESthe advanced encryption standard, Springer, 2002.
 (21) T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein, et al., Introduction to algorithms, Vol. 2, MIT press Cambridge, 2001.
 (22) N. D. Jorstad, T. Landgrave, Cryptographic algorithm metrics, in: 20th National Information Systems Security Conference, 1997.
 (23) A. J. Menezes, P. C. Van Oorschot, S. A. Vanstone, Handbook of applied cryptography, CRC press, 2010.
 (24) A. K. Lenstra, Key length. contribution to the handbook of information security.
 (25) K. Maletsky, RSA vs ECC comparison for embedded systems, White Paper, Atmel (2015) 5.

(26)
BlueKrypt, Website (2017).
URL https://www.keylength.com  (27) D. A. Patterson, J. L. Hennessy, Computer Organization and Design: the Hardware/Software Interface, 5th Edition, Morgan Kaufmann, 2013.
Author biographies
Issam Damaj, PhD ME BE, is an Associate Professor of Computer Engineering at the American University of Kuwait. His research interests include hardware/software codesign, embedded system design, automation, Internetofthings, and engineering education. He is a Senior Member of the IEEE and a Professional Member of the ASEE. He maintains an academic website at www.idamaj.net.
Safaa Kasbah, MSc BSc, received a Master Degree in Computer Science, in 2006, from the Lebanese American University. She received a Bachelor Degree in Computer Science and a minor in Physics from the American University of Beirut in 2004. Her main research interests are iterative methods, hardware/software codesign, reconfigurable computing, quantum computing and Information and Knowledge Management.