Search-based Tier Assignment for Optimising Offline Availability in Multi-tier Web Applications
submitted=2017-04-07, published=2017-12-06, year=2018, volume=2, issue=2, articlenumber=3, \addbibresourcepaper.bib
perspective=art, area=Distributed systems programming,
<ccs2012> <concept> <concept_id>10011007.10010940.10010971.10011120.10010538</concept_id> <concept_desc>Software and its engineering Client-server architectures</concept_desc> <concept_significance>500</concept_significance> </concept> <concept> <concept_id>10011007.10011006.10011008.10011009.10010177</concept_id> <concept_desc>Software and its engineering Distributed programming languages</concept_desc> <concept_significance>300</concept_significance> </concept> <concept> <concept_id>10011007.10011006.10011041.10011047</concept_id> <concept_desc>Software and its engineering Source code generation</concept_desc> <concept_significance>300</concept_significance> </concept> </ccs2012>
Software and its engineering Client-server architectures \ccsdescSoftware and its engineering Distributed programming languages \ccsdescSoftware and its engineering Source code generation
Application developers target the web platform more and more: not only is it supported on a plethora of devices, it is also an ideal stage for rich, interactive and collaborative applications. Such rich internet applications have a thick client that is more than just a static user interface. The client integrates results from external services, e.g., a Twitter feed, and updates its UI reactively. Moreover, not every click results in a roundtrip to the server and data can be stored locally, thus making (parts of) the application offline available as well.
Multi-tier or tierless programming effectively deals with these problems by enabling the development of multi-tier applications as a single artefact. Typical for this approach is that the same language can be used to develop all three tiers. This enables a desktop-like style of development, because the accidental complexity resulting from the communication between the different tiers is hidden from the programmer. Multi-tier programming was introduced over a decade ago and has since proven successful in solving several problems related to the web domain.
Up till now, multi-tier programming approaches have focussed on the elimination of the impedance mismatch between the different tiers in a multi-tier application. In order to do so, every approach has support for some tier-specific annotations in the source code that dictate what parts of the code have to run on each of the different tiers. For this paper, our hypothesis is that this explicit division of the source code into those tiers limits the configurability and flexibility of the resulting application. More specifically, we argue that different crosscutting concerns such as performance, offline availability, security, etc. require different configurations of the same application. For example, depending on the application, a developer might not know upfront what parts of the code need to run on the client, what parts on the server and what parts need to be replicated in order to maximise offline availability. General-purpose data replication specifically, is a concern that has not yet been considered in the context of multi-tier programming.
In this paper we present an approach where applications are developed as a single artefact consisting of different slices. Each slice represents part of the application that is written in a tier-agnostic way. These slices serve as different building blocks for configuring an application. Such a configuration is specified in a placement specification where each of the slices are assigned to a specific tier. This assignment can be done manually or by means of a recommender system that automatically tries to calculate the optimal configuration in order to optimise a given metric for a certain crosscutting concern. The assignment of different slices to different specific tiers in order to optimise this metric is a task that is often neither deterministic nor exact. For this reason our recommender system employs an evolutionary algorithm in order to calculate this assignment.
After assigning each slice to a tier the recommender system reports back to the programmer about the computed assignment of slices to specific tiers. On top of that, our recommender system will also suggest slice refinements that can potentially positively impact the result. The application developer can then split up or merge particular slices and insert or change specific annotations according to the suggested refinements. After this step, a next iteration of the recommender system can be started.
We argue that the combination of slices and the recommender system enables developers to focus first on the essential complexity of their applications. Crosscutting concerns can then later be tackled by changing the assignment of different sliced to specific tiers or by adding annotations in the code based on the suggestions of the recommender system.
This division of an application in tier-agnostic slices along with the recommender system are the main contributions of this work. In this paper we introduce slice-based web programming, we first discuss our previous work on multi-tier programming. We deliberate about possible approaches for the recommender system and give a concrete example of one that tries to optimise the offline availability of a web application. We give an evaluation by comparing different versions of the same application, that starts from a minimum of slices and incorporates the feedback from the system. We measure how this has an impact on the offline availability, together with the granularity of the slices. We implement the same application in a library-based and language-based multi-tier approach as well, and compare the runtime and code characteristics of the three versions.
2 Slice-based web development
2.1 Motivating example
Throughout this paper we use a small example of a rich internet application, called Uni-corn. The idea is that PhD students can use this app to manage their “uni-versity” career, by keeping track of tasks, meetings, teaching schedule, etc. Through a built-in calendar view and charts page the student can monitor his or her progress. This application has four main services: viewing the calendar, viewing progress via the charts, viewing/adding/updating meetings and tasks. To keep it simple we focus on a single server - single client application, meaning that there is no synchronisation between different clients.
The main focus of our motivating example is that the application has a high offline availability; meaning that the four services can still be used when no connection is at hand. For instance, entered data should not be lost when the connection drops, and ideally the newly added input should already be rendered. This way the programmer is not hampered by the lost connection and can keep using the application in the same way as before.
2.2 Slice-based web development with pre-determined tiers
In order for a developer to transform his tierless code into tierfull code different code blocks have to be annotated with either @client and @server annotations. The tier-splitting process will then automatically detect inter-tier communication and transform the source code into two program slices, one for the server and one for the client. Annotations for remote communication, data sharing and failure handling are supported as well, an overview can found in Appendix A.
Based on the AST of the multi-tier code, we build the corresponding Program Dependence Graph (PDG) representation of the program. It is a directed graph of which the nodes correspond to statements in the program and the edges represent control and data dependencies between the nodes. PDG’s are often used for program slicing [Weiser:1981], a technique used for program comprehension and debugging. Informally, a program slice is an executable subset of a program that has a direct or indirect effect on values computed at a certain location, or criterion.
We extended the PDG and program slicing algorithm with the notion of tiers, depicted by Distributed Component (i.e. client and server) nodes in the graph. Dependencies between nodes that belong to another distributed component are remote dependencies: e.g., a remote data or call reference. Figure 1 gives an example of a PDG for the code given in Listing 1. Our adapted slicing algorithm takes a distributed component as a slicing criterion and returns the subset of the code that is needed to execute that component, without including nodes that belong to another distributed component. Please note that not every node must belong to a distributed component, we call these shared nodes, because they end up in the program slices of those distributed components that use them.
After the program slicing step we end up with two program slices: one for the client tier and one for the server tier. The program transformation step now transforms these selected nodes to their distributed variants by injecting distributed communication into the multi-tier code. This means that local calls must become remote procedure calls and that local functions that are called by another tier must become remote functions. The program transformation step calls on the information of the PDG to decide whether such transformations must take place.
In the work presented in this section code blocks are coupled to tier-specific placement annotations (@client and @server). However, in this paper we advocate that placement should be decoupled from the source code into a separate configuration step. Our approach facilitates this decoupling because it relies on static analysis to determine at compile time what calls are between different tiers. Nothing in the source code distinguishes local from remote calls and only at compile time our transformation tool decides what calls should be transformed into their asynchronous counterparts. Furthermore, we show that this decoupling of placement annotations and tier-agnostic slices improves the flexibility of the resulting application. Different crosscutting concerns such as performance, offline availability, security, etc. that require different configurations can be easily accommodated. On top of that we present a recommender system that tries to calculate the optimal configuration and suggests refinements based on the results of that configuration.
2.3 Untangling tier placement from web slices
A number of cross-cutting concerns cannot be expressed by means of a simple annotation in the source code. For example, when a developer wants to improve the offline availability of his or her application, this can have an impact in various locations the code base. For example, a developer might choose to replicate a collection, but this will also have an impact on all the code that is using that collection. Meanwhile, the offline availability is improved but the overall security of the application might be lowered. There is no way to visualise or measure how well a certain configuration of the application handles these various concerns. Generally speaking, handling cross-cutting concerns leads to code scattering and tangling between the tiers. Maintaining the application becomes harder as well: because of code duplication the code on the client and server tier handling this concern should both be maintained.
For this paper, instead of taking @client and @server slices as building units if the web application, we decouple the slices from their tier specification. Slices are a unit of code that have a unique name and can be mapped to a tier in an external specification. Two new location-based annotations are thus added: @slice to define a new slice and @config to map slices to a certain tier.
Listing 4 shows how we could implement the part of the Uni-corn application that keeps track of the tasks of the student. We omitted the bodies of the functions for brevity. We define four slices (on line 4, 4, 4 and 4) and give them each a name: data, sorting, statistics and browser. Each slice has its own block statement that consists of the code belonging to that slice. In this example the data slice is responsible of declaring the replicated data, the sorting slice defines a function to sort the tasks based on each task’s priority. The statistics slice has a function that calculates how many tasks are finished, in progress or have yet to be started. The browser slice has a function to display the tasks and the task statistics, after making sure the tasks are sorted.
Because our approach is based on a program analysis a dependence graph can be constructed that enables a visualisation of the dependencies between different slices, see Figure 2. It is actually a PDG as introduced in Section 2.2 where the Distributed Component nodes (thus the slices) are collapsed. Whenever a call or data dependency crosses the boundaries of a slice, it becomes a remote dependency regardless of the location of slices. As can be seen on the graph, the browser slice has only outgoing dependencies. The data slice on the other hand is a supportive slice: it has only incoming dependencies.
The configuration of the slices (Listing 4, line 4) gives a fixed placement for the browser and data slice. The placement of the browser slice is fixed and must be on the client tier because it updates the user interface in the browser. The placement of the data slice is also fixed and must be on the server tier because we want our server to manage a centralised set of tasks. The slices that have no fixed placement in the configuration (in this case statistics and sorting) are not bound to a particular tier and their placement can vary depending on the output of our recommender system.
Placing the slices on a certain tier is the task of the recommender system. This component is responsible for taking a crosscutting concern into account when assigning slices to tiers. For example, one system can try to maximise the offline availability of an application while another can try to optimise the memory usage. The result of the recommender system is an allocation of every slice to a tier. However, how well our recommender system can decide on a placement heavily depends on the amount of slices and their granularity. For this reason the recommender system has an additional step in which it reports back to the developer about potential refinements of the source code. Given a placement for every slice, considerations for slice refinements are given to the developer. By following these considerations, the developer can potentially improve how well a cross-cutting concern is handled within his application. In the following sections we discuss the tier assignment and slice refinement process.
2.4 Search-based tier assignment to user-defined slices
The recommender system must be applied to allocate the unplaced slices to a tier, given one or more crosscutting concerns. As an input it takes the configuration of fixed slices and the multi-tier program and as output it gives a mapping where every slice is assigned to a tier. The system can resort to several techniques to calculate the most ideal placement: a dependence-driven analysis, code instrumentation, etc.
Because with each additional slice the amount of possible configurations increase exponentially it is impossible to consider every potential configuration. Therefore we opted for a genetic search algorithm in order to reach, or at least approximate, the optimal placement to satisfy a crosscutting concern. Genetic search algorithms have their origin in natural selection: starting from a population of individuals, called a generation, from which the fittest candidates are selected to form a new generation by performing mutations and combinations (crossovers). This process repeats until a certain end condition is satisfied or a certain number of generations have been computed.
The only required input for a genetic search algorithm is a starting population, a fitness function and method for generating offspring (mutations) from an existing population. The only part of that input that is specific to a certain cross-cutting concern is the fitness function. The following section gives a brief overview of each component of the search algorithm and gives an example fitness function for optimising offline availability of an application. However, this fitness function can be altered to focus on other concern(s). Currently, the search we apply is single-objective: this means that during the search we optimise only one objective. We leave the case where multiple objectives are optimised at the same time and trade-offs between possible conflicting objectives as future work.
The individuals of the population are a mapping of unplaced slices to a tier, in this case client, server or both. Every iteration of the genetic search produces a new generation of a fixed size, originating from the previous generation. The initial seed for the search algorithm is a mapping where every unplaced slice is assigned to a random tier. Because every unplaced slice has three options, we have a search space of order , with n the number of unplaced slices. In the selection process we discuss how we reduce this search space.
Because a single-objective search is conducted, we have a single fitness value that the algorithm tries to maximise. For this example, our aim is to get an high offline availability, thus our fitness function calculates a real number in the [0,1] range that represents how offline available the current configuration is. Before we define the fitness function, we need to able to define how offline a certain slice, placed on a given tier, is.
For this we calculate the number of local calls performed in the slice, which are calls to functions defined in the slice itself or to slices placed on the same tier. This is divided by the total number of calls performed in that slice, this number thus includes calls to functions defined in slices placed on other tiers. We do not claim that this is an ideal function for measuring offline availability of a certain slice. However, our experimental results have shown that this function gives a good enough approximation.
Our fitness function regards every slice and calculates a weighted mean of every score for every slice. This way slices that perform a large amount of calls have more influence on the final offline availability score of the configuration.
Note that the calculation of this fitness function is partly based on results from our static analysis tool. Whether a call is local or remote is part of the PDG generated by the analysis. Our current implementation uses the JIPDA abstract interpretation [nicolay2015detecting] framework. Designing such a fitness function thus requires expert knowledge of our framework and we envision that a number of these are developed upfront for each cross-cutting concern that can be considered.
The creation of a new generation is achieved by performing selections and mutations on the current generation. A mutation consists of randomly picking a slice from the placement mapping and assigning it to a random tier. A selection undertakes a tournament selection, which randomly picks a number of mappings from the current generation and returns the one wth the best fitness value.
However, not every generated mapping is a valid one. Consider, for example, a slice that defines a function that is called by a (fixed) server slice and a (fixed) client slice. The slice itself performs no calls to functions defined in another slice. If we place this slice on the client tier, we get a higher offline availability, because the client tier now gains local calls instead of making remote calls. However, the server slice that called the function as well now has to perform a remote call to a slice on the client slice. At runtime, this means that the server instance must pick one or more connected clients to call this function and wait for the result. Because it is not clear whether the local call should be a broadcast to all clients or to a specific client, this is not a valid placement of the slices. If the client performs the call on all clients, what should happen with possible return values from the clients, or in case of a specific client, which client is selected? Please note that the broadcast and reply annotations from Table 6 can be used for server-to-client calls. If the call in the server slice is annotated with one of these annotations the program would be valid.
These invalid placements could have a higher fitness score than valid placements, and this is why the selection process is responsible of not selecting these configurations as a base for the next generation.
We run the genetic search until either a mapping has been found that produces a fitness equal to 1 or when a certain threshold of generations has been reached. In the latter case, the fittest individual of the latest generation is returned. Because a genetic search is non-deterministic, the result of different runs of the algorithm can yield different placements of the slices. The more generations are considered, the better the final placement will address the concern.
2.5 Recommending slice refinements after assignment
After a placement configuration is calculated by the genetic search algorithm an additional step is taken. The result of this step is advice that the programmer can consider to integrate such that the resulting web application can score better on the concern that is under consideration.
Our recommendation system currently only supports suggestions for improving offline availability. However, in the future other recommendation functions can be considered.
For offline availability, in the recommendation phase, we consider all slices and every data or function declaration made within those slices. Data declarations can either be tied to a tier or be replicated. Generally speaking, replicating data has a positive effect on the offline availability of a web application as it makes sure that the client has a local copy of the data. Therefore, we look for data declarations that are not indicated to be replicated (by means of an annotation), but that are often used in functions that are called remotely by another tier. For example, we declare data on the server side and define a number of accessors for retrieving and manipulating that data. If these functions are heavily called by the client tier, it might be a better solution to replicate that data such that the client has to communicate less with the server.
To detect this, we use the dependencies present in the PDG. We look at every declaration and follow its data dependencies to statements in the same tier. If these statements reside in a function that has more remote calls than local calls, we give the programmer the advice to replicate the data.
Function declarations might be a good candidate to move to another tier, or to a new slice, if it is called more by another tier than the one it is defined in. We define more here as the percentage difference between the number of local calls and the number of remote calls to that function is bigger than a certain threshold. To calculate the number of (remote) calls we also use the dependencies in the PDG. If we observe functions that fulfil this requirement, we advice the programmer to move it to a new slice. This way, the recommender system can figure out on which tier(s) the function definition should end up.
In certain cases it would be beneficial to detect these declarations and move them automatically to a new slice. However, if these declarations are defined in a tier with a fixed placement, it might be done on purpose by the programmer. If the recommender system automatically makes declarations replicated by adding an annotation on top of it, we might break some validation or security checks that are performed on the functions that manipulate the data. Because the client can never be trusted, the programmer’s intention could be that the data must remain on the server side and is only accessible through these functions. For this reason we only give advice and don’t perform transformations on the fixed slices.
Our evaluation aims to show that a genetic search algorithm can recommend a good placement strategy for slice-based web development and that our recommender system can further improve upon this by recommending refinements to existing slices. We currently only support offline availability as a cross-cutting concern, but our approach is applicable for other concerns as well.
In order to validate our approach we seek to answer the following research questions:
Can the recommender system identify a location assignment of the slices and recommendations that lead to a web application with a high(er) offline availability?
Does the slice granularity have an impact on the effectiveness of the recommender system?
How does our approach compare to other multi-tier approaches with respect to the offline availability of the resulting web applications?
Because offline availability is one of the defining characteristics of today’s web applications, we use this concern as our main goal for the recommender system. This means that in each of the following evaluations we measure and try to maximise the offline availability of the Uni-corn application.
The code used for the evaluation is publicly available333https://github.com/lphilips/multitier-approaches (accessed November 2017).
3.1 Recommender system for offline availability
To evaluate the strength of our recommender system for enabling offline availability we have implemented several versions of the Uni-corn application. We start from a first version that contains only two slices with a fixed placement: one on the client tier, the other on the server tier. In each step we follow all of the recommendation for refinements suggested by our recommendation system. This means we either follow the advice to add data sharing annotations to certain declarations or move certain functions to new slices. We keep doing this until we end up with a version that receives a fitness value of 100 % (In our example, a fitness of 100 % is achievable with the correct placement strategy. However, this is not necessarily true for all applications). In total we have six versions, Figure 3 shows how these versions evolve and Table 1 lists several properties of the code, such as the number of annotations categorised according to Table 6.
Every version of the application has at least two slices that are tied to a certain tier and thus have a fixed placement. In the final version these slices contain the smallest subset of the source code that is bound to either the server or client tier. The recommender system is in charge of coming up with a placement of the remaining unplaced slices.
|ver.||# slices||# fixed slices||@placement||@communication||@sharing|
For this evaluation we run the genetic search 100 times for each version and analyse the outcomes. Table 2 summarises for each version the minimum, maximum and median number of slices placed on every tier (denoted as minC, maxC and medC respectively for the client tier; minS, maxS and medS for the server tier; minB, maxB and medB for both tiers). It also lists the fitness value for every version, which is the same for the 100 runs, because the search always selects the highest number, but the configuration of the slices can differ between runs. We also specify the number of data declarations that could be replicated (data adv.) according to the recommender system and the number of functions that could be moved to a new slice (slice adv.).
|ver.||gen.||medC||minC||maxC||medS||minS||maxS||medB||minB||maxB||offline %||data adv||slice adv|
From Table 2 we can see that following up on the advice has a positive impact on the offline fitness score of the application. Version 6 has a fitness value of 100 %, because more slices have been added and data has been replicated. The versions that have a higher fitness score also place more slices on both client and server tier. Table 1 indicates that adding annotations for replicated data reduces the number of communication annotations that must be used.
From this evaluation we can answer the first research question (RQ1). We have successfully used the recommender system to evolve the initial web application to one where the offline availability according to the recommender system’s report is higher than that of the initial application. Integrating the recommender system’s advice leads to an increase of the fitness score. The distribution plan computed by the recommender system that focuses on offline availability clearly prefers to put slices on the client tier or duplicates them between client and server. The different runs of the genetic search on the same version lead to distribution plans that might differ in the number of slices that are put on the client tier or on both.
3.2 Slice granularity
Because the recommender system decides on a distribution plan based on the unplaced slices, its effectiveness depends on the programmer to make a program with enough slices. In the ideal case, the fixed slices only contain code that is restricted to one of the tiers and all remaining code is put inside unplaced slices. Table 2 shows that the version of the Uni-corn application with only two slices has a low offline availability score and receives a considerable amount of feedback.
For this reason we implemented an extension of our tier splitter that automatically follows up on the advice. For example, if the recommender system indicates that a declaration should be shared, we adapt the original AST of the program such that it now has an @replicated annotation. If the recommendation indicates that a function should be moved to a new slice, we remove the function from its original slice, add a new slice to the program with no fixed tier and add the function as only expression in that slice.
We ran the same experiments as before for the six versions of the Uni-corn app, but in between runs we automatically incorporate the suggestions made by the recommender system, the results are given in Table 3.
|ver.||max. for OFFLINE % found after||original nr. of slices||nr. slices in end configuration|
The results make clear that integrating the advice automatically produces the desired raise in offline availability immediately. In the first version, where only two slices were initially present, 13 new slices were introduced, each defining a function that was advised to be moved to a new slice (as can be seen in table 3). Following the advice concerning replicated data does not result in a new slice, but merely adapts the original code.
We can conclude from this table that automatically integrating the advice into the code results in a higher offline availability. This answers the second research question: slice granularity has an impact on the effectiveness of the recommender system. More slices leaves more room for the recommender system to optimise the distribution plan, while few and big slices may lead to inferior results.
However, the automatic integration adds a lot of new slices or annotations to the code, thus making it harder for the developer to map the transformed code back to the multi-tier input. As mentioned before, we do not perform these changes automatically but rather give them to the developer as feedback. The reason is that code that was originally in a fixed slice should not always be moved automatically, because the moved portions could contain code that cannot be shared with or moved to another tier.
3.3 Evaluation comparison of the three multi-tier approaches
For every implementation we count the total SLOC for server, client and UI code as well. For the Stip.js version we have code that belongs to slices that are not assigned to a tier, so we count that code as undecided. As can be seen, the three versions fall into the same category for the total number of SLOC. However, the Hop.js variation focuses more on the server part, while the two other implementations have a small server setup. Please note that for the Stip.js version this is not the only code for the server tier: parts of the undecided code could be server-side code as well. We also enumerated the number of annotations: these are the tier-switching operators ~and $ for Hop.js and any annotation from Table 6 for Stip.js. For the library-based version there are no annotations present, because all the tier information is hidden away in the usage of the framework’s libraries. For that reason we counted the number of calls to libraries for data sharing, reactive updates, and so on. The Hop.js version has significantly more annotations to escape from the server to client tier and vice versa.
|SLOC||SLOC client||SLOC server||SLOC ui||SLOC undecided||annotations/ library calls|
We evaluate the three multi-tier variants of the Uni-corn app through several scenarios. Because the app offers four services to the user, we first evaluate these services individually in an online and offline setting. This means that we view, add and update meetings and tasks and view the schedule and charts. We measure the number of requests that are made, the number of failed requests and kilobytes transferred. Then we look how many steps of the scenario succeeded: i.e., the application produces the correct results. For example, adding a new meeting should first of all not produce an error and the entered data should not be lost, even in an offline setting. Moreover, the meeting should be added to the list of meetings and the user should be able to retrieve the details of the meeting to alter it later on. The results of evaluating the different services this way can be found in Appendix B. We did the same for three scenarios, given in Table 8 in Appendix B, that combine more of the services, e.g., adding a task and viewing the progress charts later on.
To measure these characteristics we used Chrome Developer Tools on Google Chrome browser, version 56. To inspect network traffic for the Meteor version we used Meteor Devtools555https://github.com/bakery/meteor-devtools (accessed November 2017), because Meteor uses its own communication protocol that can’t be inspected via Chrome Developer Tools.
|number of requests||failed requests||kb transferred||number of steps|
As can be seen from Table 5, both Meteor and Stip.js perform the different steps correctly, even in an offline setting. This means that no errors were shown to the user, no data loss occurred, and the user was actually unaware of the lost connection. The reason is that Meteor has replicated Collections that buffer every change on the client side to send to the server when the connection is restored.
A significant change between the different approaches can be seen in the number of requests performed: the Stip.js version performs more requests in the first two scenarios. This is because the run-time library for remote communication and data sharing666https://github.com/lphilips/asyncCall-with-shared-data (accessed November 2017) works on a fine-grained level when changes to a replicated are made: every property change on a replicated object is sent to the server. For example, two changes are communicated when adding an element to a replicated array: one for the index on which the object is added and one for the length that has changed. The library we use is smart enough to keep track of the connection with the server through a heart-beat system, such that communication is automatically buffered when no connection is present and thus produces no failed requests. Another consequence of having replicated data available on the client is that fewer kilobytes need to be transferred.
In the Hop.js version the HTML content is rendered on the server side, this resulting in larger data size transferred. At the time of writing, Hop.js has no support for data replication or reactive UI updates, so updates and retrieving the data in an offline setting do not work. Both the Meteor and and Stip.js versions only have to communicate the changes made to the replicated objects and the user interface is updated automatically.
This table shows that certain multi-tier approaches excel at optimising certain crosscutting concerns. However, focusing on other crosscutting concerns often force the developer to extend the framework or language by hand. The Meteor framework scores highly on offline availability for the reason that its built-in Collections are replicated to every client. The Hop.js language has no support yet for data replication and offline availability, thus leading to more communication to the server that fails when no connection is at one’s disposal.
Both the runtime and code characteristics of the three versions of the Uni-corn application gives us an answer to the third research question. For this application the code characteristics are comparable, although Stip.js requires less lines of code and annotations. On the other hand, Stip.js is not tailored towards runtime optimisation, resulting in more actual web requests that are performed. On the other hand, because the resulting application focuses on offline availability, less data is transferred with each request. The language-based multi-tier approach is not tailored towards offline availability, resulting in all data in the shared collections to be transferred each time.
Threats to validity
For the motivating example, this evaluation shows that the recommender system can successfully help the developer to achieve full offline availability. However, we were also able to manually achieve full offline availability for the motivating example. Whether or not our recommender system is better at finding placement strategies than a manual approach, or whether our recommender system can help developers find an optimal strategy faster would require an extensive user study for much larger applications. Additionally, the evaluation currently only covers offline availability as a single-objective optimisation. To show that the approach is also applicable to other single- and multi-objective optimisations would require additional exploration. Finally, the evaluation compares the approach with two other tierless programming platforms. A complete comparison that covers a wider range of platforms would strengthen our comparison with the related work.
4 Related Work
Current techniques that infer a distributed placement mostly focus on minimising the communication cost between the distributed tiers of the application. When implemented on top of a new language, the placement analysis is often guided by primitive operations: e.g., all gui_... operations must be allocated on the client tier. From there on the location analysis propagates throughout the program and assigns each operation/declaration to a set of possible tier locations. We give an overview of approaches that can be used for web applications, but most of them handle distributed applications in general.
4.1 Multi-tier Approaches
We classify multi-tier approaches in three main categories: language-based, library-based and transformation-based approaches. Library-based and transformation-based approaches target a general-purpose language instead, thus facilitating the reuse of existing developer tools. We now discuss multi-tier approaches that support a location analysis to assign expressions to a tier.
The multi-tier language Opa [opa:2013] supports annotations to indicate where functions or data should be located. In addition, Opa’s slicer takes the type information and makes a call graph to decide which function/values end up on what tier. By default values appear on both tiers whenever possible, but are constrained by security concerns or the behaviour they execute. DOM-related operations end up on the client whereas database operations are always placed on the server tier. It is not clear how the slicer makes the decision and what the goal of the placement algorithm is. In contrast with Opa, our approach is extensible such that the programmer can try out different location strategies. These location decisions are deeply embedded in Opa’s slicer and the programmer must first test the slicer’s result before adding more specific annotations to influence the decisions made by the slicer.
Distributed Orc [distributedorc:2016] is a distributed extension of the Orc programming language. The language introduces location transparency by not abstracting away distributed concerns, but by explicitly making local and remote semantics uniform. As a consequence, asynchronous calls and failure handling are consistent throughout the whole program, even for local operations. Its location analysis uses the locations of data and decides when data is used whether it is by using a copy of that data, migrating the execution to another location or another manner. It is ongoing research at this point and the authors state that it remains an open question whether there is a profit in communication costs when using these optimisers. At the moment the programmer cannot give an initial or partial distribution specification and it is not clear which analysis would drive the distribution decisions.
The placement inferencer for a client-server calculus presented in [Neubauer:2005, placementinference:2008] is guided by the fixed placement of primitive operators. The location analysis allocates every operation to a set of locations and propagates these assignments through the program. This technique does not require the programmer to give an initial seed to the placement decision algorithm, as many others, amongst which Stip.js, do require. However, this approach is not applicable to general-purpose languages. For example, primitive operators to update the DOM could easily be renamed or this could be achieved by means of an external library. It is a difficult task to automatically infer this and an initial seed is thus necessary when targeting general-purpose languages.
4.2 Location Transparency
Location transparency abstracts away the place of execution of certain parts of a distributed program. Several languages and frameworks therefore hide the semantics of remote communication between these parts and support a uniform way of communication. Location transparency has been criticised [Waldo94anote], mainly because concerns such as latency, memory access, partial failures, concurrency, etc. should not be abstracted away from the programmer. The reason is that remote calls have a fundamentally different semantics to local calls, and thus fundamentally different failure handling code must be provided by the programmer as well.
Many distributed languages support mobility of distributed components, thus decoupling the components in space by communicating in an asynchronous way.
Actors are an example of such decoupled components [kim1995efficient]. Erlang [Armstrong:2007:PES] is an example of a programming language that supports location transparent actors. Every actor has a unique address and can be accessed through an actor reference. There is no differentiation between local or remote actor references, and all inter-actor communication is done via asynchronous message passing.
Modular design is a field of software engineering that is present in many programming languages and architectures. Remote-OSGi (or R-OSGi) [Rellermeyer2007] uses centralised module management as a starting point for distributed Java applications. Modules are the units of distribution that implement a service-oriented architecture. Local and remote service invocation is indistinguishable, and communication failures are handled as local module events.
Our approach offers location transparency, because the execution location of parts of the program is decoupled from the code itself. Just as other approaches we support a uniform way of communication, opting for local communication. It is the burden of the transformation tool to transform local to remote communication. However, issues such as failure handling are not completely abstracted away from the programmer, but are added by means of annotations. These annotations work for communication that stays local and the communication that is transformed to remote calls.
4.3 Automatic Partitioning
Code partitioning is a process that computes a mapping from code partitions to the nodes of a distributed system. This process can take several factors into account, such as the hardware characteristics of certain nodes, profiling information, etc.
Coing [Hunt:1999:CAD] is an automatic partitioning system for distributed applications that consist out of components. The partitioning depends on a graph representation of the remote communication based on scenario-based profiling. The profiling step gathers information while the user runs the application but an automated testing tool can be used as well. All communication that crosses the boundaries of the components is instrumented, together with the amount of transferred data. A graph slicing algorithm is then applied to split the application and come up with the most optimal placement of components. While the programmer can tweak the results, overriding the distribution decisions seems not possible. Coign resembles the approach taken by Stip.js, but we use a static analysis instead of scenario-based profiling. The recommender system of Stip.js could use the profiling and instrumentation step of Coign, possibly leading to a more accurate report.
Secure program partitioning [Zdancewic:2002:SPP] is a splitting process that protects confidentiality of data in a distributed application. The splitter takes an annotated program and a set of trust declarations and produces programs that satisfy all security policies. This automated partitioning enables the programmer to write a program independent of its distributed setting, but with strong guarantees about the flow information. The partitioning has been extended with data and code replication [Zheng:2003:URP], to increase the flexibility of the splitter but keeping the same guarantees meanwhile. The emphasis of the work is on security and respecting user-defined security contracts. In contrast to our approach, expressing contracts for other crosscutting concerns is not supported.
5 Discussion and Future Work
Our approach employs a non-deterministic search algorithm in order to prune the search space for potential placement configurations. However, for applications with a small number of slices a deterministic approach could be used. In our example, the non-deterministic strategy was able to successfully get to an ideal placement with a fitness of 1. However, additional experiments are required to further strengthen our claim that a genetic search algorithm is a good fit for finding placement strategies. Also, additional experiments need to be conducted to validate whether this approach does not land in local optima when exploring the search space. We also currently have not experimented with different versions of the fitness function.
We currently have only explored offline availability as a cross-cutting concern. However, we believe that our framework can easily be extended to other cross-cutting concerns such as memory usage, security, communication cost, etc. By adding the appropriate fitness function and extending the recommender in order to incorporate advice that is specific to that cross-cutting concern.
Our fitness function for offline availability now employs the results of our static analysis in order to evaluate different configurations. However, for other fitness functions, this can be changed to also include a runtime analysis or other information into account.
In most tierless programming languages or frameworks today communication between different tiers is marked with an explicit boundary between source code fragments. This means that developers are required to think upfront about the placement of the different components of their application. Moreover, several non-functional, cross-cutting concerns such as offline availability and security are impacted by this placement strategy. This means that changing the placement of certain code fragments in order to improve on some of these concerns over time becomes impractical. All of these concerns have to be taken into account in the initial design of the application. This leads to applications that are hard to maintain.
The main contribution of this paper is slice-based web development as a tier-agnostic development strategy. The placement of different slices onto the different tiers can be done separately. We show that such a configuration can be used to tackle several cross-cutting concerns without the need to significantly change the original source code. Moreover, we show that a recommender system can be built to automatically suggest a placement strategy to optimise for certain concerns. Our recommender suggest a specific placement of the different slices onto the different tiers and gives advice for slice refinements. Such advice could be to add annotations in the code, move certain parts of the code to a new slice, etc.
Additionally, we provide a preliminary evaluation of our approach by applying it to our motivating example (the uni-corn app) for one specific cross-cutting concern, namely offline availability. We start from an implementation with the minimal number of slices, one for the client and one for the server and gradually incorporate the feedback from our recommender system. The results of our evaluation show that our recommender system can successfully help developers optimise for offline availability. We also assessed the slice granularity and what impact this has on the results.
To evaluate our approach compared to the state of the art in tierless programming, the uni-corn application is also implemented in a library-based and language-based approach. We scrutinised the runtime and code characteristics of the three implementations and how they handle the offline availability concern. We conclude that the slices in combination with a recommender system allow the programmer to focus on the application logic and takes the burden away of distributing the code by hand. The effectiveness of the recommender system is conditional on the slice granularity, and we showed that applying automatic transformations to fix this can be beneficial.
Appendix A Tier-specific annotations present in previous work
|block level||call or function level||declaration level||tier or call level|
Appendix B Scenario-based evaluation
|number of requests||failed requests||kb transferred||number of steps|
|Tasks: view, add, update|
|Tasks offline: view, add, update|
|Meetings: view, add, update|
|Meetings offline: view, add, update|
|Schedule offline: view|
|Charts offline: view|
|Scenario 1||Scenario 2||Scenario 3|
|View all tasks||View all meetings||Go offline|
|Add a new task||Add a new meeting||View all tasks|
|Update the newly added task||Update the newly added meeting||Add a new task|
|Go offline||Go offline||View all meetings|
|Add a new task||Add a new meeting||Add a new meeting|
|Update the newly added task||Update the newly added meeting||View the charts section|
|View the charts section||View the calendar section||Go online|
|Go online||Go online||View the charts section|
|View all tasks||View all meetings|