Profile-based optimal matchings in the Student/Project Allocation problem††thanks: A preliminary version of this paper appeared in the proceedings of IWOCA 2014: the 25th International Workshop on Combinatorial Algorithms.
In the Student / Project Allocation problem (spa) we seek to assign students to individual or group projects offered by lecturers. Students provide a list of projects they find acceptable in order of preference. Each student can be assigned to at most one project and there are constraints on the maximum number of students that can be assigned to each project and lecturer. We seek matchings of students to projects that are optimal with respect to profile, which is a vector whose th component indicates how many students have their th-choice project. We present an efficient algorithm for finding a greedy maximum matching in the spa context – this is a maximum matching whose profile is lexicographically maximum. We then show how to adapt this algorithm to find a generous maximum matching – this is a matching whose reverse profile is lexicographically minimum. Our algorithms involve finding optimal flows in networks. We demonstrate how this approach can allow for additional constraints, such as lecturer lower quotas, to be handled flexibly. Finally we present results obtained from an empirical evaluation of the algorithms.
Keywords: Greedy maximum matching; Generous maximum matching; Matching profile; Augmenting path
In most academic programmes students are usually required to take up individual or group projects offered by lecturers. Students may be required to rank a subset of the projects they find acceptable in order of preference. Each project is offered by a unique lecturer who may also be allowed to rank the projects she offers or the students who are interested in taking her projects in order of preference. Each student can be assigned to at most one project and there are usually constraints on the maximum number of students that can be assigned to each project and lecturer. The problem then is to assign students to projects in a manner that satisfies these capacity constraints while taking into account the preferences of the students and lecturers involved. This problem has been described in the literature as the Student-Project Allocation problem (spa) [4, 21, 5, 16]. Variants of spa also exist in which lower quotas are assigned to projects and/or lecturers. These lower quotas indicate the minimum number of students to be assigned to each project and lecturer.
Although described in an academic context, applications of spa need not be limited to assigning students to projects but may extend to other scenarios, such as the assignment of employees to posts in a company where available posts are offered by various departments. Applications of spa in an academic context can be found at the University of Glasgow , the University of York [7, 18, 27], the University of Southampton [6, 10] and the Geneva School of Business Administration . As previously stated, it is widely accepted that matching problems (like spa) are best solved by centralised matching schemes where agents submit their preferences and a central authority computes an optimal matching that satisfies all the specified criteria . Moreover the potentially large number of students and projects involved in these schemes motivates the need to discover efficient algorithms for finding optimal matchings.
1.1 Two-sided preferences and stability
In spa, students are always required to provide preference lists over projects. However, variants of the problem may be defined depending on the presence and nature of lecturer preference lists. Some variants of spa require both students and lecturers to provide preference lists. These variants include: (i) the Student/Project Allocation problem with lecturer preferences over Students (spa-s)  which requires each lecturer to rank the students who find at least one of her offered projects acceptable, in order of preference, (ii) the Student/Project Allocation problem with lecturer preferences over Projects (spa-p) [21, 16] which involves lecturers ranking the projects they offer in order of preference and (iii) the Student/Project Allocation problem with lecturer preferences over Student-Project pairs (spa-(s,p)) [4, 5] where lecturers rank student-project pairs in order of preference. These variants of spa have been studied in the context of the well-known stability solution criterion for matching problems . The general stability objective is to produce a matching in which no student-project pair that are not currently matched in can simultaneously improve by being paired together (thus in the process potentially abandoning their partners in ). A full description of the results relating to these spa variants can be found in .
1.2 One-sided preferences and profile-based optimality
In many practical spa applications it is considered appropriate to allow only students to submit preferences over projects. When preferences are specified by only one set of agents in a two-sided matching problem, the notion of stability becomes irrelevant. This motivates the need to adopt alternative solution criteria when lecturer preferences are not allowed. In this subsection we describe some of these solution criteria and briefly present results relating to them. These criteria consider the size of the matchings produced as well as the satisfaction of the students involved.
When preference lists of lecturers are absent, the spa problem becomes a two-sided matching problem with one-sided preferences. We assume students’ preference lists can contain ties in these spa variants. Various optimality criteria for such problems have been studied in the literature . Some of these criteria depend on the profile or the cost of a matching. In the spa context, the profile of a matching is a vector whose th component indicates the number of students obtaining their th-choice project in the matching. The cost of a matching (w.r.t. the students) is the sum of the ranks of the assigned projects in the students’ preference lists (that is, the sum of taken over all components of the profile, where is the th component value).
A minimum cost maximum matching is a maximum cardinality matching with minimum cost. A rank-maximal matching is a matching that has lexicographically maximum profile [15, 13]. That is the maximum number of students are assigned to their first-choice project and subject to this, the maximum number of students are assigned to their second choice project and so on. However a rank maximal matching need not be a maximum matching in the given instance (see, e.g., [20, p.43]). Since it is usually important to match as many students as possible, we may first optimise the size of the matching before considering student satisfaction. Thus we define a greedy maximum matching [14, 22, 11] to be a maximum matching that has lexicographically maximum profile. The intuition behind both rank-maximal and greedy maximum matchings is to maximize the number of students matched with higher ranked projects. This may lead to some students being matched to projects that are relatively low on their preference lists. An alternative approach is to find a generous maximum matching which is a maximum matching in which the minimum number of students are matched to their th-choice project (where is the maximum length of any students’ preference list) and subject to this, the minimum number of students are matched to their th-choice project and so on. Greedy and generous maximum matchings have been used to assign students to projects in the School of Computing Science, and students to elective courses in the School of Medicine, both at the University of Glasgow, since 2007. Figure 1 shows a sample spa instance with greedy and generous maximum matchings, namely and respectively.
A special case of spa, where each project is offered by a unique lecturer with an infinite upper quota and zero lower quota, can be modelled as the Capacitated House Allocation problem with Ties (chat). This is a variant of the well-studied House Allocation problem (ha) [12, 30] which involves the allocation of a set of indivisible goods (which we call houses) to a set of applicants. In chat, each applicant is required to rank a subset of the houses in order of preference with the houses having no preference over applicants. The applicants play the role of students and the houses play the role of projects and lecturers. As in the case of spa, we seek to find a many-to-one matching comprising applicant-house pairs. Efficient algorithms for finding profile-based optimal matchings in chat have been studied in the literature [11, 14, 25, 22]. The most efficient of these is the algorithm for finding rank-maximal, greedy maximum and generous maximum matchings in chat problems due to Huang et al.  where is the maximum rank of any applicant in the matching, is the sum of all the preference list lengths and is the total number of applicants and houses. These models however fail to address the issue of load balancing among lecturers. In order to keep the assignment of students fair each lecturer will typically have a minimum (lower quota) and maximum (capacity/upper quota) number of students they are expected to supervise. These numbers may vary for different lecturers according to other administrative and academic commitments.
The chat algorithms mentioned above are based on modelling the problem in terms of a bipartite graph with the aim of finding a matching in the graph which satisfies the stated criteria. However a more flexible approach would be to model the problem as a network with the aim of finding a flow that can be converted to a matching which satisfies the stated criteria. spa has also been investigated in the network flow context [2, 29] where a minimum cost maximum flow algorithm is used to find a minimum cost maximum matching and other profile-based optimal matchings. The model presented in  allows for lower quotas on lecturers and projects as well as alternative lecturers to supervise each project. By an appropriate assignment of edge weights in the network it is shown that a minimum cost maximum flow algorithm (due to Orlin ) can find rank maximal, generous maximum and greedy maximum matchings in a spa instance. This takes time in the worst case, where and are the number of vertices and edges in the network respectively. In the spa context this takes time where is the numbers of students and is the sum of all the students’ preference list lengths. However this approach involves assigning exponentially large edge weights (see, e.g., [20, p.405]), which may be computationally infeasible for larger problem instances due to floating point inaccuracies in dealing with such high numbers. For example given a large spa instance involving say, students each ranking projects in order of preference, edge weights could potentially be of the order (and arithmetic involving such weights could easily require more than the - significant figures available in a -bit double-precision floating representation). Since the flow algorithms involve comparing these edge weights, floating point precision errors could easily cause them to fail in practice. Moreover using the standard assumption that arithmetic on numbers of magnitude takes constant time, arithmetic on edge weights of magnitude would add an additional factor of onto the running time of Orlin’s algorithm.
1.3 Other spa models and approaches
The variants of spa already discussed above have been motivated by both practical and theoretical interests. These variants are usually distinguished by the (i) feasibility and (ii) optimality criteria specific to them. In this section, we discuss some more spa models found in the literature as well as other approaches that have been used to solve these problems. The techniques employed include Integer Programming (IP) [6, 28, 24, 17], [24, 17], Constraint Programming (CP) [7, 27], and others [26, 10, 19].
In , an IP model for spa was presented with the aim of optimising the overall satisfaction of the students and the lecturers offering the projects (i.e., minimising the overall cost on both sides). In  an IP model was presented for spa problems involving individual and group projects. Various objective functions were also employed (often in a hierarchical manner). These include minimising the cost, balancing the work-load among lecturers, maximising the number of students assigned and maximising the number of first-choice assignments (w.r.t. student preferences). In  a more general IP model for spa which allows project lower quotas was also presented. However none of these models simultaneously consider profile-based optimality as well as upper and lower quota constraints.
1.4 Our contribution
In Section 2 we formally define the spa model. In Section 3 we present an time algorithm for finding a greedy maximum matching given a spa instance and prove its correctness. The algorithm takes lecturer upper quotas into consideration. In Section 4 we show how this algorithm can be modified in order to find a generous maximum matching. Section 5 introduces lecturer lower quotas to the spa model and shows how our algorithm can be modified to handle this variant. In Section 6 we present results from an empirical evaluation of the algorithms described. We conclude the paper in Section 7 by presenting some open problems.
2 Preliminary definitions
An instance of the spa problem consists of a set of students, a set of projects and a set of lecturers. Each student ranks a set of projects that she considers acceptable in order of preference. This preference list of projects may contain ties. Each project has an upper quota indicating the maximum number of students that can be assigned to it. Each lecturer offers a set of projects and has an upper quota indicating the maximum number of students that can be assigned to . Unless explicitly mentioned, we assume that all lecturer lower quotas are equal to . The sets partition . If project , then we denote .
An assignment in is a subset of such that:
Student-project pair implies .
For each student .
If we denote . For a project , is the set of students assigned to in . Also if and we say student is assigned to project and to lecturer in . We denote the set of students assigned to a lecturer as . A matching in this problem is an assignment that satisfies the capacity constraints of the projects and lecturers. That is, for all projects and for all lecturers .
Given a student and a project , we define as the number of projects that prefers to . Let be the maximum rank of a project in any student’s preference list. We define the profile of a matching in as an -tuple where for each (), is the number of students assigned in to a project such that . Let and be any two profiles. We define the empty profile where for all . We also define the negative infinity profile where () and the positive infinity profile where (). We define the sum of two profiles and as . Given any , we define . We define in a similar way.
We define the total order on profiles as follows. We say left dominates , denoted by if there exists some such that for and . We define weak left domination as follows. We say if or . We may also define an alternative total order on profiles as follows. We say right dominates () if there exists some such that for and . We also define weak right domination as follows. We say if or .
The spa problem can be modelled as a network flow problem. Given a spa instance , we construct a flow network where is a directed graph and is a non-negative capacity function defining the maximum flow allowed through each edge in . The network consists of a single source vertex and sink vertex and is constructed as follows. Let and where , , and . We set the capacities as follows: for all , for all , for all and for all .
We call a path from to some project a partial augmenting path if can be extended adding the edges and to form an augmenting path with respect to flow . Given a partial augmenting path from to , we define the profile of , denoted , as follows:
where additions are done with respect to the and operations on profiles. Unlike the profile of a matching, the profile of an augmenting path may contain negative values. Also if can be extended to a full augmenting path with respect to flow by adding the edges and where and are the endpoints of , then we define the profile of , denoted by , to be . Multiple partial augmenting paths may exist from to , thus we define the maximum profile of a partial augmenting path from to with respect to , denoted , as follows:
An augmenting path is called a maximum profile augmenting path if
Let be an integral flow in . We define the matching in induced by as follows: . Clearly by construction of , is a matching in , such that . If is a flow and is an augmenting path with respect to then where and is the flow obtained by augmenting along . Also given a matching in , we define a flow in corresponding to as follows:
We define a student to be exposed if meaning that there is no flow through . Similarly we define a project to be exposed if and where .
Let be a matching of size in . We say that is a greedy -matching if there is no other matching such that and . If is the size of a maximum cardinality matching in , we call a greedy maximum matching in . Also we say that is a generous -matching if there is no other matching such that and . If is the size of a maximum cardinality matching in , we call a generous maximum matching in . We also define the degree of a matching to be the rank of one of the worst-off students matched in or if is an empty set.
3 Greedy maximum matchings in spa
In this section we present the algorithm Greedy-max-spa for finding a greedy maximum matching given a spa instance. The algorithm is based on the general Ford-Fulkerson algorithm for finding a maximum flow in a network . We obtain maximum profile augmenting paths by adopting techniques used in the bipartite matching approach for finding a greedy maximum matching in ha  and chat .
The Greedy-max-spa algorithm shown in Algorithm 1 takes in a spa instance as input and returns a greedy maximum matching in . A flow network is constructed as described in Section 2. Given a flow in that yields a greedy -matching in , if is not the size of a maximum flow in , we seek to find a maximum profile augmenting path with respect to in such that the new flow obtained by augmenting along yields a greedy -matching in . Lemmas 3.1 and 3.2 show the correctness of this approach. We firstly show that if is smaller than the size of a maximum flow in then such a path is bound to exist.
Let be an instance of spa and let denote the size of a maximum matching in . Let be given and suppose that is a greedy -matching in . Let and . Then there exists an augmenting path with respect to in such that if is the result of augmenting along then is a greedy -matching in .
Let be a new instance of spa obtained from as follows. Firstly we add all students in to . Next, for every project , we add clones to each of capacity . We then add all lecturers in to . If in , we add to for all . If is in , we add to for all . Also if , we set for all . Let be the underlying graph in involving only the student and project clones. With respect to the matching , we construct a cloned matching in as follows. If project is assigned students in we add to for all . Hence is a greedy -matching in .
Let be a greedy -matching in (this exists because ). Then is a greedy -matching in . Let . Then each connected component of is either (i) an alternating cycle, (ii) an even-length alternating path or (iii) an odd-length alternating path in (with no restrictions on which matching the end edges belong to). The aim is to show that, by eliminating a subset of , we are left with a set of connected components which can be transformed into a single augmenting path with respect to in and subsequently a single augmenting path with respect to in .
Eliminating connected components of : Suppose is a type (i) connected component of or a type (ii) connected component of whose end vertices are students (we may call this a type (ii)(a) component). Suppose also that . A new matching in of cardinality can be created from by replacing all the -edges in with the -edges in (i.e. by augmenting along ). Since the upper quota constraints of the lecturers involved are not violated after creating from , it follows that is also a valid spa matching in . Moreover which is a contradiction to the fact that is a greedy -matching in . A similar contradiction (to the fact that is a greedy - matching in ) exists if we assume . Thus .
Form the argument above, no type (i) or type (ii)(a) connected component of contributes to a change in the size or profile as we augment from to or vice versa. In fact, this is true for any even-length connected component of which does not cause lecturer upper quota constraints to be violated as we augment from to or vice versa. The claim can further be extended to certain groups of connected components which, when considered together, (i) have equal numbers of and edges and (ii) do not cause lecturer upper quota constraints to be violated as we augment from to or vice versa. In all these cases, it is possible to eliminate such components (or groups of components) from consideration. Using the above reasoning, we begin by eliminating all type (i) and type (ii)(a) connected components of .
Let be the union of all the edges in type (i) and type (ii)(a) connected components of . Let . Then it follows that for some greedy -matching in which can be constructed by augmenting along all type (i) and type (ii)(a) components of . Thus contains
(1) even-length alternating paths whose end vertices are project clones (we call these type (ii)(b) paths),
(2) odd-length alternating paths whose end edges are in (we call these type (iii)(a) paths) and
(3) odd-length alternating paths whose end edges are in (we call these type (iii)(b) paths).
Although these alternating paths are vertex disjoint, there are special cases where two alternating paths in may be joined together by pairing their end project clone vertices.
Joining alternating paths: Consider some lecturer and project . We extend the notation to include all clones of (i.e. for all ). Let
Thus is the set of end edges incident to project clones belonging to a subset of the type (ii)(b) and type (iii)(a) paths in . Let
Thus is the set of end edges incident to project clones belonging to a subset of the type (ii)(b) and type (iii)(b) paths in . Also let
Thus and where and are the number of unassigned positions that has in and respectively.
Note that if and only if . If then all the paths with end edges in can be considered as valid alternating paths in (i.e. if they are used to augment , ’s upper quota will not be violated in the resulting matching). Since then all the paths with end edges in can be considered as valid alternating paths in (i.e. if they are used to augment , ’s upper quota will not be violated in the resulting matching).
On the other hand, assume . Then . Let be an arbitrary subset of of size and let be an arbitrary subset of of size . Thus all paths with end edges in and can be considered as valid alternating paths in and respectively. Also . We can thus form a correspondence between the edges in and those in . Let and be the end edges of two alternating paths in . The paths can be joined together by pairing the clones of both end projects thus forming a project pair at . These project pairs can be formed from all edges in and .
In the cases where project pairs are formed, the resulting path (which we call a compound path) may be regarded as a single path along which or may be augmented. In some cases, the two projects being paired may be end vertices of a single (or compound) alternating path. Thus pairing them together will form a cycle. Since the cycle is of even length and the lecturer’s upper quota will not be violated if it is used to augment or it can be eliminated right away. For each lecturer , once the pairings between alternating paths in and have been carried out (where applicable) and any formed cycles have been eliminated, we are left with a set of single or compound alternating paths of the following types (for simplicity we call all remaining alternating paths compound paths even though they may consist of only one path).
A compound type (ii)(a) path - a compound path with an even number of edges with both end vertices being students. This path will contain a type (iii)(a) path at one end, and a type (iii)(b) path at the other end with zero or more type (ii)(b) paths in between (See Figure 2(a)). Such a path can be eliminated from consideration.
A compound type (ii)(b) path - a compound path with an even number of edges with both end vertices being project clones. This path will contain one or more type (ii)(b) paths joined together. Such a path can also be eliminated from consideration as its end edges are incident to exposed project clones.
A compound type (iii)(a) path - a compound path with an odd number of edges with both end edges being matched in . This path will contain a type (iii)(a) path at one end with zero or more type (ii)(b) paths joined to it (See Figure 2(b)). We will consider these paths for elimination later in this proof.
A compound type (iii)(b) path - a compound path with an odd number of edges with both end edges being matched in . This path will contain a type (iii)(b) path at one end with zero or more type (ii)(b) paths joined to it (See Figure 2(c)). We will consider these paths for elimination later in this proof.
Eliminating compound paths: At this stage we are left with only compound type (iii)(a) and compound type (iii)(b) paths in . These paths, if considered independently decrease and increase the size of by respectively. Since then there are type (iii)(a) paths and type (iii)(b) paths. Consider some compound type (iii)(b) path and some compound type (iii)(a) path . Then we can consider the combined effect of augmenting or along . Suppose that . A new matching in of cardinality can be created by augmenting along . Since the upper quota constraints on the lecturers involved are not violated after creating from , then is also a valid spa matching in . Thus which is a contradiction to the fact that is a greedy -matching in . A similar contradiction (to the fact that is a greedy -matching in ) exists if we assume . Thus . It follows that, considering and together, the size and profile of the matching is unaffected as augment from to or vice versa and so both and can be eliminated from consideration.
Generating an augmenting path in : Once all these eliminations have been done, since it is easy to see that there remains only one path left in which is a compound type (iii)(b) path. The path can then be transformed to a component in (where is basically the undirected counterpart of without capacities) by replacing all the project clones in with the original project and, for every joined pair of project clones (), adding the lecturer in between them. Thus a project may now appear more than once in . A lecturer may also appear more than once in .
Consider some project that appears more than once. Then let be the path consisting of edges between the first and last occurrence of the clones in ( corresponds to a collection of cycles belonging to in involving ). Thus is of even length and both end projects of are clones of the same project. Augmenting or along will not violate the lecturer upper quota constraints or affect the size or profile of the matching obtained (again using the same arguments presented above). Thus can be eliminated from consideration. Although this potentially breaks into two separate paths in it still remains connected in . Similarly consider some lecturer that appears more than once. Then let be the path consisting of edges between the first and last occurrence of the clones in ( corresponds to a collection of type (ii)(b) paths with project clones offered by ). Thus augmenting or along will not violate the lecturer upper quota constraints or affect the size or profile of the matching obtained (again using the same arguments presented above). Thus can be eliminated from consideration. Doing the above steps continually for all projects and lecturers that occur more than once in eventually yields a valid path in in which all nodes are visited only once.
Finally we describe how the path in , obtained after removing duplicate projects and lecturers, can be transformed to an augmenting path in (i.e. we establish the direction of flow from to through in ). Firstly we add the edge to where is the exposed student in . Next for every edge we add a forward edge to . Also for every edge we add a backward edge to . Finally we add the edges and to where is the end project vertex in . Thus is an augmenting path with respect to in such that if is the flow obtained when is augmented along then is a greedy -matching in . ∎
Let be a flow in and let . Suppose that is a greedy -matching. Let be a maximum profile augmenting path with respect to . Let be the flow obtained by augmenting along . Now let . Then is a greedy -matching.
Suppose for a contradiction that is not a greedy -matching. By Lemma 3.1, there exists an augmenting path with respect to such that if is the result of augmenting along then is a greedy -matching. Hence . Since and , it follows that , a contradiction to the assumption that is a maximum profile augmenting path. ∎
The Get-max-aug algorithm shown in Algorithm 2 accepts a flow network and flow as input and finds an augmenting path of maximum profile relative to or reports that none exists. The latter case implies that is already a greedy maximum matching. The method consists of three phases: an initialisation phase (lines 2 -16), the main phase which is a loop containing two other loops (lines 17 - 44) and a final phase (lines 45 - 53) where the augmenting path is generated and returned.
For each project the Get-max-aug method maintains a variable describing the profile of a partial augmenting path from some exposed student to . It also maintains, for every project , a pointer to the student or lecturer preceding in . For every lecturer a pointer is also used to refer to any project preceding in . Thus the final augmenting path produced will pass through each lecturer or project at most once. The initialisation phase of the method involves setting all pointers to null and profiles to . Next, the method seeks to find, for each project , a partial augmenting path from the source, through an exposed student to should one exist. In the presence of multiple paths satisfying this criterion, the path with the best profile (w.r.t. ) is selected. The variables and are updated accordingly. Thus at the end of this phase indicates the maximum profile of an augmenting path of length via some exposed student to should one exist. If such a path does not exist then and remain and null respectively.
In the main phase, the algorithm then runs iterations, at each stage attempting to increase the quality (w.r.t. ) of the augmenting paths described by the profiles. Each iteration runs two loops. Each loop identifies cases where the flow through one edge in the network can be reduced in order to allow the flow through another to be increased while improving the profile of the projects involved. In both loops, the decision on whether to switch the flow between candidate edges is made based on an edge relaxation operation similar to that used in the Bellman-Ford algorithm for solving the single source shortest path problem in which edge weights may be negative. In the first loop, we seek to evaluate the gain that may be derived from switching the flow through a student from one project to another. Given an edge with a flow of in and edge with no flow in , we define to be the resulting profile of if the partial augmenting path ending at is to be extended (via ) to . Thus will become the new value of should this extension take place. If (i.e. if the proposed profile is better than the current one), we extend the augmenting path to and update and .
In the second loop, we seek to evaluate the gain that may be derived from switching flow to some lecturer from one project to another. Given a lecturer , let be the set of projects offered by with positive outgoing flow and be the set of projects offered by that are undersubscribed in . Then we seek to determine if an improvement can be obtained by switching a unit of flow from some project to some other project . This is achieved by comparing the and profiles and updating , and if where represents the profile of a partial augmenting path that does not already pass through (i.e., ). This means that the partial augmenting path ending at can be extended further (via ) to while improving its profile. The intuition is that, after augmenting along such a path, gains an extra student while loses one.
During the final phase, we iterate through all exposed projects and find the one with the largest profile with respect to (say ). An augmenting path is then constructed through the network using the values of the projects and lecturers and the matched edges in starting from . The generated path is returned to the calling algorithm. If no exposed project exists, the method returns null. We next show that Get-max-aug method produces such a maximum profile augmenting path in with respect to should one exist.