A MATHEURISTIC APPROACH FOR THE MAXIMUM BALANCED SUBGRAPH OF A SIGNED GRAPH

A graph G = (V, E) with its edges labeled in the set {+,−} is called a signed graph. It is balanced if its set of vertices V can be partitioned into two sets V1 and V2, such that all positive edges connect nodes within V1 or V2, and all negative edges connect nodes between V1 and V2. The maximum balanced subgraph problem (MBSP) for a signed graph is the problem of finding a balanced subgraph with the maximum number of vertices. In this work, we present the first polynomial integer linear programming formulation for this problem and a matheuristic to obtain good quality solutions in a short time. The results obtained for different sets of instances show the effectiveness of the matheuristic, optimally solving several instances and finding better results than the exact method in a much shorter


Introduction
Let = ( , ) be an undirected graph where is the set of vertices and is the set of edges. Consider the set of labels {+, −} and the function : → {+, −} that assigns a label (or sign) to each edge in . An edge ∈ is called negative if ( ) = − and positive if ( ) = +. For some interesting problems an edge can be both positive and negative. In this case, it is called a parallel edge ( ( ) = ±). In this context, an undirected graph together with a function is called a signed graph = ( , , ) or briefly an s-graph.
Signed graphs were introduced by Heider [13] for describing sentiment relations between people of the same social group. In the conventional approach for modeling a social network, a positive edge represents a friendship or similarity, while a negative edge represents some enmity. Signed graphs have been studied by researchers in different scientific areas, including portfolio analysis in risk management [12,14], biological systems [5] and detection of embedded matrix structures [11].
The concept of balanced signed graphs was formalized by Cartwright and Harary [4], as an extension of Heider's theory. The authors stated that a balanced signed graph could be partitioned into two vertex sets, being that all edges within the sets are positive, and all those between the sets are negative. It is clear that not every signed graph is balanced, and there are different situations in which finding a balanced signed subgraph results in a problem of interest. Despite this fact, the problem of finding a balanced subgraph with maximum number of vertices (MBSP) has been applied in almost as many applications as the problem of finding signed graphs [11,12,14].
To formally define the MBSP problem, we present some additional notations used in the rest of the paper. Let = ( , , ) denote a signed graph where − ⊆ and + ⊆ denote, respectively, the set of negative and positive edges in . Similarly, for a vertex ∈ , we define by − ( ) and + ( ) the set of vertices adjacent to connected by negative and positive edges. Moreover, for a vertex set ′ ⊆ , let [ ′ ] = {{ , } ∈ | , ∈ ′ } be the subset of edges induced by ′ and ′ = ( ′ , [ ′ ], ) its vertex induced subgraph of . To simplify notation we will assume that | ′ | = | ′ |. We also define that a signed graph = ( , , ) is balanced if its vertex set can be partitioned into sets 1 and 2 in such a way that [ 1 ] ∪ [ 2 ] = + . Thus, given a signed graph = ( , , ), the MBSP is the problem of finding a subset of vertices ′ ⊆ such that the subgraph ′ is balanced and maximizes the cardinality of ′ . Gülpinar et al. [11] proved that MBSP is NP-hard, and the literature presents some heuristic and exact approaches to solve this problem [8,9,11]. Branch-and-cut methods presented in [8,9] use mathematical models based on the fact that a signed graph is balanced if and only if it does not contain a parallel edge or a cycle with an odd number of negative edges [2,11,21]. Also, heuristics and metaheuristics were developed for this problem, such as the GGMZ heuristic [11] and a GRASP metaheuristic in [8]. More recently, a new heuristic for this problem, based on negative chordless cycles, was presented by Marinelli and Parente [16].
Different strategies appear in the literature to combine exact and heuristics algorithms. In this context, matheuristics [7,10,17] have attracted the attention of the scientific community. Matheuristics may use some features of the mathematical models of one problem to customize heuristics to solve these problems or use heuristics to improve time effectiveness of mathematical programming techniques.
In this work, we propose an alternative optimization formulation for the one proposed in [8,9]. In contrast with their formulation, the size of our optimization model is polynomial in the input size. We also introduce a matheuristic developed to solve the MBSP to obtain good quality solutions quickly.
The remainder of this paper is organized as follows. Section 2 presents a new formulation for the MBSP problem. Section 3 describes the main algorithms used in the proposed matheuristic. In Section 4, we present the results obtained by the matheuristic and the comparison with previous heuristics. Finally, conclusions are discussed in Section 5.

A new formulation for the MBSP
Previous mathematical formulations are based in the fact that a signed graph is balanced if and only if it does not contain a parallel edge or a cycle with an odd number of negative edges. The branch-and-cut algorithm proposed in [9] and extended in [8] is based on this result and consider a possible exponential set of odd negative cycles in , denoted as − .
The formulation used in this algorithm is the following: max ∑︁ ∈ (2.1) subject to: The above model is said to be a clustering formulation. The objective function (2.5) maximizes the total number of vertices in 1 ∪ 2 . Constraints (2.6) indicate that any vertex in the solution has to belong to only one set, 1 or 2 . Constraints (2.7) and (2.8) ensure that there are no negative edges with both endpoints in 1 and 2 , respectively. Moreover, constraints (2.9) indicate that there are no positive edges that have one endpoint in 1 and the other in 2 . Constraints (2.10) ensure that if there is a parallel edge between two vertices, both vertices cannot be in the solution.

A matheuristic for the MBSP
In Figueiredo and Frota [8], a GRASP heuristic was proposed to solve the MBSP and provided the best quality results among the heuristics developed for this problem for the instances analyzed. In addition, the NCCH heuristic presented in [16], obtained the best quality results for the instances used in that work. This section details our proposed matheuristic to tackle the MBSP.

The NCCH heuristic
The greedy heuristic NCCH proposed in [16] is based on the observation that a vertex set ⊆ induces a balanced subgraph if and only if contains no negative cycles, i.e., cycles with an odd number of negative edges [2]. Therefore, this greedy approach tries to reduce negative cycles progressively by applying the Negative Cycle Contraction (NCC) rule shown in Algorithm 1 [16]. This procedure preserves the sign of all negative cycles and shortens those containing an edge { , } that is not a parallel edge. Any balanced induced subgraph of the signed graph obtained by applying the NCC-rule to all the vertices of is a balanced induced subgraph of .
Algorithm 2 presents the NCCH heuristic using this rule.

Algorithm 1: NCC rule.
Input: The signed graph = ( , , ) and edge Input: The signed graph = ( , , ), function . Output: The balanced subgraph . This greedy procedure runs from lines 3 to 9, while there are vertices from to be examined. In each step, it selects a vertex by using a greedy criterion defined by the function (line 4). Then it applies the NCC rule to every vertex adjacent to (line 5). Later (lines 6-8), the method deletes all parallel edges generated after applying the NCC-rule. At the end of Algorithm 2 (line 10), the vertex set induces a balanced signed graph (see details in [16]).
The function should prioritize vertices that have few and small negative cycles without chords associated with them. Marinelli and Parente [16] proposed three options for , and showed that the performance of the heuristic varies depending on the function and instance structure.

Construction procedure based on negative chordless cycles
Based on the NCCH heuristic, we developed a construction procedure using a Restricted Candidate List (RCL) [19]. This procedure, shown in Algorithm 3, is similar to the above algorithm. The only difference between the original NCCH heuristic and this new one is that the former greedily selects the vertex using the function , while the latter creates a list to select the vertex from that list randomly. The list contains all vertices with the minimum value for the function.
Choosing a random vertex from this list at each step causes the final solution to differ from the solution obtained by the NCCH heuristic. Thus, Algorithm 3 can be used to generate different feasible solutions. In this work ( ) = ( ) was selected, where ( ) represents the degree of vertex ∈ . This is one of the functions proposed in [16], and it was chosen because it demands less computational time to be calculated.
Apply NCC-rule to every vertex adjacent to 7 foreach adjacent to do

Local Search
The purpose of this procedure is to find locally optimal solutions with respect to some neighborhood in order to improve the initial solution provided by Algorithm 3. The local search in [8] obtains neighboring solutions by removing one or two vertices from the current solution, and including as many vertices as possible (maintaining feasibility), in order to reach an improved solution.
If we consider the removal of all possible pairs of elements, a quadratic component is incorporated in the local search. This neighborhood would be computationally expensive in large graphs. So, our local search procedure obtains neighboring solutions by randomly removing ≤ | ′ | vertices from a feasible solution ′ = ( ′ , [ ′ ], ) (shrinking phase) and reapplying the NCCH heuristic shown in Algorithm 2 (expansion phase). The parameter is determined by the input parameter that indicates a percentage of the number of vertices of the input graph (line 2). Thereby, we define the -neighborhood, denoted by , as the family of all solutions obtained by removing vertices from ′ and reapplying the NCCH heuristic described in Algorithm 2. As the number of neighbors grows exponentially, only neighbors in are analyzed in each local search step. The local search procedure, shown in Algorithm 4, tries to increase the solution's cardinality returned by the construction phase. Each time it finds an enhancement, the current best solution best is updated (lines 6-8). The method stops if there is no further improvement of the solution after exploring neighbors in .

The MS NCCH heuristic
Using the constructive procedure and the local search algorithm we propose the MS NCCH heuristic described in Algorithm 5, based on a multi-start schema.
Each MS NCCH iteration starts with an initial solution generated for the problem (lines 6-9). For the first iteration, we use the greedy NCCH heuristic described in Algorithm 2. Then, we use the constructive procedure defined in Section 3.2 for the remaining iterations. Once a feasible solution is created, the local search tries to improve the quality of that solution (line 10). The best global solution best is updated (line 12) if the solution found by the local search improves it. The method maintain a pool of elite solutions (line 13) that is updated with new solutions found in the local search procedure that are better than those in , respecting the size limit of . That implies that all neighboring solutions obtained in the local search stage are considered. As has a limited size, when a solution has a better value than the worst solution of , the new solution enters the set, while the worst solution leaves . From lines 14 to 17 the set of elite solutions is checked. If some new solution enters the set, then the counter for the number of iterations without change in is restarted.

Algorithm 4: Local Search.
Input: The balanced subgraph , a maximum number of attempts and the percentage of vertices to remove . Output: The balanced subgraph with at least as many vertices as .
Update with new solutions found in the local search that are better than those in . 14 if there is a change in then The algorithm stops if it reaches the maximum running time , the maximum number of overall iterations , or there are no changes in the elite set during iterations.

The matheuristic MH MBSP
The MH MBSP is a matheuristic developed using the proposed formulation presented in Section 2 and the solutions inserted in the elite set by the MS NCCH heuristic. The proposed compact formulation can be described using a polynomial number of variables and constraints. This feature allows the formulation to be integrated with the MS NCCH heuristic in order to quickly find feasible solutions after fixing a subset of variables. The quality of solution depends on the choice of which and how many variables will be selected to be fixed.
In our first approach we try to identify a subset of vertices ⊆ that appear in all solutions of the elite set in the end of the method MS NCCH. Then, we include Constraints 3.1 into the formulation, indicating that these vertices must be present in the solution: Unfortunately, this approach turned out to be very restrictive to the formulation, not improving the solutions. So, we decided to use the subset of vertices that appear in all solutions that were ever inserted in the elite set. This strategy improved the results and is adaptive for each instance, as the number of solutions inserted in the elite set varies depending on the instance. If the time used by the heuristic reached the time limit , it returns the solution found by the MS NCCH heuristic (line 5); otherwise, the model is created according to Section 2 (line 6), and expanded by Constraints 3.1 defined by vertex set , indicating that these vertices must be fixed in the solution, belonging to any of solution sets 1 or 2 .
Subsequently, the formulation is solved using ( − ) as the time limit (line 7). The matheuristic result is the best solution between the one found by the MS NCCH heuristic and the one found by solving the model with the initialized variables (lines 8-11).

Computational experiments
This section reports the computational experiments carried out with the heuristics methods described in the previous sections. We performed several computational experiments to validate the effectiveness of the proposed methods. First, we analyzed the same set of instances of [8], which contains several groups of instances (DMERN, Portfolio, and Random) having signed graphs with different sizes and densities. Then, we analyzed the set of instances proposed in [16], which contains three groups of instances (N, P, and R instances).

Exact method
The new proposed mathematical formulation described in Section 2 is the starting point for our exact method used in the matheuristic. We executed it for an hour to verify its performance. Experiments for the exact method were executed on an Intel (R) Core i5-4460S CPU @ 2.90 GHz and 8 Gb of RAM using the operating system Fedora 22. All methods were programmed in C++ language using the gcc compiler. The IBM ILOG CPLEX 12.6 was used with a single thread of execution, and all other CPLEX parameters were left to their default values.
In [8], the authors showed the results obtained by their exact method based on removing cycles with an odd number of negative edges. These experiments were executed on an Intel (R) Pentium (R) 4 CPU 3.06 GHz, equipped with 3 GB of RAM. In order to establish a more similar environment, we calculated the associated time scaling factor of 0.34 (based on Pass mark benchmark), representing the ratio between the speed of the processor in [8] and the speed of our processor. Based on this, we show their time values multiplied by this associated factor. Table 1 shows the results obtained for instances set DMERN, composed of 61 instances with the number of vertices | | varying from 144 to 8317. Columns 2, 3, and 4 present, respectively, the upper bound provided by the root node relaxation, the best lower bound value, and the time to reach this value for the exact model shown in [8]. Columns 5, 6, and 7 have the same meaning, but for our exact method. The symbol "-" means that the optimal solution was not found within the execution time limit.
According to Table 1, our method could solve nine more instances than the method in [8]. On the other hand, the new formulation presents a much weaker relaxation at the root node. However, the resolution times were short, and our exact method presents high-quality lower bounds (feasible solutions). Together with the fact that the new formulation is compact, these features indicate that our proposed formulation may be a good choice to be integrated with a heuristic to search for high-quality and fast feasible solutions.

Choosing parameters
We used the Irace package [15] to select the parameters for the MS NCCH heuristic. This package includes the iterated F-race algorithm [3] and several extensions for finding the most appropriate parameter settings for an optimization algorithm. It needs the specification of the parameters to be configured and instances for training. According to Section 3.4, the parameters to be configured are: -: The percentage of vertices to be removed from the solution set in the local search phase. Its values ranged from 5% to 50%. -: The maximum number of neighbors explored during the local search. Its values were in the range of 10 to 1000. -: The number of iterations without a change in the elite set, tested in the range of 1 to 50.
We configure the Irace to run 1000 experiments using 10 instances, none of them present in previous works or used in later comparisons. The instances used were:boeing1, boeing2, capri, degen2, file150.7, finnis, scfxm1, scfxm3, scsd6 and standata, coming from a set of general mixed integer programs in Netlib (https://www. netlib.org/lp/data). The instances were selected with different sizes and densities to provide parameters that work well with different input structures.
From the results of Irace, the final configurations were: = 33.3%, = 1000 and = 20. The parameter was set to 10, as used in Martins et al. [18] and de Holanda Maia et al. [6]. For all experiments, we used = 100 and = 30 s.

Analyzing the performance of the heuristics
Tests were developed on an Intel(R) Core(TM) i5-7200U CPU @ 2.50 GHz with 8 Gb of RAM using Ubuntu 16.04. All methods were programmed in C++ language using the gcc compiler. We used the instances presented in [16] to analyze the NCCH, MS-NCCH, and MH-MBSP heuristics performance. The goal of these tests was to verify the impact on the solution of each developed strategy. For all heuristics, the function is the vertex degree.
The three sets of instances are N-instances, P-instances, and R-instances. N-instances consist of 34 signed graphs obtained from a set of preprocessed and scaled Netlib matrices. P-instances are 50 randomly generated   power-law signed graphs, five instances for each size | | = 100 with ∈ {1, . . . , 10}. R-instances are random signed graphs generated with the function ℎ( , ) provided by Mathematica [20], with the number of vertices ranging from 50 to 200 and density in the set {0.1, 0.3, 0.5, 0.7}. Table 2 shows the results for N-instances. Column 2 shows the results of the greedy heuristic NCCH. The time in seconds, used by this heuristic, is presented in Column 3. Columns 4 and 5 show the average result and time of MS-NCCH heuristic, respectively, after 10 runs. Next two columns show the same information but for the MH-MBSP matheuristic. The final column shows the gap between the best solution found by the exact method (ILP) and the average result found by the matheuristic (Col. 6). The gap is calculated by the following equation: where Best ILP is the best solution found by our exact method in the time limit of 1 h. Since the exact optimization phase of the matheuristic starts when the MS-NCCH heuristic finishes, the results and time of MS-NCCH are always worse and less than or equal to those obtained by MH-MBSP. For the same reason, the NCCH results are always worse than or equal to those obtained by the MS-NCCH heuristic. Cells with an asterisk in the last column indicate that the problem was not optimally solved in 1 h and the gap is calculated related to the best feasible solution found. Moreover, the value "-" in the time column indicates that the method reached the maximum assigned time of 30 s.
The N-instances set contains "easy" instances, and the greedy heuristic NCCH was able to solve optimally 22 of the 34 instances. For those 12 instances not solved optimally by the NCCH heuristic, the MS-NCCH heuristic solved optimally eight new instances and improved the results for other two unsolved instances. The exact optimization phase of the matheuristic did not improve the results of the remaining four unsolved instances. When analyzing these instances, we can see that they are instances in which the MS-NCCH heuristic used all execution time, and there was no available time to perform the exact optimization phase in three of them. Table 3 shows the results for P-instances. Columns of this table have the same meaning as in Table 2. These instances are a little more complex than N-instances, and the results reflect that. In this set, the NCCH heuristic found seven optimal results. For the remaining 43 instances, the MS-NCCH heuristic always obtained a better result than the NCCH heuristic and found 10 additional optimal solutions. For the remaining 33 unsolved instances, the matheuristic MH-MBSP improved all results and found 19 new optimal values.
The R-instances are classified in [16] as hard instances because they are graphs connected with many negative cycles. Table 4 shows the results for these instances, and its columns have the same meaning as in the previous tables. The MS-NCCH heuristic improved all results found by the NCCH heuristic. Moreover, the MS-NCCH The above experiments showed the impact that each NCC-rule-based heuristic has on the quality of the solutions. The next section compares the MH-MBSP matheuristic with the heuristic GRASP developed in [8], which is the best-known metaheuristic for the problem.

DMERN, portfolio and random graphs
This section compares the MH-MBSP matheuristic to the GRASP method proposed in [8], which is the best metaheuristic in the literature for the problem. As the authors kindly provided the GRASP source code, we  ran both heuristics in the same machine and executed each instance ten times with the same time limit as MH-MBSP (30 s). Table 5 shows the results obtained for the set of instances DMERN. Columns 2 and 3 show the average GRASP solution and the gap between the best solution found by the exact method (ILP) and the average result found by the GRASP. Columns 4 and 6 have the same meaning as columns 2 and 3 but for the MH-MBSP matheuristic. Moreover, column 5 shows the average execution time for MH-MBSP, because the optimization phase could stop before the time limit. Those results in bold font are strictly better than the results from the other methods. An asterisk in the gap cells indicates that the exact method could not find an optimal solution in one hour, and the gap is calculated using the best solution found. The value "-" in the time cells indicates that the method reached the maximum assigned time of 30 s. Table 5 shows that the GRASP heuristic obtained the best average results for two instances, which are the smallest in the group and correspond to graphs with 144 vertices (danoint) and 184 vertices (bienst1). In these cases, the GRASP heuristic always found the optimal value, while the matheuristic MH-MBSP found the optimal value sometimes. For the 53 remaining instances, the MH-MBSP got better results than GRASP.
In general, as the instances' size increases, it is more difficult for GRASP to find good quality solutions in a limited time. It may happen because GRASP has a strictly quadratic and more intensive local search. Thus, for small instances, this more intensive search can find good values in a short time, but it would take much more time to achieve a better result for larger ones. In contrast, our local search explores a limited number of neighbors that is a fraction of the total number of vertices. Thus, we work with a local search with a linear behavior, which obtains good quality results for large instances without spending so much time.
We also analyzed the performance of the proposed methods on the Portfolio instances set, which is composed of instances with the number of vertices | | varying in the set {390, 420, 450, 480, 510}, and the threshold value used in the creation of the instances varying in the set {0.300, 0.325, 0.350, 0.375, 0.400} (see Huffner et al. [14] for more details). For each combination of these values, there are ten different signed graphs, resulting in 250 instances. Table 6 shows the results for the Portfolio instances. This table is presented in a format similar to that used by Figueiredo and Frota [8]. In this case, each line represents a group of 10 instances generated with the same threshold . Columns 3-7 in this table have the same meaning as columns 2-6 in Table 5.
The matheuristics MH-MBSP always got a better result than the GRASP heuristic. Moreover, the value obtained by the matheuristic was very accurate as the gap column values are at a maximum distance of 1 from the optimal solution. In three groups of instances, the matheuristic MH-MBSP found all optimal values. For instances with 420 vertices and parameter = 350, it found a better solution than the one found by the exact algorithm in one hour of processing. Additionally, the average resolution time obtained by MH-MBSP was much less than the time limit of 30 s.
In the last experiment, we consider the results obtained for the Random set of instances, which is composed by randomly generated instances classified into two groups. Group 1 is the set of random signed graphs without parallel edges ( Table 7 presents the results where each row represents a group of 27 graphs. Similarly to Table 6, we give the average information per group of instances. For these instances, the GRASP heuristic obtained the best average results in all groups. However, the gaps were small for both heuristics, being at an average distance of only 1 from the best value obtained by the exact algorithm. We verified that the time used by the MS-NCCH heuristic is 0.6% of the time used by the exact phase of the MH-MBSP for these instances. We believe that if the local search of the MS-NCCH is more exhaustive for small instances like these, the elite set's quality may improve and accelerate the exact phase.     We can conclude from Tables 5-7 that, up to 200 vertices, the GRASP and MH-MBSP heuristics are quite similar in terms of the results' quality. But for instances with more than 200 vertices, the heuristic MH-MBSP always achieved better results, both in terms of solution quality and CPU time. For the last experiment, we evaluated and compared the run-time distributions (also know as time-to-target plots) of the GRASP heuristic and the MH-MBSP matheuristic for some instances. Time-to-target plots display on the ordinate axis the probability that an algorithm will find a solution at least as good as a given target value in a given running time, shown on the abscissa axis.
The heuristics were performed 200 times in each instance, with different seeds for the pseudo-random number generator. Then, the empirical probability distributions of each heuristic time spent to find a target solution value are plotted. The methodology described in [1] was adopted to trace each heuristic's empirical distribution. The run times ( ) are sorted in ascending order, and a probability = ( − 1 2 )/200 is associated with each . Then, the = ( , ) points for = 1, . . . , 200 are plotted. The best algorithm is the one that presents the corresponding leftmost plots.
We chose two instances from different sizes to analyze the performance of the methods (air05 and ns1688347 ). Table 8 shows the instance's name, size, and target value. Both heuristics reached all chosen target values in the previous experiments since our interest was the convergence analysis. Figure 1 shows the cumulative probability for the MH-MBSP for the instance air05, while Figure 2 shows the cumulative probability for both heuristics. In Figure 1, we can see how MH-MBSP reaches the target value very fast, with a high probability of finding the target in a time close to 0.16 s. Also, compared with the GRASP heuristic, the figure shows that GRASP needs much more time to reach the target value. In addition, for a probability of finding the target value greater than 50%, the GRASP heuristic needs at least 50 s. Figures 3 and 4 show the cumulative probability for the ns1688347 instance. Again, the MH-MBSP quickly reaches (less than one second) the target value for all iterations. While Figure 4 shows that GRASP presents a probability of 70% of reaching the target value. According to the obtained results, the minimum time to find the target solution for this instance by the GRASP heuristic was 3.3 s. Moreover, it was necessary to use a time close to 8.8 s to reach the target value with a probability greater than 50%.

Conclusions
In this work, we developed methods to obtain heuristic solutions for the MBSP problem. We also provided an alternative formulation for the problem based on the basic definition of a balanced signed graph, whose size grows polynomially as a function of the input size.
Our main contribution in this work is the development of a matheuristic for the MBSP problem. Using different strategies as a multi-start scheme and a fast local search combined with an exact optimization phase allowed our method to obtain quickly good quality solutions for the tested sets of instances.
The results show that the proposed matheuristic is an appropriate choice to find good quality solutions for the MBSP problem. Furthermore, for large instances (with more than 200 vertices), the matheuristic MH MBSP achieved in 30 s similar or better results than those obtained by solving exactly the problem using the proposed formulation in one hour of processing.
Our current research focus on the investigation of the structure of the polynomial size formulation in order to develop robust and efficient exact methods for the MBSP. Another direction for future research involves extending the solution method to similar balanced subgraph problems, such as KMBSP, where the goal is to find -balanced components in the graph.