### abstract ###
Boosting is of great interest recently in the machine learning community because  of the impressive performance for classification and regression problems
The success of boosting algorithms may be interpreted in terms of the margin theory   CITATION
Recently, it has been shown that generalization error of classifiers can be obtained by explicitly taking the margin distribution of the training data into account
Most of the current boosting algorithms in practice usually optimize a convex loss function and do not make use of the margin distribution
In this work we design a new boosting algorithm, termed margin-distribution boosting (MDBoost), which directly maximizes the average margin and minimizes the margin variance at the same time
This way the margin distribution is optimized
A totally-corrective optimization algorithm based on column generation is proposed to implement MDBoost
Experiments on various  datasets show that MDBoost outperforms AdaBoost and LPBoost in most cases
### introduction ###
Boosting offers a method for improving existing classification algorithms
Given a training dataset, boosting builds a   strong  classifier using only a  weak  learning algorithm   CITATION
Typically, a weak (or base) classifier generated by the weak learning algorithm has a misclassification error that is slightly better than random guess
A strong classifier has a much better test error
In this sense,  boosting algorithms can boost the weak learning algorithm to obtain a much stronger classifier
Boosting was originally proposed as an ensemble learning method, which depends on majority voting of multiple individual classifiers
Later, Breiman  CITATION  and Friedman  CITATION  observed that many boosting algorithms can be viewed as gradient descent optimization in functional space
Mason  CITATION  developed  AnyBoost for boosting arbitrary loss functions with a similar idea
Despite the large success in practice of these boosting algorithms, there are still open questions about why and how boosting works
Inspired by the large-margin theory in kernel methods,  Schapire   CITATION  presented a margin-based bound for AdaBoost, which tries to interpret AdaBoost's success with the margin theory
Although the margin theory provides a qualitative explanation of the effectiveness  of boosting, the bounds are quantitatively weak
A recent work  CITATION  has proffered new tighter margin bounds, which may be useful for quantitative predictions
Arc-Gv   CITATION , a variant of the AdaBoost algorithm, was designed  by Breiman to empirically test AdaBoost's convergence properties
It   is very similar to AdaBoost (only different  in calculating the coefficient associated with each weak classifier) such that  it increases margins even more aggressively than AdaBoost
Breiman's   experiments on Arc-Gv  show contrary results to the margin theory: Arc-Gv always has a minimum margin that is provably larger than AdaBoost but Arc-Gv performs worse in terms of test error  CITATION
Grove and Schuurmans  CITATION  observed the same phenomenon
In the literature, much work has focused on maximizing the minimum margin  CITATION
Recently, Reyzin and Schapire  CITATION  re-ran Breiman's experiments by controlling weak classifiers' complexity
They found that a better margin distribution is more important than the minimum margin
It is of importance to have a large minimum margin, but not at the expense of other factors
They thus conjectured that maximizing the average margin rather than the minimum margin may lead to improved boosting algorithms
We try to verify this conjecture in this work
Recently, Garg and Roth  CITATION   introduced margin distribution based complexity measure for learning classifiers and developed margin distribution based generalization bounds
Competitive classification results have been shown by optimizing this bound
Another relevant work is  CITATION
CITATION  applies a boosting method to optimize the margin distribution based generalization bound obtained by   CITATION
Experiments show that the new boosting methods achieve considerable improvements over AdaBoost
The optimization of this new boosting method is based on the AnyBoost framework  CITATION
Aligned with these attempts, we propose a new boosting algorithm through optimization of margin distribution (termed MDBoost)
Instead of minimizing a margin distribution based generalization bound, we directly optimize the margin distribution: maximizing the average margin and at the same time minimizing the variance of the margin distribution
The theoretical justification of the proposed MDBoost is that, approximately, AdaBoost actually maximizes the average margin and minimizes the margin variance
The main contributions of our work are as follows
We propose a new totally-corrective boosting algorithm, MDBoost, by optimizing the margin distribution directly
The optimization procedure of MDBoost is based on the idea of  column generation that has been widely used in large-scale linear programming
We empirically demonstrate that MDBoost outperforms AdaBoost and LPBoost on most UCI datasets used in our experiments
The success of MDBoost verifies the conjecture in    CITATION
Our results also show that MDBoost has achieved similar (or better) classification performance compared with AdaBoost-CG  CITATION
AdaBoost-CG  is also totally-corrective in the sense  that all the linear coefficients of the weak classifiers are updated during the training
An advantage of MDBoost is that,  at each iteration, MDBoost solves a quadratic program while AdaBoost-CG needs to solve a general convex program
Throughout the paper, a matrix is denoted by an upper-case letter ( SYMBOL ); a column vector is denoted by a bold low-case letter ( SYMBOL )
The  SYMBOL th row of  SYMBOL  is denoted by  SYMBOL  and the  SYMBOL th column  SYMBOL
We use  SYMBOL  to denote the identity matrix
SYMBOL  and    SYMBOL  are column vectors of  SYMBOL 's and  SYMBOL 's, respectively
Their sizes will be clear from the context
We use  SYMBOL  to denote component-wise inequalities
The rest of the paper is structured as follows
In Section  we present the main idea
In Section  the dual of the MDBoost's optimization problem is derived, which enables us to design an LPBoost-like column generation based boosting algorithm
We provide an experimental comparison of the algorithms on UCI data in Section , and conclude the paper in Section
