.. note::
    :class: sphx-glr-download-link-note

    Click :ref:`here <sphx_glr_download_examples_example_detect_outliers.py>` to download the full example code
.. rst-class:: sphx-glr-example-title

.. _sphx_glr_examples_example_detect_outliers.py:


Outlier detection via leave-one-out
===================================

Outliers can sometimes be identified by assessing the influence of each
datapoint. To assess the influence of one point, we fit the dataset while the
point and compare the result with the fit of the full dataset. The code below
shows how to do this with lmfit. Note that the presented method is very basic.


.. code-block:: default

    from collections import defaultdict

    import matplotlib.pyplot as plt
    import numpy as np

    import lmfit

    plt.rcParams['figure.dpi'] = 130
    plt.rcParams['figure.autolayout'] = True






Generate test data and model. Apply the model to the data


.. code-block:: default

    x = np.linspace(0.3, 10, 100)
    np.random.seed(1)
    y = 1.0 / (0.1 * x) + 2.0 + 3 * np.random.randn(x.size)

    params = lmfit.Parameters()
    params.add_many(('a', 0.1), ('b', 1))


    def func(x, a, b):
        return 1.0 / (a * x) + b


    # Make 5 points outliers
    idx = np.random.randint(0, x.size, 5)
    y[idx] += 10 * np.random.randn(idx.size)

    # Fit the data
    model = lmfit.Model(func, independent_vars=['x'])
    fit_result = model.fit(y, x=x, a=0.1, b=2)







and gives the plot and fitting results below:


.. code-block:: default


    fit_result.plot_fit()
    plt.plot(x[idx], y[idx], 'o', color='r', label='outliers')
    plt.show()
    print(fit_result.fit_report())




.. image:: /examples/images/sphx_glr_example_detect_outliers_001.png
    :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    /Users/Newville/Codes/lmfit-py/examples/example_detect_outliers.py:46: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.
      plt.show()
    [[Model]]
        Model(func)
    [[Fit Statistics]]
        # fitting method   = leastsq
        # function evals   = 12
        # data points      = 100
        # variables        = 2
        chi-square         = 1338.34458
        reduced chi-square = 13.6565773
        Akaike info crit   = 263.401856
        Bayesian info crit = 268.612196
    [[Variables]]
        a:  0.08937623 +/- 0.00590168 (6.60%) (init = 0.1)
        b:  1.51298992 +/- 0.46229147 (30.55%) (init = 2)
    [[Correlations]] (unreported correlations are < 0.100)
        C(a, b) =  0.601



Fit the dataset while omitting one data point


.. code-block:: default


    best_vals = defaultdict(lambda: np.zeros(x.size))
    stderrs = defaultdict(lambda: np.zeros(x.size))
    chi_sq = np.zeros_like(x)
    for i in range(x.size):
        idx2 = np.arange(0, x.size)
        idx2 = np.delete(idx2, i)
        tmp_x = x[idx2]
        tmp = model.fit(y[idx2],
                        x=tmp_x,
                        a=fit_result.params['a'],
                        b=fit_result.params['b'])
        chi_sq[i] = tmp.chisqr
        for p in tmp.params:
            tpar = tmp.params[p]
            best_vals[p][i] = tpar.value
            stderrs[p][i] = (tpar.stderr / fit_result.params[p].stderr)







Plot the influence on the red. chisqr of each point


.. code-block:: default


    fig, ax = plt.subplots()
    ax.plot(x, (fit_result.chisqr - chi_sq) / chi_sq)
    ax.scatter(x[idx],
               fit_result.chisqr / chi_sq[idx] - 1,
               color='r',
               label='outlier')
    ax.set_ylabel(r'Relative red. $\chi^2$ change')
    ax.set_xlabel('x')
    ax.legend()




.. image:: /examples/images/sphx_glr_example_detect_outliers_002.png
    :class: sphx-glr-single-img




Plot the influence on the parameter value and error of each point


.. code-block:: default


    fig, axs = plt.subplots(4, figsize=(4, 7), sharex='col')
    axs[0].plot(x, best_vals['a'])
    axs[0].scatter(x[idx], best_vals['a'][idx], color='r', label='outlier')
    axs[0].set_ylabel('best a')

    axs[1].plot(x, best_vals['b'])
    axs[1].scatter(x[idx], best_vals['b'][idx], color='r', label='outlier')
    axs[1].set_ylabel('best b')

    axs[2].plot(x, stderrs['a'])
    axs[2].scatter(x[idx], stderrs['a'][idx], color='r', label='outlier')
    axs[2].set_ylabel('err a change')

    axs[3].plot(x, stderrs['b'])
    axs[3].scatter(x[idx], stderrs['b'][idx], color='r', label='outlier')
    axs[3].set_ylabel('err b change')

    axs[3].set_xlabel('x')



.. image:: /examples/images/sphx_glr_example_detect_outliers_003.png
    :class: sphx-glr-single-img





.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  1.518 seconds)


.. _sphx_glr_download_examples_example_detect_outliers.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example



  .. container:: sphx-glr-download

     :download:`Download Python source code: example_detect_outliers.py <example_detect_outliers.py>`



  .. container:: sphx-glr-download

     :download:`Download Jupyter notebook: example_detect_outliers.ipynb <example_detect_outliers.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
