大主宰之灵路天蚕土豆,有声

新聞中心

這里有您想知道的互聯(lián)網(wǎng)營銷解決方案

時(shí)間序列平滑法中邊緣數(shù)據(jù)的處理技術(shù)

我們可以用來更好地理解趨勢(或幫助模式識別/預(yù)測算法)的一種方法是時(shí)間序列平滑。以下傳統(tǒng)的方法:

移動平均線——簡單、容易、有效(但會給時(shí)間序列數(shù)據(jù)一個(gè)“滯后”的觀測)，Savitzky-Golay過濾器——有效但更復(fù)雜，它包含了有一些直觀的超參數(shù)

還有一個(gè)不太傳統(tǒng)的方法是解熱方程，它有更直觀的超參數(shù)，而且也非?？?在本文中，我們將考慮一個(gè)稍微復(fù)雜一些的方程，但它具有保存邊緣的效果。

這個(gè)方程叫做Perona-Malik PDE (偏微分方程)，它的平滑效果可以在下面的動圖中看到:

上圖是該保持邊緣平滑方法在用于于特斯拉(TSLA)在2022年的收盤價(jià)的效果。標(biāo)題中的“t=x”對應(yīng)于我們平滑級數(shù)的時(shí)間(以非維度單位)。

如果你對上面的效果感興趣，那么本文將解釋以下內(nèi)容:

Perona-Malik PDE(偏微分方程)，以及為什么要使用它
如何求解偏微分方程。
和熱方程的比較

Perona-Malik PDE

下面是將要處理的方程公式:

Perona-Malik PDE。式中u是我們要平滑的時(shí)間序列，α是控制邊保的參數(shù)(α越小對應(yīng)的邊保越多)。

看著有點(diǎn)復(fù)雜，我們繼續(xù)解釋。當(dāng)我們試圖解決這個(gè)PDE的原始形式時(shí)，它會導(dǎo)致一些問題;所以我們要考慮一種修改后的形式:

基本上，函數(shù)g的內(nèi)部進(jìn)行了一次高斯函數(shù)卷積(也就是說，它變得更平滑了)。這被稱為正則化，我們只要知道它是可解的就可以了

這個(gè)一個(gè)可怕的等式比上面更復(fù)雜了，但是這我們沒有多個(gè)空間維度，我們在平滑的是一個(gè)時(shí)間序列，所以它只有一個(gè)維度。所以需要解的方程是:

這樣看著就簡單多了，至少所有的字母我們都認(rèn)識了，但是如果想了解細(xì)節(jié)，可以通過擴(kuò)展和簡化得到上面的公式進(jìn)行推導(dǎo)，雖然不建議這么做，但是如果你喜歡的話那隨意。

我們剛提到處理的時(shí)間序列是一維的，但是為什么偏微分方程是二維的?

這個(gè)偏微分方程是根據(jù)時(shí)間來求解的。從本質(zhì)上講時(shí)間上的每一步都使數(shù)據(jù)進(jìn)一步平滑。所以t越大，時(shí)間序列越平滑，這意味著空間變量x表示時(shí)間序列中的“時(shí)間”，后面的求解會詳細(xì)解釋。

為什么要用這個(gè)方程呢?

熱方程的問題是它不能很好地保存邊。保留這些邊緣來捕捉價(jià)格的大幅快速波動可能是可取的，但要去除任何小但高頻的噪聲。

這種方法比熱方程更難，因?yàn)镻erona-Malik PDE是非線性的(不像熱方程是線性的)。一般來說，非線性方程不像線性方程那么容易求解。

如何求解這個(gè)偏微分方程

我們將使用一種稱為有限差分（finite differences）的方法。它是一種求偏微分（或常微分）方程和方程組定解問題的數(shù)值解的方法。你可以將其視為每次我們在下圖中遇到交集時(shí)找到解決方案：

隨著時(shí)間和空間分裂成離散間隔的圖示。這里空間中的離散區(qū)間是從 [0, 1] 開始的，時(shí)間上的離散區(qū)間是從 t=0 到 t=sk，其中 s 是我們獲取的區(qū)間。線的交點(diǎn)是我們找到偏微分方程解的位置。

在處理數(shù)字之前，我們需要用數(shù)學(xué)方法來定義整個(gè)問題。由于方程在空間上是二階的，在時(shí)間上是一階的，所以需要兩個(gè)邊界條件和一個(gè)初始條件:

我們將求解以平滑時(shí)間序列的方程組（這個(gè)方程看起來比代碼復(fù)雜得多?。?，我們的起點(diǎn)是股票價(jià)格時(shí)間序列，并且終點(diǎn)總是具有相同的價(jià)格。

那么我們?nèi)绾螐臄?shù)值上開始求解呢?我們最初的方法是用這些導(dǎo)數(shù)的有限差分近似，Perona-Malik PDE中導(dǎo)數(shù)的近似值，這些導(dǎo)數(shù)的推導(dǎo)超出了本文的范圍，所以就不詳細(xì)寫了。

上面公式中，h和k分別是空間和時(shí)間離散點(diǎn)之間的距離。這里可以使用相同的公式計(jì)算 c 的導(dǎo)數(shù)。

在我們的問題中，空間中的離散點(diǎn)被分開一天，所以 h = 1。建議使用 k < 0.1 的值來保持有限差分公式的一些準(zhǔn)確性。因?yàn)槿绻鹝值太大，則有限差分方案會變得不穩(wěn)定。

我們現(xiàn)在需要將這些近似值放入偏微分方程中……這會讓公式看起來更加復(fù)雜，這是我的計(jì)算代數(shù)軟件給出的結(jié)果:

這就是Perona-Malik PDE的離散形式，越來越復(fù)雜了。有沒有更好的方法呢？

我們可以偷懶并使用微分矩陣。因?yàn)闀r(shí)間序列是一組離散點(diǎn)，所以可以使用矩陣向量乘積進(jìn)行微分。

如果以前從未聽說過這個(gè)，那么這里有一些代碼展示了如何使用矩陣向量積計(jì)算多項(xiàng)式的簡單導(dǎo)數(shù)：

import numpy as np
 
 import plotly.io as pio
 import plotly.graph_objects as go
 pio.renderers.default='browser'
 
 
 if __name__ == '__main__':
     
    n = 100 # The number of discrete points
    a = -4 # The interval starting point
    b = 4 # The interval ending point
     
    # The discrete points and spacing
    x = np.linspace(a, b, n)
    h = (b-a)/n
     
    # Function to differentiate, and its analytical derivative for checking
    f = 10*x**3 + x**2
    fx = 30*x**2 + 2*x
     
    # Form the differentiation matrix
    Dx = (
        np.diag(np.ones(n-1), 1)
        - np.diag(np.ones(n-1), -1)
    )/(2*h)
     
    # Calculate the numerical derivative on the interior only
    Ux = Dx[1:-1, :]@f
     
    # Plot the figure
    fig = go.Figure()
    fig.add_trace(go.Scatter(x = x, y = f, name = 'Function'))
    fig.add_trace(go.Scatter(x = x, y = fx, name = 'Analyitcal Derivative'))
    fig.add_trace(
        go.Scatter(
            x = x[1:-1],
            y = Ux,
            name='Finite Difference Derivative',
            line={'dash': 'dash'}
        )
    )
     
    fig.update_layout(
        yaxis = {'title': 'f(x), f_x(x)'},
        xaxis = {'title': 'x'},
        title = 'Finite Difference Derivative Test',
        legend = {'x': 0, 'y': 1.075, 'orientation': 'h'},
        width = 700,
        height = 700,
    )
     
    fig.show()

以上代碼應(yīng)該產(chǎn)生以下輸出:

使用矩陣向量積可以對簡單多項(xiàng)式求導(dǎo)。它本質(zhì)上是一階導(dǎo)數(shù)的有限差分逼近

已轉(zhuǎn)化為矩陣向量乘積，使用下面的代碼：

Dx = (
    np.diag(np.ones(n-1), 1) # u_{r+1, s} terms
    - np.diag(np.ones(n-1), -1) # u_{r-1, s} terms
 )/(2*h)# Calculate the numerical derivative on the interior only
 Ux = Dx[1:-1, :]@f

通過使用這種有限差分公式，我們只能找到內(nèi)部點(diǎn)的導(dǎo)數(shù)。比如在域的第一個(gè)點(diǎn) (x = r = 0) 有近似值：

雖然這是沒有意義的，因?yàn)樾枰挠?jì)算點(diǎn)在域之外。但是這對我們來說不是一個(gè)問題——因?yàn)槲覀冎唤鈨?nèi)部點(diǎn)的偏微分方程，而這些解在端點(diǎn)處是固定的。

我們使用一個(gè)簡單的小系統(tǒng)的離散方程(比如有5個(gè)離散點(diǎn))，上面的解釋可能會清晰得多。

還有最后一個(gè)問題卷積是如何執(zhí)行的?最簡單的方法是使用scipy. nimage.gaussian_filter，但是這里我選擇的方法是解熱方程，通過一點(diǎn)數(shù)學(xué)技巧，可以證明高斯卷積可以解決熱方程。換句話說，我們要解

這可以用離散形式表示為：

高斯濾波中的標(biāo)準(zhǔn)差(σ)與我們通過σ2(τ) = 2τ求解上述方程的“時(shí)間”量有關(guān)，所以，要解的時(shí)間越長，標(biāo)準(zhǔn)差越大，時(shí)間序列就越平滑。

為什么要用這種方式進(jìn)行卷積?偏微分方程到卷積的連接非常簡潔!并且因?yàn)榭梢詫⑵⒎址匠糖蠼膺壿嬘簿幋a為循環(huán)，所以將其包裝在@numba.jit裝飾器中，提高了計(jì)算效率。

Python實(shí)現(xiàn)

現(xiàn)在我們已經(jīng)在數(shù)學(xué)方面做了艱苦的工作，編碼就變得非常直接了!

卷積的實(shí)現(xiàn)如下所示:

import numpy as np
 
 def convolve_PDE(U: np.array,
                  sigma: float = 1,
                  k: float = 0.05) -> np.array:
    '''
    Perform Gaussian convolution by solving the heat equation with Neumann
    boundary conditions
    Parameters
    ----------
    U : np.array
        The array to perform convolution on.
    sigma : float, optional
        The standard deviation of the guassian convolution.
    k : float, optional
        The step-size for the finite difference scheme (keep < 0.1 for accuracy)
    Returns
    -------
    U : np.array
        The convolved function
    '''
     
    t = 0
    t_end = sigma**2/2
     
    while t < t_end:
         
        # Implementing the nuemann boundary conditions
        U[0] = 2*k*U[1] + (1-2*k)*U[0]
        U[-1] = 2*k*U[-2] + (1-2*k)*U[-1]
         
        # Scheme on the interior nodes
        U[1:-1] = k*(U[2:] + U[:-2]) + (1-2*k)*U[1:-1]
         
        t += k
     
    return U

代碼看起來就很簡單了，對吧?Perona-Malik求解器的實(shí)現(xiàn)也不復(fù)雜：

def get_diff_mat(n: int) -> Tuple[np.array, np.array]:
    '''
    Get the first and second differentiation matrices, which are truncated
    to find the derivative on the interior points.
    Parameters
    ----------
    n : int
        The total number of discrete points
    Returns
    -------
    [Dx, Dxx] : np.array
        The first and second differentiation matrices, respecitvely.
    '''
     
    Dx = (
        np.diag(np.ones(n-1), 1)
        - np.diag(np.ones(n-1), -1)
    )/2
     
    Dxx = (
        np.diag(np.ones(n-1), 1)
        - 2*np.diag(np.ones(n), 0)
        + np.diag(np.ones(n-1), -1)
    )
     
    # Truncate the matrices so that we only determine the derivative on the
    # interior points (i.e. we don't calculate the derivative on the boundary)
    return Dx[1:-1, :], Dxx[1:-1, :]
 
 
 def perona_malik_smooth(p: np.array,
                        alpha: float = 10.0,
                        k: float = 0.05,
                        t_end: float = 5.0) -> np.array:
    '''
    Solve the Gaussian convolved Perona-Malik PDE using a basic finite
    difference scheme.
    Parameters
    ----------
    p : np.array
        The price array to smoothen.
    alpha : float, optional
        A parameter to control how much the PDE resembles the heat equation,
        the perona malik PDE -> heat equation as alpha -> infinity
    k : float, optional
        The step size in time (keep < 0.1 for accuracy)
    t_end : float, optional
        When to termininate the algorithm the larger the t_end, the smoother
        the series
    Returns
    -------
    U : np.array
        The Perona-Malik smoothened time series
    '''
     
    Dx, Dxx = get_diff_mat(p.shape[0])
     
    U = deepcopy(p)
    t = 0
 
    while t < t_end:
         
        # Find the convolution of U with the guassian, this ensures that the
        # PDE problem is well posed
        C = convolve_PDE(deepcopy(U))
         
        # Determine the derivatives by using matrix multiplication
        Cx = Dx@C
        Cxx = Dxx@C
         
        Ux = Dx@U
        Uxx = Dxx@U
         
        # Find the spatial component of the PDE
        PDE_space = (
            alpha*Uxx/(alpha + Cx**2)
            - 2*alpha*Ux*Cx*Cxx/(alpha + Cx**2)**2
        )
         
        # Solve the PDE for the next time-step
        U = np.hstack((
            np.array([p[0]]),
            U[1:-1] + k*PDE_space,
            np.array([p[-1]]),
        ))
         
        t += k
         
    return U

因?yàn)槭褂昧宋⒎志仃嚕曰旧峡梢园凑瘴覀儾榭捶匠痰母袷街袑懗銎⒎址匠痰碾x散形式。這使得調(diào)試比硬編碼有限差分方程更簡單。

但是這會不會引入數(shù)據(jù)泄漏？

如果平滑一個(gè)大的時(shí)間序列，然后將該序列分割成更小的部分，那么絕對會有數(shù)據(jù)泄漏。所以最好的方法是先切碎時(shí)間序列，然后平滑每個(gè)較小的序列。這樣根本不會有數(shù)據(jù)泄露！

與熱方程的比較

到目前為止，還沒有說Perona-Malik PDE參數(shù)α。如果你取α非常大，趨于無窮，就可以將Perona-Malik PDE化簡為熱方程。

對于大的 α，基本上有一個(gè)擴(kuò)散主導(dǎo)的機(jī)制，其中邊緣保留是有限的。我們最終會得到這個(gè)方程組：

這里一維的熱方程，以及問題的適當(dāng)?shù)某跏?邊界條件。用我們的微分矩陣法可以很好很容易地寫他的代碼:

def heat_smooth(p: np.array,
                k: float = 0.05,
                t_end: float = 5.0) -> np.array:
    '''
    Solve the heat equation using a basic finite difference approach.
    Parameters
    ----------
    p : np.array
        The price array to smoothen.
    k : float, optional
        The step size in time (keep < 0.1 for accuracy)
    t_end : float, optional
        When to termininate the algorithm the larger the t_end, the smoother
        the series
    Returns
    -------
    U : np.array
        The heat equation smoothened time series
    '''
     
    # Obtain the differentiation matrices
    _, Dxx = get_diff_mat(p.shape[0])
     
    U = deepcopy(p)
    t = 0
     
    while t < t_end:
 
        U = np.hstack((
            np.array([p[0]]),
            U[1:-1] + k*Dxx@U,
            np.array([p[-1]]),
        ))
         
        t += k
         
    return U

運(yùn)行Perona-Malik PDE其中設(shè)α = 1,000,000,和熱方程,基本得到了相同的結(jié)果。

最后讓我們看看一些對比結(jié)果的動圖

Perona-Malik PDE與熱方程的比較。參數(shù)σ = 1， α = 20, k=0.05。

上圖是比較Perona-Malik、熱方程和指數(shù)移動平均方法對MSFT股價(jià)在2022年期間的時(shí)間序列數(shù)據(jù)進(jìn)行平滑處理。

總結(jié)

總的來說，Perona-Malik 方法更好一些。雖然他的數(shù)學(xué)求解要復(fù)雜的多，但它確實(shí)對數(shù)據(jù)產(chǎn)生了非常好的結(jié)果。就個(gè)人而言，建議在開發(fā)過程中同時(shí)考慮 Perona Malik 和熱方程方法，看看哪種方法可以為我們解決的問題提供更好的結(jié)果。

網(wǎng)頁名稱：時(shí)間序列平滑法中邊緣數(shù)據(jù)的處理技術(shù)
本文路徑：http://fisionsoft.com.cn/article/dpcijgi.html