## RESULTS

We focus on the Cascadia subduction using the slip history on the megathrust derived from the inversion of geodetic records over more than 10 years, from 2007.0 to 2017.632 (*20*). SSEs in Cascadia show a first-order segmentation consisting of 13 major segments, which produced repeating ruptures over the studied time period (*18*) (Fig. 1). We low-pass filter the slip time series to calculate the slip rate, and then integrate it over the area of a given segment to obtain the slip potency rate time series for all the 13 segments

; Fig. 2, figs. S1 to S7, and the “Slip potency rate” and “Filters” sections]. The results shown here were obtained using the filter adopted by (*20*), i.e., an equiripple filter (EF) with normalized passband and stopband frequencies of 35^{−1} and 21^{−1}

, passband ripple of 1 dB, and stopband attenuation of 60 dB. The type and characteristics of the filter have a limited influence on our results (“Filters” and “Surrogate data” sections). We resort to the embedding theory (ET) (*21*) and the extreme value theory (EVT) (*22*) to study the dynamics of the system (see Methods). The system that we study is limited to the slow slipping belt on the megathrust, at a depth of about 30 to 40 km (Fig. 1). We treat it as an isolated dynamical system, thus neglecting the potential interactions with other parts either of the megathrust or of the surrounding medium. Since we are focusing on the slow slipping region, it follows that we are dealing with only a part of the whole phase space. The phase space is that space where all possible states of the system can be represented, i.e., that space having for axes all the variables needed to explain the dynamics of the system. Because of the assumption of isolated system, we refer to the region representing the slow slipping phenomenon as the phase space, and to the trajectories in it, which represent how the state of the dynamical system evolves with time, as the attractor of the system. The slip rate (and thus the slip potency rate, which is directly proportional to it) is considered to be the main observable variable governing the evolution of friction in laboratory experiments (*5*). To apply ET and EVT, we need the temporal evolution of at least one of the variables needed to explain the system’s dynamics, and the slip potency rate is therefore a good candidate. It shows episodic bursts with the most extreme events occurring quite regularly (Fig. 2 and figs. S1 to S7).

A quantity of interest to characterize a dynamical system is its dimension, a geometrical property of the underlying attractor. There are multiple definitions of an attractor’s dimension, but they should agree in telling how many effective degrees of freedom (dof) are needed to explain the dynamical system (*23*). The ET methods indicate an embedding dimension *m* ≃ 10 for all the segments, consisting in a correlation dimension of the attractor ν ≲ 5 (figs. S8 to S12 and “Embedding theory” section). The application of EVT for the calculation of the average attractor dimension (*D*) (see “Extreme value theory” section) confirms low average dimension, for which we get a value <4 for all the segments (3.1 < *D* < 3.5; green dots in Fig. 3). However, the signal-to-noise ratio (SNR) is high enough only for segments located in the northern section of the megathrust (segments #1 and #2). This is evident when surrogate data (*24*) are used to test the significance of the observed low dimensionality of the system (Fig. 3, “Surrogate data” section, and figs. S13 and S14).

The dimension, *D*, being an average in the phase space, does not capture the information about transient instabilities. Recent advances in EVT applied to dynamical systems theory have proven that it is possible to characterize these transient instabilities via two instantaneous properties (*25*): the instantaneous dimension (*d*) and the instantaneous extremal index (θ). These quantities refer to the state of the system in a given location of the phase space and, for this reason, are also referred as local. As in (*25*), here we name them “instantaneous,” because they depend on the specific time, and we want to avoid confusion with “local” in a geographical sense. Their distribution and temporal evolution are shown in figs. S15 to S19. The instantaneous dimension, *d*, indicates the number of variables needed to explain the dynamics of the system in a specific phase space location. We expect to find high values of *d* when a metastable state is approached. In this situation, the segment is transitioning from a stable to an unstable regime, or vice versa, and more variables are then needed to explain the state of the system. This is apparent from Fig. 4A, where a section of the attractor is plotted with *d* color coded, with high values of *d* marking the onset and end of large SSEs.

The instantaneous extremal index (θ) measures the inverse of the average persistence time around a given state in a region of the phase space (Fig. 4B). In this sense, it is a local and instantaneous quantity. The extremal index (Θ), a parameter in the range [0,1], measures the degree of clustering of extremes in a stationary process

$[{\stackrel{\xb7}{P}}_{s}(t)$ in our case] and can be defined as the reciprocal of the mean cluster size (*26*). A relationship between Θ and the metric entropy *H* of a system has been recently demonstrated (*27*)

(1)

The relationship between Θ and θ is not as straightforward as the one between *D* and *d*, but we must have Θ ∈ [θ_{min}, θ_{max}]. We can, thus, deduce a range of values for *H*. Given the values of θ that we have found (figs. S15 to S19) and the fact that the metric entropy is equal to the sum of all the positive Lyapunov exponents, we deduce that there is at least one positive Lyapunov exponent in our system. The positive Lyapunov exponents tell us how rapidly two trajectories in the phase space exponentially diverge. We can thus derive a predictability time *t** as the inverse of the metric entropy, and we find *t** ranging from a few days to about 2 months (Table 1). A similar conclusion, with predictability horizons of the order of days to weeks, is reached by applying nonlinear forecasting analysis (NFA) techniques (based on ET, see Methods) to estimate *H* (figs. S20 and S21) (*28*–*29*). The NFA provides only a lower bound estimate of *H* (*29*). It follows that the NFA estimate of *t** is an upper bound, which, for all the segments, is in the range estimated from EVT (Table 1).

## DISCUSSION

According to the EVT, the dynamics revealed by the filtered data requires at least four variables, which is the first integer above the fractal value of *D*. The description of stick-slip friction in the laboratory can be schematized by a spring-slider system and, in a fully inertial model, must involve slip, slip rate, and at least one additional “state” variable to allow fault healing after slip events so that they can repeat. The underlying physics and the number of state variables needed to explain laboratory friction remain matters of debate (*30*). It is important to clarify here the difference between the state variables coming from the rate-and-state description of friction (RS-state variables) and the state variables that determine the state of the system. Typically, a point in the phase space is defined by a state vector, with elements given by the variables needed to explain the dynamics of the system. An RS-state variable is only one among all the state variables belonging to a state vector.

In the absence of coupling with other spring-slider systems, the expected number of dof is 2 (because the system is described by a second-order differential equation) plus the number of RS-state variables, and the dimension should be between dof-1 and dof. For a noninertial single spring-slider, adding a second RS-state variable is a viable way to enact chaotic behavior, and a Kaplan-Yorke dimension of 2.119 ± 0.001 has been derived (*8*), consistent with the fact that we can drop 1 dof because of the noninertial assumption. A one-dimensional, fluid-infiltrated, rate-and-state friction spring-slider has 5 dof, i.e., its dynamics can be described by five variables: loading shear stress (which is proportional to slip in case of constant loading rate), slip rate, one RS-state variable (potentially related to the effective contact area) (*31*), porosity, and pore pressure (*32*). To describe SSEs, we can assume a quasi-static condition, and under this hypothesis, the number of state variables drops to four, consistent with the number of dof recovered from our analysis in Cascadia where pressurized fluid has been invoked to justify SSEs’ existence (*33*).

In a coupled system, the loading term would also depend on the variables associated with neighboring fault segments, which need to be introduced in the state vector to describe the system’s evolution. In the case of a one-dimensional system, similar to the one observed in Cascadia with along-strike segmentation, we would expect dimensions as high as 12, with the extra 8 dof carried in by the two adjacent segments. The neighbors’ effect may be particularly relevant in starting or stopping a slipping event. The maximum retrieved instantaneous dimension is in the range between 11 and 18 for the segments with high SNR. Considering the first-order segmentation here adopted and the available noisy data, we consider this result in good agreement with the maximum expected dimension of 12, as in a one-dimensional spring-and-slider chain. Potentially, an along-dip segmentation would also increase the number of neighbors, i.e., the number of dof. The proposed description considers only short-range elastic interactions, like in the Burridge-Knopoff model. Long-range interactions (i.e., interactions with segments further than the adjacent neighbors) would further increase the instantaneous dimension of the attractor.

Since the filter parameters may affect our findings, we perform the same analyses on the unfiltered time series as well as on time series filtered with various parameters and different filters. The surrogate data test on unfiltered time series suggests that we cannot reject the null hypothesis for which data can be described via a linear stochastic model (fig. S22). Furthermore, for unfiltered time series, we find extremal index values close to 1, with predictability horizons smaller than the data sampling rate, indicating a random system (fig. S26 for an example relative to segment #1). A similar conclusion would be reached by ET-based techniques (figs. S20, S21, and S27). This shows that, if not filtered, the noise in the data is dominating, masking the SSE dynamical structure. A more detailed discussion of the filter effects (and the tested filters) is reported in the “Filter” and “Surrogate data” sections. We stress here that filters applied to pure noise can potentially produce the spurious identification of finite correlation dimension (*34*), and for this reason, we further test the effects of the filters on the predictability of the time series generating (pseudo-)random time series and applying the same filters to them. The number of generated random time series is equal to the number of subfaults in segment #1, which is the segment with the largest number of subfaults. We then generate surrogate data and calculate the average dimension on both the filtered random time series and the surrogate data. The result shows that we would not be able to reject the null hypothesis according to which the time series were generated by a random process (fig. S28). We thus consider unlikely that the filters here adopted are introducing an apparent chaotic dynamics, and we notice that, for the cases that pass the surrogate data test, *D* is typically between 3 and 4 for all segments, independently of the chosen filter parameters (figs. S22 and S23).

Not all segments pass the surrogate data test. We conclude that SSEs on the northernmost segments of the Cascadia subduction zone (segments #1 and #2) result from deterministic chaos, while the central part (segments #3 to #5) shows ambiguous results, and for the southernmost portion of the fault (segments #6 to #13), we cannot infer deterministic chaos with the data at our disposal (figs. S13, S14, and S22 to S25). Our analysis implies that it should be possible to forecast the onset of large SSE ahead of time based on an explicit deterministic representation of the system dynamics or some machine learning algorithm that would implicitly capture it. Along that line, it is notable that the increase in instantaneous dimensionality seems to constitute a reliable precursor of the large SSEs (segments #1 and #2; Fig. 2 and fig. S1). The causal filter adopted here introduces a group delay larger than the predictability horizon time, implying that this approach cannot be used for real-time forecasting. Alternative noise reduction or forecasting techniques need to be investigated for this purpose, or more accurate data would be needed to avoid the filtering step.

In conclusion, we have shown that continuous spatiotemporal models of SSE evolution (*20*) provide insight into fault dynamics because, where the SNR is sufficiently high, we have shown that SSEs in Cascadia manifest chaotic dynamics. In other words, SSEs can be described as a deterministic, albeit chaotic system rather than as a random or stochastic process. The estimated predictability horizon is of the order of days or weeks that is equivalent to a fraction of the typical large SSE duration in Cascadia, which is the representative time scale of the instability. If the dynamics derived from the filtered time series is representative of the true underlying dynamics, then long-term prediction of SSEs (i.e., over a time horizon much longer than their duration) seems intrinsically impossible. This implies that for long-term predictions, a stochastic approach remains the best tool at our disposal, while short-term deterministic predictions should be achievable. As SSEs might be regarded as earthquakes in slow motion (*18*), regular earthquakes might be similarly chaotic and predictable. If the relation between predictability horizon and event duration holds also for regular earthquakes, then this would imply that long-term predictions of earthquakes are intrinsically impossible, and the predictable horizon would be only a fraction of the regular earthquakes’ typical duration (10 to 100 s for *M*_{w} > 6 earthquakes). Furthermore, it might be possible to augment the capabilities of early warning systems with a forecast of the final magnitude while the event is still ongoing.

## METHODS

### Slip potency rate

Given a subfault (*n*) of a given segment (*s* = 1, …,13), we apply a low-pass filter to the slip history

from (*20*) to obtain a smoothed slip history [*u*_{sn}(*t*)]. Since chaotic systems are extremely sensitive to initial conditions, we test several low-pass filters (details are provided in the “Filters” section). We then multiply the smoothed slip history by the area of each subfault (*A*_{sn}), getting the slip potency for each subfault

(2)

The total slip potency for the *s*-th segment is calculated as the slip potency integral over the entire slipping area. In a discretized case where there are *N _{s}* subfaults belonging to the

*s*-th segment, the total slip potency is calculated as

(3)

We also construct slip potency rate curves for each subfault

$${\stackrel{\xb7}{p}}_{\text{sn}}(t)={A}_{\text{sn}}({v}_{0\text{sn}}+{\stackrel{\xb7}{u}}_{\text{sn}}(t))$$(4)and the total slip potency rate for each segment

$${\stackrel{\xb7}{P}}_{s}(t)={\displaystyle \sum _{n=1}^{{N}_{s}}}{A}_{\text{sn}}({v}_{0\text{sn}}+{\stackrel{\xb7}{u}}_{\text{sn}}(t))$$(5)where *v*_{0sn} is the long-term reference loading velocity for a given patch belonging to a specific segment as derived from (*20*).

The choice of the variable to use for the dynamics reconstruction is important because it can bias the correlation dimension estimation. In particular, the correlation dimension can be lowered when using a variable strongly correlated only with a few variables of the system (*35*). Having in mind the rate-and-state formalism for friction (*5*), we hypothesize that the slip rate, and thus the slip potency rate, is not only more strongly coupled than the slip potency to the other variables of the system, but it is also coupled with more variables (e.g., the state variables). We find smaller values for the correlation dimension of the attractor ν when using the slip potency as observable (figs. S8 to S12 and S29 to S33 and “Embedding theory” section), confirming our intuition that slip potency rate is a more suitable observable.

### Filters

The effect of low-pass filters on dynamical systems’ dimension estimation has been extensively studied. Ideal linear low-pass filters introduce further equations in the dynamical system, leading to an increase in dimensionality (*36*). Finite order nonrecursive filters instead do not modify quantities estimated via ET, but this is not strictly proven for practical cases where finite time series are available (*37*). Proper low-pass filtering data can improve the dynamical system characterization (*38*), and we thus restrain ourselves to finite order nonrecursive filters, which implies that the filtered value *u*_{sn}(*t*) is obtained from the current and past *N _{b}* nonfiltered values,

, according to

$${u}_{\text{sn}}(t)={\displaystyle \sum _{i=0}^{{N}_{b}}}{b}_{i}{u}_{\text{sn}}^{\text{rough}}(t-i\mathrm{d}t)$$(6)where d*t* is the data sampling rate, i.e., 1 day. First, we test an EF with fixed passband ripple and stopband attenuation of 1 and 60 dB, respectively, and we test 10 combinations of normalized stopband and passband frequencies:

and

${\left\{{f}_{i}^{\text{stopband}}\right\}}_{i=1}^{10}={(7i)}^{-1}$. We follow the MATLAB notation for which the normalized frequency is unitary when it is equal to the Nyquist frequency. The filters are designed using the built-in function designfilt, from which we extract the corresponding transfer function coefficients

${\{{b}_{i}\}}_{i=0}^{{N}_{b}}$ via the function tf. The coefficients are then used by the function filter to generate the smoothed signal *u _{sn}*(

*t*). With the tested combinations, the order of the filter (i.e.,

*N*) takes the following values when increasing

_{b}*i*: 48, 125, 230, 363, 484, 664, 871, 1106, 1368, and 1658. For each designed filter, we show the filter response in figs. S34 and S35. The results shown in the main text are those obtained using an EF and the parameters adopted in (

*20*) to automatically pick 64 SSEs from the dataset here studied, i.e.,

and

${f}_{i}^{\text{stopband}}={21}^{-1}$. We test also a Hamming Window filter (HWF; which is a weighted moving average filter), commonly used in signal processing, which offers an easier way to tune the filter order (*N _{b}*) and its normalized frequency cutoff (

*f*

_{cutoff}). We test both a variable

*f*

_{cutoff}fixing

*N*to 60 (i.e., we use 2 months of data before the current epoch to smooth the signal; figs. S36 and S37) and a variable

_{b}*N*fixing

_{b}*f*

_{cutoff}to 28

^{−1}(figs. S38 and S39). In the first case, we vary

, while in the second case, we vary

${\left\{{N}_{bi}\right\}}_{i=1}^{10}=7i-1$, i.e., we use data of epoch *t* and the 6 epochs before, 13 epochs before, and so on until using a total of 10 weeks of data. We use the MATLAB built-in function fir1, which adopts a normalized gain of the filter at *f*_{cutoff} of −6 dB. We observe that reducing the cutoff frequency lets the underlying dynamics emerge, and we consistently observe a finite fractal dimension significantly smaller than what would be calculated from surrogate data (fig. S22). We refer the reader to the “Surrogate data” section for more details.

All tested filters have a constant group delay (figs. S34 to S39). The information on the group delay is relevant when discussing about the predictability horizon of the system (see Discussion). The effects of the EF filters on the time series are shown in figs. S40 to S44. For readability, the filtered time series have been shifted back in time of the corresponding group delay to be in phase with the original time series. To test whether there is any difference between causally and noncausally filtered time series, for the filter

${\text{EF}}_{1/35}^{1/21}$, we also construct a noncausal filter, knowing that noncausal filters cannot be adopted for real-time applications, because they use information from future epochs. We use the function FiltFiltM (https://www.mathworks.com/matlabcentral/fileexchange/32261-filterm) to generate zero-phase shift (i.e., noncausal) filtered time series, and we find similar results to those obtained with a causal filter (figs. S13, S26, and S28). In all tested cases, we retrieve similar average dimensions, lower than those determined from nonfiltered time series (fig. S45). The application of a low-pass filter reduces the dimensionality of the system from values typically >6 to values <4.

### Embedding theory

We apply two methodologies based on ET to determine the correlation dimension of the strange attractor, and the NFA, also based on ET, to estimate the metric entropy associated with the time series.

The correlation dimension is defined as (*39*)

(7)where *C*(*r*, *T*) is the correlation integral, *r* is a variable threshold distance, *T* is the time series length, and we use the base-10 logarithm Log.

The construction of the correlation integral typically involves two parameters: the delay time (τ) and the embedding dimension (*m*). Given a scalar time series *x*(*t*), for example

, its values are used to construct an *m*-dimensional vector delaying in time the time series of an amount τ for *m* − 1 times

(8)

We can now define the correlation integral as (*40*)

(9)where we introduce a cutoff parameter *w* > 1 to improve the convergence of the classical algorithm (*w* = 1) toward its limit *T* → ∞, and H is the Heaviside function. Values of *w*~τ_{corr} are recommended, where τ_{corr} is the autocorrelation time for the specific scalar time series under examination (*40*). We calculate τ_{corr} using the batch means method (*41*).

Basically, the algorithm counts how many *m*-dimensional vectors are closer than *r* at different times. If *m* ≥ 2ν + 1, then **F** is an embedding function of the strange attractor (*42*), and we expect the correlation dimension to be independent of *m*. Figures S8 to S12 and S29 to S33 show the ν versus Log(*r*) curves for different *m*. The selected delay times τ are chosen to emphasize the scaling region. For every segment, we have tested values of τ = 2*i* + 1 days, with *i* from 1 to 6, and *m* = 4*j*, with *j* from 1 to 5. In the case of a stochastic signal, we might expect that a scaling relationship between *C*(*r*) and *r* does not hold, and larger values of ν are calculated when increasing *m*, i.e., ν does not saturate. Nonetheless, autocorrelated noise in short time series can fool the described algorithm (*24*). For this reason, surrogate data are typically introduced, but instead of evaluating the plateau for ν, which is quite subjective, we prefer to apply the methodology from EVT for this calculation. Given the time delays retrieved from ET (see figs. S8 to S12), one accurate slip rate data per week might be sufficient to reconstruct an embedding for SSEs.

The second methodology derived from ET needs only the definition of the delay time to determine the minimum embedding dimension (*43*). The algorithm still uses the embedding function **F**, but it exploits the nearest neighbors counting in the reconstructed phase space to detect false neighbors when changing the embedding dimension *m*.

The following two quantities are defined

$$E(m)=\frac{1}{T-m\mathrm{\tau}}{\displaystyle \sum _{t=1}^{T-m\mathrm{\tau}}}\frac{\Vert \mathbf{F}(t,\mathrm{\tau},m+1)-{\mathbf{F}}_{n(t,m)}(t,\mathrm{\tau},m+1)\Vert}{\Vert \mathbf{F}(t,\mathrm{\tau},m)-{\mathbf{F}}_{n(t,m)}(t,\mathrm{\tau},m)\Vert}$$(10)

$${E}^{*}(m)=\frac{1}{T-m\mathrm{\tau}}{\displaystyle \sum _{t=1}^{T-m\mathrm{\tau}}}\Vert x(t+m\mathrm{\tau})-{x}_{n(t,m)}(t+m\mathrm{\tau})\Vert $$(11)where 1 ≤ *n*(*t*, *m*) ≤ *T* − *m*τ is an integer such that **F**_{n(t, m)}(*t*, τ, *m*) is the nearest neighbor of **F**(*t*, τ, *m*). From these two quantities, the ratio at two subsequent embedding dimensions is calculated

(12)

$${E}_{2}(m)=\frac{{E}^{*}(m+1)}{{E}^{*}(m)}$$(13)

The quantity *E*_{1} is relevant because if the studied time series is the result of a dynamic process, i.e., it comes from an attractor, then *E*_{1}(*m*) saturates (*43*). In other words, we reach a value

such that *E*_{1} stops changing increasing *m* above

. The minimum embedding dimension will then be

$\widehat{m}+1$. The second quantity, *E*_{2}, is introduced to check for randomness in the data. In theory, for stochastic time series, *E*_{1} should never saturate, but in practical cases, it can be unclear whether *E*_{1} is slowly increasing with increasing *m* or not. If the time series of interest is the result of a stochastic process, then we expect future data points to be independent from the previous ones. This means that *E*_{2}(*m*) = 1 for each *m*. If the studied time series is instead the result of a deterministic process, then *E*_{2} is not constant, and there must exist some *m* values such that *E*_{2}(*m*) ≠ 1.

We show in fig. S27 the results on both causally filtered (i.e., present values depend only on the past and the present, such that the statistic *E*_{2} can be used) with

and nonfiltered time series for the example of segment #1. A minimum embedding dimension *m* ≃ 10 is detected from filtered data, implying a correlation dimension ν ≤ (*m* − 1)/2 ≲ 4.5, while for nonfiltered data, *E*_{2} remains almost constant at unitary values. This result is consistent with the results from EVT, for which a θ close to 1 is calculated for the nonfiltered time series (e.g., fig. S26).

Another quantity of interest to characterize a chaotic system is its metric entropy *H*. NFA also works using nearest neighbors in the phase space, and it is a powerful approach not only to estimate *H* from the short time series but also to assess the chaotic or stochastic nature of the time series under study (*28*–*29*). We apply here a first-order NFA approach (*28*). Being based on ET, we first embed a time series in a phase space of dimension *m* using delay coordinates with delay time τ. The idea then is to use the first half of the time series (training set) to learn a nonlinear map to make a prediction *T _{p}* time steps in the future using a local linear approximation in the phase space. We then use the second half of the data (test set) to test the learned prediction and assess the chaoticity of the time series. In practice, the following are the steps of the procedure:

1)Embed the time series *x*(*t*) like in Eq. 8 to create the observation **F**(*t*, τ, *m*) at time *t*.

2)For a given point in the test set **F**(*t*, τ, *m*), find the *k* nearest neighbors

belonging to the training set. [We must have at least *k* = *m* + 1, but, as recommended in (*28*), to ensure stability of the solution, it is better to take *k* > *m* + 1. Here, we use *k* = 2*m* + 1.]

3)Use the *k* nearest neighbors in the training set [for which *x*(*t* + *T*_{p}) is known] to find the best local linear predictor *A*.

4)Use *A* to predict the value of *x* at time *t* + *T _{p}* in the test set:

.

5)Compare *x*(*t* + *T _{p}*) and

in the test set to estimate the root mean square error

$(\text{RMSE})={\u3008{[\widehat{x}(t,{T}_{p})-x(t+{T}_{p})]}^{2}\u3009}^{1/2}$that we use as a measure of predictive error.

6)Normalize the prediction error by the root mean square deviation (RMSD) of the data in the test set:

$\mathrm{\u03f5}=\frac{\text{RMSE}}{\text{RMSD}}$, with RMSD = 〈[ *x*(*t*) − 〈*x*(*t*)〉]^{2}〉^{1/2}.

If ϵ = 0, then the prediction is perfect; if instead ϵ = 1, then the prediction is as good as a constant predictor

$\widehat{x}(t,{T}_{p})=\u3008x(t)\u3009$. For a stochastic system, we expect ϵ to be independent of *T _{p}*, while for a chaotic system, we expect ϵ to increase when increasing

*T*. We have tested various embedding dimensions

_{p}*m*and delay times τ, obtaining similar results. In figs. S20 and S21, we show the results relative to

*m*= 9 (i.e., we assume an average dimension between 3 and 4 for the underlying attractor, as estimated via EVT) and τ = 7 days. We show the evolution of ϵ versus

*T*for all the segments using the same filter adopted in (

_{p}*20*) (fig. S20), and for segment #1 varying the filter parameters (fig. S21). The black lines indicate the value calculated for the unfiltered time series, while the blue and red lines correspond to ϵ for the various segments or filter parameters. We find that for the unfiltered time series, ϵ ≈ 1, confirming the dominant role of stochastic noise, while for sufficiently effective filters, ϵ increases with

*T*, as expected for a chaotic system. These graphs can be used to estimate the metric entropy

_{p}*H*using the following equation, valid in the limit ϵ ≪ 1

(14)where *N* is the number of data points and *D* is the (average) attractor dimension (for which we use the values estimated from EVT). It follows that

(15)and *C* and *H* are estimated via least square using points for which ϵ ≤ 0.3 (green dashed line in figs. S20 and S21) to fit Eq. 15.

Another estimate of *H*, which does not depend on a particular forecasting technique, can be obtained using the predictions to calculate the rate of loss of information from the time series (*29*). We calculate the correlation coefficient (ρ) between

and *x*(*t* + *T _{p}*) in the testing set, which will be a function of

*T*. After some approximations, the following relation can be derived (

_{p}*29*)

(16)from which we can estimate *c* and *H*. The number of points to use for the estimate is not strictly defined. Previous studies have used 2 (*29*) or 6 (*44*) points for example. The resulting values of *t** for all segments using the same EF adopted in the main text are reported also in Table 1.

### Extreme value theory

We use recent results of EVT to overcome some of the issues encountered when using ET algorithms. In particular, we would like (i) a method to rigorously calculate a particular statistic (for example, the attractor’s dimension) and, consequently, test whether the calculated quantity is the result of a random process or not, and (ii) a method to calculate also instantaneous properties of the attractor. The main idea behind the usage of EVT for the characterization of a dynamical system is to connect the statistics of extreme events to the Poincaré recurrence theorem (*25*, *45*).

Let us consider a generic point ζ on a strange attractor. The instantaneous dimension is a quantity that measures the density of neighbors in the phase space around ζ. To calculate this density, we can ask ourselves the following question: What is the probability to visit again a region of the phase space close to (i.e., in an arbitrary small radius ε from) ζ? If we had access to all the possible states of the system (*z*), then we could calculate the distances from the actual state under study, δ(*z*, ζ). Then, we would like to construct a random variable related to δ(*z*, ζ) and use the second theorem of EVT (*46*) to be able to gain information about the density of neighbors around ζ. The second theorem of EVT states that, given a random variable *Z* with nonvanishing probability distribution, we can set a threshold value *q* such that, for *q* sufficiently large, values of *Z* that exceed *q* (or exceedances) follow a generalized Pareto distribution (GPD).

For real case scenarios, we might only have one scalar time series, and here, we consider the univariate case. Similarly to what is done to characterize atmospheric flows (*25*), we assume that our observed scalar time series (i.e., the slip potency rate) approximates possible states of the system. The only requirement to apply this methodology is the observed time series to be sampled from an underlying ergodic system. A possible improvement would be to consider a multivariate case with slip potency rates and/or slip potencies from adjacent segments, but we defer this complication to future investigations. For our case of interest, to generate the pool of all possible *z*, we consider the slip potency rates of all subfaults belonging to the segment under examination (Eq. 4). We then select ζ to be equal to the slip potency rate at a certain epoch *t* and calculate the distance from all other possible states that we have recorded at different times. Recent studies (*45*, *47*) have shown that if we construct the random variable given by the negative log distances, *Z* = − ln (δ(*z*, ζ)), then we can use the GPD shape parameter (σ) to calculate the instantaneous dimension as

(17)

This assumes that the exceedances follow a GPD. We visually inspected the Q-Q plots of the exceedances versus a GPD, and discrepancies between the observed quantiles and those predicted by a GPD are noticed especially in the high quantile of the distribution. This probably reflects the fact that not many extreme events have been observed and a longer slip history may provide better results. Repeating this calculation for different values of ζ extracted from the pool of values *z*, we can then estimate the attractor dimension by simply averaging over *d*: *D* = 〈*d*(ζ)〉. This is a very powerful result for two reasons. We can calculate the attractor dimension (i) without the need of an embedding and (ii) simply setting one parameter: a threshold *q* on the negative log distances. Here, we have used *q* = 0.98 percentile and *q* = 0.99 percentile of the negative log distance, and we have tested both L-1 and L-2 norm distances. The results are overall similar (green dots in figs. S13 and S14). We can now apply the method of surrogate data to each segment and compare the reconstructed attractor dimensions with the one calculated from real data. We refer to the “Surrogate data” section for more details. We notice that using a threshold *q* = 0.99 reduces the average dimension to values <3. The threshold quantile *q* does not affect the conclusion that a chaotic deterministic dynamics can be successfully detected only for the northernmost segments. It affects though the interpretation of the total number of dof needed to explain the system. Following the spring-and-slider interpretation, if 3 < *D* < 4, then a second RS-state variable may be needed (see Discussion). Longer time series will help to resolve this ambiguity, since the value of *q* = 0.99 may be too high for the amount of available data.

Another quantity of interest that can be derived is the instantaneous extremal index of the system θ(ζ). In the main text, we have described the extremal index (Θ) as the reciprocal of the mean cluster size (*26*). The instantaneous extremal index can instead be defined as the inverse of the persistence in the phase space, where the persistence tells us how long the trajectory sticks in the proximity of a certain point in the phase space. In other words, while *d* is related to the density of points in a certain neighborhood of the phase space, i.e., how many times a certain region of the phase space is visited, the persistence time indicates for how long the system stays in a region in the neighborhood of a given state. If a state ζ is a fixed point, then we expect an infinite persistence, i.e., null θ. On the other hand, if we are studying a stochastic process, then we expect the persistence to tend to 0, i.e., unitary θ. In other words, we expect that at a certain epoch *t*, the system will be in a state ζ, and then at a subsequent epoch *t* + Δ*t* to be in a region of the phase space far from the one occupied at time *t*. Looking at the extremal index from this angle gives us the intuition that it can be related to the predictability of the system. As mentioned in the main text, in a recent study (*27*) this connection has been made explicit (see Eq. 1).

We calculate θ via a maximum likelihood estimator (*48*). When we look at the calculated θ (figs. S15 to S19 for noncausal filter and fig. S26 for causal and noncausal filter relative to segment #1), we see that both causally and noncausally filtered data show values far from 1. This already indicates that, no matter the causality or not of the filter, the system at the selected frequencies shows predictability features. We observe a different situation when performing the analysis on nonfiltered data (fig. S26). The values of θ are now very close to 1, implying a predictability horizon shorter than the sampling time. This is consistent with the fact that, in such nonfiltered slip potency rate time series, the high-frequency noise dominates, which is a random process.

### Surrogate data

The concept behind surrogate data techniques is rooted in statistical hypothesis testing. The method requires to state a null hypothesis, and, using the words in (*24*), “The idea is to test the original time series *against* the null hypothesis by checking whether the discriminating statistic computed for the original time series differs significantly from the statistics computed for each of the surrogate sets.” The necessity to use surrogate data derives from the intrinsic finiteness of the available data: It is always possible to generate the observations with a particular random process.

The null hypothesis that we test consists in assuming that the data can be described via a linear stochastic model. Surrogate data should be generated before any filtering (*49*); thus, we first generate the surrogate data from the original slip potency time series, then filter both the actual and surrogate data, and lastly calculate the slip potency rate. Since we are using slip potency rates on multiple subfaults, when shuffling the signal phases, we want to preserve not only the autocorrelation of each slip potency rate time series but also the cross correlations between subfaults, and we thus use a generalization of the phase-randomized Fourier transform algorithm (*50*). Despite the fact that filtering should not compromise the actual chaotic nature of the system (*37*), the estimate of *D* might depend on the applied filter, and we consequently test the method on both causally and noncausally filtered time series for the

(figs. S13 and S14). When reducing the passband and stopband frequencies of the EF (which also corresponds to a reduction in the *f*_{cutoff}), we increase its order and we witness a decrease in *D* (figs. S22 and S45). We notice that among the tested EFs, those that do not remove relatively high frequencies

are unable to pass the surrogate data test even for the segments with high SNR (fig. S22). Also, an EF of the order larger than the average recurrence time (*N _{b}* ≳ 365 to 425, i.e., 12 to 14 months) is unable to pass the surrogate data test. The

amplitudes during the slipping phase are decreased (figs. S40 to S44). This incapability to pass the surrogate data test may be due to the fact that the EF response is not flat in the low-frequency domain, but it has some ripples (figs. S34 and S35). Consequently, we have some low frequencies reduced and others amplified, and the SNR actually decreases. As noticed in the “Filters” section, the HWF offers an easier way to independently tune the filter order and *f*_{cutoff}, and we observe a clear stabilization of *D* at values between 3 and 4 for all segments when decreasing *f*_{cutoff} (figs. S22 to S25 and S45). Moreover, we notice that when applying a 60th-order HWF, we would reject the null hypothesis here tested for the segments from #1 to #5 and not only for the two northernmost.