### Physical set-up of a magnetic device

Considering a magnetic device consisting of a slender rectangular system with cylindrical injectors (refer to Fig. 1c). The dimensions of the device are *L* × *L* × *D*. Under a consistent external magnetic field, the magnetization aligns with the *z* direction. Electric current obtained from time series data is injected into *N*_{p} injectors with the radius *a* and the same height *D* as the device. The spin-torque generated by the current propels the magnetization **m**(**x**, *t*) and spin-waves, as illustrated in Fig. 1c. The magnetization is measured at the cylindrical nanocontacts, situated on the same circle as the injectors. Our system’s dimensions are *L* = 1000 nm and *D* = 4 nm. Unless specified, *a* is set to 20 nm. *R* is fixed at 250 nm in the tasks for MC, IPC, NARMA10, and forecasting chaotic time-series section. However, *R* is varied in the Scaling of system size and wave speed section as a characteristic length scale.

### Specifics of preprocessing input data

In the time-multiplexing technique, the input time-series \({{{\bf{U}}}}=({U}_{1},{U}_{2},\ldots ,{U}_{T})\in {{\mathbb{R}}}^{T}\) is transformed into piece-wise constant time-series \(\tilde{U}(t)={U}_{n}\) with *t* = (*n*−1)*N*_{v}*θ* + *s* under *k* = 1,…,*T* and *s* = [0, *N*_{v}*θ*) (refer to Fig. 2–(1)). This signifies that the same input persists during the time interval *τ*_{0} = *N*_{v}*θ*. To leverage the benefits of physical and virtual nodes, the actual input *j*_{i}(*t*) at the *i*th physical node is *U*(*t*) multiplied by a *τ*_{0}-periodic random binary filter \({{{{\mathcal{B}}}}}_{i}(t)\). Here, \({{{{\mathcal{B}}}}}_{i}(t)\in \{0,1\}\) remains constant during the time *θ*. Different realizations of the binary filter are used at each physical node, as depicted in Fig. 2-(2). The input masks serve as weights for the ESN without virtual nodes. Since different masks are employed for different physical nodes, each physical and virtual node might have varying information about the input time series.

The masked input is further converted into input for the physical system. Although numerical micromagnetic simulations operate in discrete time intervals, the physical system functions in continuous time *t*. The injected current density *j*_{i}(*t*) at time *t* for the *i*th physical node is established as \({j}_{i}(t)=2{j}_{c}{\tilde{U}}_{i}(t)=2{j}_{c}{{{{\mathcal{B}}}}}_{i}(t)U(t)\) with *j*_{c} = 2 × 10^{−4}/(*π**a*^{2})A/m^{2}. For a given input time series of length *T*, a constant current is applied during the time *θ*, then the current is updated at the subsequent step. The same input current with different masks is injected for different virtual nodes. Consequently, the total simulation time is *T**θ**N*_{v}.

### Micromagnetic simulations

An examination of the magnetization dynamics of the Landau-Lifshitz-Gilbert (LLG) equation is performed using the micromagnetic simulator mumax^{3} ^{61}. The LLG equation for the magnetization **M**(**x**, *t*) yields

$$\begin{array}{l}{\partial }_{t}{{{\bf{M}}}}({{{\bf{x}}}},t)\,=\,-\frac{\gamma {\mu }_{0}}{1+{\alpha }^{2}}{{{\bf{M}}}}\times {{{{\bf{H}}}}}_{{{{\rm{eff}}}}}\\\qquad\qquad\qquad-\frac{\alpha \gamma {\mu }_{0}}{{M}_{s}(1+{\alpha }^{2})}{{{\bf{M}}}}\times \left({{{\bf{M}}}}\times {{{{\bf{H}}}}}_{{{{\rm{eff}}}}}\right)\\\qquad\qquad\qquad+\,\frac{\hslash P\gamma }{4{M}_{s}^{2}eD}J({{{\bf{x}}}},t){{{\bf{M}}}}\times \left({{{\bf{M}}}}\times {{{{\bf{m}}}}}_{{{{\rm{f}}}}}\right).\end{array}$$

(10)

In the micromagnetic simulations, Eq. (10) is analyzed with the effective magnetic field **H**_{eff} = **H**_{ext} + **H**_{demag} + **H**_{exch} comprising the external field, demagnetization, and the exchange interaction. The specific form of each term is

$${{{{\bf{H}}}}}_{{{{\rm{eff}}}}}={{{{\bf{H}}}}}_{{{{\rm{ext}}}}}+{{{{\bf{H}}}}}_{{{{\rm{demag}}}}}+{{{{\bf{H}}}}}_{{{{\rm{exch}}}}},$$

(11)

$${{{{\bf{H}}}}}_{{{{\rm{ext}}}}}={H}_{0}{{{{\bf{e}}}}}_{z}$$

(12)

$${{{{\bf{H}}}}}_{{{{\rm{ms}}}}}=-\frac{1}{4\pi }\int\nabla \nabla \frac{{\bf{M}}({\bf{x}}^{\prime})}{| {{{\bf{x}}}}-{{{{\bf{x}}}}}^{{\prime}}|}d{{{{\bf{x}}}}}^{{\prime}}$$

(13)

$${{{{\bf{H}}}}}_{{{{\rm{exch}}}}}=\frac{2{A}_{{{{\rm{ex}}}}}}{{\mu }_{0}{M}_{s}}\Delta {{{\bf{M}}}},$$

(14)

Where **H**_{ext} represents the external magnetic field, **H**_{ms} is the magnetostatic interaction, and **H**_{exch} stands for the exchange interaction with the exchange parameter *A*_{ex}. Spin waves are initiated by Slonczewski spin-transfer torque^{62}, described by the last term in Eq. (10). The driving term is proportional to the DC current, i.e., *J*(**x**, *t*) = *j*_{i}(*t*) at the *i*th nanocontact and *J*(**x**, *t*) = 0 in other regions.

Parameters for the micromagnetics simulations are selected as follows: The mesh points are 200 in the *x* and *y* directions, and 1 in the *z* direction. The ferromagnet utilized is the Co_{2}MnSi Heusler alloy, which possesses low Gilbert damping and high spin polarization with the parameters *A*_{ex} = 23.5 pJ/m, *M*_{s} = 1000 kA/m, and *α* = 5 × 10^{−4} ^{51,52,53,63,64}. An out-of-plane magnetic field *μ*_{0}*H*_{0} = 1.5 T is applied to ensure the magnetization points out-of-plane. The spin-polarized current field is incorporated by the Slonczewski model^{62} with polarization parameter *P* = 1 and spin torque asymmetry parameter *λ* = 1, considering the reduced Planck constant *$\hbar$* and the charge of an electron *e*. The uniform fixed layer magnetization is **m**_{f} = **e**_{x}. Absorbing boundary layers for spin waves are utilized to ensure the magnetization diminishes at the system’s boundary^{65}. The initial magnetization is set as **m** = **e**_{z}. The magnetization dynamics are updated using the fourth-order Runge-Kutta method (RK45) with adaptive time-stepping and a maximum error of 10^{−7}.

The reference time scale in this system is *t*_{0} = 1/*γ**μ*_{0}*M*_{s} ≈ 5 ps, where *γ* represents the gyromagnetic ratio, *μ*_{0} is permeability, and *M*_{s} is saturation magnetization. The reference length scale is the exchange length *l*_{0} ≈ 5 nm. The pertinent parameters consist of Gilbert damping *α*, the time scale of the input time series *θ* (view Section Learning with reservoir computing), and the characteristic length between the input nodes *R*.

### Theoretical analysis using response function

To comprehend the mechanism of effective learning through spin wave propagation, a model employing the response function of the spin wave dynamics is considered. By linearizing the magnetization around **m** = (0, 0, 1) without inputs, the linear response of the magnetization at the *i*th readout *m*_{i} = *m*_{x,i} + *i**m*_{y,i} to the input is expressed as

$$\begin{array}{l}{m}_{i}(t)\,=\,\mathop{\sum }\limits_{j=1}^{{N}_{p}}\int\,d{t}^{{\prime} }{G}_{ij}(t,{t}^{{\prime} }){U}^{(j)}({t}^{{\prime} }),\end{array}$$

(15)

where the response function *G*_{ij} for the same node is expressed as

$$\begin{array}{l}{G}_{ii}(t-\tau )\,=\,\frac{1}{2\pi }{e}^{-\tilde{h}(\alpha +i)(t-\tau )}\frac{-1+\sqrt{1+\frac{{a}^{2}}{{(d/4)}^{2}{(\alpha +i)}^{2}{(t-\tau )}^{2}}}}{\sqrt{1+\frac{{a}^{2}}{{(d/4)}^{2}{(\alpha +i)}^{2}{(t-\tau )}^{2}}}\end{array}$$

(16)

and for different nodes,

$$\begin{array}{l}{G}_{ij}(t-\tau )\,=\,\frac{{a}^{2}}{2\pi }{e}^{-\tilde{h}(\alpha +i)(t-\tau )}\\\qquad\qquad\;\;\times \;\;\frac{1}{{(d/4)}^{2}{(\alpha +i)}^{2}{(t-\tau )}^{2}{\left(1+\frac{| {{{{{\bf{R}}}}}}_{i}-{{{{\bf{R}}}}}_{j}{| }^{2}}{{(d/4)}^{2}{(\alpha +i)}^{2}{(t-\tau )}^{2}}\right)}^{3/2}}.\end{array}$$

(17)

The comprehensive derivation of the response function is presented in Supplementary Information Section VII. Here, *U*^{(j)}(*t*) represents the input time series at the *j*th nanocontact. The response function comprises a self part *G*_{ii}, indicating input and readout nanocontacts are identical, and the propagation part *G*_{ij}, marking the distance between the input and readout nanocontacts as ∣**R**_{i}−**R**_{j}∣. In Eqs. (16) and (17), we assume the spin waves propagate solely via dipole interaction. The effect of the exchange interaction is discussed in Supplementary Information Section VII A. It’s important to note that the analytical form of Eq. (17) is acquired under specific assumptions, which are discussed in Supplementary Information Section VII A.

We employ the quadratic nonlinear readout, which exhibits a structure

$$\begin{array}{l}{m}_{i}^{2}(t)\,=\,\mathop{\sum}\limits_{{j}_{1}=1}^{{N}_{p}}\mathop{\sum}\limits_{{j}_{2}=1}^{{N}_{p}}\int\,d{t}_{1}\int\,d{t}_{2}\\\qquad\qquad{