Distributed ARIMA models for ultra-long time series (2024)

Xiaoqian Wangxiaoqianwang@buaa.edu.cnYanfei Kangyanfeikang@buaa.edu.cnRob J Hyndmanrob.hyndman@monash.eduFeng Lifeng.li@cufe.edu.cnSchool of Economics and Management, Beihang University, Beijing,100191, ChinaDepartment of Econometrics and BusinessStatistics, Monash University, Clayton, VIC 3800, AustraliaSchool of Statistics and Mathematics, Central University of Financeand Economics, Beijing 102206, China

Abstract

Providing forecasts for ultra-long time series plays a vital role in various activities, such as investment decisions, industrial production arrangements, and farm management. This paper develops a novel distributed forecasting framework to tackle challenges associated with forecasting ultra-long time series by using the industry-standard MapReduce framework. The proposed model combination approach retains the local time dependency and utilizes a straightforward implementation of splitting across samples to facilitate distributed forecasting by combining the local estimators of time series models delivered from worker nodes and minimizing a global loss function. In this way, instead of unrealistically assuming the data generating process (DGP) of an ultra-long time series stays invariant, we make assumptions only on the DGP of subseries spanning shorter time periods. We investigate the performance of the proposed approach with AutoRegressive Integrated Moving Average (ARIMA) models using the real data application as well as numerical simulations. Compared to directly fitting the whole data with ARIMA models, our approach results in improved forecasting accuracy and computational efficiency both in point forecasts and prediction intervals, especially for longer forecast horizons. Moreover, we explore some potential factors that may affect the forecasting performance of our approach.

keywords:

Ultra-long time series, Distributed forecasting, ARIMA models, Least squares approximation, MapReduce

journal: arXiv.org\xpatchbibmacro

volume+number+eid\DeclareDelimFormat[cbx@textcite]nameyeardelim

1 Introduction

Ultra-long time series (i.e.time series data observed over a long time interval) arebecoming increasingly common. Examples include hourly electricity demands spanning severalyears, stock indices observed every minute over several months, daily maximum temperaturesrecorded for hundreds of years, and streaming data continuously generated inreal-time. Attempts to forecast these data play a vital role in investment decisions,industrial production arrangement, farm management, and business riskidentification. However, it is challenging to deal with such long time series withtraditional time series forecasting approaches.

We identify three significant challenges associated with forecasting ultra-long timeseries. First, the optimization of parameters in training forecasting algorithms istime-consuming due to the time dependency nature of time series. Second, processing timeseries spanning such a long time interval drives significant storage requirements,especially in the algorithms’ training process, where a standalone computer could hardlytackle. The third and most serious difficulty is that the standard time series models donot perform well for ultra-long time series hyndman2018forecasting (22). One possiblereason is that it is usually unrealistic to assume that the data generating process (DGP)of time series has remained invariant over an ultra-long time. Hence, there is an apparentdifference between the models we use and the actual DGP. The more realistic idea is toassume that the DGP stays locally stable for short time-windows.

Forecasters have made some attempts to address these limitations in forecasting ultra-longtime series. A straightforward approach is to discard the earliest observations and usethe resulting shorter time series for model fitting. But this approach only works well forforecasting a few future values, and is not an efficient use of the available historicaldata. A better approach is to allow the model itself to evolve over time. For example, ARIMA(AutoRegressive Integrated Moving Average) models and ETS (ExponenTial Smoothing) models canaddress this issue by allowing the trend and seasonal components to vary over timehyndman2018forecasting (22). An alternative, proposed by (das2020predictive, 11, ), isto apply a model-free prediction assuming that the series changes slowly and smoothly withtime. However, the aforementioned methods require considerable computational time inmodel fitting and parameter optimization, making them less practically useful in modernenterprises.

In industry, distributed computing platforms usually lack forecastingmodules. For example, it is well known that Spark supports time series forecasting poorly,especially multi-step forecasting. To support large-scale time seriesforecasting on such platforms, practitioners commonly have to adopt inadequate but availablemethods on distributed platforms meng2016mllib (39, 16). For instance, onehas to utilize the regression models in Spark’s MLlib to implement an autoregressivetype regression and artificially convert the multi-step prediction problem intomulti-subproblems, to fit the Spark framework for time series forecasting with improvedcomputational efficiency galicia2018novel (16).

On the other hand, the unprecedented scale of data collection has driven a vast literatureon studying statistical estimation and inference problems in a distributed manner, both ina frequentist settingkleiner2014scalable (29, 56, 26, 9), and a Bayesiansettingsuchard2010understanding (47, 53, 36, 10).Among the existing methods, the divide-and-conquer approach, one of the most important andeasy to implement algorithms on distributed platforms, provides an algorithm designparadigm in which a given problem is divided into a set of related subproblems that aresimple to process. Their solutions are then aggregated using proper aggregation strategies. Ideally, the subproblems can be solved in parallel, which is computationally moremanageable than dealing with the original problem. Due to its simplicity for parallelprocessing, the divide-and-conquer strategy has been widely used in the statistical literatureto untangle large-scale problems with independent data (zhang2013communication, 57, 29, 32, 58, 41, see, e.g.,).

In this paper, we follow the idea of divide-and-conquer and provide a novel approach toaddressing the large-scale time series forecasting problems in distributed environments.Specifically, we propose a distributed time series forecasting framework, in which thelong time series is first divided into several subseries spanning short time periods, andmodels can be applied to each subseries under the reasonable assumption that the DGPremains invariant over a short time. In this view, our framework has the flavor of a“varying coefficient model” fan2008statistical (15) for a long time series. However,unlike the varying coefficient models, we combine the local estimators trained on eachsubseries using weighted least squares to minimize a global loss function.

Considering a general loss function enables our framework to, in principle, be applied tostatistical time series forecasting models that can be considered as parametric regressionproblems. However, properly combining the local estimators is of great importance and challenge in practice.For example, directly combining the original parameters of ARIMA models trained on subseriessometimes leads to a global ARIMA model with roots inside the unit circle, thus causingthe stationary, causality, or invertibility problems. Therefore, when applying the proposeddistributed forecasting framework to a specific type of model, the key issue is how to processlocal estimators based on the model settings so that they can be combined properly.Conventionally, ARIMA models are among the most widely used forecasting models because(i) they can handle non-stationary and seasonal patterns, (ii) ARIMA models also frequentlyserve as benchmark methods because of their excellent performancemontero2020fforma (40, 54).Nonetheless, such models are difficult to scale up with the current Spark distributed platformdue to the nature of time dependency, making it infeasible for large scale time seriesforecasting. Therefore, we restrict our attention to ARIMA models and proposethe Distributed ARIMA (DARIMA) models, in which a necessary linear transformation stepis involved to address the aforementioned issue of estimator combination.

The proposed method preserves the local time dependency and performs a straightforwardachievement of splitting across samples to make distributed forecasting possible forultra-long time series with only one round of communication. Thus, the proposed methodcan be naturally integrated into industry-standard distributed systems with aMapReduce architecture. Such a MapReduce algorithm requires only one “master-and-worker”communication for each worker node and avoids further iterative steps. No direct communicationbetween workers is required. To this end, it is highly efficient in terms of communication.Additionally, our proposed method is scalable for a large forecast horizon, which is thetypical case when dealing with ultra-long time series, and is therefore superior to thedistributed forecasting methods (reviewed in Section2.1) in which multipletrainings or iterations are required in order to perform multi-step forecasting.

Our Distributed ARIMA modeling framework is built using an efficientdistributed computing algorithm without modifying the underlying estimation scheme forindividual time series models, making it possible to incorporate a large variation offorecasting models. In both the real data application and the simulations, we show thatour approach consistently achieves an improved forecasting accuracy over conventional globaltime series modeling approaches, both in point forecasts and prediction intervals. Theachieved performance improvements become more pronounced with increasing forecasthorizon. Moreover, our approach delivers substantially improved computational efficiencyfor ultra-long time series.

The rest of the paper is organized as follows. Section2 describes thedistributed systems, ARIMA models, and highlights of our contributions to ultra-long timeseries forecasting. Section3 introduces the framework of the proposedforecasting approach in distributed systems, further elaborated by its corecomponents. Section4 applies the proposed method with a real data andprovides a sensitivity analysis. A simulation study is performed in Section5to further investigate and justify our proposed method in terms of forecasting accuracyand computational cost. Section6 discusses other potentialsand suggests possible avenues of research. Finally, Section7 concludesthe paper.

2 Background

This section surveys the forecasting challenges on distributed systems with a specialfocus on the ARIMA models, and highlights the contributions of our framework.

2.1 Forecasting with Distributed Systems

A typical distributed system consists of two core components: the Distributed File System(DFS) and the MapReduce framework. See A for an overview for distributedsystems. The DFS provides the primary storage infrastructure for distributed systems. Bystoring files in multiple devices, the DFS effectively eliminates the negative impacts ofdata loss and data corruption. It enables devices to handle large-scale data sets andaccess data files in parallel. MapReduce provides the batch-based computing framework fordistributed systems. The MapReduce framework refers to two steps, namely Map andReduce. The input data set is first split into a collection of independent data tuples,represented by key/value pairs. The Map task takes the assigned tuples, processes them ina parallel manner, and creates a set of key/value pairs as the output, illustrated byk1,v1Map()list(k2,v2)𝑘1𝑣1𝑀𝑎𝑝𝑙𝑖𝑠𝑡𝑘2𝑣2\langle k1,v1\rangle\rightarrow Map(\cdot)\rightarrow list(\langle k2,v2\rangle)⟨ italic_k 1 , italic_v 1 ⟩ → italic_M italic_a italic_p ( ⋅ ) → italic_l italic_i italic_s italic_t ( ⟨ italic_k 2 , italic_v 2 ⟩ ). Subsequently, the Reduce task takes the intermediate outputs that come fromthe Map tasks as its inputs, combines these data, and produces a new set of key/valuepairs as the output, which can be described ask2,list(v2)Reduce()list(k3,v3)𝑘2𝑙𝑖𝑠𝑡𝑣2𝑅𝑒𝑑𝑢𝑐𝑒𝑙𝑖𝑠𝑡𝑘3𝑣3\langle k2,list(v2)\rangle\rightarrow Reduce(\cdot)\rightarrow list(\langle k3%,v3\rangle)⟨ italic_k 2 , italic_l italic_i italic_s italic_t ( italic_v 2 ) ⟩ → italic_R italic_e italic_d italic_u italic_c italic_e ( ⋅ ) → italic_l italic_i italic_s italic_t ( ⟨ italic_k 3 , italic_v 3 ⟩ ). The main advantage of the MapReduce computing paradigm is that data processingis enabled to be easily extended on multiple computing nodes with high computationalefficiency.

Many distributed systems were designed for processing massive independent data. However,a distinct feature of time series data is that the observations are serially dependent,and so additional considerations are required in processing time series with distributedsystems. How to efficiently bridge time series forecasting with the distributed systemsis of crucial importance in the forecasting domain.

We identify four main challenges associated with using distributed systems for timeseries forecasting: (i) time series partitioning for MapReduce tasksli2014rolling (33); (ii) independent subseries modeling with distributed systems;(iii) model communication between worker nodes; and (iv) distributed time seriesforecasting, especially for multi-step prediction galicia2018novel (16).

Some attempts have been made by researchers to allow for efficient implementations oflarge-scale time series analysis. For example, (mirko2013hadoop, 27, ) propose aHadoop-based analysis framework for efficient data collection and accurate preprocessing,which are at least as important as the algorithm selection for a given forecasting problem.Unlike (mirko2013hadoop, 27, ), (li2014rolling, 33, ) focus on model selection.Specifically, MapReduce is used to perform cross-validation and facilitate parallelrolling-window forecasting using the training set of a large-scale time series data. Theforecasting errors are then collected and used to calculate the accuracy, allowing modelcomparison and selection. Therefore, the method is not designed to address the challengesassociated with forecasting using ultra-long historical data. (talavera2018big, 48, ) and(galicia2018novel, 16, ) manage multi-step forecasting for ultra-long time series from theperspective of machine learning by computing with the Spark platform. Specifically,(talavera2018big, 48, ) require H𝐻Hitalic_H iterations to perform multi-step forecasting, while(galicia2018novel, 16, ) formulate the forecasting problem as H𝐻Hitalic_H parallel forecastingsubproblems to support multivariate regression with the MLlib library, where H𝐻Hitalic_H is theforecast horizon. Consequently, the approaches they propose have poor scalability for a largeforecast horizon. However, ultra-long time series forecasting is typically characterized with theneed to forecast quite a few future values in practice. A recent overview of forecasting withbig data time series is provided in Section 2.7 of the encyclopedic review in(petropoulos2020forecasting, 42, ). In summary, the attempts mentioned above provide a premisefor distributed time series forecasting but it is still impractical to apply these methods tohandle ultra-long time series for the direct purpose of forecasting quite a few futureobservations.

In recent years, research has been deeply engaged in distributed statistical inferencewith concerns about practical computational cost for large-scale data processing. The mostprominent papers build upon the divide-and-conquer strategy.The communication cost, which refers to the time cost for data communication between different computernodes, is often recognized as one of the key challenges faced by distributed statistical inference.The first stream of researchemploys one-shot aggregation and averages the local estimators computed on each batch ofthe entire data to obtain the global estimators. Examples include but are not limited tothe subsampled average algorithm based on an additional level of sampling on worker nodeszhang2013communication (57), the generative model parameters estimation via the maximumlikelihood estimator liu2014distributed (35), quantile regression processesvolgushev2019distributed (51), parametric regression handled by least squareapproximation zhu2021least (58), and nonparametric and infinite-dimensional regressionzhang2015divide (56).

Another stream of research looks at the underlying iterative algorithms with multi-roundcommunications and aggregations for distributed optimization. Multiple iterations areconsidered with the purpose of further matching the aggregated estimators to the globalestimators. These iterations unfortunately result in considerable “master-and-worker”communication, which is communicationally expensive.For example, (shamir2014communication, 44, ) propose a distributedapproximate Newton algorithm to solve general subproblems available locally beforeaveraging computations, requiring a constant number of iterations (and thus rounds ofcommunication). Based on their framework, recent works by (wang2017efficient, 52, ) and(Jordan2019, 26, ) propose iterative methods with communication efficiency for distributedsparse learning and Bayesian inference. (chen2019quantile, 9, ) restrict themselves torefine the estimator of quantile regression via multiple rounds of aggregations underdistributed computing environment. In addition, another popular approach is thealternating direction method of multipliers (boyd2011distributed, 5, ADMM,) developedfor distributed convex optimization. It blends the decomposability of dual ascent with thesuperior convergence properties of the method of multipliers.

Both the streams are difficult to apply directly to time series forecasting models. Theone-shot averaging strategy is straightforward to implement and requires only a singleround of communication. While naively merging the local estimators that are processedseparately may yield inference procedures that are highly biased and variable, leading toinefficient estimations in most occasions; see (zhang2013communication, 57, ),(shamir2014communication, 44, ), (Jordan2019, 26, ), and (pan2021note, 41, ) for furtherdiscussions. The divide-and-conquer strategy is difficult to the time seriesforecasting. So some research focuses on distributed learning by splitting across thefeatures in the frequency domain for different time series rather than samples (differenttimestamps) sommer2021online (46, 19). Distributed algorithms likeADMM suffer a crucial limitation that (i)they require a reimplementation of eachestimation scheme with distributed systems, and (ii)they can be very slow to converge tohigh accuracy compared to existing algorithm designed for standalone computers, see(boyd2011distributed, 5, ) for more details. Additionally, communication cost isrecognized as a key challenge faced by statistical computation on a distributed systemJordan2019 (26, 58).

The studies reviewed above focus on scaling up optimization algorithms for solvinglarge-scale statistical tasks in a distributed manner, while our study adopts a modelcombination strategy in which model parameters obtained from multiple subseries areaggregated to minimize the global loss function. In this way, our proposed method can achievelower computational and implementation complexity than modeling the whole ultra-long timeseries with a single model.

2.2 ARIMA Models

An ARIMA (AutoRegressive Integrated Moving Average) model is composed of differencing,autoregressive (AR), and moving average (MA) components box2015time (4). We refer toan ARIMA model as ARIMA(p,d,q)𝑝𝑑𝑞(p,d,q)( italic_p , italic_d , italic_q ), where p𝑝pitalic_p is the order of the AR component, d𝑑ditalic_d is thenumber of differences required for a stationary series, and q𝑞qitalic_q is the order of the MAcomponent. An ARIMA model can be extended to a seasonal ARIMA model by includingadditional seasonal terms to deal with time series exhibiting strong seasonal behavior. Aseasonal ARIMA model is generally denoted as ARIMA(p,d,q)(P,D,Q)m𝑝𝑑𝑞subscript𝑃𝐷𝑄𝑚(p,d,q)(P,D,Q)_{m}( italic_p , italic_d , italic_q ) ( italic_P , italic_D , italic_Q ) start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, where theuppercase P,D,Q𝑃𝐷𝑄P,D,Qitalic_P , italic_D , italic_Q refer to the AR order, the number of differences required for aseasonally stationary series, and the MA order for the seasonal component, respectively,while m𝑚mitalic_m denotes the period of the seasonality tsay2005analysis (50).

An ARIMA(p,d,q)(P,D,Q)m𝑝𝑑𝑞subscript𝑃𝐷𝑄𝑚(p,d,q)(P,D,Q)_{m}( italic_p , italic_d , italic_q ) ( italic_P , italic_D , italic_Q ) start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT model for time series {yt,t}subscript𝑦𝑡𝑡\{y_{t},t\in\mathbb{Z}\}{ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ∈ blackboard_Z } can bewritten with the backshift notation as

(1i=1pϕiBi)(1i=1PΦiBim)(1B)d(1Bm)Dyt1superscriptsubscript𝑖1𝑝subscriptitalic-ϕ𝑖superscript𝐵𝑖1superscriptsubscript𝑖1𝑃subscriptΦ𝑖superscript𝐵𝑖𝑚superscript1𝐵𝑑superscript1superscript𝐵𝑚𝐷subscript𝑦𝑡\displaystyle\bigg{(}1-\sum_{i=1}^{p}\phi_{i}B^{i}\bigg{)}\bigg{(}1-\sum_{i=1}%^{P}\Phi_{i}B^{im}\bigg{)}(1-B)^{d}(1-B^{m})^{D}y_{t}( 1 - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( 1 - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT italic_i italic_m end_POSTSUPERSCRIPT ) ( 1 - italic_B ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ( 1 - italic_B start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT(1)
=(1+i=1qθiBi)(1+i=1QΘiBim)εt,absent1superscriptsubscript𝑖1𝑞subscript𝜃𝑖superscript𝐵𝑖1superscriptsubscript𝑖1𝑄subscriptΘ𝑖superscript𝐵𝑖𝑚subscript𝜀𝑡\displaystyle=\bigg{(}1+\sum_{i=1}^{q}\theta_{i}B^{i}\bigg{)}\bigg{(}1+\sum_{i%=1}^{Q}\Theta_{i}B^{im}\bigg{)}\varepsilon_{t},= ( 1 + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( 1 + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Q end_POSTSUPERSCRIPT roman_Θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT italic_i italic_m end_POSTSUPERSCRIPT ) italic_ε start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,

where B𝐵Bitalic_B is the backward shift operator, εtsubscript𝜀𝑡\varepsilon_{t}italic_ε start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is white noise, m𝑚mitalic_m is thelength of the seasonal period, ϕisubscriptitalic-ϕ𝑖\phi_{i}italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and ΦisubscriptΦ𝑖\Phi_{i}roman_Φ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT refer to the AR parameters ofnon-seasonal and seasonal parts, θisubscript𝜃𝑖\theta_{i}italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and ΘisubscriptΘ𝑖\Theta_{i}roman_Θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT refer to the MA parametersof non-seasonal and seasonal parts respectively.

Combinations of the non-seasonal orders p,d,q𝑝𝑑𝑞p,d,qitalic_p , italic_d , italic_q and seasonal orders P,D,Q𝑃𝐷𝑄P,D,Qitalic_P , italic_D , italic_Q provide arich variation of ARIMA models. Consequently, identifying the best model among thesepossibilities is of crucial importance in obtaining good forecasting performance usingARIMA models. Fortunately, vast automatic ARIMA model selection schemes are developed. Oneof the most widely used algorithms is the auto.arima() function developed forautomatic time series forecasting with ARIMA models in the R packageforecast Hyndman2008b (23). Despite that those algorithms allow us to implementthe order selection process with relative ease in a standalone computer for short timeseries, efficient ARIMA model selection for ultra-long time series is challenging withmodern distributed computational environments.

Distributed ARIMA models for ultra-long time series (1)

We take the algorithm in the auto.arima() function to describe the ARIMA modelselection process. Other algorithms follow a similar fashion. Figure1depicts how the auto.arima() function is applied to conduct a search process overpossible models. The algorithm consists of three main steps: stationary tests, orderselection, and model refit. First, the stationary tests aim to decide the order offirst-differencing and seasonal-differencing, using a KPSS testkwiatkowski1992testing (31) for estimating d𝑑ditalic_d and either a Canova-Hansen testcanova1995seasonal (8) or a measure of seasonality hyndman2018forecasting (22) forestimating D𝐷Ditalic_D. Second, the order selection process chooses the model orders via aninformation criterion such as AIC, AICc, or BIC values. There are two options for theorder selection approach, namely (greedy) stepwise selection and global selection, whichcan be customized according to time series characteristics such as time series length andseasonality. Such selection can be time-consuming because each information criterion isobtained by a model fitting process. Finally, the selected model orders are applied torefit best models without approximation if the information criteria used for modelselection are approximated.

The automatic ARIMA modeling has been extended in many ways by forecasting researchers(calheiros2014workload, 7, 45, 38, e.g.,). Despite itssuperb performance in forecasting time series, several difficulties hinder the extensionof this approach to ultra-long time series. We use the electricity demand for theNortheast Massachusetts (NEMASSBOST) zone of the Global Energy Forecasting Competition2017 (hong2019global, 20, GEFCom2017;) to elaborate on the following challenges ofextending the auto.arima() function to ultra-long time series data.

  1. 1.

    Modeling the whole time series with a single model relies on an unrealisticassumption that the DGP of time series has remained invariant over an ultra-longperiod. Note that the assumption is made on the DGP of the target time series, ratherthan the time series itself.

  2. 2.

    Order selection is an extremely time-consuming process, which requires to fit allavailable models. Even though we can select model orders by adopting global orderselection approach with parallelism, it still takes a lot of time to run a single timeseries model for ultra-long time series. The computational time grows exponentially withthe length of time series increasing.

  3. 3.

    Multiple models may be considered in the model refit process because theauto.arima() function carries out several strict checks for unit roots, alsoresulting in a loss of computing efficiency.

  4. 4.

    The existing approaches for model fitting, such as CSS (conditional sum-of-squares),ML (maximum likelihood), and hybrid CSS-ML (find starting values by CSS, then ML), arehard to parallel due to the nature of time dependency. The ML approach is the mostcommonly used but time-consuming approach for fitting ARIMA modelsHyndman2008b (23). Figure2 compares the execution time of theauto.arima() function under the CSS and CSS-ML fitting methods, and shows theimpact of fitting methods on the function’s execution time.When considering minimizing the conditional sum-of-squares as the estimation methodfor ARIMA models, the auto.arima() function in the forecast package forR commonly uses the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithmfor optimization purposes. The computational complexity of ARIMA modeling is𝒪(n2T)𝒪superscript𝑛2𝑇\mathcal{O}(n^{2}T)caligraphic_O ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T ) when the non-seasonal orders (p,d,q𝑝𝑑𝑞p,d,qitalic_p , italic_d , italic_q) and seasonal orders(P,D,Q𝑃𝐷𝑄P,D,Qitalic_P , italic_D , italic_Q) are given, where n𝑛nitalic_n is the number of parameters (i.e.n=p+q+P+Q𝑛𝑝𝑞𝑃𝑄n=p+q+P+Qitalic_n = italic_p + italic_q + italic_P + italic_Q),and T𝑇Titalic_T is the length of the time series of interest.

  5. 5.

    The length of time series has a significant impact on automatic ARIMA modeling. Wenotice that a standalone computer may not have sufficient resources to fit an ultra-longtime series. From Figure2, we find that time series with longer lengthyield much longer computation time, which provides another good explanation of why theorder selection and model refit processes are time-consuming.

  6. 6.

    Most model selection schemes only allow a small range of lag values in ARIMAmodeling to limit the computation. The maximum values of model orders directly determinethe available models in the order selection process. If the model orders are allowed totake a broader range of values, the number of candidate models will increaserapidly. Therefore, relaxing the restriction of maximum values of model orders becomesan obstacle in automating ARIMA modeling for ultra-long time series.

Distributed ARIMA models for ultra-long time series (2)

In this study, the proposed algorithm is designed to tackle these challenges by estimating ARIMAmodels on distributed systems, making it suitable for the processing of big data time series.More specifically, the proposed algorithm retains the local time dependency and utilizes a straightforwardimplementation of splitting across samples to facilitate distributed forecasting forultra-long time series with only one round of communication. To the best of our knowledge,this study is the first distributed forecasting approach that integrates distributedcomputing and forecast combinations to process time series spanning large-scale timeperiods, in which we use weighted least squares to combine the local estimators trained oneach subseries by minimizing a global loss function. The proposed framework helps inextending existing forecasting methods to distributed ultra-long time series forecastingby merely making assumptions about the DGP of subseries spanning a short timeinterval. Our proposed approach makes the following contributions compared to the existingliterature and implementations.

  1. 1.

    We extend the distributed least-square approximation (zhu2021least, 58, DLSA,)method, which is designed to solve pure regression-type problems with observablecovariates, to address the challenges associated with forecasting ultra-long time seriesin a distributed manner. While the theoretical and empirical work by(zhu2021least, 58, ) guarantees the statistical properties for independent data.

  2. 2.

    Although the DLSA method ensures the estimators being consistent with the globalmodel, where all data are used to fit a single model, this is not our only concern withtime series forecasting because a conventional time series model usually unrealisticallyassumes the DGP of an ultra-long time series is invariant. In our paper, the DGPs ofeach subseries are allowed to vary over time, thus enabling the possible evolution oftrend and seasonality across consecutive subseries. With the support of DLSA, the localestimators computed on each subseries are aggregated using weighted least squares inorder to minimize the global loss function. This further prevents overfitting as aresult of averaging multiple models rather than selecting a single model.

  3. 3.

    Compared with algorithm level parallelization, such as (boyd2011distributed, 5, ),our framework is intuitive and easy to implement. In principle, manyforecasting models are possible to elaborate with our framework. Moreover, our frameworkoutperforms competing methods with improved computational efficiency, especially forlong-term forecasting, which is necessary for many fields, such as investment decisions,industrial production arrangements, and farm management.

  4. 4.

    Time series models, such as the ARIMA model or GARCH models, also model the errorterms in a parametric form. Our study shows that directly applying the DLSA, whichfocuses on the coefficients of regression type problems, to all the parameters in timeseries models may cause the stationary, causality, or invertibility problems. Our workfurther tackles this issue by a necessary ARIMA transformation step.

  5. 5.

    Our approach retains a solid theoretical foundation and our proposed scheme can alsobe viewed as a model combination approach in the sense that it combines model parametersfrom the multiple subseries, in contrast to the classic forecast combinations ofdifferent forecasting methods (montero2020fforma, 40, 28, 34, 54, e.g.,).

3 Distributed Forecasting for Ultra-long Time Series

Given a time series spanning a long stretch of time, {yt;t=1,2,,T}formulae-sequencesubscript𝑦𝑡𝑡12𝑇\{y_{t};t=1,2,\dots,T\}{ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; italic_t = 1 , 2 , … , italic_T }, we aimto develop a novel framework to work well for forecasting the future H𝐻Hitalic_Hobservations. Define 𝒮={1,2,,T}𝒮12𝑇\mathcal{S}=\{1,2,\dots,T\}caligraphic_S = { 1 , 2 , … , italic_T } to be the timestamp sequence of timeseries {yt}subscript𝑦𝑡\{y_{t}\}{ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT }. Then the parameter estimation problem can be formulated asf(θ,Σyt,t𝒮)𝑓𝜃conditionalΣsubscript𝑦𝑡𝑡𝒮f(\theta,\Sigma\mid y_{t},t\in\mathcal{S})italic_f ( italic_θ , roman_Σ ∣ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ∈ caligraphic_S ), where f𝑓fitalic_f is a parameter estimationalgorithm, θ𝜃\thetaitalic_θ denotes the global parameters, and ΣΣ\Sigmaroman_Σ denotes the covariancematrix for the global parameters.

Nevertheless, the above statement relies on the assumption that the underlying DGP of thetime series remains the same over a long stretch of time, which is unlikely inreality. Alternatively, suppose the whole time series is split into K𝐾Kitalic_K subseries withcontiguous time intervals; that is 𝒮=k=1K𝒮k𝒮superscriptsubscript𝑘1𝐾subscript𝒮𝑘\mathcal{S}=\cup_{k=1}^{K}\mathcal{S}_{k}caligraphic_S = ∪ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, where𝒮ksubscript𝒮𝑘\mathcal{S}_{k}caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT collects the timestamps of k𝑘kitalic_kth subseries. We know thatT=k=1KTk𝑇superscriptsubscript𝑘1𝐾subscript𝑇𝑘T=\sum_{k=1}^{K}T_{k}italic_T = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, where Tksubscript𝑇𝑘T_{k}italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the length of the k𝑘kitalic_kth subseries. Consequently,we divide an ultra-long time series into several subseries with a realistic assumptionmade about the DGP of each subseries, as illustrated in Figure3. In thisway, the parameter estimation problem is transformed into K𝐾Kitalic_K subproblems and onecombination problem as follows:

f(θ,Σyt,t𝒮)=g(f1(θ1,Σ1yt,t𝒮1),,fK(θK,ΣKyt,t𝒮K)),𝑓𝜃conditionalΣsubscript𝑦𝑡𝑡𝒮𝑔subscript𝑓1subscript𝜃1conditionalsubscriptΣ1subscript𝑦𝑡𝑡subscript𝒮1subscript𝑓𝐾subscript𝜃𝐾conditionalsubscriptΣ𝐾subscript𝑦𝑡𝑡subscript𝒮𝐾\displaystyle f(\theta,\Sigma\mid y_{t},t\in\mathcal{S})=g\big{(}f_{1}(\theta_%{1},\Sigma_{1}\mid y_{t},t\in\mathcal{S}_{1}),\ldots,f_{K}(\theta_{K},\Sigma_{%K}\mid y_{t},t\in\mathcal{S}_{K})\big{)},italic_f ( italic_θ , roman_Σ ∣ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ∈ caligraphic_S ) = italic_g ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , roman_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∣ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ∈ caligraphic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_f start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT , roman_Σ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ∣ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ∈ caligraphic_S start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) ) ,

where fksubscript𝑓𝑘f_{k}italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT denotes the parameter estimation problem for the k𝑘kitalic_kth subseries, and g𝑔gitalic_g is acombination algorithm applied to combine the local estimators of subseries. Here we arecombining the models before forecasting, rather than forecasting from each model and thencombining the forecasts kang2020gratis (28, 54). In the simplestsituation, g()𝑔g(\cdot)italic_g ( ⋅ ) could be just a single mean function, and our framework could beviewed as a model averaging approach. Note that the optimal length of split subseries canbe different for different time series. One has to balance the computational resourceswith the needed length to replicate the underlying structure in the time series. Forexample, the subseries of an hourly time series should be long enough to capture the dailypattern, while the monthly pattern can be ignored due to the limited computationalresources and processed by preprocessing steps such as time series decomposition beforedistributed computing.

Distributed ARIMA models for ultra-long time series (3)

Figure4 outlines the proposed framework to forecast an ultra-long timeseries on distributed systems. We assume that the historical observations and theirtimestamps are stored in the DFS before being processed by our framework. We document thepseudocode for the Mapper and Reducer of the proposed approach in B.Such a MapReduce algorithm requires only one “master-and-worker” communication for each workernode and avoids further iterative steps required by several distributed methods,see Section2.1 for further discussions.Hence, our proposed method is highly efficient in terms of communication. In simpleterms, the proposed framework consists of the following steps.

  1. Step1:

    Preprocessing. Split the whole time series into K𝐾Kitalic_K subseries, as shown inFigure3, which is done automatically with distributed systems.

  2. Step2:

    Modeling. Train a model for each subseries via worker nodes by assumingthat the DGP of subseries remains the same over the short time-windows.

  3. Step3:

    Linear transformation. Convert the trained models in Step 2 into K𝐾Kitalic_K linearrepresentations described in Section3.1.

  4. Step4:

    Estimator combination. Combine the local estimators obtained in Step 3 byminimizing the global loss function described in Section3.2.

  5. Step5:

    Forecasting. Forecast the next H𝐻Hitalic_H observations by using the combinedestimators described in Sections3.3 and 3.4.

Distributed ARIMA models for ultra-long time series (4)

In this paper, we illustrate our approach with the ARIMA model. Since we split the time seriesof interest into K𝐾Kitalic_K subseries with contiguous time intervals, the computational complexityof modeling ARIMA for each subseries is reduced to 𝒪(n2T/K)𝒪superscript𝑛2𝑇𝐾\mathcal{O}(n^{2}T/K)caligraphic_O ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T / italic_K ) when themodel orders are specified. As a result, when forecasting an ultra-long time series withan extremely large T𝑇Titalic_T, our proposed method is computationally more efficient than ARIMA,which has a computational complexity of 𝒪(n2T)𝒪superscript𝑛2𝑇\mathcal{O}(n^{2}T)caligraphic_O ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T ), because it solves the large-scalecomputation problem in a distributed fashion.

The rest of this section elaborates on the steps and approaches of theframework. Section3.1 provides the details of how to convert a generalARIMA model into a linear representation. Section3.2 entails solving theproblem of combining the local estimators of subseries’ models, whileSection3.3 and Section3.4 describe the multi-step point andinterval forecasting respectively.

3.1 Linear Representations of ARIMA Models

The order selection process identifies the model with the minimum specified informationcriterion for each split subseries by using the automatic ARIMA modeling implemented inthe forecast package for R Hyndman2008b (23). Moreover, several checksare carried out to avoid an output ARIMA model with roots inside the unit circle, thusensuring time series properties such as stationarity, causality, and invertibility.Employing distributed systems to forecast ultra-long time series requires the local modelsfitted on the subseries capable of being combined to result in the global model for thewhole series. However, directly combining the original parameters of ARIMA models trainedon subseries may sometimes lead to a global ARIMA model with roots inside the unit circle.As a result, directly combining all parameters to produce an auxiliary global ARIMA modelcould be ill-behaved, thus resulting in numerically unstable forecasts. Consequently, inthis subsection, we are devoted to converting a general ARIMA model into a linearrepresentation which can be regarded as a regression problem to facilitate theparameters-based combination.

Following Equation(1), a general seasonal ARIMA model with intercept,time trend, and covariates used in the forecast R package is formallygiven by

(1i=1pϕiBi)(1i=1PΦiBim)(1B)d(1Bm)D(ytμ0μ1tj=1lηjγj,t)1superscriptsubscript𝑖1𝑝subscriptitalic-ϕ𝑖superscript𝐵𝑖1superscriptsubscript𝑖1𝑃subscriptΦ𝑖superscript𝐵𝑖𝑚superscript1𝐵𝑑superscript1superscript𝐵𝑚𝐷subscript𝑦𝑡subscript𝜇0subscript𝜇1𝑡superscriptsubscript𝑗1𝑙subscript𝜂𝑗subscript𝛾𝑗𝑡\displaystyle\bigg{(}1-\sum_{i=1}^{p}\phi_{i}B^{i}\bigg{)}\bigg{(}1-\sum_{i=1}%^{P}\Phi_{i}B^{im}\bigg{)}(1-B)^{d}(1-B^{m})^{D}\bigg{(}y_{t}-\mu_{0}-\mu_{1}t%-\sum_{j=1}^{l}\eta_{j}\gamma_{j,t}\bigg{)}( 1 - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( 1 - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT italic_i italic_m end_POSTSUPERSCRIPT ) ( 1 - italic_B ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ( 1 - italic_B start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_t - ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_η start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_j , italic_t end_POSTSUBSCRIPT )
=(1+i=1qθiBi)(1+i=1QΘiBim)εt,absent1superscriptsubscript𝑖1𝑞subscript𝜃𝑖superscript𝐵𝑖1superscriptsubscript𝑖1𝑄subscriptΘ𝑖superscript𝐵𝑖𝑚subscript𝜀𝑡\displaystyle=\bigg{(}1+\sum_{i=1}^{q}\theta_{i}B^{i}\bigg{)}\bigg{(}1+\sum_{i%=1}^{Q}\Theta_{i}B^{im}\bigg{)}\varepsilon_{t},= ( 1 + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ( 1 + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Q end_POSTSUPERSCRIPT roman_Θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT italic_i italic_m end_POSTSUPERSCRIPT ) italic_ε start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,(2)

where μ0subscript𝜇0\mu_{0}italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the intercept, μ1subscript𝜇1\mu_{1}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is called the slope of the linear time trend,γj,t(j=1,2,,l)subscript𝛾𝑗𝑡𝑗12𝑙\gamma_{j,t}(j=1,2,\cdots,l)italic_γ start_POSTSUBSCRIPT italic_j , italic_t end_POSTSUBSCRIPT ( italic_j = 1 , 2 , ⋯ , italic_l ) is a covariate at time t𝑡titalic_t and ηjsubscript𝜂𝑗\eta_{j}italic_η start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is itscoefficient. The automatic ARIMA modeling provides flexibility in whether to include theintercept and time trend terms. The time trend coefficient may be non-zero when d+D=1𝑑𝐷1d+D=1italic_d + italic_D = 1,while the intercept may be non-zero when d+D=0𝑑𝐷0d+D=0italic_d + italic_D = 0.

Let xt=ytμ0μ1tj=1lηjγj,tsubscript𝑥𝑡subscript𝑦𝑡subscript𝜇0subscript𝜇1𝑡superscriptsubscript𝑗1𝑙subscript𝜂𝑗subscript𝛾𝑗𝑡x_{t}=y_{t}-\mu_{0}-\mu_{1}t-\sum_{j=1}^{l}\eta_{j}\gamma_{j,t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_t - ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_η start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_j , italic_t end_POSTSUBSCRIPT. Then theseasonal ARIMA model for time series {yt,t}subscript𝑦𝑡𝑡\{y_{t},t\in\mathbb{Z}\}{ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ∈ blackboard_Z } is transformed into aseasonal ARIMA model for time series {xt,t}subscript𝑥𝑡𝑡\{x_{t},t\in\mathbb{Z}\}{ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ∈ blackboard_Z } without the intercept,time trend, and covariates terms. First, we convert the seasonal ARIMA model into anon-seasonal ARMA model.By using the polynomial multiplication, we assumethat the converted non-seasonal ARMA model is denoted ARMA(u,v)𝑢𝑣(u,v)( italic_u , italic_v ), where u=p+mP+d+mD𝑢𝑝𝑚𝑃𝑑𝑚𝐷u=p+mP+d+mDitalic_u = italic_p + italic_m italic_P + italic_d + italic_m italic_D andv=q+mQ𝑣𝑞𝑚𝑄v=q+mQitalic_v = italic_q + italic_m italic_Q. The (possibly non-stationary) ARMA(u,v)𝑢𝑣(u,v)( italic_u , italic_v ) is defined as

(1i=1uϕiBi)xt=(1+i=1vθiBi)εt,1superscriptsubscript𝑖1𝑢superscriptsubscriptitalic-ϕ𝑖superscript𝐵𝑖subscript𝑥𝑡1superscriptsubscript𝑖1𝑣superscriptsubscript𝜃𝑖superscript𝐵𝑖subscript𝜀𝑡\displaystyle\bigg{(}1-\sum_{i=1}^{u}\phi_{i}^{\prime}B^{i}\bigg{)}x_{t}=\bigg%{(}1+\sum_{i=1}^{v}\theta_{i}^{\prime}B^{i}\bigg{)}\varepsilon_{t},( 1 - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_B start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( 1 + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_B start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) italic_ε start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,(3)

where ϕisuperscriptsubscriptitalic-ϕ𝑖\phi_{i}^{\prime}italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and θisuperscriptsubscript𝜃𝑖\theta_{i}^{\prime}italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT refer to the AR and MA parametersrespectively. The converted ARMA model after polynomial multiplications still satisfiesthe invertibility condition because of the unit root checks performed in the originalARIMA modeling, so that all roots of the MA characteristic polynomial lie outside the unitcircle.

The next task involves converting the ARMA(u,v)𝑢𝑣(u,v)( italic_u , italic_v ) model for time series {xt}subscript𝑥𝑡\{x_{t}\}{ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } to itslinear representation.Given polynomialsϕ(B)=(1i=1uϕiBi)superscriptitalic-ϕ𝐵1superscriptsubscript𝑖1𝑢superscriptsubscriptitalic-ϕ𝑖superscript𝐵𝑖\phi^{\prime}(B)=\left(1-\sum_{i=1}^{u}\phi_{i}^{\prime}B^{i}\right)italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_B ) = ( 1 - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_B start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ), andθ(B)=(1+i=1vθiBi)superscript𝜃𝐵1superscriptsubscript𝑖1𝑣superscriptsubscript𝜃𝑖superscript𝐵𝑖\theta^{\prime}(B)=\left(1+\sum_{i=1}^{v}\theta_{i}^{\prime}B^{i}\right)italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_B ) = ( 1 + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_B start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) with rootsoutside the unit circle, we have

π(B)xt=ϕ(B)θ(B)xt=εt,𝜋𝐵subscript𝑥𝑡superscriptitalic-ϕ𝐵superscript𝜃𝐵subscript𝑥𝑡subscript𝜀𝑡\displaystyle\pi(B)x_{t}=\frac{\phi^{\prime}(B)}{\theta^{\prime}(B)}x_{t}=%\varepsilon_{t},italic_π ( italic_B ) italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_B ) end_ARG start_ARG italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_B ) end_ARG italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_ε start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,

where π(B)=(1i=1πiBi)𝜋𝐵1superscriptsubscript𝑖1subscript𝜋𝑖superscript𝐵𝑖\pi(B)=\left(1-\sum_{i=1}^{\infty}\pi_{i}B^{i}\right)italic_π ( italic_B ) = ( 1 - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ). The parameters of theconverted AR(\infty) model can be obtained by a recursion process. Consequently, thelinear representation of the original seasonal ARIMA model in Equation(3.1)is given by

yt=β0+β1t+i=1πiyti+j=1lηj(γj,ti=1πiγj,ti)+εt,subscript𝑦𝑡subscript𝛽0subscript𝛽1𝑡superscriptsubscript𝑖1subscript𝜋𝑖subscript𝑦𝑡𝑖superscriptsubscript𝑗1𝑙subscript𝜂𝑗subscript𝛾𝑗𝑡superscriptsubscript𝑖1subscript𝜋𝑖subscript𝛾𝑗𝑡𝑖subscript𝜀𝑡y_{t}=\beta_{0}+\beta_{1}t+\sum_{i=1}^{\infty}\pi_{i}y_{t-i}+\sum_{j=1}^{l}%\eta_{j}\bigg{(}\gamma_{j,t}-\sum_{i=1}^{\infty}\pi_{i}\gamma_{j,t-i}\bigg{)}+%\varepsilon_{t},italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_t + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t - italic_i end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_η start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_j , italic_t end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_j , italic_t - italic_i end_POSTSUBSCRIPT ) + italic_ε start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,(4)

where

β0=μ0(1i=1πi)+μ1i=1iπiandβ1=μ1(1i=1πi).formulae-sequencesubscript𝛽0subscript𝜇01superscriptsubscript𝑖1subscript𝜋𝑖subscript𝜇1superscriptsubscript𝑖1𝑖subscript𝜋𝑖andsubscript𝛽1subscript𝜇11superscriptsubscript𝑖1subscript𝜋𝑖\beta_{0}=\mu_{0}\bigg{(}1-\sum_{i=1}^{\infty}\pi_{i}\bigg{)}+\mu_{1}\sum_{i=1%}^{\infty}i\pi_{i}\qquad\text{and}\qquad\beta_{1}=\mu_{1}\bigg{(}1-\sum_{i=1}^%{\infty}\pi_{i}\bigg{)}.italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( 1 - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_i italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) .

Thus, the infinite order in the AR representation can be approximated by a large order ofp*superscript𝑝p^{*}italic_p start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT to make the AR(p*)superscript𝑝(p^{*})( italic_p start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) model infinitely close to the true AR process.

3.2 The Distributed Least Squares Approximation Method

Suppose we have obtained the appropriate models for individual subseries by traversing themodel space. The next stage entails solving the problem of combining the local estimatorsof each subseries model to perform multi-step forecasting.Inspired by (zhu2021least, 58, ), we aim to solve the time series modeling problem withthe dependency structure. The local ARIMA models trained for the subseries are unifiedinto the AR representations with a large order, making it possible to estimate the globalparameters in the master node by combining the local estimators delivered from a group ofworker nodes.

Let the model parameters in Equation(4) be given byθ=(β0,β1,π1,π2,,πp,η1,,ηl)𝜃superscriptsubscript𝛽0subscript𝛽1subscript𝜋1subscript𝜋2subscript𝜋𝑝subscript𝜂1subscript𝜂𝑙top\theta=(\beta_{0},\beta_{1},\pi_{1},\pi_{2},\cdots,\pi_{p},\eta_{1},\cdots,%\eta_{l})^{\top}italic_θ = ( italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_π start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_η start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. If (θ;yt)𝜃subscript𝑦𝑡\mathcal{L}(\theta;y_{t})caligraphic_L ( italic_θ ; italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is a twice differentiable loss function,we have the global loss function given by(θ)=T1t=1T(θ;yt)𝜃superscript𝑇1superscriptsubscript𝑡1𝑇𝜃subscript𝑦𝑡\mathcal{L}(\theta)=T^{-1}\sum_{t=1}^{T}\mathcal{L}(\theta;y_{t})caligraphic_L ( italic_θ ) = italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT caligraphic_L ( italic_θ ; italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) and the local lossfunction for the k𝑘kitalic_kth subseries given byk(θ)=Tk1t𝒮k(θ;yt)subscript𝑘𝜃superscriptsubscript𝑇𝑘1subscript𝑡subscript𝒮𝑘𝜃subscript𝑦𝑡\mathcal{L}_{k}(\theta)=T_{k}^{-1}\sum_{t\in\mathcal{S}_{k}}\mathcal{L}(\theta%;y_{t})caligraphic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_θ ) = italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t ∈ caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_L ( italic_θ ; italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). By using Taylor’s theorem and the relationship between the Hessian and covariancematrix for Gaussian random variables yuen2010bayesian (55), we have

(θ)𝜃\displaystyle\mathcal{L}(\theta)caligraphic_L ( italic_θ )=1Tk=1Kt𝒮k(θ;yt)k=1K(θθ^k)(TkTΣ^k1)(θθ^k)+c2,absent1𝑇superscriptsubscript𝑘1𝐾subscript𝑡subscript𝒮𝑘𝜃subscript𝑦𝑡superscriptsubscript𝑘1𝐾superscript𝜃subscript^𝜃𝑘topsubscript𝑇𝑘𝑇superscriptsubscript^Σ𝑘1𝜃subscript^𝜃𝑘subscript𝑐2\displaystyle=\frac{1}{T}\sum_{k=1}^{K}\sum_{t\in\mathcal{S}_{k}}\mathcal{L}(%\theta;y_{t})\approx\sum_{k=1}^{K}(\theta-\widehat{\theta}_{k})^{\top}\left(%\frac{T_{k}}{T}\widehat{\Sigma}_{k}^{-1}\right)(\theta-\widehat{\theta}_{k})+c%_{2},= divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t ∈ caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_L ( italic_θ ; italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≈ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_θ - over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( divide start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ( italic_θ - over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,(5)

where θ^ksubscript^𝜃𝑘\widehat{\theta}_{k}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the minimizer of the local loss function. That is θ^k=argmink(θ)subscript^𝜃𝑘subscript𝑘𝜃\widehat{\theta}_{k}=\arg\min\mathcal{L}_{k}(\theta)over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = roman_arg roman_min caligraphic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_θ ), c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are constants, and Σ^ksubscript^Σ𝑘\widehat{\Sigma}_{k}over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the covariance estimate for local estimator of the k𝑘kitalic_kth subseries.

Consequently, the objective of minimizing the global loss function is achieved by minimizing the weighted least squares expression in Equation(5). The global estimator takes the following form

θ~=(k=1KTkΣ^k1)1(k=1KTkΣ^k1θ^k).~𝜃superscriptsuperscriptsubscript𝑘1𝐾subscript𝑇𝑘superscriptsubscript^Σ𝑘11superscriptsubscript𝑘1𝐾subscript𝑇𝑘superscriptsubscript^Σ𝑘1subscript^𝜃𝑘\displaystyle\widetilde{\theta}=\left(\sum_{k=1}^{K}T_{k}\widehat{\Sigma}_{k}^%{-1}\right)^{-1}\left(\sum_{k=1}^{K}T_{k}\widehat{\Sigma}_{k}^{-1}\widehat{%\theta}_{k}\right).over~ start_ARG italic_θ end_ARG = ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) .(6)

Then the covariance matrix of the estimated global parameters is given byΣ~=T(k=1KTkΣ^k1)1~Σ𝑇superscriptsuperscriptsubscript𝑘1𝐾subscript𝑇𝑘superscriptsubscript^Σ𝑘11\widetilde{\Sigma}=T\left(\sum_{k=1}^{K}T_{k}\widehat{\Sigma}_{k}^{-1}\right)^%{-1}over~ start_ARG roman_Σ end_ARG = italic_T ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT.

Note that instead of simply averaging the estimated parameters, the analytical solution inEquation(6) approximates global estimators by taking a weighted average oflocal parameters using weights Σ^k1superscriptsubscript^Σ𝑘1\widehat{\Sigma}_{k}^{-1}over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, eliminating the influence ofoutliers on the forecast performance when there are subseries that are poorly fitted byARIMA models. Furthermore, the simple averaging method assumes subseries as hom*ogeneousfrom worker to worker to guarantee statistical efficiency. This is highly questionable inpractice because heteroskedasticity and imbalance are common phenomenons for distributedstored data.

The analytical form of the global estimator in Equation(6) can be used toestimate the global parameters in distributed forecasting. The difficulty with this methodis that it requires knowledge of Σ^ksubscript^Σ𝑘\widehat{\Sigma}_{k}over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. Specifically, the localparameters of a subseries in Equation(4) are derived from a seasonal ARIMAmodel and it may not be possible to straightly obtain a good estimate of the covariancematrix. Moreover, a large order p*superscript𝑝p^{*}italic_p start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is often considered for a better approximation ofthe initial ARIMA model trained by the split subseries, resulting in a set of localparameters in which some entries are close to or equal to zero and a high-dimensionalcovariance matrix whose dimension p*superscript𝑝p^{*}italic_p start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is likely to be larger than the length ofsubseries. Therefore, following the literature of high-dimensional covariance matrixestimation fan2008high (12, 13, 21), we assumeΣ^ksubscript^Σ𝑘\widehat{\Sigma}_{k}over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is sparse and approximate it using σ^k2Isuperscriptsubscript^𝜎𝑘2𝐼\hat{\sigma}_{k}^{2}Iover^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I for eachsubseries in this study, which greatly simplifies the computations and further reduces thecommunication costs in distributed systems.

3.3 Point Forecasts

After combining the local estimators from each subseries by minimizing the global loss function, the coefficients of the global estimators are calculated as illustrated in Section3.2. By using a large order p*superscript𝑝p^{*}italic_p start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT instead of the infinite order in the converted AR representation for each subseries, the combined global model can be written generally as follows:

yt=β~0+β~1t+i=1p*π~iyti+j=1lη~j(γj,ti=1p*π~iγj,ti)+et,subscript𝑦𝑡subscript~𝛽0subscript~𝛽1𝑡superscriptsubscript𝑖1superscript𝑝subscript~𝜋𝑖subscript𝑦𝑡𝑖superscriptsubscript𝑗1𝑙subscript~𝜂𝑗subscript𝛾𝑗𝑡superscriptsubscript𝑖1superscript𝑝subscript~𝜋𝑖subscript𝛾𝑗𝑡𝑖subscript𝑒𝑡\displaystyle y_{t}=\tilde{\beta}_{0}+\tilde{\beta}_{1}t+\sum_{i=1}^{p^{*}}%\tilde{\pi}_{i}y_{t-i}+\sum_{j=1}^{l}\tilde{\eta}_{j}\bigg{(}\gamma_{j,t}-\sum%_{i=1}^{p^{*}}\tilde{\pi}_{i}\gamma_{j,t-i}\bigg{)}+e_{t},italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over~ start_ARG italic_β end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + over~ start_ARG italic_β end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_t + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT over~ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t - italic_i end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT over~ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_j , italic_t end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT over~ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_j , italic_t - italic_i end_POSTSUBSCRIPT ) + italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,(7)

where θ~=(β~0,β~1,π~1,,π~p*,η~1,,η~l)~𝜃superscriptsubscript~𝛽0subscript~𝛽1subscript~𝜋1subscript~𝜋superscript𝑝subscript~𝜂1subscript~𝜂𝑙top\tilde{\theta}=(\tilde{\beta}_{0},\tilde{\beta}_{1},\tilde{\pi}_{1},\cdots,%\tilde{\pi}_{p^{*}},\tilde{\eta}_{1},\cdots,\tilde{\eta}_{l})^{\top}over~ start_ARG italic_θ end_ARG = ( over~ start_ARG italic_β end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over~ start_ARG italic_β end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over~ start_ARG italic_π end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , over~ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , over~ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , over~ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT is a vector of global model coefficients, and etsubscript𝑒𝑡e_{t}italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the observed residual.

Given the time series {yt}subscript𝑦𝑡\{y_{t}\}{ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT }, suppose that we are at the time T𝑇Titalic_T and are interested in forecasting the next H𝐻Hitalic_H observations, where the time index T𝑇Titalic_T is the forecast origin. The hhitalic_h-step-ahead forecast can be calculated with relative ease as

y^T+h|T=subscript^𝑦𝑇conditional𝑇absent\displaystyle\hat{y}_{T+h|T}=over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_T + italic_h | italic_T end_POSTSUBSCRIPT =β~0+β~1(T+h)+j=1lη~j(γj,T+hi=1p*π~iγj,T+hi)subscript~𝛽0subscript~𝛽1𝑇superscriptsubscript𝑗1𝑙subscript~𝜂𝑗subscript𝛾𝑗𝑇superscriptsubscript𝑖1superscript𝑝subscript~𝜋𝑖subscript𝛾𝑗𝑇𝑖\displaystyle\tilde{\beta}_{0}+\tilde{\beta}_{1}(T+h)+\sum_{j=1}^{l}\tilde{%\eta}_{j}\bigg{(}\gamma_{j,T+h}-\sum_{i=1}^{p^{*}}\tilde{\pi}_{i}\gamma_{j,T+h%-i}\bigg{)}over~ start_ARG italic_β end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + over~ start_ARG italic_β end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_T + italic_h ) + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT over~ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_j , italic_T + italic_h end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT over~ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_j , italic_T + italic_h - italic_i end_POSTSUBSCRIPT )
+{i=1p*π~iyT+1i,h=1i=1h1π~iy^T+hi|T+i=hp*π~iyT+hi,1<h<p*i=1p*π~iy^T+hi|T,hp*.casessuperscriptsubscript𝑖1superscript𝑝subscript~𝜋𝑖subscript𝑦𝑇1𝑖1superscriptsubscript𝑖11subscript~𝜋𝑖subscript^𝑦𝑇conditional𝑖𝑇superscriptsubscript𝑖superscript𝑝subscript~𝜋𝑖subscript𝑦𝑇𝑖1superscript𝑝superscriptsubscript𝑖1superscript𝑝subscript~𝜋𝑖subscript^𝑦𝑇conditional𝑖𝑇superscript𝑝\displaystyle+\begin{cases}\sum_{i=1}^{p^{*}}\tilde{\pi}_{i}y_{T+1-i},&h=1\\\sum_{i=1}^{h-1}\tilde{\pi}_{i}\hat{y}_{T+h-i|T}+\sum_{i=h}^{p^{*}}\tilde{\pi}%_{i}y_{T+h-i},&1<h<p^{*}\\\sum_{i=1}^{p^{*}}\tilde{\pi}_{i}\hat{y}_{T+h-i|T},&h\geq p^{*}\end{cases}.+ { start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT over~ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_T + 1 - italic_i end_POSTSUBSCRIPT , end_CELL start_CELL italic_h = 1 end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h - 1 end_POSTSUPERSCRIPT over~ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_T + italic_h - italic_i | italic_T end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i = italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT over~ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_T + italic_h - italic_i end_POSTSUBSCRIPT , end_CELL start_CELL 1 < italic_h < italic_p start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT over~ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_T + italic_h - italic_i | italic_T end_POSTSUBSCRIPT , end_CELL start_CELL italic_h ≥ italic_p start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_CELL end_ROW .

In this way, the point forecasts of the next H𝐻Hitalic_H observations can be calculated recursively for h=1,,H1𝐻h=1,\dots,Hitalic_h = 1 , … , italic_H.

3.4 Prediction Intervals

As described in Section3.1, the linear representation of the seasonal ARIMA model, which is trained for each subseries of {yt}subscript𝑦𝑡\{y_{t}\}{ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT }, is derived from the following AR model for time series {xt}subscript𝑥𝑡\{x_{t}\}{ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT }:

xt=i=1p*πixti+εt,subscript𝑥𝑡superscriptsubscript𝑖1superscript𝑝subscript𝜋𝑖subscript𝑥𝑡𝑖subscript𝜀𝑡\displaystyle x_{t}=\sum_{i=1}^{p^{*}}\pi_{i}x_{t-i}+\varepsilon_{t},italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t - italic_i end_POSTSUBSCRIPT + italic_ε start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,

where the infinite order for the converted AR model is replaced by a large order p*superscript𝑝p^{*}italic_p start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT. Asis well-known, once we estimate the coefficients of the AR regression and the standarddeviation of the residuals, the standard error of the hhitalic_h-step ahead forecast can beuniquely determined. Thus, the forecast variances of the linear representation inEquation(4) are not affected by the intercept term and time trend term of theseasonal ARIMA model (ignoring estimation error). Consequently, the forecast variances ofthe combined global model in Equation(7) depend only on the AR part of themodel, that is the term i=1p*π~iyti+etsuperscriptsubscript𝑖1superscript𝑝subscript~𝜋𝑖subscript𝑦𝑡𝑖subscript𝑒𝑡\sum_{i=1}^{p^{*}}\tilde{\pi}_{i}y_{t-i}+e_{t}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT over~ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t - italic_i end_POSTSUBSCRIPT + italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

To compute these variances, we convert the AR model to a MA model with infinite orderbrockwell2016introduction (6):

et+i=1ψ~ieti.subscript𝑒𝑡superscriptsubscript𝑖1subscript~𝜓𝑖subscript𝑒𝑡𝑖\displaystyle e_{t}+\sum_{i=1}^{\infty}\tilde{\psi}_{i}e_{t-i}.italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT over~ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_t - italic_i end_POSTSUBSCRIPT .

Then, in the global model, the standard error of the hhitalic_h-step ahead forecast is given by

σ~h2={σ~2,h=1σ~2(1+i=1h1ψ~i2),h>1,superscriptsubscript~𝜎2casessuperscript~𝜎21superscript~𝜎21superscriptsubscript𝑖11superscriptsubscript~𝜓𝑖21\displaystyle\tilde{\sigma}_{h}^{2}=\begin{cases}\tilde{\sigma}^{2},&h=1\\\tilde{\sigma}^{2}\left(1+\sum_{i=1}^{h-1}\tilde{\psi}_{i}^{2}\right),&h>1,%\end{cases}over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = { start_ROW start_CELL over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , end_CELL start_CELL italic_h = 1 end_CELL end_ROW start_ROW start_CELL over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h - 1 end_POSTSUPERSCRIPT over~ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , end_CELL start_CELL italic_h > 1 , end_CELL end_ROW

where σ~~𝜎\tilde{\sigma}over~ start_ARG italic_σ end_ARG is the standard deviation of the residuals for the combined global model and is unknown. As illustrated in Section3.2, we suggest replacing the covariance estimate Σ^ksubscript^Σ𝑘\widehat{\Sigma}_{k}over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT of local estimators with σ^k2Isuperscriptsubscript^𝜎𝑘2𝐼\hat{\sigma}_{k}^{2}Iover^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I. Subsequently, the covariance estimate of the global estimators is calculated by Σ~=(k=1K(Tk/T)(σ^k2I)1)1~Σsuperscriptsuperscriptsubscript𝑘1𝐾subscript𝑇𝑘𝑇superscriptsuperscriptsubscript^𝜎𝑘2𝐼11\widetilde{\Sigma}=\left(\sum_{k=1}^{K}\left(T_{k}/T\right)(\hat{\sigma}_{k}^{%2}I)^{-1}\right)^{-1}over~ start_ARG roman_Σ end_ARG = ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT / italic_T ) ( over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, and we can estimate σ~2=tr(Σ~)/psuperscript~𝜎2tr~Σ𝑝\tilde{\sigma}^{2}=\operatorname{tr}(\widetilde{\Sigma})/pover~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = roman_tr ( over~ start_ARG roman_Σ end_ARG ) / italic_p.

Assuming normally distributed errors, the central 100(1α)%100percent1𝛼100(1-\alpha)\%100 ( 1 - italic_α ) % prediction interval for the hhitalic_h-step ahead forecast is given by

y^T+h|T±Φ1(1α/2)σ~h,plus-or-minussubscript^𝑦𝑇conditional𝑇superscriptΦ11𝛼2subscript~𝜎\displaystyle\hat{y}_{T+h|T}\pm\Phi^{-1}(1-\alpha/2)\tilde{\sigma}_{h},over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_T + italic_h | italic_T end_POSTSUBSCRIPT ± roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( 1 - italic_α / 2 ) over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ,

where ΦΦ\Phiroman_Φ is the cumulative distribution function of the standard normal distribution.

4 Application to Electricity Data

In the section, the electricity demand data set we use to investigate the performance ofthe proposed distributed ARIMA models and the experimental design are shown in detail. Weanalyze the performance of the proposed distributed ARIMA models and explore the factorsthat affect its forecasting performance.

4.1 Data Description

To illustrate our proposed approach, we forecast the time series of the GEFCom2017hong2019global (20). The data, publicly available athttps://github.com/camroach87/gefcom2017data, was initially made available by ISONew England. It comprises the electricity load, holiday information, and weather datacomposed of dew point and dry bulb temperatures. To assess the benefits of the proposeddistributed ARIMA model, we restrict our attention to the electricity load data in thefollowing analysis.

Distributed ARIMA models for ultra-long time series (5)

The electricity load data set consists of 10101010 time series of hourly data, ranging from 1March 2003 to 30 April 2017. As aforementioned in Section2.2, thecomputational time for the automatic ARIMA modeling grows exponentially over the timelength, reaching 10101010 minutes at the length of 1,00010001,0001 , 000. Allowing a wider range of modelorders will further increase the execution time beyond an acceptable range (seeSection4.5 for more experimental results). In this respect, the electricityload data, spanning 124,171124171124,171124 , 171 time points, is long enough so that distributed computing isdesired, consistent with the applicable scenarios of our proposed approach.Dealing with data of this magnitude is uncommon in traditional forecasting applications,but it should be noted that load forecasting problems routinely have cases with availabledata spanning more than one million time points, most likely for several locations at the same time.The forecasts are primarily used in power systems operations, such as energy trading and unit commitment.

Figure5 presents the hourly electricity load for allbottom-level zones and two aggregated zones. We train the distributed ARIMA model usingdata from 1 March 2003 to 31 December 2016, while data from 1 January 2017 to 30 April2017 are used for testing. In this way, we provide the four-month (2879287928792879-step) aheadpoint forecasts, and the corresponding prediction intervals with multiple confidencelevels. The original GEFCom2017 hong2019global (20) only requires one month aheadforecasting. However, forecasting at longer horizons in energy and many otherhigh-frequency time series domains is in great demand as it allows for earlier managementplans. Note that we restrict our attention to time series forecasting on distributedsystems without considering the data’s hierarchical configuration.

Electricity demand may exhibit periodic patterns such as time of the day, day of the week,and month of the year. Although day of the week can be easily involved in our proposedmodel as covariates, our ex-ante analysis shows no significant distinction in patternsbetween days of the week: whether the day-of-the-week pattern is included in ARIMA modelsor distributed ARIMA models results in little difference in both point and intervalforecasting accuracy. Moreover, the subseries is not long enough to enable us to considermonthly seasonality under distributed computing environment, while the monthly seasonalitycan be handled with preprocessing steps such as time series decomposition. To better focuson assessing the benefits of the proposed distributed ARIMA models over normal ARIMAmodels, we only consider the time-of-the-day effect in the following analysis by using the seasonalcomponents of ARIMA models for hourly subseries (m=24𝑚24m=24italic_m = 24).

4.2 Experimental Setup

We partition each time series into 150150150150 subseries in the experiment with the length ofeach subseries about 800800800800. The setup is inspired by the M4 competitionmakridakis2020m4 (38): the length of the hourly time series ranges from 700700700700 to900900900900. For time series with such lengths, traditional forecasting models perform well on astandalone computer, and the time consumed by automatic ARIMA modeling process is within5555 minutes, which is empirically acceptable. The analysis exploring the forecastingperformance on different settings of the number of subseries is presented inSection4.5.

As illustrated in Section3.1, the AR representation with infinite order isobtained from the seasonal ARIMA model for each subseries to facilitate theparameter-based combination. We approximate the infinite order AR model with one sizeablefinite order, balancing model complexity, and approximating the original ARIMA model.In practice, one can perform an ex-ante analysis using several candidate order values andselect the smallest one from the order values resulting in no significant differencesin the forecasting results of the original ARIMA model and its converted linearrepresentation. Alternatively, if no ex-ante analysis has been conducted, a sizeableorder is always recommended as it will only lead to negligible computationalcomplexity in the combination of local parameters in the master node, while allowing abroader range of order values for the original ARIMA models. Accordingly, we set the AR orderp*superscript𝑝p^{*}italic_p start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT to 2000200020002000 in this experiment, see Section4.4 for more details.

In this paper, the distributed ARIMA model is designed with the aim of facilitating the ARIMAmodeling for ultra-long time series in a distributed manner with high computational efficiencyand potentially improved forecasting performance, rather than running a horse race between theproposed method and other forecasting methods. Therefore, the main comparison we are interestedin is DARIMA versus ARIMA. Moreover, the most widely used forecastingapproaches developed with distributed systems are poorly scalable for large forecast horizons,making it impractical to apply these approaches to forecasting quite a few future values(see Section2.1 for more details). Inthis regard, we only compare the proposed approach to ARIMA models for the whole time series,as well as an additional standard for comparison: ETS models.

For comparison purposes, the argument configuration of the automatic ARIMA modeling forthe whole series and subseries is shown in Table1. To make the algorithmexecution time comparable, we consider the global order selection with parallelism infitting models for the whole time series, while using non-parallel stepwise orderselection when modeling the subseries. Furthermore, we apply the CSS method to fit ARIMAmodels instead of CSS-ML (see Section2.2 for details). With the fittingmethod CSS-ML, we may have to fit more than one model in the model refit process, sincethe model with the appropriate order identified in the order selection process may berejected by several strict checks for unit roots. Due to the uncertainty, the comparisonof execution time between the ARIMA model on the whole series and the distributed ARIMAmodel would be unreliable if we used CSS-ML. Finally, the experiment is limited tospecific maximum values of model orders. We further discuss the importance of model ordersto forecasting performance in Section4.5. Moreover, implementations of ETSmodels are available in the forecast R package with the ets() function.Given a single time series data, ETS modeling can not be extended to parallel computing.

ArgumentARIMADARIMA
max.p; max.q55
max.P; max.Q22
max.order55
fitting methodCSSCSS
parallel (multicore)TrueFalse
stepwiseFalseTrue

As for the system environment, the experiments are carried out on a Spark-on-YARN cluster onAlibaba Cloud Server composed of one master node and two worker nodes. Each node contains32323232 logical cores, 64646464 GB RAM and two 80808080 GB SSD local hard drives. The algorithm isdeveloped on Spark platform (2.4.5), and both Python as well as Rinterfaces are freely available at https://github.com/xqnwang/darima.

4.3 Evaluation Measures

To assess the performance of the point forecasts, we consider the mean absolute scalederror (hyndman2006another, 24, MASE;) as the measure of forecasting accuracy. MASE isrecommended because of its excellent mathematical properties, such as scale-independentand less insensitive to outliers. Besides, (hyndman2006another, 24, ) suggest MASE as thestandard measure for comparing forecasting accuracy across multiple time series. The formulafor computing the MASE is the following:

MASE=1Ht=T+1T+H|yty^t|T|1Tmt=m+1T|ytytm|.absent1𝐻superscriptsubscript𝑡𝑇1𝑇𝐻subscript𝑦𝑡subscript^𝑦conditional𝑡𝑇1𝑇𝑚superscriptsubscript𝑡𝑚1𝑇subscript𝑦𝑡subscript𝑦𝑡𝑚\displaystyle=\frac{\frac{1}{H}\sum\limits_{t=T+1}^{T+H}|y_{t}-\hat{y}_{t|T}|}%{\frac{1}{T-m}\sum\limits_{t=m+1}^{T}|y_{t}-y_{t-m}|}.= divide start_ARG divide start_ARG 1 end_ARG start_ARG italic_H end_ARG ∑ start_POSTSUBSCRIPT italic_t = italic_T + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T + italic_H end_POSTSUPERSCRIPT | italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t | italic_T end_POSTSUBSCRIPT | end_ARG start_ARG divide start_ARG 1 end_ARG start_ARG italic_T - italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_t = italic_m + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT | italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t - italic_m end_POSTSUBSCRIPT | end_ARG .

We evaluate the accuracy of prediction intervals using the mean scaled interval score(gneiting2007strictly, 18, MSIS;), given by

MSIS=1Ht=T+1T+H(Ut|TLt|T)+2α(Lt|Tyt)𝟏{yt<Lt|T}+2α(ytUt|T)𝟏{yt>Ut|T}1Tmt=m+1T|ytytm|,MSIS1𝐻superscriptsubscript𝑡𝑇1𝑇𝐻subscript𝑈conditional𝑡𝑇subscript𝐿conditional𝑡𝑇2𝛼subscript𝐿conditional𝑡𝑇subscript𝑦𝑡1subscript𝑦𝑡subscript𝐿conditional𝑡𝑇2𝛼subscript𝑦𝑡subscript𝑈conditional𝑡𝑇1subscript𝑦𝑡subscript𝑈conditional𝑡𝑇1𝑇𝑚superscriptsubscript𝑡𝑚1𝑇subscript𝑦𝑡subscript𝑦𝑡𝑚\displaystyle\text{MSIS}=\frac{\displaystyle\frac{1}{H}\sum_{t=T+1}^{T+H}(U_{t%|T}-L_{t|T})+\frac{2}{\alpha}(L_{t|T}-y_{t})\bm{1}\{y_{t}<L_{t|T}\}+\frac{2}{%\alpha}(y_{t}-U_{t|T})\bm{1}\{y_{t}>U_{t|T}\}}{\displaystyle\frac{1}{T-m}\sum_%{t=m+1}^{T}|y_{t}-y_{t-m}|},MSIS = divide start_ARG divide start_ARG 1 end_ARG start_ARG italic_H end_ARG ∑ start_POSTSUBSCRIPT italic_t = italic_T + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T + italic_H end_POSTSUPERSCRIPT ( italic_U start_POSTSUBSCRIPT italic_t | italic_T end_POSTSUBSCRIPT - italic_L start_POSTSUBSCRIPT italic_t | italic_T end_POSTSUBSCRIPT ) + divide start_ARG 2 end_ARG start_ARG italic_α end_ARG ( italic_L start_POSTSUBSCRIPT italic_t | italic_T end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) bold_1 { italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT < italic_L start_POSTSUBSCRIPT italic_t | italic_T end_POSTSUBSCRIPT } + divide start_ARG 2 end_ARG start_ARG italic_α end_ARG ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_U start_POSTSUBSCRIPT italic_t | italic_T end_POSTSUBSCRIPT ) bold_1 { italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT > italic_U start_POSTSUBSCRIPT italic_t | italic_T end_POSTSUBSCRIPT } end_ARG start_ARG divide start_ARG 1 end_ARG start_ARG italic_T - italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_t = italic_m + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT | italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t - italic_m end_POSTSUBSCRIPT | end_ARG ,

where Lt|Tsubscript𝐿conditional𝑡𝑇L_{t|T}italic_L start_POSTSUBSCRIPT italic_t | italic_T end_POSTSUBSCRIPT and Ut|Tsubscript𝑈conditional𝑡𝑇U_{t|T}italic_U start_POSTSUBSCRIPT italic_t | italic_T end_POSTSUBSCRIPT are lower and upper bounds of the generated100(1α)1001𝛼100(1-\alpha)100 ( 1 - italic_α )% prediction interval, respectively. The scoring rule balances the width ofthe generated prediction intervals and the penalty for true values lying outside theprediction intervals.

4.4 Distributed Forecasting Results

We now investigate the performance of the proposed distributed ARIMA models on theGEFCom2017 data set compared to that of ARIMA models as well as ETS models in termsof MASE as well as MSIS. Execution time isalso considered as an important metric describing the computational efficiency ofalgorithms. For conciseness, our proposed algorithm, the distributed ARIMA model, ishereinafter referred to as DARIMA.

To verify whether the approximating order of the AR representation of 2000200020002000 (as describedin Section4.2) is large enough to make the AR model close to its originalseasonal ARIMA model, we present the forecasting results of the ARIMA model and itsAR(2000)2000(2000)( 2000 ) representation on the GEFCom2017 data set in Table2. We observethat there is no difference between the forecasting performance of the ARIMA model andthat of the converted AR model (as measured by MASE and MSIS), to the degree of 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT.

MASEMSIS
MeanMedianSDMeanMedianSD
DARIMA1.2971.2180.28415.07814.9561.021
ARIMA1.4301.3250.35119.73316.4987.446
AR representation1.4301.3250.35119.73316.4987.446
ETS1.4911.3380.40853.78349.10915.834

Table2 also compares the forecasting performance of DARIMA against ARIMA and ETSfor the whole time series in terms of the mean, median, and standard deviation (SD) of the MASE and MSIS values. Asexpected, DARIMA always outperforms the benchmark methods regardless of point forecasts orprediction intervals. Specifically, for point forecasting, DARIMA achieves substantialperformance improvements compared to the benchmark methods, approximately at least 9.3%percent9.39.3\%9.3 % for themean MASE value and 8.1%percent8.18.1\%8.1 % for the median, with a smaller degree of variation. Meanwhile, DARIMA yields a statisticallysignificant improvement (at least 23.6%percent23.623.6\%23.6 %) over the benchmark methods in terms of the mean ofMSIS values. Therefore, implementing ARIMA models on distributed systems by splitting thewhole time series into several subseries dominates ARIMA and ETS in bothpoint forecasting and interval forecasting.

We proceed by observing how the forecasting performance of distributed ARIMA modelschanges with the forecast horizon. Figure6 depicts the accuracy ofDARIMA over various forecast horizons against the benchmark methods: ARIMA and ETS. First, theleft panel shows that the point forecasting performance of DARIMA displays smalldifferences with ARIMA and ETS when we are interested in obtaining the forecasts of the first fewfuture values. We also observe that DARIMA yields slightly larger values than ARIMA interms of MASE when focusing on forecasting the next 1000100010001000 observations. This differencetapers off with increasing forecast horizon, and finally, DARIMA significantly outperformsboth ARIMA and ETS for the forecasting of long-term observations. On the other hand, the right panelillustrates that DARIMA provides much better performance than ARIMA and ETS according to MSISvalues when we turn our attention to forecasting more than 100100100100 futurevalues. The achieved performance improvements become more pronounced as theforecast horizon increases. Furthermore, one possible reason for the result thatDARIMA and ARIMA models perform substantially better than ETS modelsis that, for seasonal data, there are many more ARIMA models than the possiblemodels in the exponential smoothing class Hyndman2008b (23).In simple terms, we conclude that if long-term observations areconsidered, DARIMA is favorable, both in point forecasts and prediction intervals.

Figure7 shows the forecasting performance of DARIMA compared toARIMA on the electricity demand series for the NEMASSBOST zone. We observe from theforecasts that, compared to ARIMA, DARIMA captures the yearly seasonal pattern from theoriginal series. Even for large forecast horizons, DARIMA results in forecastscloser to the true future values than ARIMA. These conclusions are consistent with theprevious results shown in Table2 and Figure6.

Distributed ARIMA models for ultra-long time series (6)
Distributed ARIMA models for ultra-long time series (7)

Figure8 presents the MSIS results of forecasting with DARIMA,ARIMA, and ETS across different confidence levels varying from 50%percent5050\%50 % to 99%percent9999\%99 %. We observe thatDARIMA persistently results in better forecasting accuracy than ARIMA and ETS in terms of MSISacross various confidence levels. Besides, the superiority of DARIMA tends to be moresubstantial as the confidence level increases.

Distributed ARIMA models for ultra-long time series (8)

The aforementioned results mainly focus on the forecasting accuracy of DARIMA against thebenchmark methods. Now we compare DARIMA to ARIMA and ETS in terms of execution time toinvestigate the computational efficiency of DARIMA, as shown in Figure9.The forecasting performance of DARIMA varies with the number of subseries (seeSection4.5 for details), but not with the number of executorsin the Spark system. More specifically, the number of executors determines how many tasks(i.e.subproblems of ARIMA modeling) are assigned to each executor, which greatly affectsthe runtime in individual executors. When it comes to ARIMAmodeling for the whole time series, increasing the number of cores used can also give a significantspeedup as the specification search is done in parallel. However, for a given ultra-longtime series, ETS modeling can not be implemented in a parallel manner.So the execution time of ETS modeling does not change along with the number of virtual cores used.Figure9 shows improved computational efficiency of both ARIMA and DARIMA with increasingnumbers of executors/cores. Besides, DARIMA persistently results in less execution timethan ARIMA and ETS when using more than two executors/cores. In our application, modeling a DARIMAmodel for an ultra-long time series with the length of about 120,000120000120,000120 , 000 takes an average of1.221.221.221.22 minutes with 32323232 cores, while ARIMA modeling takes an average of 5.165.165.165.16minutes and ETS takes an average of 5.385.385.385.38 minutes. Therefore, our approach results in significantlyimproved forecasting accuracy with remarkably less execution time compared to ARIMA and ETS models.

Distributed ARIMA models for ultra-long time series (9)

4.5 Sensitivity Analysis

This subsection focuses on the factors that may affect the forecast performance of thedistributed ARIMA models. In the following analysis, we consider two main factors: thenumber of split subseries and the maximum values of model orders. Other potential factorswill be discussed in Section6.

We first explore the relationship between the forecasting performance of the distributedARIMA models and the number of subseries K𝐾Kitalic_K preset in the preprocessing process, aspresented in Figure10. In essence, the relationship also depicts theimportance of the length of subseries to the functioning of the distributed ARIMAmodels. With the number of subseries K𝐾Kitalic_K varying from 10101010 to 100100100100, there is aconsiderable drop in the MASE values of DARIMA. It then slightly fluctuates when K𝐾Kitalic_K isbetween 100100100100 and 300300300300, and has an enormous growth when K𝐾Kitalic_K equals to350350350350. Subsequently, the MASE values of DARIMA go up and down widely with a largerK𝐾Kitalic_K. Besides, the MSIS of DARIMA shows an overall trend of decreasing first and thenincreasing. Therefore, we conclude that the number of subseries should be controlledwithin a reasonable range, with too few or too many subseries causing poor forecastingperformance. In our study, we should limit the number of subseries between 100100100100 to 300300300300.

Distributed ARIMA models for ultra-long time series (10)

Table3 compares the forecasting performance of DARIMA with that of ARIMAunder different settings of the maximum values of model orders in terms of MASE andMSIS. The maximum value of p+q+P+Q𝑝𝑞𝑃𝑄p+q+P+Qitalic_p + italic_q + italic_P + italic_Q only works for the process of global orderselection. Therefore, when we keep the maximum values of non-seasonal and seasonal partsfixed, the changes in the maximum value of p+q+P+Q𝑝𝑞𝑃𝑄p+q+P+Qitalic_p + italic_q + italic_P + italic_Q result in some changes in theforecasting accuracy of ARIMA, but no changes in that of the DARIMA. If the model ordersare allowed to range more widely, ARIMA achieves better forecasting performance on bothpoint forecasts and prediction intervals. The main reason is that the broader range ofmodel orders provides more possible models in the order selection process. In contrast,DARIMA performs higher MASE when more possible models are provided. One possible reasonfor this result is that allowing more extensive choices of model orders may lead to overfittingfor subseries with short lengths. Moreover, Table3 shows that themaximum values of model orders have a limited impact on forecasting performance: thechanges in performance both for ARIMA and DARIMA gradually disappear as the maximum ordersincreases. We also compare the results using the symmetric mean absolute percentage error(makridakis1993accuracy, 37, sMAPE;) and obtain almost identical results from thosewith MASE. As expected, DARIMA always outperforms ARIMA on different settings of themaximum values of model orders for both point forecasts and prediction intervals.

We proceed by comparing our proposed DARIMA with ARIMA regarding execution time ondifferent settings of the maximum values of model orders, as shown inTable3. The results indicate that DARIMA is more computationallyefficient than ARIMA in multiple settings of the maximum values of model orders. When themodel orders are allowed to take a broader range of values, both DARIMA and ARIMA takemore time modeling the time series. The execution time spent on ARIMA modeling has amarked increase, while DARIMA keeps its modeling time within a reasonable and acceptablerange. For example, DARIMA is 53535353 times more efficient than ARIMA on the setting ofmax.orders being equal to (8,4,10)8410(8,4,10)( 8 , 4 , 10 ). The improved efficiency makes it possible for DARIMA to search for anappropriate model for each subseries in a broader range of candidate models.

Max ordersMethodMASEMSISExecution time
(mins)
(5, 2, 5)ARIMA1.43019.7334.596
DARIMA1.29715.0781.219
(5, 2, 7)ARIMA1.41018.69514.189
DARIMA1.29715.0781.211
(6, 2, 7)ARIMA1.41018.69515.081
DARIMA1.29815.1081.326
(6, 3, 7)ARIMA1.41315.44421.072
DARIMA1.32412.5901.709
(6, 3, 10)ARIMA1.41315.65476.272
DARIMA1.32412.5901.769
(7, 3, 10)ARIMA1.41315.65483.077
DARIMA1.32712.5611.829
(7, 4, 10)ARIMA1.40913.667111.292
DARIMA1.33812.0792.267
(8, 4, 10)ARIMA1.40913.667117.875
DARIMA1.33512.0762.224

5 Numerical Simulations

In this section, we perform a simulation study to further investigate and justify our proposedDARIMA method in terms of forecasting accuracy and computational cost.

5.1 Simulation Setup

We consider daily, hourly, and half-hourly series and generate 1,00010001,0001 , 000 timeseries samples in each case. Each series is generated by an ARIMA(p,d,q)(P,D,Q)m𝑝𝑑𝑞subscript𝑃𝐷𝑄𝑚(p,d,q)(P,D,Q)_{m}( italic_p , italic_d , italic_q ) ( italic_P , italic_D , italic_Q ) start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPTprocess with d𝑑ditalic_d being randomly sampled from Bernoulli(0.90.90.90.9), D𝐷Ditalic_D being randomly sampledfrom Bernoulli(0.40.40.40.4), p𝑝pitalic_p and q𝑞qitalic_q each taking values from a uniform distribution on{0,1,2,3,4,5}012345\{0,1,2,3,4,5\}{ 0 , 1 , 2 , 3 , 4 , 5 }, and P𝑃Pitalic_P and Q𝑄Qitalic_Q each taking values 00, 1111 and 2222 with equalprobability. The periods m𝑚mitalic_m of the simulated series are set to be 7777, 24242424 and 48484848 tomatch daily, hourly and half-hourly time series. For each generated series, theparameters of each process are randomly chosen from the uniform distribution U(2,222-2,2- 2 , 2)over the stationary and invertible space.

We divide each series into three parts: the first m×10𝑚10m\times 10italic_m × 10 observations are discarded asburn-in, the following T𝑇Titalic_T observations are used as a training set for estimating parameters, andthe last H𝐻Hitalic_H observations are used for testing. For daily, hourly, and half-hourly series,T𝑇Titalic_T takes the values 8,00080008,0008 , 000, 100,000100000100,000100 , 000 and 200,000200000200,000200 , 000 respectively, while H𝐻Hitalic_H takes thevalues 100100100100, 2,00020002,0002 , 000 and 4,00040004,0004 , 000 respectively.

Final forecasts are produced using four different methods:(i) distributed ARIMA modeling as introduced in Section3 (DARIMA);(ii) simply averaging the estimated parameters for split subseries when implementing DARIMA (DARIMA_SA);(iii) automatic ARIMA modeling for the whole series with a single model (ARIMA);and (iv) ETS modeling for the whole time series (ETS).For comparison purposes, DARIMA and DARIMA_SA share the samesettings for time series partitions. Specifically, each daily series is partitioned into21212121 subseries with each subseries spanning at least 52525252 weeks, and each hourly andhalf-hourly series is split into 138138138138 subseries so that each subseries spans at least30303030 days. Other settings are consistent with that in the application to electricitydata, see Section4.2 for more details.

5.2 Results

We choose three metrics to evaluate the performance of our proposed method, which are MASE,MSIS, and ACD (absolute coverage difference). To assess prediction intervals, we setα=0.05𝛼0.05\alpha=0.05italic_α = 0.05 (corresponding to 95%percent9595\%95 % prediction intervals). As asupplemental scoring rule, ACD measures the absolute difference between the actual coverageof the target method and the nominal coverage, where the coverage measures the rate at which thetrue values lie inside the prediction intervals the method provides.

Table4 displays the forecasting accuracy of DARIMA as well as three methodsconsidered as benchmarks in this study. The accuracy is reported for short-term (four weeks)and long-term (remaining periods) horizons separately as well as across all forecast horizons.Moreover, the multiple comparisons with the best (koning2005m3, 30, MCB,) test isperformed on each data frequency to identify whether the average ranks, based on MASE andMSIS, of the four models considered are significantly different, as presented in Figure11.If the intervals of two methods do not overlap, this indicates a statistically different performance.

The results indicate that, for daily and half-hourly series, the DARIMA method consistently achievesthe best forecast accuracy in terms of the mean values of MASE and MSIS, especially for long-termforecasting. The corresponding MCB results indicate that DARIMA achieves the best-ranked performanceas well, except that it ranks second in MASE for the half-hourly frequency but without significantlydiffering from the best.

In terms of hourly time series, DARIMA provides better forecasts, measured by the average MASEand MSIS values, than ARIMA and DARIMA_SA, but worse forecasts than ETS. However, the main comparisonwe are interested in is DARIMA versus ARIMA. Moreover, the correspondingMCB results show that DARIMA ranks third but is very close to the top two methods interms of MASE, while it ranks best and displays a clear gap from the remaining ones when MSISis used for conducting the MCB test.A more in-depth analysis shows that DARIMA provides about 20.77%percent20.7720.77\%20.77 % more accurate intervalforecasts than ETS, based on the median of MSIS, with a smaller degree of variation.This demonstrates that DARIMA tends to provide more stable forecasts than the competing methods.Additionally, when there are subseries that are poorly fitted by ARIMA models, the use of DARIMA helps eliminatethe influence of outliers on the forecast performance, which can not be achieved using theDARIMA_SA method.

However, we observe that DARIMA may result in lower-than-nominal coverage andyield ACD values that are higher than ARIMA, but lower than or comparable to DARIMA_SA. The lossof the efficiency of the estimator may be attributed to the simplified treatment ofsetting Σ^ksubscript^Σ𝑘\widehat{\Sigma}_{k}over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT with σ^k2Isuperscriptsubscript^𝜎𝑘2𝐼\hat{\sigma}_{k}^{2}Iover^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I in Section3.2.Furthermore, DARIMA, on average, ranks higher in MSIS than ARIMA, DARIMA_SA, and ETS as shown inFigure11, thus enabling optimal decision making with a comprehensive understandingof the uncertainty and the resulting risks. Thus, we conclude that DARIMA does not only provide stableforecasts across different frequencies, but also achieve improved or at least comparable forecastscompared to the benchmark methods.

DailyHourlyHalf-hourly
ShortLongTotalShortLongTotalShortLongTotal
MASE
DARIMA0.7001.9851.6263.27711.5438.7662.6228.4726.506
DARIMA_SA0.7041.9971.6354.002superscript4.0024.002^{\dagger}4.002 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT12.911superscript12.91112.911^{\dagger}12.911 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT9.917superscript9.9179.917^{\dagger}9.917 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT2.879superscript2.8792.879^{\dagger}2.879 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT9.044superscript9.0449.044^{\dagger}9.044 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT6.973superscript6.9736.973^{\dagger}6.973 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT
ARIMA0.7932.6702.1443.60914.51710.8523.99115.14411.397
ETS0.8382.1431.7782.4427.7645.9768.70524.66319.301
MSIS
DARIMA3.92111.5749.43144.671200.993148.46924.608104.28777.515
DARIMA_SA4.02711.8159.63474.645superscript74.64574.645^{\dagger}74.645 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT256.492superscript256.492256.492^{\dagger}256.492 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT195.391superscript195.391195.391^{\dagger}195.391 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT36.263superscript36.26336.263^{\dagger}36.263 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT134.006superscript134.006134.006^{\dagger}134.006 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT101.164superscript101.164101.164^{\dagger}101.164 start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT
ARIMA4.68320.99316.42662.030490.435346.49190.331710.849502.355
ETS5.68817.71314.34630.671130.10596.696137.094614.441454.053
ACD
DARIMA0.0030.0060.0050.0990.1400.1260.0670.1250.106
DARIMA_SA0.0070.0020.0030.1310.1730.1590.0970.1500.132
ARIMA0.0020.0110.0090.0000.0010.0010.0030.0120.009
ETS0.0040.0320.0240.1070.1400.1290.0390.0860.070

Distributed ARIMA models for ultra-long time series (11)
Distributed ARIMA models for ultra-long time series (12)

In order to investigate the computational cost of our proposed method, we proceed byvisualising the computational time of DARIMA and the three benchmark methods for each datafrequency, as presented in Figure12. It should be noted that the executiontime of DARIMA_SA is not included because there is no significant difference in the execution timebetween DARIMA and DARIMA_SA: they differ only in terms of how to combine the estimated parameterstrained on each subseries, and the time required for the combination step is negligible.Moreover, the execution time of ETS does not change along with the number of virtualcores used, since ETS modeling for a single time series cannot be extended to parallel processing.We observe that, as the length of the time series data of interest increases, both DARIMAand ARIMA take more time modeling the time series. The runtime required for both ARIMAand DARIMA modeling decreases as the number of executors/cores increases. However,DARIMA is computationally more efficient than ARIMA and ETS and keeps its runtime and forecastswithin a reasonable and acceptable range when using more than eight executors.

6 Discussion

Advances in technology have given rise to increasing demand for forecasting time seriesdata spanning a long time interval, which is extremely challenging to achieve. Attempts totackle the challenge by using MapReduce technology typically focus on two mainstreamdirections: combining forecasts from multiple models bendre2019time (3) and splittingthe multi-step forecasting problem into H𝐻Hitalic_H (forecast horizon) subproblemsgalicia2018novel (16). On the other hand, the statistical computation can beimplemented on a distributed system by aggregating the information about local estimatorstransmitted from worker nodes fan2019distributed (14, 58). The approachresults in the combined estimator proven to be statistically as efficient as the globalestimator. Inspired by the solution, this study provides a new way to forecast ultra-longtime series on a distributed system.

One of our developed framework highlights is that the distributed forecasting framework isdedicated to averaging the DGP of subseries to develop a trustworthy global model for timeseries forecasting. To this end, instead of unrealistically assuming the DGP of timeseries data remains invariant over an ultra-long time periodhyndman2018forecasting (22), we customize the optimization process of model parametersfor each subseries by only assuming that the DGP of subseries stays invariant over a shortperiod, and then aggregate these local parameters to produce the combined global model. Inthis way, we provide a complete novel perspective of forecasting ultra-long time series,with significantly improved computational efficiency.

As illustrated in Section3, this study focuses on implementing thedistributed time series forecasting framework using general ARIMA models that allow theinclusion of additional seasonal terms to deal with strong seasonal behavior.Nevertheless, it is also possible to apply the framework with other statistical models,such as state-space models, VAR models, and ETS, as a general loss function is considered.However, special concerns should be given on how to properly convert the local estimatorsto avoid the inefficiency in the combined estimators. In this work, we restrict our attentionto ARIMA models and involve a linear transformation step, in which ARIMA models trained onsubseries are converted into linear representations to avoid the stationary, causality,and invertibility problems that may be caused by directly combining the original parametersof ARIMA models. Similar to ARIMA models, ETS models share the virtue of allowing the trend andseasonal components to vary over time hyndman2002state (25). We hope to shed some lighton using distributed ETS models in the future.

The forecasting performance of the distributed ARIMA models is affected by variousfactors. Two factors, the number of split subseries and the maximum values of modelorders, are taken into consideration as described in Section4.5. Ourresults show that the number of subseries should be limited to a reasonable range toachieve improved performance in point forecasts and prediction intervals. Specifically, werecommend that subseries’ length ranges from 500500500500 to 1200120012001200 for hourly timeseries. Moreover, compared to ARIMA models, a smaller maximum value of model order issufficient for the distributed ARIMA models to fit models for all subseries and obtainimproved forecasting results according to the combined estimators.

Many other potential factors may hold sway over the forecasting performance of ourproposed approach. For example, whether to set an overlap between successive subseries maybe a practical consideration when implementing the proposed distributed forecastingframework. Through repeated surveys, (scott1974analysis, 43, ) explore the effect ofwhether to overlap the random samples at each period on the estimation of populationparameters. They illustrate that considering the overlap between samples offers reductionsin the variance; they also discuss the optimum proportion of overlap. Therefore, webelieve that a study on setting overlap between successive subseries will further improveour framework, and our framework and computer code are generally applicable to such ascenario. To take another example, we may consider adding time-dependent weights for eachsubseries when combining the local estimators delivered from a group of worker nodes. Thetime-dependent weights for subseries help assign higher weights to subseries closer to theforecast origin, while smaller weights to subseries that are further away from theforecast origin.

7 Conclusions

In this paper, we propose a novel framework for ultra-long time series forecasting on adistributed system. Unlike previous attempts in the forecasting literature, this studyfacilitates distributed time series forecasting by taking a weighted average of the localestimators delivered from worker nodes to minimize the global loss function. To this end,an ultra-long time series spanning a long stretch of time is divided into severalsubseries spanning short time periods. Specifically, in this study, we focus onimplementing our proposed framework on ARIMA models to enable the ARIMA estimation ofultra-long time series in a distributed manner.

We investigate the performance of the distributed ARIMA models in both the real dataapplication and the simulations and compare the proposed approach against ARIMA modelsconcerning point forecasts, prediction intervals, and execution time. We find that thedistributed ARIMA models outperform ARIMA models in both point forecasting and uncertaintyestimation. The achieved performance improvements become more pronounced as the forecasthorizon increases. Finally, the comparison of execution time shows that our approach alsoachieves better forecasting performance with improved computational efficiency. We alsoexplore various factors that may affect the forecasting performance of the distributedARIMA models, such as the number of split subseries, the maximum values of model orders,overlap between successive subseries, and time-dependent weights for subseries. To furtherimprove the research on distributed forecasting methods, we suggest some possible researchavenues. For example, it would be meaningful to explore distributed ETS models in thefuture.

Acknowledgments

The authors are grateful to the editors and two anonymous reviewers for helpful commentsthat improved the contents of the paper.

Yanfei Kang is supported the National Natural Science Foundation of China (No. 72171011)and Feng Li is supported by the Emerging Interdisciplinary Project ofCUFE and the Beijing Universities Advanced Disciplines Initiative (No. GJJ2019163). Thisresearch is supported by Alibaba Group through the Alibaba Innovative Research Programand the high-performance computing (HPC) resources at Beihang University.

References

  • (1)Robin Anil et al.“Apache Mahout: Machine Learning on Distributed Dataflow Systems”In Journal of Machine Learning Research 21.127, 2020, pp. 1–6
  • (2) Apache Software Foundation“Apache Spark”, 2020URL: https://spark.apache.org
  • (3)Mininath Bendre and Ramchandra Manthalkar“Time series decomposition and predictive analytics using MapReduce framework”In Expert Systems with Applications 116Elsevier, 2019, pp. 108–120
  • (4)George EP Box, Gwilym M Jenkins, Gregory C Reinsel and Greta M Ljung“Time series analysis: forecasting and control”John Wiley & Sons, 2015
  • (5)Stephen Boyd, Neal Parikh and Eric Chu“Distributed optimization and statistical learning via the alternating direction method of multipliers”In Foundations and Trends in Machine Learning 3.1, 2011, pp. 1–122
  • (6)Peter J Brockwell and Richard A Davis“Introduction to time series and forecasting”Switzerland: Springer International Publishing, 2016
  • (7)Rodrigo N Calheiros, Enayat Masoumi, Rajiv Ranjan and Rajkumar Buyya“Workload prediction using ARIMA model and its impact on cloud applications’ QoS”In IEEE Transactions on Cloud Computing 3.4IEEE, 2014, pp. 449–458
  • (8)Fabio Canova and Bruce E Hansen“Are seasonal patterns constant over time? A test for seasonal stability”In Journal of Business & Economic Statistics 13.3Taylor & Francis Group, 1995, pp. 237–252
  • (9)Xi Chen, Weidong Liu and Yichen Zhang“Quantile regression under memory constraint”In Annals of Statistics 47.6Institute of Mathematical Statistics, 2019, pp. 3244–3273
  • (10)Angelo Coluccia and Giuseppe Notarstefano“A Bayesian framework for distributed estimation of arrival rates in asynchronous networks”In IEEE Transactions on Signal Processing 64.15IEEE, 2016, pp. 3984–3996
  • (11)Srinjoy Das and Dimitris N Politis“Predictive inference for locally stationary time series with an application to climate data”In Journal of the American Statistical Association 116.534Taylor & Francis, 2020, pp. 919–934
  • (12)Jianqing Fan, Yingying Fan and Jinchi Lv“High dimensional covariance matrix estimation using a factor model”In Journal of Econometrics 147.1Elsevier, 2008, pp. 186–197
  • (13)Jianqing Fan, Yuan Liao and Martina Mincheva“High dimensional covariance matrix estimation in approximate factor models”In Annals of Statistics 39.6NIH Public Access, 2011, pp. 3320
  • (14)Jianqing Fan, Dong Wang, Kaizheng Wang and Ziwei Zhu“Distributed estimation of principal eigenspaces”In Annals of Statistics 47NIH Public Access, 2019, pp. 3009–3031
  • (15)Jianqing Fan and Wenyang Zhang“Statistical methods with varying coefficient models”In Statistics and its Interface 1NIH Public Access, 2008, pp. 179–195
  • (16)Antonio Galicia, José F Torres, Francisco Martínez-Álvarez and A Troncoso“A novel spark-based multi-step forecasting algorithm for big data time series”In Information Sciences 467Elsevier, 2018, pp. 800–818
  • (17)Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung“The Google file system”In Proceedings of the nineteenth ACM symposium on Operating systems principles, 2003, pp. 29–43
  • (18)Tilmann Gneiting and Adrian E Raftery“Strictly proper scoring rules, prediction, and estimation”In Journal of the American Statistical Association 102.477Taylor & Francis, 2007, pp. 359–378
  • (19)Carla Gonçalves, Ricardo J Bessa and Pierre Pinson“A critical overview of privacy-preserving approaches for collaborative forecasting”In International Journal of Forecasting 37.1Elsevier, 2021, pp. 322–342
  • (20)Tao Hong, Jingrui Xie and Jonathan Black“Global energy forecasting competition 2017: Hierarchical probabilistic load forecasting”In International Journal of Forecasting 35.4Elsevier, 2019, pp. 1389–1399
  • (21)Rob J Hyndman, Roman A Ahmed, George Athanasopoulos and Han Lin Shang“Optimal combination forecasts for hierarchical time series”In Computational Statistics & Data Analysis 55.9Elsevier, 2011, pp. 2579–2589
  • (22)Rob J Hyndman and George Athanasopoulos“Forecasting: principles and practice”OTexts, 2021URL: https://OTexts.com/fpp3
  • (23)Rob J Hyndman and Yeasmin Khandakar“Automatic Time Series Forecasting: The forecast Package for R”In Journal of Statistical Software 27jstatsoft.org, 2008, pp. 1–22DOI: 10.18637/jss.v027.i03
  • (24)Rob J Hyndman and Anne B Koehler“Another look at measures of forecast accuracy”In International Journal of Forecasting 22.4Elsevier, 2006, pp. 679–688
  • (25)Rob J Hyndman, Anne B Koehler, Ralph D Snyder and Simone Grose“A state space framework for automatic forecasting using exponential smoothing methods”In International Journal of Forecasting 18.3Elsevier, 2002, pp. 439–454
  • (26)Michael I Jordan, Jason D Lee and Yun Yang“Communication-Efficient Distributed Statistical Inference”In Journal of the American Statistical Association 114.526Taylor & Francis, 2019, pp. 668–681
  • (27)Mirko Kämpf and Jan W Kantelhardt“Hadoop.TS: large-scale time-series processing”In International Journal of Computer Applications 74.17Foundation of Computer Science, 2013, pp. 1–8
  • (28)Yanfei Kang, Rob J Hyndman and Feng Li“GRATIS: GeneRAting TIme Series with diverse and controllable characteristics”In Statistical Analysis and Data Mining 13, 2020, pp. 354–376
  • (29)Ariel Kleiner, Ameet Talwalkar, Purnamrita Sarkar and Michael I Jordan“A scalable bootstrap for massive data”In Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76.4Wiley Online Library, 2014, pp. 795–816
  • (30)Alex J Koning, Philip Hans Franses, Michele Hibon and Herman O Stekler“The M3 competition: Statistical tests of the results”In International Journal of Forecasting 21.3Elsevier, 2005, pp. 397–409
  • (31)Denis Kwiatkowski, Peter CB Phillips, Peter Schmidt and Yongcheol Shin“Testing the null hypothesis of stationarity against the alternative of a unit root”In Journal of Econometrics 54.1-3, 1992, pp. 159–178
  • (32)Jason D Lee, Qiang Liu, Yuekai Sun and Jonathan E Taylor“Communication-efficient sparse regression”In Journal of Machine Learning Research 18.5, 2017, pp. 1–30
  • (33)Lei Li, Farzad Noorian, Duncan JM Moss and Philip HW Leong“Rolling window time series prediction using MapReduce”In Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014), 2014, pp. 757–764IEEE
  • (34)Xixi Li, Yanfei Kang and Feng Li“Forecasting with time series imaging”In Expert Systems with Applications 160, 2020, pp. 113680
  • (35)Qiang Liu and Alexander Ihler“Distributed estimation, information loss and exponential families”In Advances in Neural Information Processing Systems, 2014, pp. 1098–1106
  • (36)Dougal Maclaurin and Ryan Prescott Adams“Firefly Monte Carlo: Exact MCMC with subsets of data”In Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015
  • (37)Spyros Makridakis“Accuracy measures: theoretical and practical concerns”In International Journal of Forecasting 9.4Elsevier, 1993, pp. 527–529
  • (38)Spyros Makridakis, Evangelos Spiliotis and Vassilios Assimakopoulos“The M4 Competition: 100,000 time series and 61 forecasting methods”In International Journal of Forecasting 36.1Elsevier, 2020, pp. 54–74
  • (39)Xiangrui Meng et al.“MLlib: Machine learning in apache spark”In Journal of Machine Learning Research 17.1JMLR. org, 2016, pp. 1235–1241
  • (40)Pablo Montero-Manso, George Athanasopoulos, Rob J Hyndman and Thiyanga S Talagala“FFORMA: Feature-based forecast model averaging”In International Journal of Forecasting 36.1Elsevier, 2020, pp. 86–92
  • (41)Rui Pan et al.“A Note on Distributed Quantile Regression by Pilot Sampling and One-Step Updating”In Journal of Business and Economic Statistics In Press, 2021DOI: 10.1080/07350015.2021.1961789
  • (42)Fotios Petropoulos et al.“Forecasting: theory and practice”In International Journal of Forecasting, 2022
  • (43)AJ Scott and TMF Smith“Analysis of repeated surveys using time series methods”In Journal of the American Statistical Association 69.347Taylor & Francis Group, 1974, pp. 674–678
  • (44)Ohad Shamir, Nati Srebro and Tong Zhang“Communication-efficient distributed optimization using an approximate Newton-type method”In International Conference on Machine Learning, 2014, pp. 1000–1008PMLR
  • (45)Han Lin Shang and Rob J Hyndman“Grouped functional time series forecasting: An application to age-specific mortality rates”In Journal of Computational and Graphical Statistics 26.2Taylor & Francis, 2017, pp. 330–343
  • (46)Benedikt Sommer, Pierre Pinson, Jakob W Messner and David Obst“Online distributed learning in wind power forecasting”In International Journal of Forecasting 37.1Elsevier, 2021, pp. 205–223
  • (47)Marc A Suchard et al.“Understanding GPU programming for statistical computation: Studies in massively parallel massive mixtures”In Journal of Computational and Graphical Statistics 19.2Taylor & Francis, 2010, pp. 419–438
  • (48)R Talavera-Llames, Rubén Pérez-Chacón, A Troncoso and Francisco Martínez-Álvarez“Big data time series forecasting based on nearest neighbours distributed computing with Spark”In Knowledge-Based Systems 161Elsevier, 2018, pp. 12–25
  • (49)Andrew S Tanenbaum and Maarten Van Steen“Distributed systems: principles and paradigms”Prentice-Hall, 2007
  • (50)Ruey S Tsay“Analysis of financial time series”Hoboken, NJ, USA: John Wiley & Sons, 2010
  • (51)Stanislav Volgushev, Shih-Kang Chao and Guang Cheng“Distributed inference for quantile regression processes”In Annals of Statistics 47.3Institute of Mathematical Statistics, 2019, pp. 1634–1662
  • (52)Jialei Wang, Mladen Kolar, Nathan Srebro and Tong Zhang“Efficient distributed learning with sparsity”In International Conference on Machine Learning, 2017, pp. 3636–3645PMLR
  • (53)Xiangyu Wang and David B Dunson“Parallelizing MCMC via Weierstrass sampler”In arXiv preprint arXiv:1312.4605, 2013
  • (54)Xiaoqian Wang, Yanfei Kang, Fotios Petropoulos and Feng Li“The uncertainty estimation of feature-based forecast combinations”In Journal of the Operational Research SocietyTaylor & Francis, 2021DOI: 10.1080/01605682.2021.1880297
  • (55)K.V. Yuen“Bayesian methods for structural dynamics and civil engineering”Singapore: John Wiley & Sons, 2010
  • (56)Yuchen Zhang, John Duchi and Martin Wainwright“Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates”In Journal of Machine Learning Research 16.1JMLR. org, 2015, pp. 3299–3340
  • (57)Yuchen Zhang, John C Duchi and Martin J Wainwright“Communication-efficient algorithms for statistical optimization”In Journal of Machine Learning Research 14.1JMLR. org, 2013, pp. 3321–3363
  • (58)Xuening Zhu, Feng Li and Hansheng Wang“Least-Square Approximation for a Distributed System”In Journal of Computational and Graphical Statistics 30.4, 2021, pp. 1004–1018DOI: 10.1080/10618600.2021.1923517

Appendix A Background of Distributed Systems

A distributed system, usually used for distributed computing, is a system with a group ofinteracting computing nodes connected by a network tanenbaum2007distributed (49). Theseautonomous computers share resources, work together, and coordinate their activities tofulfill specified tasks, just like a single computer via a MapReduce framework. Whendealing with large-scale problems, distributed systems provide a new solution that sendsthe computing code to each computer node where data are also distributed stored. TheMapReduce is short for the “move-code-to-data” computing architecture that enables us toscale horizontally by adding more computing nodes, rather than scale vertically, byupgrading a single node’s hardware.

Inspired by the Google File System paper ghemawat2003google (17) that describesGoogle’s algorithm for distributed data-intensive applications,Hadoop ecosystem has been developed in the data science communityas an open-source platform that allows for thedistributed storage and processing of large-scale data sets. Such an ecosystem is thede-facto standard for large scale distributed computing in data analyticssectors. Nonetheless, the existing distributed systems equip with machine learning libraries meng2016mllib (39, 1) butlack forecasting support. Forecasters have to make unrealisticindependence assumptions for modeling large scale time series data to fit in theecosystem. Developing and integrating forecasting methods into such distributed systems hasgreat potential.

Apache Sparkspark (2) is the most popular distributed execution engine used for bigdata processing in the distributed ecosystem. With in-memory processing, Spark does notspend excess time moving data in and out of the disk, which achieves significantly faster(up to 100100100100 times) computation. Besides, Spark supports computer languages (e.g., Java, Scala, R, andPython) that are widely used in the machine learning and forecasting domains, making itdeveloper-friendly. Spark also offers a stack of libraries, such as MLlib for machinelearning, Spark Streaming for real-time processing, Spark SQL for interactive queries, andGraphX for graph processing, which provides easy-to-use analytics in many cases.

Appendix B Pseudocode for Distributed Forecasting

This section provides the pseudocode for Mapper and Reducer of the proposed distributedforecasting approach in Section3.

key,value𝑘𝑒𝑦𝑣𝑎𝑙𝑢𝑒\big{\langle}key,value\big{\rangle}⟨ italic_k italic_e italic_y , italic_v italic_a italic_l italic_u italic_e ⟩

key,(θ^,Σ^)𝑘𝑒𝑦^𝜃^Σ\big{\langle}key,(\widehat{\theta},\widehat{\Sigma})\big{\rangle}⟨ italic_k italic_e italic_y , ( over^ start_ARG italic_θ end_ARG , over^ start_ARG roman_Σ end_ARG ) ⟩

1:Start

2:yttime series datasubscript𝑦𝑡time series datay_{t}\leftarrow\text{time series data}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ← time series data

3:Ttime series length𝑇time series lengthT\leftarrow\text{time series length}italic_T ← time series length

4:Knumber of split subseries𝐾number of split subseriesK\leftarrow\text{number of split subseries}italic_K ← number of split subseries

5:indexindex vector of the assigned subseriesindexindex vector of the assigned subseries\text{index}\leftarrow\text{index vector of the assigned subseries}index ← index vector of the assigned subseries

6:

7:n=floor(T/K)𝑛floor𝑇𝐾n=\text{floor}(T/K)italic_n = floor ( italic_T / italic_K )

8:fori𝑖iitalic_i in indexdo

9:lbound=n×(i1)+1lbound𝑛𝑖11\text{lbound}=n\times(i-1)+1lbound = italic_n × ( italic_i - 1 ) + 1

10:ubound=ifelse(iK,T,n×i)uboundifelse𝑖𝐾𝑇𝑛𝑖\text{ubound}=\text{ifelse}(i\geq K,T,n\times i)ubound = ifelse ( italic_i ≥ italic_K , italic_T , italic_n × italic_i )

11:y=yt[lbound,ubound]𝑦subscript𝑦𝑡lbounduboundy=y_{t}[\text{lbound},\text{ubound}]italic_y = italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT [ lbound , ubound ] \triangleright Step 1111

12:fit=model(y)fitmodel𝑦\text{fit}=\text{model}(y)fit = model ( italic_y ) \triangleright Step 2222

13:fit=model.to.linear(fit)superscriptfitmodel.to.linearfit\text{fit}^{{}^{\prime}}=\text{model.to.linear}(\text{fit})fit start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT = model.to.linear ( fit ) \triangleright Step 3333

14:θ^=fit.coefformulae-sequence^𝜃superscriptfitcoef\widehat{\theta}=\text{fit}^{{}^{\prime}}.\text{coef}over^ start_ARG italic_θ end_ARG = fit start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT . coef

15:Σ^=fit.var.coefformulae-sequence^Σsuperscriptfitvar.coef\widehat{\Sigma}=\text{fit}^{{}^{\prime}}.\text{var.coef}over^ start_ARG roman_Σ end_ARG = fit start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT . var.coef

16:endfor

17:Stop

key,list(θ^,Σ^)keylist^𝜃^Σ\big{\langle}\text{key},\text{list}(\widehat{\theta},\widehat{\Sigma})\big{\rangle}⟨ key , list ( over^ start_ARG italic_θ end_ARG , over^ start_ARG roman_Σ end_ARG ) ⟩

key,forecvalueskeyforecvalues\big{\langle}\text{key},\text{forecvalues}\big{\rangle}⟨ key , forecvalues ⟩

1:Start

2:yttime series datasubscript𝑦𝑡time series datay_{t}\leftarrow\text{time series data}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ← time series data

3:Hforecast horizon𝐻forecast horizonH\leftarrow\text{forecast horizon}italic_H ← forecast horizon

4:levelconfidence levels for prediction intervals𝑙𝑒𝑣𝑒𝑙confidence levels for prediction intervalslevel\leftarrow\text{confidence levels for prediction intervals}italic_l italic_e italic_v italic_e italic_l ← confidence levels for prediction intervals

5:

6:fit~=Comb.method(list(θ^,Σ^))~fitComb.methodlist^𝜃^Σ\widetilde{\text{fit}}=\text{Comb.method}\big{(}\text{list}(\widehat{\theta},%\widehat{\Sigma})\big{)}over~ start_ARG fit end_ARG = Comb.method ( list ( over^ start_ARG italic_θ end_ARG , over^ start_ARG roman_Σ end_ARG ) ) \triangleright Step 4444

7:forec=forecast(fit~,yt,H,level)forecforecast~fitsubscript𝑦𝑡𝐻level\text{forec}=\text{forecast}\big{(}\widetilde{\text{fit}},y_{t},H,\text{level}%\big{)}forec = forecast ( over~ start_ARG fit end_ARG , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_H , level ) \triangleright Step 5555

8:pred=forec.meanpredforec.mean\text{pred}=\text{forec.mean}pred = forec.mean \triangleright point forecasts

9:lower=forec.lowerlowerforec.lower\text{lower}=\text{forec.lower}lower = forec.lower \triangleright lower bound of prediction intervals

10:upper=forec.upperupperforec.upper\text{upper}=\text{forec.upper}upper = forec.upper \triangleright upper bound of prediction intervals

11:Stop

Distributed ARIMA models for ultra-long time series (2024)
Top Articles
Latest Posts
Article information

Author: Allyn Kozey

Last Updated:

Views: 5879

Rating: 4.2 / 5 (63 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Allyn Kozey

Birthday: 1993-12-21

Address: Suite 454 40343 Larson Union, Port Melia, TX 16164

Phone: +2456904400762

Job: Investor Administrator

Hobby: Sketching, Puzzles, Pet, Mountaineering, Skydiving, Dowsing, Sports

Introduction: My name is Allyn Kozey, I am a outstanding, colorful, adventurous, encouraging, zealous, tender, helpful person who loves writing and wants to share my knowledge and understanding with you.