北京富士通研发中心实习报告邱诚

北京富士通研发中心实习报告

邱诚

报告主题富士通的工作

Auto-Regressive and Moving Average Model (ARMA)介绍

RHadoop介绍

富士通的工作

研究数据选择方式； TBSC 均值法指示性片段

优化 ARMA模型和 SVR模型；动态结合 ARMA模型和 SVR模型；

均值法描述基本步骤

查找与预测天 1~9点的欧式距离最接近的五天；将所得到的五天通过 10~20点的欧式距离进行展；将前两步得到的全部天通过 k-means聚成两类；挑选预测天之前最接近的同一工作日作为判定天，和两个聚类中心计算欧式距离，挑选距离较小的聚类；

将所得聚类中的各天求平均值作为预测结果。

ARMA 模型介绍

ARMA模型原理

ARMA模型优化

R 中 ARMA模型的使用

ARMA 基本原理Auto-Regressive model

Moving Average model

0

10

20

30

40

50

60

70

80

90

100

X1X0 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12

ARMA 基本原理

自回归模型描述的是当前值与历史值之间的关系；

滑动平均模型描述的是自回归部分的误差累计；

ARMA模型就是通过将自回归模型的预测值与累计误差相结合；

ARMA 模型的优化

Akaike’s Information Criterion (AIC)

AIC, Bias Corrected (AICc)

Bayesian Information Criterion (BIC)

以上优化都是针对通过最大似然估计进行拟合得到的ARMA模型

AIC 优化指标

：代表最大似然；

：代表模型的参数个数；

R 中 ARMA 模型的使用

arima

auto.arima

arima 函数arima ( x,

order = c(0, 0, 0),

seasonal = list(order = c(0, 0, 0), period = NA),

xreg = NULL,

include.mean = TRUE,

transform.pars = TRUE,

fixed = NULL,

init = NULL,

method = c("CSS-ML", "ML", "CSS"),

n.cond,

optim.method = "BFGS",

optim.control = list(),

kappa = 1e6

)

R 中 arima 参数说明

auto.arima 函数auto.arima( x,

d=NA, D=NA, max.p=5, max.q=5, max.P=2, max.Q=2, max.order=5, start.p=2, start.q=2, start.P=1, start.Q=1, stationary=FALSE, ic=c("aicc","aic", "bic"), stepwise=TRUE, trace=FALSE, approximation=(length(x)>100 | frequency(x)>12), xreg=NULL, test=c("kpss","adf","pp"), seasonal.test=c("ocsb","ch"), allowdrift=TRUE, lambda=NULL, parallel=FALSE, num.cores=NULL

)

Nowadays, we have lots of data. BIG DATA!

What is R?

What is R?

Why R?

Why R?

What need?

There is a need for more than counts and averages on these big data sets

Analyzing all of the data can lead to insights that sampling or subsets can’t reveal

Why R and Hadoop?

Why R and Hadoop?

Why R and Hadoop?

Why R and Hadoop?

RHadoop 介绍

Rhadoop 用途 The open-source RHadoop project makes it

easier to extract data from Hadoop for analysis with R, and to run R within the nodes of the Hadoop cluster -- essentially, to transform Hadoop into a massively-parallel

statistical computing cluster based on R.

Rhadoop

rhdfs

Manipulate HDFS directly from R

Mimic as much of the HDFS Java API as possible

rhdfs Functions

rmr

Designed to be the simplest and most elegant way to write MapReduce programs

Gives the R programmer the tools necessary to perform data analysis in a way that is “R” like

Provides an abstraction layer to hide the implementation details

rmr mapreduce Function

Thank you!

Documents

北京富士通研发中心实习报告 邱 诚

北京富士通研发中心实习报告邱诚