Korea Spring User Group27 Nov 2014
Spring BatchA Quickstart guide to running batch plications with Spring
최 찬영 ( 주 )에듀앤텍[email protected]
2
Agenda
●Batch processing
●Spring Batch high-level overview
●Quick start using Spring Batch
●Batch Specification Language
●General Principles and Guidelines
3
What are the Batch characteristics
● Long-running
– Often outside office hours
● Non-interactive
– Often include logic for handling errors or restarts
● Process large volumes of data
– More than fits in memory or a single transaction
4
Batch processing
● Close of business processing
– Order processing– Business reporting– Account reconciliation
● Import/export handling
– a.k.a. ETL jobs (Extract‐Transform‐Load)– Instrument/position import– Data warehouse synchronization
● Large-scale output jobs
– Loyalty scheme emails– Bank statements
Batch Domain● The batch domain adds some value to a plain
business process by introducing new concepts:
– A job has an identity – defines what needs to be done– A job has steps– A job instance can be restarted after a failure – a new execution
– Each execution has a start time, stop time, status– The job instance has an overall status– Each execution can tell us how many items were processed, how many commits, rollbacks, skips
● Add value through robustness, reliability, traceability (SLA)
“The EndOfDay Job”
“The EndOfDay Job for 2014/11/27”
“ The first attempt at EndOfDay Job for 2014/11/27”
6
Batch Applications for the Java
API for robust batch processing targeted to Java EE, Java
● ItemReader class is designed to consume a chunk of the processing data (usually a single record);
● ItemProcessor, for which business and domain logic is to be imposed upon the chunk;
● ItemWriter, to which records will be delegated post-processing, and thereafter aggregated
JobOperator Job Step
Job Repository
ItemProcessor
ItemReader
ItemWriter
7
Sample: Import flat files to database
Item
Rea
der
File
Database
Item
Wri
ter
Step
Item
Pro
cess
or
8
Job Configuration
<job id="myJob"> <step name="myStep"> <tasklet> <chunk reader="myItemReader" processor="myItemProcessor" writer="myItemWriter" commit-interval="100" /> </tasklet> </step></job> <bean id="myItemReader" class="...MyItemReader" />
<bean id="myItemProcessor"class="...MyItemProcessor" />
<bean id="myItemWriter" class="...MyItemWriter" />
9
Batch Applications with the Java Config
@Bean public ItemReader<Person> reader() { FlatFileItemReader<Person> reader = return reader;
@Bean public ItemProcessor<Person, Person> processor() { return new PersonItemProcessor();
@Bean public ItemWriter<Person> writer(DataSource dataSource) { JdbcBatchItemWriter<Person> writer = new JdbcBatchItemWriter<Person>(); ... return writer;
@Bean public Step step1(StepBuilderFactory stepBuilder,
ItemReader<Person> reader, ItemWriter<Person> writer, ItemProcessor<Person, Person> processor) { return stepBuilder.get("step1") .<Person, Person>chunk(10) .reader(reader) .processor(processor) .writer(writer) .build(); }
● Application developers have clear, reusable interfaces for constructing batch style applications.
● Job writers have a powerful expression language for how to execute the steps of a batch execution.
● Solution integrators have a runtime API for initiating and controlling batch execution.
● a programming model
● a job specification language
● a batch runtime
● Spring Batch make available a framework for building, deploying, and running batch applications. Spring Batch has influenced JSR 352 and it addresses three critical concerns:
11
Batch Applications for the Java Platform
Batch Applications for the Java Platform, known also as JSR-352, offers application developers a model for developing robust batch processing systems. The core of this programming model is a development pattern borrowed from Spring Batch, coined the Reader-Processor-Writer pattern, in which developers are encouraged to embrace a Chunk-oriented processing standard.
12
Batch Programming Artifact Overview
JSR 352 – Codifies key batch programming constructs‐ Reader, Processor, Writer, Listener, more...‐ Btch runtime orchestrate flow based on well known patterns
Chunk-Oriented Processing
● Input-output can be grouped together● Input collects Items before outputting:
Chunk-Oriented Processing● Optional ItemProcessor
Delegate business logicChunk with size N
14
Batch Usage Patterns
JobLauncher
JobLauncher
start()
JobExecution
Job
execute()
Business
ExitStatus
Client
With ExitStatus.COMPLETED or FAILED
doStuff()
Done
16
More Readers and Writers
● Spring Batch provides many implementations of ItemReader and ItemWriter, e.g.
– Flat files
– XML
– JDBC: cursor & driving query
– Hibernate
– JMS
● Some simple jobs can be implemented with off-the-shelf components
Run Tier is concerned with the scheduling and launching of the application. A vendor product is typically used in this tier to allow time-based and interdependent scheduling of batch jobs as well as providing parallel processing capabilities.Job Tier is responsible for the overall execution of a batch job. It sequentially executes batch steps, ensuring that all steps are in the correct state and all appropriate policies are enforced.Application Tier contains components required to execute the program. It contains specific modules that address the required batch functionality and enforces policies around a module execution (e.g., commit intervals, capture of statistics, etc.)Data Tier provides the integration with the physical data sources that might include databases, files, or queues. Note: In some cases the Job tier can be completely missing and in other cases one job script can start several batch job instances.
General Principles and Guidelines● A batch architecture typically affects on-line architecture
and vice versa. Design with both architectures and environments in mind using common building blocks when possible.
● Simplify as much as possible and avoid building complex logical structures in single batch applications.
● Process data as close to where the data physically resides as possible or vice versa (i.e., keep your data where your processing occurs).
● Minimize system resource use, especially I/O. Perform as many operations as possible in internal memory.
● Review application I/O (analyze SQL statements) to ensure that unnecessary physical I/O is avoided. In particular, the following four common flaws need to be looked for:
19
General Principles and Guidelines
● Allocate enough memory at the beginning of a batch application to avoid time-consuming reallocation during the process.
● Always assume the worst with regard to data integrity. Insert adequate checks and record validation to maintain data integrity.
● Implement checksums for internal validation where possible. For example, flat files should have a trailer record telling the total of records in the file and an aggregate of the key fields.
● Plan and execute stress tests as early as possible in a production-like environment with realistic data volumes.
Questions ?
21
Reference
● http://docs.spring.io/spring-batch/batch-principles-gu
● http://docs.spring.io/spring-batch/faq.html
● http://docs.spring.io/spring-batch-core/index.html
● http://docs.spring.io/spring-batch-admin/reference/re
● http://spring.io/guides/gs/batch-processing/
https://github.com/spring-projects/spring-batchhttps://github.com/spring-guides/gs-batch-processing
Spring Batch Admin
● Sub project of Spring Batch● Provides Web UI and RESTFul interface to
manage batch processes
http://static.springsource.org/spring-batch-admin/index.html
● Manager, Resources, Sample WAR– Deployed with batch job(s) as single app to be able to
control & monitor jobs
– Or monitors external jobs only via shared database
Home Page
Registered Jobs
Launching Jobs
Details for Job Execution
28
General Principles and Guidelines
● There are a great many extension points in Spring Batch for the framework developer (as opposed to the implementor of business logic). Clients are expected to create their own more specific strategies that can be plugged in to control things like commit intervals (CompletionPolicy), rules about how to deal with exceptions (ExceptionHandler), and many others.
● Generally you can expect anything at the top level of the source tree in packages org.springframework.batch.* to be public, but not necessarily sub-classable. Extending the concrete implementations of most strategies is discouraged in favour of a composition or forking approach. If your code can use only the interfaces from Spring Batch, that gives you the greatest possible portability.
29
General Principles and Guidelines
● A specific implementation of the Step deals with the concern of breaking apart the business logic and sharing it efficiently between parallel processes or processors (see PartitionStep ).
● There are a number of technologies that could play a role here. The essence is just a set of concurrent remote calls to distributed agents that can handle some business processing.
● One implementation that we have had some experience with is a set of remote web services handling the business processing. We send a specific range of primary keys for the inputs to each of a
number of remote calls.