Upload
nrb
View
101
Download
0
Embed Size (px)
Citation preview
®
IBM Software Group
© 2015 IBM Corporation
SIMD
Spring 2015
Peter Elderon
IBM Software Group | Rational software
2
SIMD
Overview
PL/I use in SEARCH and VERIFY
PL/I use in INLIST
Possible PL/I use in the future
®
IBM Software Group
© 2013 IBM Corporation
Overview
IBM Software Group | Rational software
4
SIMD/Vector Overview
SIMD – Single Instruction Multiple Data
Also referred to as Vector Instructions
Each vector contains multiple data elements of a fixed size:
16 bytes
8 halfwords
4 fullwords
2 doublewords
1 quadword
So each vector is 16 bytes long
There are 32 vector registers, named V0, V1, …, V31
IBM Software Group | Rational software
5
SIMD/Vector Overview
Vector instructions can operate on all of the elements in one or more vectors
So, a vector add of V1 and V2 as 16 one-byte integers into V0 would perform 16 adds in one instruction
+ + + ++ + + + + + + ++ + + +V1
V2
V0
IBM Software Group | Rational software
6
SIMD/Vector Overview
And a vector add of V1 and V2 as 8 2-byte integers would perform 8 adds in one instruction
Bits in the instruction encode the element size, and the instruction mnemonic will reveal it as well
+ ++ + + ++ +V1
V2
V0
IBM Software Group | Rational software
7
SIMD/Vector Overview
Most vector instructions do not set/change the condition code
Instead a vector instruction can be used to extract a result
Vector loads and stores
Can handle any byte alignment
Are most efficient with 8 byte boundaries
There is also a vector load instruction that will load only those bytes up to the next 4K page boundary
Very useful in handling null-terminated strings
IBM Software Group | Rational software
8
Overlaid Vector/Floating-Point registers
The 32 vector registers overlay the 16 FPRs
Bits 0:63 of SIMD registers 0-15 will correspond to FPRs 0-15
When using an FPR, bits 64:127 of the corresponding vector register will become unpredictable
FPRs
Vectors
15
63
0
310 127
Bits
Regis
ter
IBM Software Group | Rational software
9
Application Considerations
Be very aware that any use of a FPR will change all 16 bytes of the corresponding VR
Linkage Convention (caller may assume across a call)
– VRs 0 to 7 are volatile
– VRs 8 to 15
– bytes 0-7 are non-volatile
– bytes 8-15 are volatile
– VRs 16 to 23 are non-volatile
– VRs 24 to 31 are volatile
IBM Software Group | Rational software
10
Instruction Overview
There are 4 classes of vector instructions
Support instructions
Integer instructions
Floating-point instructions
String instructions
The (many) integer instructions allow for add, multiply, compare, logical and, shifts, etc etc
Currently not exploited by PL/I
The floating-point instructions support only IEEE float binary
hence of little use to PL/I (or COBOL)
IBM Software Group | Rational software
11
String instructions - VFAE
Vector Find Any Equal
VFAE v1,v2,v3,m4,m5
Compares v2 to v3 from left to right looking for an element in v2 equal to any of the elements in v3
Stores the byte index (0-15) of the leftmost hit or 16 if none in v1
m4 is a 4-bit nibble indicating the element size
0 – byte for 16 * 16 compares
1 – halfword for 8 * 8 compares
2 – word for 4 * 4 compares
m5 is a 4-bit nibble providing some variations
IBM Software Group | Rational software
12
String instructions - VFAE
Vector Find Any Equal
VFAE v1,v2,v3,m4,m5
if m5 is ‘1…’b, then
Compares v2 to v3 from left to right looking for an element in v2 not equal to any of the elements in v3
Stores the byte index (0-15) of the leftmost hit or 16 if none in v1
So, m5 equal to ‘0…’b is useful for SEARCH
And, m5 equal to ‘1…’b is useful for VERIFY
When m5 is ‘1…’b, the instruction is also known as
Vector Find Any Not Equal
IBM Software Group | Rational software
13
String instructions - VFAE
Vector Find Any Equal
VFAE v1,v2,v3,m4,m5
if m5 is ‘..1.’b, then
The comparison will stop if an element in v2 is equal to zero
Stores the byte index (0-15) of the leftmost hit or 16 if none in v1
So, m5 equal to ‘..1.’b is useful for VARYINGZ strings
When m5 is ‘..1.’b, the instruction is also known as
Vector Find Any Equal or Zero
Vector Find Any Not Equal or Zero
IBM Software Group | Rational software
14
String instructions - VSTRC
Vector String Range Compare
VSTRC v1,v2,v3,v4,m5,m6
Compares each element in v2 to all the ranges specified by the corresponding elements in the even-odd element pairs in v3 with the comparison done according to the indicator bits in the even odd element pairs in v4
Stores the byte index (0-15) of the leftmost hit or 16 if none in v1
m5 is a 4-bit nibble indicating the element size
0 – byte for 16 sets of 8 range compares
1 – halfword for 8 sets of 4 range compares
2 – word for 4 sets of 2 range compares
m6 is a 4-bit nibble providing variations a la VFAE
IBM Software Group | Rational software
15
String instructions - VSTRC
Typically the even-odd element pairs defining the comparisons will be constants indicating GE and LE
But they could consist of EQ and EQ for a degenerate range
Duplicate ranges are allowed
The number of range pairs may also be less than fills the vector
This instruction is very useful for tests of WCHAR
a la VFAE, the m6 value determines
if it implements SEARCH or VERIFY
if it also looks for a terminating zero (for VARYINGZ)
®
IBM Software Group
© 2013 IBM Corporation
PL/I use in SEARCH and VERIFY
IBM Software Group | Rational software
17
SEARCH and VERIFY of CHAR
SEARCH and VERIFY of CHAR are done inline if arch < 11
But the old TRT instruction is used only if the first argument is
NONVARYING with length known at compile time
Otherwise the characters are tested one at a time
But with the vector instructions, we do much better
IBM Software Group | Rational software
18
SEARCH and VERIFY of CHAR
For example, this simple code tests if a VARYING CHAR string is hex
ishex: proc( s );
dcl s char(*) var;
dcl x char value( '0123456789ABCDEF' );
dcl sx fixed bin(31);
sx = verify( s, x );
if sx > 0 then
It can now be done with a loop of vector find-any-not-equal
IBM Software Group | Rational software
19
SEARCH and VERIFY of CHAR
Namely with a series of FANE testing up to 16 bytes at a time
E720 1000 0006 VL v2,+CONSTANT_AREA(,r1,0)
E3F0 E000 0095 LLH r15,_shadow2(,r14,0)
4120 E002 LA r2,#AddressShadow(,r14,2)
ECFC 003F 007E CIJNH r15,H'0',@1L9
@1L5 DS 0H
A7FE 0010 CHI r15,H'16'
4140 0010 LA r4,16
B9F2 404F LOCRL r4,r15
B9FA 00E2 ALRK r14,r2,r0
E704 E000 0037 VLL v0,r4,_shadow1(r14,0)
E700 2080 0082 VFAE v0,v0,v2,b'0000',b'1000'
E7E0 0001 2021 VLGV r14,v0,1,2
EC4E 000C 2076 CRJH r4,r14,@1L6
A7FA FFF0 AHI r15,H'-16'
A70A 0010 AHI r0,H'16'
ECFC 0024 007E CIJNH r15,H'0',@1L9
A7F4 FFE5 J @1L5
@1L6 DS 0H
IBM Software Group | Rational software
20
SEARCH and VERIFY of CHAR
In this example, the value string ‘0123456789ABCDEF’ was 16 bytes long
So it was loaded into a vector register that was repeatedly tested
If the value string were ‘0123456789ABCDEFabcdef’, this would not work
But then the vector range compare instruction could be used
The test string would then be compared 16 bytes at time to see if any was
not in one of the ranges 0-9, A-F or a-f
IBM Software Group | Rational software
21
SEARCH and VERIFY of CHAR
So, under arch(11), for SEARCH(x,y) and VERIFY(x,y) where x is char, if y is a
literal with 1 <= length(y) <= 16, then
the compiler will generate code using the vector find-any-equal (VFAE) instruction
(or VFANE for verify)
If y is a literal that the compiler can regroup as 8 or fewer ranges, then
It will generate code using the vector string range compare instruction
And this will be done if x is NONVARYING, VARYING or VARYINGZ
IBM Software Group | Rational software
22
SEARCH and VERIFY of WIDECHAR
SEARCH and VERIFY of WIDECHAR are done via library calls if arch < 11
But under arch(11), for SEARCH(x,y) and VERIFY(x,y) where x is widechar
when y is a literal with 1 <= length(y) <= 8,
the compiler will generate code using the vector find-any-equal (VFAE) instruction
(or VFANE for verify)
Length(y) must be <= 8 since the set of wchars to be tested has to fit in one
vector of 16 bytes
IBM Software Group | Rational software
23
SEARCH and VERIFY of WIDECHAR
For example, this simple code tests if a UTF-16 string is octal
woctal: proc( s );
dcl s wchar(*) var;
dcl o wchar value( '01234567' );
dcl sx fixed bin(31);
sx = verify( s, o );
if sx > 0 then ...
It is done with an expensive library call with ARCH <= 10
IBM Software Group | Rational software
24
SEARCH and VERIFY of WIDECHAR
With ARCH(11), the vector instruction facility is used to inline it as
E720 1000 0006 VL v2,+CONSTANT_AREA(,r1,0)
@1L5 DS 0H
A7FE 0010 CHI r15,H'16'
4140 0010 LA r4,16
B9F2 404F LOCRL r4,r15
B9FA 00E2 ALRK r14,r2,r0
E704 E000 0037 VLL v0,r4,_shadow1(r14,0)
E700 2080 1082 VFAE v0,v0,v2,b'0001',b'1000'
E7E0 0001 2021 VLGV r14,v0,1,2
EC4E 000C 2076 CRJH r4,r14,@1L6
A7FA FFF0 AHI r15,H'-16'
A70A 0010 AHI r0,H'16'
ECFC 0026 007E CIJNH r15,H'0',@1L9
A7F4 FFE5 J @1L5
@1L6 DS 0H
IBM Software Group | Rational software
25
SEARCH and VERIFY of WIDECHAR
Here the value string ‘01234567’ has 8 wide characters
So it can be loaded as a vector of 8 2-byte integers
And then the test string can be compared against the vector with up to 8
characters tested at a time
If the value string had more than 8 wchars, FAE and FANE could not be used
But a vector string compare could be used instead
IBM Software Group | Rational software
26
SEARCH and VERIFY of WIDECHAR
For example, this simple code tests if a UTF-16 string is numeric
wnumb: proc( s );
dcl s wchar(*) var;
dcl n wchar value( '0123456789' );
dcl sx fixed bin(31);
sx = verify( s, n );
if sx > 0 then ...
It is done with an expensive library call with ARCH <= 10
IBM Software Group | Rational software
27
SEARCH and VERIFY of WIDECHAR
With ARCH(11), the vector instruction facility is used to inline it as
E700 E000 0006 VL v0,+CONSTANT_AREA(,r14,0)
E740 E010 0006 VL v4,+CONSTANT_AREA(,r14,16)
@1L2 DS 0H
A74E 0010 CHI r4,H'16'
4150 0010 LA r5,16
B9F2 4054 LOCRL r5,r4
B9FA F0E2 ALRK r14,r2,r15
E725 E000 0037 VLL v2,r5,_shadow1(r14,0)
E722 0180 408A VSTRC v2,v2,v0,v4,b'0001',b'1000'
E7E2 0001 2021 VLGV r14,v2,1,2
EC5E 000D 2076 CRJH r5,r14,@1L3
A74A FFF0 AHI r4,H'-16'
A7FA 0010 AHI r15,H'16'
EC4C 000E 007E CIJNH r4,H'0',@1L4
A7F4 FFE5 J @1L2
IBM Software Group | Rational software
28
SEARCH and VERIFY of WIDECHAR
Where the range and comparison vectors are
0030_0039 0000_0000 0000_0000 0000_0000
A000_C000 0000_0000 0000_0000 0000_0000
If the string to test were ‘_0123456789’, then the vectors would be:
0030_0039 005F_005F 0000_0000 0000_0000
A000_C000 8000_8000 0000_0000 0000_0000
Where the second “range” is a degenerate range to test for a wchar _
IBM Software Group | Rational software
29
SEARCH and VERIFY of WIDECHAR
For SEARCH of wchar under arch(11),
when y is a literal with 16 or fewer ranges, the compiler will generate code
using the vector string range compare (VSTRC) instruction
If there are more than 4 ranges, the source bytes are loaded once and
repeated VSTRC tests are made against that source vector until a range is hit
IBM Software Group | Rational software
30
SEARCH and VERIFY of WIDECHAR
For VERIFY of wchar under arch(11),
when y is a literal with 4 or fewer ranges, the compiler will generate code
using the inverse vector string range compare (VSTRC) instruction
However, loading the source once and using repeated inverse VSTRC tests
against that vector won't work as simply with VERIFY (unlike SEARCH)
IBM Software Group | Rational software
31
SEARCH and VERIFY of WIDECHAR
For example, suppose the suppose the literal defines a set of 6 ranges
c-d g-h k-l o-p s-t w-x
To perform SEARCH of ‘quvx’ against this set of ranges, we can simply test
to see if any of the characters fall in one of the first 4 ranges, and if not, in
one of the next 4 etc:, i.e. test first against
c-d g-h k-l o-p
and then, if necessary, test against
s-t w-x
IBM Software Group | Rational software
32
SEARCH and VERIFY of WIDECHAR
But for VERIFY of ‘clot’ against this set of ranges, we would find a character
not in the first 4 ranges and 3 characters not in the next 2 ranges
c-d g-h k-l o-p s-t w-x
That would lead us to produce a non-zero result
But every character is in the full set of ranges and we want a result of zero!
The key here is that every vector string range compare instruction is
comparing multiple characters against a set of ranges – unlike a traditional,
simple test of a single character against a set of ranges
IBM Software Group | Rational software
33
SEARCH and VERIFY of WIDECHAR
This would suggest limiting VERIFY of wchar to 4 ranges
But that restriction is worse than it might seem
Testing no more than 8 ranges for char may be ok since there are only 256
char values and 8 ranges of 16 cover half of that
But there are 64K wchar values and 4 ranges won’t cover much of that
And one major European bank runs some important code (over 1M times a
day) that has a VERIFY against this string
IBM Software Group | Rational software
34
SEARCH and VERIFY of WIDECHAR
With 16 ranges
dcl test_chars wchar value(
'002B002C002D002E'wx
|| '0030003100320033003400350036003700380039'wx
|| '0660066106620663066406650666066706680669'wx
|| '06F006F106F206F306F406F506F606F706F806F9'wx
|| '0966096709680969096A096B096C096D096E096F'wx
|| '09E609E709E809E909EA09EB09EC09ED09EE09EF'wx
|| '0A660A670A680A690A6A0A6B0A6C0A6D0A6E0A6F'wx
|| '0AE60AE70AE80AE90AEA0AEB0AEC0AED0AEE0AEF'wx
|| '0B660B670B680B690B6A0B6B0B6C0B6D0B6E0B6F'wx
|| '0BE70BE80BE90BEA0BEB0BEC0BED0BEE0BEF'wx
|| '0C660C670C680C690C6A0C6B0C6C0C6D0C6E0C6F'wx
|| '0CE60CE70CE80CE90CEA0CEB0CEC0CED0CEE0CEF'wx
|| '0D660D670D680D690D6A0D6B0D6C0D6D0D6E0D6F'wx
|| '0E500E510E520E530E540E550E560E570E580E59'wx
|| '0ED00ED10ED20ED30ED40ED50ED60ED70ED80ED9'wx
|| '0F200F210F220F230F240F250F260F270F280F29'wx );
IBM Software Group | Rational software
35
SEARCH and VERIFY of WIDECHAR
However: we can finesse this problem:
We flip VERIFY( x, y ) to SEARCH( x, not y )
And so VERIFY and SEARCH for widechar will both be inlined if the number
of ranges is 16 or less
Although for VERIFY this may require testing 17 ranges
IBM Software Group | Rational software
36
SEARCH and VERIFY of WIDECHAR
For example, suppose the suppose the literal defines a set of 6 ranges
c-d g-h k-l o-p s-t w-x
VERIFY against this is the same as SEARCH against the “missing” ranges,
and so we can inline this via two normal (non-inverse) VSTRC tests against
this set of ranges
a-b e-f i-j m-n q-r u-v y-z
But note that the 6 ranges when flipped became 7 ranges - hence if there are
16 ranges, we might have to test against 17
®
IBM Software Group
© 2013 IBM Corporation
PL/I use in INLIST
IBM Software Group | Rational software
38
INLIST
This built-in function is useful in determining if a value belongs to a set of
values and allows you to put a SELECT in the middle of an IF
It requires a minimum of 3 arguments and accepts a maximum of 64
INLIST( x, a, b, c, … ) is equivalent to ( x = a ) | ( x = b ) | ( x = c ) …
All the arguments must have computational type
The compiler will optimize this when possible
IBM Software Group | Rational software
39
INLIST
If the first argument is “nice” and the rest are all similar, “close” values, then
the compiler will turn the inlist reference into a branch table. For example,
inlist( x, 2, 3, 5, 7, 11, 13, 17, 19 )
would become a branch table if x is FIXED BIN(31) or if X is FIXED DEC(5)
And if all are CHAR(1), a simple table look-up is generated
But if all are CHAR(2) or CHAR(4) or WCHAR(1) or WCHAR(2), a series of
compares is generated (since the values are unlikely to be “close” and the
branch table would be huge)
IBM Software Group | Rational software
40
INLIST
But, consider this snippet of code to validate a 2-byte country code
checkcc:
proc( countryCode )
options(nodescriptor);
dcl countryCode char(2);
if inlist( countryCode,
'AT', 'DE', 'CH', 'NL', 'DK', 'FI', 'SE', 'NO' ) then;
else
signal error;
IBM Software Group | Rational software
41
INLIST
Under arch(10) and opt(3), it becomes 8 compares and branches
5810 1000 L r1,_addrCOUNTRYCODE(,r1,0)
4800 1000 LH r0,_shadow1(,r1,0)
A70E C1E3 CHI r0,H'-15901'
A784 0026 JE @1L13
A70E C4C5 CHI r0,H'-15163'
A784 0022 JE @1L13
A70E C3C8 CHI r0,H'-15416'
A784 001E JE @1L13
A70E D5D3 CHI r0,H'-10797'
A784 001A JE @1L13
A70E C4D2 CHI r0,H'-15150'
A784 0016 JE @1L13
A70E C6C9 CHI r0,H'-14647'
A784 0012 JE @1L13
A70E E2C5 CHI r0,H'-7483'
A784 000E JE @1L13
A70E D5D6 CHI r0,H'-10794'
A784 000A JE @1L13
IBM Software Group | Rational software
42
INLIST
But under arch(11) and opt(3), one vector-find-any-equal and one branch do
it faster and more simply!
5810 1000 L r1,_addrCOUNTRYCODE(,r1,0)
4100 0002 LA r0,2
E700 1000 0037 VLL v0,r0,_shadow1(r1,0)
E720 E000 0006 VL v2,+CONSTANT_AREA(,r14,0)
E700 2000 1082 VFAE v0,v0,v2,b'0001',b'0000'
E700 0001 2021 VLGV r0,v0,1,2
EC08 000B 007E CIJE r0,H'0',@1L4
IBM Software Group | Rational software
43
INLIST
And if there were 16 codes to be tested, then instead of 16 compares and
branches, under arch(11), 2 vector-find-any-equal and 2 branches suffice!
5810 1000 L r1,_addrCOUNTRYCODE(,r1,0)
4100 0002 LA r0,2
C0E0 0000 LARL r14,F'48'
E720 1000 0037 VLL v2,r0,#AddressShadow(r1,0)
E700 E000 0006 VL v0,+CONSTANT_AREA(,r14,0)
E702 0000 1082 VFAE v0,v2,v0,b'0001',b'0000'
E700 0001 2021 VLGV r0,v0,1,2
EC08 0017 007E CIJE r0,H'0',@1L6
E700 E010 0006 VL v0,+CONSTANT_AREA(,r14,16)
E702 0000 1082 VFAE v0,v2,v0,b'0001',b'0000'
E700 0001 2021 VLGV r0,v0,1,2
EC08 000B 007E CIJE r0,H'0',@1L6
IBM Software Group | Rational software
44
INLIST
And one vector operation will also suffice for
8 compares of WCHAR(1)
4 compares of CHAR(4)
4 compares of WCHAR(2)
®
IBM Software Group
© 2013 IBM Corporation
Possible PL/I use in the future
IBM Software Group | Rational software
46
BETWEEN
This built-in function is useful in determining if a value is in an interval
It requires exactly 3 arguments
BETWEEN( x, a, b ) is equivalent to ( x >= a ) & ( x <= b )
All the arguments must be ordinals or have real numeric type
The compiler will optimize this when possible
For example, if x, a, and b are all FIXED BIN(p,0) with p <= 31, then the compiler will
turn BETWEEN( x, a, b ) into one comparison (not two!)
OORDINAL, CHAR(1), and WCHAR(1) are optimized in the same way
IBM Software Group | Rational software
47
BETWEEN
If this function were allowed to have more arguments to test if a value was in
any one of several ranges, for example
BETWEEN( x, a, b, c, d, e, f ) would be equivalent to
BETWEEN( x, a, b ) | BETWEEN( x, c, d ) | BETWEEN( x, e, f )
Then for certain types of x, the compiler could use the vector range compare
instruction to generate nice code to do these tests
IBM Software Group | Rational software
48
Other built-in functions
USUPPLEMENTARY
Essentially a range compare
JSONGetComma, JSONGetColon, etc
Requires an initial “VERIFY” against the possible whitespace values
IBM Software Group | Rational software
49
Arrays
PL/I has always had array language
Vector instructions could be used to optimize code such as
A = B; where A is an array of FIXED BIN(31) and B an array of FIXED BIN(15)
A = B + C * D; (etc) where the elements are arrays of FIXED BIN
ALL or ANY when applied to various integer or string arrays
Et al
IBM Software Group | Rational software
50
© Copyright IBM Corporation 2008. All rights reserved. The information contained in these materials is provided for informational purposes only, and is provided AS IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, these materials. Nothing contained in these materials is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in these materials to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in these materials may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. IBM, the IBM logo, the on-demand business logo, Rational, the Rational logo, and other IBM products and services are trademarks of the International Business Machines Corporation, in the United States, other countries or both. Other company, product, or service names may be trademarks or service marks of others.
Learn more at:
IBM Rational software
IBM Rational Software Delivery Platform
Process and portfolio management
Change and release management
Quality management
Architecture management
Rational trial downloads
developerWorks Rational
IBM Rational TV
IBM Rational Business Partners