Upload
gazit
View
48
Download
0
Embed Size (px)
DESCRIPTION
Chinese Character Output. Character 字符 : abstract object recognized by human in communication, it is the representation at the conceptual level. Control characters in computer internal code is not considered characters - PowerPoint PPT Presentation
Citation preview
Lecture 9 1
Chinese Character Output
• Character字符 : abstract object recognized by human in communication, it is the representation at the conceptual level. Control characters in computer internal code is not considered characters
• Glyph字形 : character in its concrete form without regards to thickness, style, size, and the computer internal representation(bitmap, outline, etc)
• Font (font set)字體 /字型庫 : specific form of character with all computer internal representation attributes
Lecture 9 2
• The three levels of representation
Image圖像
Font字型
ExternalRepresentation
外部表示
GID(Glyph ID)
Glyph字形
DocumentDescription
Character字符 Code
Internal Representation
內部表示
Rendering
Association
Human perception
Lecture 9 3
Lecture 9 4
Lecture 9 5
Glyph Representation: Bitmaps• A matrix of 1s and 0s to represent a character• Typical monitor display a character using a 16 x 16 bitmap
• Typical sizes and storage demand are shown • (not double size => quadruple storage)• Data compression(a lot of empty space)
Total Chars 87 x 94 8,178Type Size Storage(est)Simple 16 x 16 262kCommon 24 x 24 589kCommon 32 x 32 1MDetailed 64 x 64 4MDetailed 96 x 96 8MDetailed 128 x 128 16MDetailed 256 x 256 64M
Lecture 9 6
• Usually store small bitmaps and scale up but there are problems with the quality of slanted edges
• Linear scaling: from Old(xold, yold) to New(xnew, ynew),
where 0 <= xold<= (WidthOLD -1), 0 <= yold<= (HeightOLD-1)
and 0 <= xnew<= (WidthNEW -1), 0 <= ynew<= (HeightNEW -1)
assuming Height and Width values are integers
• rx= WidthNEW/WidthOLD , ry=HeightNEW /HeightOLD
• If rx >1 and ry >1, then it is called scaling up
• New(xnew, ynew) = New(x * rx, y* ry) = Old(x , y )
Lecture 9 7
Smoothing techniques for scaling
• Ad Hoc Techniques (No underlying model but cheap):
– Enlargement (Matrix manipulation)
• Thresholding: convert into bitmap (assign 1 if >= 0.4 for unidirectional)
Lecture 9 8
• Smoothing spline (齒形 ) and interpolation嵌入法(costly)
– Basis: Character bitmaps are a coarse sample of the original character
– Approach: Recover the curves of the character as continuous functions (cubic spline) and then interpolate or generate the bitmaps of another size
– Optimization: Minimize the unsmoothing
Lecture 9 9
Bezier Curves
• P(t) = (x(t), y(t)): any pointin the curve(0<= t <= 1)
• Cubic Bezier: 4 points– end points coincide with curve
– other points control shape (can specify gradient at end points)
• X(t) =X0*(1-t)3 + 3* X1*(1-t)2*t + 3*X2*(1-t) *t2 + X3*t3
• Y(t) =Y0*(1-t)3 + 3* Y1*(1-t)2*t + 3*Y2*(1-t) *t2 + Y3*t3
Lecture 9 10
Glyph Representation: Outline
• Characters as shapes enclosed by lines or curves and specify these by parameters (i.e. data as an ASCII file and an interpreter to generate the graphic image)
• Line specified by 2 points• Curve: (usually cubic Bezier) specified by 4 points
– end points coincide with curve
– other points control shape
Lecture 9 11
• Advantages comparing to bitmaps:
– Scaling does not affect quality (Major)
– Does not need to store different sized fonts (a compression of extremely detailed/large fonts)
– Compression (as in standard text)
– Email transport without encoding and decoding
• Example of a Postscript for the Chinese Character 一 :
Lecture 9 12
• Unit of measurements: 1 point = 1/72 of an inch and the coordinates starts at the bottom left corner and coordinate translation is needed.
• Postscript level 1 font(base font) can handle only up to 256 characters in each set.
• It maps 256 code into names of fonts in the set.• Postscript Level 0 fonts: Composite Font
– Double byte encoding:– 1st byte: index to base font– 2nd byte: code in the particular base font
Lecture 9 13
• CID-keyed fonts(pp 288)
A technique to make character glyph definitions be independent of codeset.– Each character glyph is given a CID which uniquely
defines a glyph shape.
– A CMap is a file which contains mapping of character encodings with glyphs(CID).
– A CIDFont file contains the pointers to the actual descriptions of the glyphs. A CIDFont file usually keeps character glyphs with the same style.
• Other outline fonts include: TrueType fonts and OpenType. They different in the data structures/ header forms.
Lecture 9 14
Bitmap-to-Outline Conversion• Determine outline for all the straight lines • Generate curve list: a curve must begin and end in two
different corner (therefore needs to find corners: compute an angle between two vector points along the outline)
• Preprocessing for curve-fitting: knee removal, smooth filtering to yield finer co-ordinates of sample points.
• Perform curve fitting: iterations try to improve fitting goodness (measured as the least square error)
• End point alignment: close end points of two consecutive splines are merged by averaging their positions
Lecture 9 15
Lecture 9 16
Getting outline pixels through erosion
• Finding the outline of a bitmap is to find the pixel that is located inside an object, but that has at least one neighbour outside the object
• Basic idea– Find the bitmap with its edge pixels
removed:erosion( a smaller cross)
– Original bitmap with the eroded
bitmap removed.
Lecture 9 17
• Need more mathematical terms and binary image operation
• Translation:The displacement in either the x direction, the y direction or both at once. It is the reposition of the co-ordinate system.
• Suppose B is a binary image,
• Bxy means to move B by the
coordinates(x,y).
(0,0)origin
(x,y)Translated
Lecture 9 18
• Erosion of B(a bitmap): is a set of coordinates (x,y) such that S translated by (x,y), is contained in B.
• E = B ⊕ S = {(x,y) | Sxy B}
• S(4 pixels of blacks):
• Against • and their rotations• Returns all the points in B whose neighbors are not
the boarder (edge) pixels.
Lecture 9 19
• Outline pixels:
• B - (B S)