Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
1/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
CS-695 NoSQL DatabasePostgreSQL (part 2 of 2)
Dr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck Cartledge
3 Sept. 20153 Sept. 20153 Sept. 20153 Sept. 20153 Sept. 20153 Sept. 20153 Sept. 20153 Sept. 20153 Sept. 20153 Sept. 20153 Sept. 20153 Sept. 20153 Sept. 20153 Sept. 20153 Sept. 20153 Sept. 20153 Sept. 20153 Sept. 20153 Sept. 20153 Sept. 20153 Sept. 2015
2/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
Table of contents I
1 Miscellanea
2 Assignment
3 Extensions
4 Summary
5 Conclusion
6 References
7 Backup slides
3/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
Corrections and additions since last lecture.
Be sure to look atassignment #01.
4/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
Bits and pieces
Little things that mean a lot.
How to “know” what theuser intended
How to measure “sameness”
How to connect thedatabases one-to-another
5/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
What are they and why should I care?
PostgreSQL is largely ANSI-SQL:2008 compliant
“The nice thingabout standards is thatyou have so many tochoose from.”
Andrew S. Tanenbaum[8]
ANSI-SQL:2011 adds many temporal related capabilities.
6/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
What are they and why should I care?
PostgreSQL is open source
“. . . is not only apowerful databasesystem capable ofrunning the enterprise,it is a developmentplatform upon which todevelop in-house, web,or commercial softwareproducts that require acapable DBMS.”
PostgreSQL Staff [3]
Programmers are tool makers (among other things). Whenpossible to extend something, they will.
7/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
What are they and why should I care?
What are extensions?
A way to define a collection of “loose” objects into a named entity.
A collection is called an “extension”
An extension may have many internal objects
An extension is loaded via the CREATE EXTENSION command
An extension is dropped via the DROP EXTENSION command
An extension object can be modified via the CREATEFUNCTION or REPLACE FUNCTION command
\dx to list installed extensions
select * from pg available extensions() order by
name; is also available.
8/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
What are they and why should I care?
Where can I find information about an extension?
Like so many other things. It is in the documentation.1
Documentation is terse.
A few sentences about theextension.
A list of objects in thecollection.
Maybe an example.
1http://www.postgresql.org/docs/9.3/static/contrib.html
9/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
fuzzystrmatch extension
Overview
“The fuzzystrmatch module providesseveral functions to determinesimilarities and distance betweenstrings.” [5]
soundex — converts string toSoundex code
metaphone — computesrepresentative string
dmetphone — computes two“sounds like” strings
levenshtein — computes“edit-distance” between two strings
\dx+ to list objects in an extension
10/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
fuzzystrmatch extension
Levenshtein algorithm
Informally, the Levenshtein distance between two words is the minimumnumber of single-character edits (i.e. insertions, deletions or substitutions)required to change one word into the other.[7]
There is a source and target word/string (s, t)
Of length |s| and |t| respectivelyThere is a matrix levs,t(|s|, |t|) where:
levs,t(i , j) =
max(i , j) ifmin(i , j) = 0,
min
levs,t(i − 1, j) + 1 Deletionlevs,t(i , j − 1) + 1 Insertionlevs,t(i − 1, j − 1) + 1si 6=ti Substution
The Levenshtein distance is in cell levs,t(|s|, |t|).
11/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
fuzzystrmatch extension
levenshtein example
SELECT *
FROM some table
WHERE levenshtein(storedValue, ’userInput’)
12/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
fuzzystrmatch extension
A different type of similarity2
How similar are thesesentences??
Julie loves me more thanLinda loves me
Jane likes me more thanJulie loves me
A · B = ||A || ||B || cos θ
cos θ =A · B
||A || ||B ||
=
∑
n
i=1 Ai × Bi√
∑
n
i=1(Ai )2 ×
√∑
n
i=1(Bi )2
=
∑
(A× B)√
∑
(A)2 ×√
∑
(B)2
=
∑
(A× B)√
∑
(A× A)×√
∑
(B × B)
=A · B√
A · A×√B · B
(1)
Math has the answer!!2http://stackoverflow.com/questions/1746501
http://stackoverflow.com/questions/1746501
13/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
fuzzystrmatch extension
The mechanics (part 1).
Start with term frequency:
A = Julie loves me more than Linda loves me
B = Jane likes me more than Julie loves me
The set of all words (as lower case):
words = me, julie, loves, linda, than, more, likes, jane
14/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
fuzzystrmatch extension
The mechanics (part 2).
How often do the original strings use the full set of words??
Word A B
me 2 2julie 1 1loves 2 1linda 1 0than 1 1more 1 1likes 0 1jane 0 1
Convert sentences to vectors.
A = (2, 1, 2, 1, 1, 1, 0, 0)
B = (2, 1, 1, 0, 1, 1, 1, 1)
cos θ =9
3.46× 3.16= 0.823
(2)
Range of cos θ is: 0 (no match) to 1 (perfect match).
15/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
fuzzystrmatch extension
Another example
The input strings:
He hates term frequency
She loves math
Word A B
he 1 0hates 1 0term 1 0frequency 1 0she 0 1loves 0 1math 0 1
Convert sentences to vector.
A = (1, 1, 1, 1, 0, 0, 0)
B = (0, 0, 0, 0, 1, 1, 1)
cos θ =0
4× 3= 0.0
(3)
Process works well for unknown terms (i.e., great flexibility).
16/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
fuzzystrmatch extension
Using “distance” to compute similarity
From term-frequency (tf)discussion, we have the ideaof converting terms (tokens)to a numerical vector
The tf vectors are created“on the fly”
What if the union vectorwere known in advance??
17/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
fuzzystrmatch extension
As records were added, their “word vector” could becomputed
union vector = ( me, julie,loves, linda, than, more,likes, jane)
input vector = Julie lovesme more than Linda lovesme
word vector =(2,1,2,1,1,1,0,0)
Each input vector now “lives” atpoint on a multi-dimensionalplane Image from [6].
All documents would “live” on the same multi-dimensional plane.
18/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
fuzzystrmatch extension
Now that everything “lives” on the same plane . . .
We can compute how close they are to each other.Distance d(p, q) between points p and q.
1D:√
(p − q)2
2D:√
(p1 − q1)2 + (p2 − q2)2
3D:√
(p1 − q1)2 + (p2 − q2)2 + (p3 − q3)2
nD:√
∑
n
i=1(pi − qi )2
Wouldn’t it be nice if PostgreSQL could help us with this math??
19/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
cube extension
The “cube” extension (part 1 of 2)
“This module implements a data type cube for representing multidimensionalcubes.”[4]
Adds a custom data type called cube
A cube type expects a vector’(0,0,0,0,0,0,0,7,0,0,0,0,0,0,0,0,0,0)’
The vector contains n values (i.e., ndimensions)
Values are user units Notional, rather than real.
Image from [2].
20/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
cube extension
The “cube” extension (part 2 of 2)
’Oedipus’, ’(2,0,0,0,0,9,0,9,9,0,0,0,0,0,0,0,0,0)’’Gone with the Wind’, ’(0,0,0,3,0,0,0,5,0,0,0,0,0,0,0,0,0,0)’
’The 40 Year Old Virgin’, ’(0,0,0,5,5,0,0,0,0,0,0,0,0,0,0,0,0,0)’’Animal House’, ’(0,0,0,5,9,0,0,0,0,0,0,0,0,0,0,0,0,0)’
21/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
postgres fdw extension
Working with more than one database (part 1 of 2)
Original database design was envisaged as standalone. Needschanged over time. The dblink extensions (pre 9.3), postgres fdw(9.3+)PostgreSQL pre-9.3 used dblink — primarilally executes select* that returns rowsSELECT *
FROM table1 tb1
LEFT JOIN (
SELECT *
FROM dblink(’dbname=db2’,’SELECT id, code FROM
table2’)
AS tb2(id int, code text);
) AS tb2 ON tb2.column = tb1.column;
22/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
postgres fdw extension
Working with more than one database (part 2 of 2)
PostgreSQL post-9.3 uses postgres fdw3
CREATE SERVER —connect to a remotedatabase
CREATE USER MAPING —authenticate to remotedatabase
CREATE FOREIGN TABLE
— local table connectedto remote table
CREATE SERVER book server
FOREIGN DATA WRAPPER postgres fdw
OPTIONS (host ’localhost’, port
’5432’, dbname ’postgis in action’);
CREATE USER MAPPING FOR public SERVER
book server
OPTIONS (user ’book guest’, password
’whatever’);
CREATE FOREIGN TABLE
ch01.ft restaurants
(id integer, franchise character(3),
geom geometry(Point,2163)
SERVER book server OPTIONS
(schema name ’ch01’, table name
’restaurants’);
3http://www.postgresql.org/docs/9.3/static/postgres-fdw.htmlhttp://www.postgresonline.com/journal/archives/322-Generating-Create-
23/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
Strengths and weaknesses
Good and not so good
Strengths
Age, lots of years ofactive developmentLots of language specificdriversExtensibilityOpen source
Weaknesses
Partionability (re. CAPTheorem)Data must be “neat andtidy”
24/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
Applicabilities
Good for, and not so good for
Good fit
Well structured dataData known in advanceData use not known inadvance
Not so good fit
Highly variable dataHierarchical or “objectoriented”Extremely sparse data
25/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
What have we covered?
Reviewed assignment #01Covered PostgreSQLextensionsCovered different ways tocompute and measuredocument “sameness”Remember Assignment #01due before next class
Next time: CRUDy Riak
26/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
References I
[1] Thomas Lockhart, Postgresql programmers guide, Thomas Lockhart(editor), 2001.
[2] Eric Redmond and Jim R Wilson, Seven databases in seven weeks,Pragmatic Bookshelf, 2012.
[3] PostgreSQL Staff, About, http://www.postgresql.org/about/, 2015.
[4] , cube,http://www.postgresql.org/docs/9.3/static/cube.html, 2015.
[5] , fuzzystrmatch,http://www.postgresql.org/docs/9.3/static/fuzzystrmatch.html,2015.
[6] WikiHow Staff, How to plot points in three dimensions,http://www.wikihow.com/Plot-Points-in-Three-Dimensions, 2015.
http://www.postgresql.org/about/http://www.postgresql.org/docs/9.3/static/cube.htmlhttp://www.postgresql.org/docs/9.3/static/fuzzystrmatch.htmlhttp://www.wikihow.com/Plot-Points-in-Three-Dimensions
27/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
References II
[7] Wikipedia Staff, Levenshtein distance,https://en.wikipedia.org/wiki/Levenshtein_distance, 2015.
[8] Andrew S Tanenbaum, Computer networks, Prentice Hall, 2003.
https://en.wikipedia.org/wiki/Levenshtein_distance
28/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
Slides
Genres from [2]
INSERT INTO genres (name,position) VALUES
(’Action’,1),
(’Adventure’,2),
(’Animation’,3),
(’Comedy’,4),
(’Crime’,5),
(’Disaster’,6),
(’Documentary’,7),
(’Drama’,8),
(’Eastern’,9),
(’Fantasy’,10),
(’History’,11),
(’Horror’,12),
(’Musical’,13),
(’Romance’,14),
(’SciFi’,15),
(’Sport’,16),
(’Thriller’,17),
(’Western’,18);
29/29
Miscellanea Assignment Extensions Summary Conclusion References Backup slides
Slides
Connection architecture
Image from [1].
postgreSQLConnection.png
MiscellaneaCorrections and additions since last lecture.
AssignmentBits and pieces
ExtensionsWhat are they and why should I care?fuzzystrmatch extensioncube extensionpostgres_fdw extension
SummaryStrengths and weaknessesApplicabilities
ConclusionReferencesBackup slidesSlides