Incomplete Reference of Postgresql
If you can’t RTFM, this is for you my friend
I’m reading this book https://nostarch.com/practicalSQL/ , thought I would simply collect my notes here.
The book is about finding stories about real world from databases using Postgresql. I enjoy technical books written with a twist like that, although I realize I should bring myself to read the dry offical doc, but my brain just don’t click with that. I could also learn by watching videos from the likes of Udemy if I was in a hurry, but I’m not. I also figured that that might leave too many gaps in my knowledge because SQL is one of the few time-tested technologies and I want to know it well for my life projects.
Anyway, so here are the notes:
SELECT DISTINCT school, salary
If we add a column, the query returns each unique pair of values. This technique gives us the ability to ask, “For each x in the table, what are all the y values?” For each factory, what are all the chemicals it produces? For each election district, who are all the candidates running for office? For each concert hall, who are the artists playing this month?
SELECT first_name, last_name, salary
ORDER BY salary DESC;
By ordering the salary column from highest to lowest, I can determine which teachers earn the most
Multiple ORDER BY
SELECT last_name, school, hire_date
ORDER BY school ASC, hire_date DESC;
we create a listing of teachers grouped by school with the most recently hired teachers listed first. This shows us who the newest teachers are at each school.
Better to run multple queries to answer each question you have than
ORDER BY on more than two columns.
to limit the rows a query returns to only those in which one or more columns meet certain criteria.
SELECT last_name, school, hire_date
WHERE school = 'Myers Middle School';
Various operators for WHERE
‘Putting It All Together’ Example
SELECT first_name, last_name, school, hire_date, salary
WHERE school LIKE '%Roos%'
ORDER BY hire_date DESC;
returns teachers at Roosevelt High School, ordered from newest hire to earliest. We can see a clear correlation between a teacher’s hire date at the school and his or her current salary level
CREATE TABLE eagle_watch (
integer above are one of the available data types in database. Each of them specifies what type of data its associated column is allowed to store.
Below is the list of data types that are mostly used:
maximum length is specified by n . If you insert fewer characters than the maximum, PostgreSQL will not store extra spaces. For example, the string ‘blue’ will take four spaces, whereas the string 123 will take three. This saves considerable space in datebase.
No need to use the
char(n) date type.
varchar(n) is solid for most cases.
A variable-length column of unlimited length. the longest possible character string you can store is about 1 gigabyte in Postgresql.
no substantial difference in performance among the two types. but try to define a column that will always have the same number of characters is a good way to signal what data it should contain.
For instance, you might use varchar(2) for U.S. state postal abbreviations.
Note, you actually might wanna just use
TEXT for all your character data.
There are three different data types for integers —
bigint — whole numbers, both positive and negative, including zero.
Below is how they differ:
In other words, go for:
smallintif the column values will always fit within its range e.g. days of the month or years
integerif you are confident numbers will remain within the integer limit.
bigintif you are dealing with numbers larger than about 2.1 billion. The extra storage space usage relative to
integermight be a concern when dealing with millions of data rows.
There are two data types for decimal numbers:
precisionis the maximum number of digits from the left to right of the decimal point. And
scaleas the number of total digits. For example, to record decimal number like
123.52, you will do
numeric(5, 2)(‘5’ is the total digits, ‘2’ is the max fraction digits), in which case, if input is a number like
98.3, database will return it as
98.30. Go for this if calculation need to be perfectly unambiguous.
- real or double precision . For ‘floating-points’.
realhas total 6 decimal digits precision point.
double precisionhas 15. If entered value exceeds those digits, they will be rounded. If fewer, won’t be padded with
numeric. Final value might not be exact. Therefore, they are bad for mission-critical math calculation.
Choose a big enough number type. Unless you’re designing a database to hold millions of rows, err on the side of bigger.
Date and Times
To add timezone to your date,
timestamp with time zone or simply
Cast to a different data type
CAST(<column_name> AS <target_data_type>)
Both do the same thing below:
CREATE TABLE people (id serial, person_name varchar(100));
serial has a limit range of 1 to 2147483647.
It’s useful to let database automatically generate a unique id for a newly inserted row. It’s unique because it’s auto-increment. So that also means you don’t have to specify
id when you
‘Putting It All Together’ Examples
Examples to inspire different ways of writing queries!
JOIN will come very useful when you have structured your data the proper way — Each entity type in its own table, and the relationship among themselves are specified only by a relevant key(foreign key) in their tables.
It’s easy to understand
JOIN with visualization. Let’s see 👇
Say, we have two tables to begin with as follows:
Now, let’s understand each of the JOIN types!
JOIN (aka. INNER JOIN)
The most common
JOIN usually go like this:
In English, the above command go like this:
Hey database! Please SELECT[return] all values in all columns (
*) from both ‘schools_left’ and ‘schools_right’ tables only ON[when] the value of ‘id’ column of ‘schools_left’ matches the value of ‘id’ column of ‘schools_right’.
Then database will return:
Three school IDs match in both tables, so JOIN returns only the three rows of those IDs that match. Schools that exist only in one of the two tables don’t appear in the result.
That’s all about
LEFT JOIN and RIGHT JOIN
In contrast to JOIN , the LEFT JOIN and RIGHT JOIN keywords each return all
rows from one table and display blank rows from the other table if no
matching values are found in the joined columns.
LEFT JOIN :
The result of the query shows all four rows from schools_left as well as
the three rows in schools_right where the id fields matched. Because
schools_right doesn’t contain a value of 5 in its right_id column, there’s no
match, so LEFT JOIN shows an empty row on the right rather than omitting
the entire row from the left table as with JOIN . The rows from schools_right
that don’t match any values in schools_left are omitted from the results
The same logic applies to
RIGHT JOIN but just the other round!
You can explore yourself two other joining types
FULL OTHER JOIN and
CROSS JOIN .
I don’t know about you but holy shit, did we just get the whole venerated
JOIN in database? I sure hope we did lol.
‘Putting it all together’ Example
You can add more operators to
Implement Full-Text Search
ALTER TABLE table_name
ADD COLUMN textsearchable_index_col tsvector
GENERATED ALWAYS AS (to_tsvector('english', coalesce(title, '') || ' ' || coalesce(body, ''))) STORED;
- This will create a column that contains all the lexemes. The weird
GENERATED ALWAYS ASwill automatically create the lexemes from any new raw data.
'english'can be replaced by
'simple'if you don’t need any kinda transformation or the ‘stemming’ from the dictionaries.
coalesceis used if the value can be
null. Otherwise, just do like
to_tsvector('simple', title || ' ' || body).
Let’s create index for the newly created column.
CREATE INDEX textsearch_idx ON table_name USING GIN (textsearchable_index_col);
WHERE textsearchable_index_col @@ to_tsquery('yourQueryHere')
You can also do prefix-matching like
yourQueryHere:* which will match things like