Incomplete Reference of Postgresql

If you can’t RTFM, this is for you my friend

Photo by César Couto on Unsplash

I’m reading this book https://nostarch.com/practicalSQL/ , thought I would simply collect my notes here.

The book is about finding stories about real world from databases using Postgresql. I enjoy technical books written with a twist like that, although I realize I should bring myself to read the dry offical doc, but my brain just don’t click with that. I could also learn by watching videos from the likes of Udemy if I was in a hurry, but I’m not. I also figured that that might leave too many gaps in my knowledge because SQL is one of the few time-tested technologies and I want to know it well for my life projects.

Anyway, so here are the notes:

DISTINCT

SELECT DISTINCT school, salary
FROM teachers;

If we add a column, the query returns each unique pair of values. This technique gives us the ability to ask, “For each x in the table, what are all the y values?” For each factory, what are all the chemicals it produces? For each election district, who are all the candidates running for office? For each concert hall, who are the artists playing this month?

ORDER BY

SELECT first_name, last_name, salary
FROM teachers
ORDER BY salary DESC;

By ordering the salary column from highest to lowest, I can determine which teachers earn the most

Default is ASC order.

SELECT last_name, school, hire_date
FROM teachers
ORDER BY school ASC, hire_date DESC;

we create a listing of teachers grouped by school with the most recently hired teachers listed first. This shows us who the newest teachers are at each school.

Better to run multple queries to answer each question you have than ORDER BY on more than two columns.

WHERE

to limit the rows a query returns to only those in which one or more columns meet certain criteria.

SELECT last_name, school, hire_date
FROM teachers
WHERE school = 'Myers Middle School';
shows just the teachers assigned to Myers Middle
School
Comparison and Matching Operators in PostgreSQL
Examples of using operators with WHERE

‘Putting It All Together’ Example

SELECT first_name, last_name, school, hire_date, salary
FROM teachers
WHERE school LIKE '%Roos%'
ORDER BY hire_date DESC;

returns teachers at Roosevelt High School, ordered from newest hire to earliest. We can see a clear correlation between a teacher’s hire date at the school and his or her current salary level

Data Types

CREATE TABLE eagle_watch (
observed_date date,
eagles_seen integer
);

The date and integer above are one of the available data types in database. Each of them specifies what type of data its associated column is allowed to store.

Below is the list of data types that are mostly used:

maximum length is specified by n . If you insert fewer characters than the maximum, PostgreSQL will not store extra spaces. For example, the string ‘blue’ will take four spaces, whereas the string 123 will take three. This saves considerable space in datebase.

No need to use the char(n) date type. varchar(n) is solid for most cases.

A variable-length column of unlimited length. the longest possible character string you can store is about 1 gigabyte in Postgresql.

Quick note

no substantial difference in performance among the two types. but try to define a column that will always have the same number of characters is a good way to signal what data it should contain.

For instance, you might use varchar(2) for U.S. state postal abbreviations.

Note, you actually might wanna just use TEXT for all your character data.

There are three different data types for integers — smallint , integer , bigint — whole numbers, both positive and negative, including zero.

Below is how they differ:

The three different data types for integers

In other words, go for:

  • smallint if the column values will always fit within its range e.g. days of the month or years
  • integer if you are confident numbers will remain within the integer limit.
  • bigint if you are dealing with numbers larger than about 2.1 billion. The extra storage space usage relative to integer might be a concern when dealing with millions of data rows.
you know what they say: a picture is worth a thousand words

There are two data types for decimal numbers:

  • numeric(precision,scale) precision is the maximum number of digits from the left to right of the decimal point. And scale as the number of total digits. For example, to record decimal number like 123.52 , you will do numeric(5, 2) (‘5’ is the total digits, ‘2’ is the max fraction digits), in which case, if input is a number like 98.3 , database will return it as 98.30 . Go for this if calculation need to be perfectly unambiguous.
  • real or double precision . For ‘floating-points’. real has total 6 decimal digits precision point. double precision has 15. If entered value exceeds those digits, they will be rounded. If fewer, won’t be padded with 0 unlike thenumeric . Final value might not be exact. Therefore, they are bad for mission-critical math calculation.

To illustrate:

preparation
this is what you get

Choose a big enough number type. Unless you’re designing a database to hold millions of rows, err on the side of bigger.

To add timezone to your date, timestamp with time zone or simply timestampz

Cast to a different data type

Two methods:

  1. CAST(<column_name> AS <target_data_type>)
  2. <column_name>::<target_data_type> (Shortcut)

Both do the same thing below:

same thing. choose one. prefer readability maybe?

Auto-increment ID

CREATE TABLE people (id serial, person_name varchar(100));

serial has a limit range of 1 to 2147483647.

It’s useful to let database automatically generate a unique id for a newly inserted row. It’s unique because it’s auto-increment. So that also means you don’t have to specify id when you INSERT .

‘Putting It All Together’ Examples

Examples to inspire different ways of writing queries!

Understanding JOIN

JOIN will come very useful when you have structured your data the proper way — Each entity type in its own table, and the relationship among themselves are specified only by a relevant key(foreign key) in their tables.

It’s easy to understand JOIN with visualization. Let’s see 👇

Say, we have two tables to begin with as follows:

This table is called ‘schools_left’
This table is called ‘schools_right’

Now, let’s understand each of the JOIN types!

The most common JOIN usually go like this:

In English, the above command go like this:

Hey database! Please SELECT[return] all values in all columns (* ) from both ‘schools_left’ and ‘schools_right’ tables only ON[when] the value of ‘id’ column of ‘schools_left’ matches the value of ‘id’ column of ‘schools_right’.

Then database will return:

Three school IDs match in both tables, so JOIN returns only the three rows of those IDs that match. Schools that exist only in one of the two tables don’t appear in the result.

That’s all about JOIN !

In contrast to JOIN , the LEFT JOIN and RIGHT JOIN keywords each return all
rows from one table and display blank rows from the other table if no
matching values are found in the joined columns.

Doing LEFT JOIN :

LEFT JOIN

Return:

The result of the query shows all four rows from schools_left as well as
the three rows in schools_right where the id fields matched. Because
schools_right doesn’t contain a value of 5 in its right_id column, there’s no
match, so LEFT JOIN shows an empty row on the right rather than omitting
the entire row from the left table as with JOIN . The rows from schools_right
that don’t match any values in schools_left are omitted from the results

The same logic applies to RIGHT JOIN but just the other round!

You can explore yourself two other joining types FULL OTHER JOIN and CROSS JOIN .

I don’t know about you but holy shit, did we just get the whole venerated JOIN in database? I sure hope we did lol.

You can add more operators to ON too:

AND operator with ON

Implement Full-Text Search

Reference

ALTER TABLE table_name
ADD COLUMN textsearchable_index_col tsvector
GENERATED ALWAYS AS (to_tsvector('english', coalesce(title, '') || ' ' || coalesce(body, ''))) STORED;
  • This will create a column that contains all the lexemes. The weird GENERATED ALWAYS AS will automatically create the lexemes from any new raw data.
  • The 'english' can be replaced by 'simple' if you don’t need any kinda transformation or the ‘stemming’ from the dictionaries.
  • The coalesce is used if the value can be null . Otherwise, just do like to_tsvector('simple', title || ' ' || body) .

Let’s create index for the newly created column.

CREATE INDEX textsearch_idx ON table_name USING GIN (textsearchable_index_col);
SELECT title
FROM table_name
WHERE textsearchable_index_col @@ to_tsquery('yourQueryHere')
LIMIT 10;

You can also do prefix-matching like yourQueryHere:* which will match things like yourQueryHereBro or yourQueryHereMan etc.

Untitled by Kheoh Yee Wei