0%

关于SQL的笔记

个人用笔记,写的很杂(再放送).包括了初学SQL时用的sqlite,自己刷题时的mysql(dlc篇较多),最近学的Oracle中的语法.
Oracle会单独标出.
DLC篇.

Notes

关于每次每次打开SQLite studio都不记得怎么做这件事你有什么头绪吗:

db文件,是database文件.
Tab左边有open database.要导入db文件是用open database,在小窗里加入db文件的路径.
要注意看哪个database connect了哪些没有.

Tab中间有sql editor,那是写script的地方.
窗口最下面的tab是显示打开了的不同窗口的地方.

Hot key

ctrl + o -> open database
alt + e -> SQL editor
F9 -> Execute query

Comment

1
2
3
4
5
6
-- comments
/*
a lot
a lot
comments
*/

基本小笔记:加入代码和数据库.
点击tool,open sql editor,在第三行的小文件load sql from file就可以加在sql的代码.
导入数据的时候,是先create table再在其table中利用import导入数据.

SQL的function分为两种,对单一value做处理的叫做elementary function(sqrt),会对多个value作处理的的叫做summary function(max).




Side Note

记得查看documentation

SQLite是大小写不敏感的,除了用string来做search的时候.

column name <= 30 characters. 只能用a-z/A-Z/0-9/_/$
doc_sql & doc_sqlite

Oracle中的naming:

  • Up to 30 characters in length
  • Must begin with a letter
  • Can contain letters, numbers, $, _, and #
  • Must be unique within a schema
  • Cannot be an Oracle reserved word
  • Case insensitive

一个table现在能有1000的columns.

如果我们要在搜索有单引号或者双引号的内容.
For example we are searching Can”a’da.
WHERE jurisdiction = ‘Can”a’’da’
两个单引号search一个单引号.
双引号不变.
但如果 WHERE jurisdiction = “Can”a’da”
两个双引号search一个双引号.
单引号不变.










Code

SELECT

Like的用法(note见sql string).

  • _ which denotes one char
  • % which denotes 0 or more chars
1
2
3
4
5
6
SELECT Name, Continent, Population FROM Country
WHERE Name LIKE '%island%' ORDER BY Name; -- wherever there's island in the string, %means 前面或者后面可以有其他字符
SELECT Name, Continent, Population FROM Country
WHERE Name LIKE '_island%' ORDER BY Name; -- _ 的意思是某一个字符.island%前面可以有任何一个字符
SELECT Name, Continent, Population FROM Country
WHERE Continent IN ( 'Europe', 'Asia' ) ORDER BY Name; -- continent是europe或者asia

在sql string tag下接string comparsion笔记.

(oracle课程中) Literal values.
1
2
SELECT column_name_1 ,'Hello',5,'01-Jan-15'
FROM table_name;
这样output时,每一行都会有,Hello,5和01-Jan-15三个value. 强调string和date要放在single quote中.

Substituion variable.

1
2
3
SELECT column_name_1
FROM table_name
WHERE column_name_1 = &theValue;

这是会跳出来一个框让你输入theValue的值.

Use a variable name prefixed with && to prompt the user only once.

1
2
3
SELECT column_name_1, &&theValue
FROM table_name
Order BY &theValue;
(oracle课程笔记结束)

select + as可以用于取名字或改名字.

1
2
3
4
SELECT column_name_1 AS alias_name_1,  -- 这时就可以改名了
column_name_2 AS alias_name_2,
column_name_N AS alias_name_n
FROM table_name;

select + format, 常用format.

1
SELECT amount format=dollar10.2 -- 小数点前保留十位,后面保留两位

select + label, 给variable加label.

1
SELECT amount label = 'usd'

select + *, 全选.

select + calculated, calculated的作用是使用临时计算的值(在原table中没有).

1
2
3
4
5
proc sql outobs=10;
select flightnumber, date, destination,
boarded + transferred + nonrevenue as Total
from sasuser.marchflights
where calculated total < 100;
1
2
3
4
5
proc sql outobs=10;
select flightnumber, date, destination,
boarded + transferred + nonrevenue as Total,
calculated total/2 as Half
from sasuser.marchflights;

where去掉missing(在sas中):

1
where Salary is not missing

在sql中可以写:

1
where Salary ne .

要注意细节

关于select.
我们可以使用NOT, AND, OR 来更加的区别,但注意这几个关键词有优先级.

NOT > AND > OR

所以使用NOT的时候记得用括号.(有 not in (‘A’,’B’)的语法.)

我们还可以用between.有两种写法.

1
2
-- Contiunes: Between value1 and Value2
-- in (value1, value2, ..., valueN)

String也是可以连续的.C001到C004是连续的(按ascii的循序).

search多个无排序element的时候可以用in和括号.

WHERE jurisdiciton in (‘Canada’,’US’);

不等于有 <> 和 !=.



DISTINCT

select distinct 可以remove duplicate,也同时可以用来查看数据一共有的种类…

1
2
3
SELECT Continent FROM Country;
SELECT DISTINCT Continent FROM Country;
-- Using the SELECT DISTINCT statement, you will get only unique results.
1
SELECT DISTINCT column1, column2, column3 FROM Country;

顺便一提NULL也算distinct的一种.

如果使用select distinct就会apply到所有的column,如果只要其中一个column distinct:
可以用group by.
从:

1
2
3
4
5
6
ID  SKU     PRODUCT
=======================
1 FOO-23 Orange
2 BAR-23 Orange
3 FOO-24 Apple
4 FOO-25 Orange

到:

1
2
1   FOO-23  Orange
3 FOO-24 Apple
1
2
3
4
5
SELECT * FROM [TestData] WHERE [ID] IN (
SELECT MIN([ID]) FROM [TestData]
WHERE [SKU] LIKE 'FOO-%'
GROUP BY [PRODUCT]
)


WHEN

condition, case和end是一定要在一起用的.

select + case when (像if else statment.)

1
2
3
4
5
6
7
8
9
proc sql;
create table temp_2 as
select *, case
when cty="North YorkC" then "North York"
when cty="North YorkW" then "North York"
when cty="North Yrko" then "North York"
else cty end as city_clean
from disk.txn_data_v9
quit;

这里as city_clean就是新造的column,同理可以用来categories things.



Oracle中有另一种if else:
1
2
3
4
5
6
SELECT last_name, job_id,
DECODE (job_id, 'ASD_ASST' , 1.1 * salary,
'MK_REP' , 1.15 * salary,
'HR_REP' , 1.2 * salary,
salary) AS revised_salary
FROM employees;


FROM

FROM clause + WHERE clause(WHERE + logical expression)

1
SELECT * FROM Countries WHERE Continent = 'Europe';


UPDATE

用于更改数据.

1
2
3
update table
set column1=value1,column2=value2,...
where columnX=valueX [and/or] columnY=valueY [and/or]...

一次只能更新一个表.
All databases support update with subquery.
Only MySQL support update with subquery and with join statement.

1
2
3
4
5
6
7
8
9
10
11
UPDATE Customer
SET
Address = '123 music ave',
Zip = '98056'
WHERE id = 5; -- which/where row to be updated

UPDATE Customer
SET
Address = NULL, -- 如果没有的话
Zip = NULL
WHERE id = 5;

如果没有where就会把所有的都替换.



DELETE

1
delete from table where column1=value1 [and/or] column2=value2...
1
DELETE FROM Customer WHERE id = 4;

delete删除的数据后台还有保留,就是可以找回.
Note:增删改只能一个一个命令来.



AS

as create column的alias名字.注:as有时可以省略.

1
SELECT ‘Helloworld’ AS Result;


ORDER BY

sorting的一种方式.

1
2
SELECT * FROM Country ORDER BY Name;
SELECT Name, LifeExpectncy FROM Country ORDER BY Name; -- get only Name&LifeExpectncy column data.

Note: single quote for string, double string for most of other things(eg. identifier).

1
2
3
4
5
6
7
SELECT Name FROM Country ORDER BY Name DESC; -- descending 从z到a的顺序
SELECT Name FROM Country ORDER BY Name ASC; -- 从a到z,默认的顺序

SELECT Name, Continent FROM Country ORDER BY Continent, Name; --按两个条件排序,从continent到name
SELECT Name, Continent FROM Country ORDER BY 3, 4; --也可以通过列数指代variable,continent是第3列,name是第4列.

SELECT Name, Continent, Region FROM Country ORDER BY Continent DESC, Region, Name;

Example:
Query the two cities in STATION with the shortest and longest CITY names, as well as their respective lengths (i.e.: number of characters in the name). If there is more than one smallest or largest city, choose the one that comes first when ordered alphabetically.

这道题如果写两个query,我会先把city按alphabetical order给排序好,再用select min(city), length(min(city))来写.

1
2
3
4
5
6
7
(select city, length(city)
from station
order by length(city) desc, city asc limit 1)
union
(select city, length(city)
from station
order by length(city) asc, city asc limit 1);

这是答案的代码,思路也很清晰:按length先排,再按alphabetical order排.
我试max(city)或者min(city)会跑不出来.

max(city):’Zionsville’
order by length(city) desc:’Marine On Saint Croix’

注意到了吗!!直接比较city,比较的不是按string长度,而是acsii的value(震惊).



WHERE

选择部分(row)data.

1
SELECT Name, Continent, Region FROM Country WHERE Continent = 'Europe';

如果要选两个,可以用in+括号.

1
2
3
4
SELECT Name, Continent, Region FROM Country WHERE Continent in ('Europe','Africa');

-- not
SELECT Name, Continent, Region FROM Country WHERE Continent not in ('Europe','Africa');

举例之选择even number of rows:
sql的除法是%.

1
2
select distinct city from station
where ( ID % 2 ) = 0;


LIMIT and OFFSET

选择部分(row)data.

1
2
3
4
5
6
7
8
9
SELECT Name, Continent, Region
FROM Country
WHERE Continent = 'Europe'
ORDER BY Name LIMIT 5;

SELECT Name, Continent, Region
FROM Country
WHERE Continent = 'Europe'
ORDER BY Name LIMIT 5 OFFSET 5; --5-10的数据

我们也可以用limit来做offset的事.

1
LIMIT [# of records]|[(# of start) - 1, # of records];

上面是给SQLite和MySQL的.

或者使用TOP.

Access, MS SQL SERVER - Microsoft – Transact-SQL -> T-SQL
Select top [# of records] | [(# of start) - 1, # of records] column1, column2 From Table
Select top [# of records] | [(# of start) - 1, # of records] * From Table

注意其他的语法可能不一样,最好查看doc.

用limit在求第二大小的数:

1
Limit 1, 1;


COUNT

count rows.

1
2
3
4
5
6
7
SELECT COUNT(*)
FROM Country
WHERE Population > 10000000 AND Continent = 'Europe'; //算表中有多少的记录.
SELECT COUNT(Continent)
FROM Country; //count where continent has data
SELECT COUNT(DISTINCT column)
FROM table_name; //我们还可以这样count unique的值.

注意NULL不会被count进去. Count( * ) 比select * 快很多.

Example:
Find the difference between the total number of CITY entries in the table and the number of distinct CITY entries in the table.

1
select count(city) - count(distinct city) from station;


INSERT INTO

用code码的方式add data.

1
2
3
insert into Table(column1,...) values (value1, ...)
-- or
insert into Table values (value1, ...)

当用第二种方式的时候,可以用NULL来空去没有值的column.
Primary key不能为空(null).

1
2
3
INSERT INTO Customer (name, city, state, zip)
VALUES ('Jimi Hendrix', 'Renton', 'WA', '12398');

1
2
3
4
INSERT INTO test VALUES ( 1, 'This', 'Right here!' );
INSERT INTO test ( b, c ) VALUES ( 'That', 'Over there!' );
INSERT INTO test DEFAULT VALUES;
INSERT INTO test ( a, b, c ) SELECT id, name, description from item;

我们也可以用subquery来insert.
不过注意,如果将一个表格复制给它本身,也就是将数据翻倍.很可能出现primary key也重复而报错的情况.
所以要做改动.

1
2
3
4
insert into bond_price
select "column_name",column1,column2,column3
from bond_price
where column_name = 'data'


INSERT INTO(Oracle)

  • List the columns that will receive values
  • List the corresponding values in the VALUES clause
  • Enclose character and date values in single quotations
1
2
INSERT INTO departments(department_id, department_name, 										  manager_id, location_id)
VALUES(280, 'Operations', 101, 1800);

Inserting nulls

Implicit method: Omit the columns from the column list(会默认null)

1
2
INSERT INTO departments (department_id,department_name)
VALUES(290, 'Dining Services');

Explicit method: Specify NULL in the VALUES list

1
2
INSERT INTO departments(department_id, department_name, 								manager_id, location_id)
VALUES(290, 'Dining Services', NULL, NULL);

Copy rows from another table.

1
2
3
4
5
6
7
8
INSERT INTO sales_reps(id, lname, salary, comm)
SELECT employee_id, last_name, salary, commission_pct
FROM employees
WHERE job_id = 'SA_REP';

INSERT INTO emps_copy
SELECT *
FROM employees;


CREATE TABLE

在SQLite studio中,得先建database,database->add a database,然后创造的table就在database之中了.
新建table后,在tools,import中选择其table和导入文件.

1
2
3
4
5
6
7
8
9
CREATE TABLE test (
a INTEGER,
b TEXT
); -- column name will be a and b, followed by type declaration

INSERT INTO test VALUES ( 1, 'a' );
INSERT INTO test VALUES ( 2, 'b' );
INSERT INTO test VALUES ( 3, 'c' );
SELECT * FROM test;

这里的data type,有 NULL, INTEGER, REAL(像c里的float), TEXT(string value), BLOB(stored exactly as it was input).
SQLite没有Boolean,可以手动用0(false)和1(true).
(在SQL-续里有更详细的笔记).

1
2
3
4
5
CREATE TABLE test (
a TEXT UNIQUE,
b TEXT,
c TEXT DEFAULT 'panda'
); --UNIQUE and DEFAULT

Note: we can have more NULL while using UNIQUE(depends on different system)
格式是 column_name + data_type + PRIMARY KEY/ NOT NULL/ NOT NULL UNIQUE.

specify the column list of the table. Each column has a name, data type, and the column constraint. SQLite supports PRIMARY KEY, UNIQUE, NOT NULL, and CHECK.
specify the table constraints such as PRIMARY KEY, FOREIGN KEY, UNIQUE, and CHECK constraints.

1
2
3
4
5
6
CREATE TABLE [IF NOT EXISTS] [schema_name].table_name (
column_1 data_type PRIMARY KEY,
column_2 data_type NOT NULL,
column_3 data_type DEFAULT 0,
table_constraints
) [WITHOUT ROWID];

在initial的时候需要加括号来define column.
中途想要把result记载下来时,

1
2
3
4
5
create table temp_1 as
select trade_id,
transaction_date
from col_trans
;

不需要加括号,注意有as.



CREATE TABLE(Oracle)

跟sqlite不同,oracle定义table时要注明size.

For each column the following must be specified
- The column name
- The column datatype and size where applicable
- Any constraints you want applied to the column

1
2
3
4
5
6
CREATE TABLE table_name
(
column1 datatype(size) constraint,
column2 datatype(size) constraint,
etc
);

Constraints:

  • Primary key: Uniquely identifies each row with non-null value
  • Foreign key: Establishes Referential Integrity between the column and a unique identifier, usually in another table, so the values in the two columns match
  • Unique: The values must be unique in each row
  • Not null: The column cannot contain nulls
  • Check: Specifies which values are allowed
1
2
3
4
5
6
7
CREATE TABLE departments
( department_id NUMBER(4) CONSTRAINT dept_id_pk PRIMARY KEY ,
department_name VARCHAR2(30) NOT NULL,
manager_id NUMBER(6),
location_id NUMBER(4) CONSTRAINT dept_loc_id_fk
REFERENCES locations (location_id)
) ;


Index

Index are used for improving performance in a table.

1
2
3
CREATE INDEX idx_staff_last_name
ON staff
USING (last_name)

index on staff table



DROP TABLE

1
2
3
4
5
CREATE TABLE test ( a TEXT, b TEXT ); --here will have a error since the table is already created
INSERT INTO test VALUES ( 'one', 'two' );
SELECT * FROM test;
DROP TABLE test;
DROP TABLE IF EXISTS test;

Truncate: 跟drop table类似但不同.会把table里所有的数据删掉,table保留.



IS NULL

When search for null

1
2
SELECT * FROM test WHERE a IS NULL;
SELECT * FROM test WHERE a IS NOT NULL; --opposite
1
2
3
4
5
6
DROP TABLE IF EXISTS test;
CREATE TABLE test (
a INTEGER NOT NULL,
b TEXT NOT NULL,
c TEXT
);


NVL(Oracle)

一种查到了null替换值的function.
NVL(val1,val2) -> if val1 null, return val2.
NVL2(val1,val2,val3) -> if val1 null, return val2, else return val3.

1
2
3
SELECT last_name, NVL(commission_pct, 0) AS commission_pct
FROM employees
WHERE department_id IN(60, 80);
1
2
3
4
SELECT last_name, salary, commission_pct,		
NVL2(commission_pct, salary+salary*commission_pct, salary)
AS gross_pay
FROM employees;


NULL

关于null的note:
null在计算avg和sum时不受影响.如果全部都是null的数据,min,max也会return null.
null进行加减乘除,或者大小比较也是return null.
null == null也是return null.

logic involve null(unknown)的时候:

True False Both
or unknown or true == true unknown or false == unknown unknown or unknown == unknown
and unknown and true == unknown unknown and false == false unknown and unknown == unknown
not not(unknown) == unknown


COALESCE

用来默认替换null的.
SELECT COALESCE(要查找的column_name, replace null的值);

Evaluates the arguments in order and returns the current value of the first expression that initially doesn’t evaluate to NULL.
For example, SELECT COALESCE(NULL, NULL, ‘third_value’, ‘fourth_value’); returns the third value because the third value is the first value that isn’t null.



ALTER

add a new column

1
2
ALTER TABLE test ADD d TEXT;
ALTER TABLE test ADD e TEXT DEFAULT 'panda';


ID columns

this is different in different data system

1
2
3
4
5
6
7
8
9
10
CREATE TABLE test (
id INTEGER PRIMARY KEY,
a INTEGER,
b TEXT
);
INSERT INTO test (a, b) VALUES ( 10, 'a' );
INSERT INTO test (a, b) VALUES ( 11, 'b' );
INSERT INTO test (a, b) VALUES ( 12, 'c' );
SELECT * FROM test;
DROP TABLE IF EXISTS test;





Join table

(aka join table)

Each row of table has a unique key. The table’s unique key may or may not correspond with a column in the table.
Sometime unique key is hidden but the table must have one.
When a column is used as a unique key, it often called primary key (子table) .(用于identify)

通过key来连接各个表格.这是的连接用key被称为foreign keys(主table).
如果链接的一方是primary key,自己就是foreign key.

种类:
Inner join(no null),保留两个table的共同值.
Outer join(may have null): LEFT/RIGHT/FULL join
Right and full outer join are not supported in sqlite right now 但是可以用left改写right.

Left join:

Num Var
1 A1
2 A2
3 A3
Num VarB
1 B1
2 B2
4 B4
Num VarA VarB
1 A1 B1
2 A2 B2
3 A3
1
2
3
4
SELECT a.column1, ... ,b.column1, ...
FROM
tableA [AS] [INNER|LEFT|RIGHT|FULL OUTER JOIN] tableB [AS] b --这里的as可以省略
ON a.column1 = b.column1 [AND...];


Another inner join example:
(这样的join默认inner join)

1
2
3
SELECT a.*, b.*
FROM clean_data as a, disk.master_v1 as b
WHERE a.card_num=b.card_num;
1
2
3
4
5
6
proc sql;
create table inner_join_master as
select a.*, b.*
from clean_data as a inner join disk.master_v1 as b
on a.card_num = b.card_num;
quit;

If we have two keys in the right table, only have one in left table.
Join(inner + left + right) will keep two keys in right table and two copy of info in the joined table.

If we have two keys in the right table, have two in left table as well.
Join(inner + left + right) will have 4 results.



Natural Join(Oracle)

Oracle的笔记.

Joins tables based on all columns with the same name and datatype.

1
2
SELECT department_id, department_name, location_id, city
FROM departments NATURAL JOIN locations;


Join using(Oracle)

Oracle里有两种普通join的写法.一种是用using,还有一种是跟sqlite一样用等号.

1
2
3
SELECT employee_id, last_name, department_id, department_name
FROM employees JOIN departments
USING (department_id);

用等号叫做:Equijoin.



Three way join(Oracle)

1
2
3
4
5
SELECT employee_id, last_name, department_name, city
FROM employees JOIN departments
ON employee.department_id = departments.department_id
JOIN locations
ON departments.location_id = locations.location_id;

可以在join时同时加入condition.

1
2
3
4
5
6
7
8
9
SELECT employee_id, last_name, department_name
FROM employees e JOIN departments d
ON e.department_id = d.department_id
AND e.department_id IN (50,60);

SELECT employee_id, last_name, department_name
FROM employees e JOIN departments d
ON e.department_id = d.department_id
WHERE e.department_id IN (50,60);


Non equijoin(Oracle)

An operator such as BETWEEN…AND can be used to display an employee’s salary grade.

1
2
3
4
SELECT grade_level, salary, last_name
FROM employees e JOIN sal_grades s
ON salary BETWEEN lowest_sal AND highest_sal
ORDER BY 1, 2, 3;


Non standard joins(Oracle)

一种格式不大一样的Oracle only的join.

1
2
3
4
SELECT last_name, department_name, city
FROM employees e, departments d, locations l
WHERE e.department_id = d.department_id
AND d.location_id = l.location_id;


Combine two tables(right and left relationship)

Relational database, using JOIN clause

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
CREATE TABLE left ( id INTEGER, description TEXT );
CREATE TABLE right ( id INTEGER, description TEXT );

INSERT INTO left VALUES ( 1, 'left 01' );
INSERT INTO left VALUES ( 2, 'left 02' );

INSERT INTO right VALUES ( 1, 'right 06' );
INSERT INTO right VALUES ( 2, 'right 07' );

SELECT * FROM left;
SELECT * FROM right;

-- 这里拼起来(inner join, 只有id有重叠的行会join)
SELECT l.description AS left, r.description AS right
FROM left AS l
JOIN right AS r ON l.id = r.id -- 用on来告诉他们用什么condition来join(intersection);
1
2
3
4
-- (outer join, 全部join)
SELECT l.description AS left, r.description AS right
FROM left AS l
LEFT JOIN right AS r ON l.id = r.id -- 没有数据的地方就会生成NULL;
1
2
3
SELECT s.id AS sale, s.date, i.name, i.description, s.price -- 这里可以选更多的一起放到表格里来
FROM sale AS s
JOIN item AS i ON s.item_id = i.id;


Oracle中的left join算left full join:
1
2
3
SELECT last_name, department_name
FROM employees e [LEFT|RIGHT|FULL] OUTER JOIN departments d
ON e.department_id = d.department_id;


Junction table(many to many relationship)

1
2
3
4
5
6
7
8
9
10
SELECT * FROM customer;
SELECT * FROM item;
SELECT * FROM sale;

SELECT c.name AS Cust, c.zip, i.name AS Item, i.description, s.quantity AS Quan, s.price AS Price
FROM sale AS s
JOIN item AS i ON s.item_id = i.id
JOIN customer AS c ON s.customer_id = c.id
ORDER BY Cust, Item
;
1
2
3
4
5
6
7
8
9
INSERT INTO customer ( name ) VALUES ( 'Jane Smith' );
SELECT * FROM customer;

-- left joins
SELECT c.name AS Cust, c.zip, i.name AS Item, i.description, s.quantity AS Quan, s.price AS Price
FROM customer AS c
LEFT JOIN sale AS s ON s.customer_id = c.id
LEFT JOIN item AS i ON s.item_id = i.id
ORDER BY Cust, Item;


在oracle中有类似的Cross join: >Generates a Cartesian Product.
1
2
SELECT last_name, department_name
FROM employees CROSS JOIN departments;


UNION

合并table.多个table竖着合并.
Union中会选择union前的column name作为最后的column name.

1
2
3
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;
1
2
3
SELECT column_name(s) FROM table1
UNION ALL
SELECT column_name(s) FROM table2;

注:
Union 会默认remove duplicated rows,union all不会.



关于Oracle的竖着的合并:

Union和Union all 一样,注意union会自带sort:

Union all:

  • Including duplicated rows
  • Result is not sorted

Union:

  • Removing duplicated rows
  • Performs a record sort, from left to right

Intersect:

  • Returns the rows that are common to both queries
  • Displays names that appear in both tables, removing duplicated rows
  • Performs a record sort
1
2
3
4
5
SELECT first_name, last_name
FROM employees
INTERSECT
SELECT first_name, last_name
FROM consultants;

Minus:

  • Returns the distinct rows selected by the first query not selected by the next query
  • Displays employees who are not consultants, removing duplicated rows, if any
  • Performs a record sort

(也就是第二个query的数据用来减第一个query.)

1
2
3
4
5
SELECT first_name, last_name
FROM employees
MINUS
SELECT first_name, last_name
FROM consultants;

关于这些竖着的合并:

  • All column headings come from the first query, use generic column aliases as needed
  • The number of columns selected must match
  • The datatype family of the columns selected must match
  • The order by clause can appear only once, at the very end
  • To avoid confusion, it is common practice to order by column integers

比如把order2放在最后面:

1
2
3
4
5
6
SELECT first_name, last_name
FROM consultants
MINUS
SELECT first_name, last_name
FROM employees
ORDER BY 2;
1
2
3
4
5
SELECT employee_id, job_id, salary
FROM employees
UNION
SELECT employee_id, job_id, 0
FROM job_history;

把从job_history里的数据在salary里默认为0.

1
2
3
4
5
SELECT location_id, department_name, NULL AS city
FROM departments
UNION
SELECT location_id, NULL, city
FROM locations;

注意第一个query中要确认名字,空出null为city给第二个query.
而第二个query中的departent_name就不用.

(oracle笔记结束)


Self join

Write an SQL query to fetch the list of employees with the same salary.

1
2
3
4
Select distinct W.WORKER_ID, W.FIRST_NAME, W.Salary
from Worker W, Worker W1
where W.Salary = W1.Salary
and W.WORKER_ID != W1.WORKER_ID;


来自Oracle的Self join笔记: You may need to scan more than one row in a table to arrive at an answer. 例子:一个employee有id也有manager_id.要直接寻找employee的manager可以按两者的id join.
1
2
3
4
SELECT e.last_name as emp, m.last_name as mgr
FROM employees e JOIN employees m
ON e.manager_id = m.employee_id
ORDER BY emp;



Substitution variables(Oracle)

Use a variable name prefixed with & to prompt the user

1
2
3
SELECT first_name, last_name, salary, job_id
FROM employees
WHERE employee_id = &empid;

会跳出让输入empid的窗口.

Use a variable name prefixed with && to prompt the user only once

1
2
3
SELECT first_name, last_name, salary, &&col4
FROM employees
ORDER BY &col4;



SQL string

1
2
3
4
-- in standard sql case
SELECT 'a literal sql string';
SELECT 'if we need a single quote in our string we do ''two single quote to represent one'
SELECT 'we can also' || 'connect string' || 'like this';

每个system不大一样,记得查看documentation(再放送).
// in SQLite case

1
2
-- string length
SELECT LENGTH('string');
1
2
-- 数每一个city的名字长度,从大到小排序.
SELECT Name, LENGTH(Name) AS Len FROM City ORDER BY Len DESC;

用SUBSTRING来cut string.

What is the difference between substr and substring in SQL?
Both functions take two parameters, but substr() takes the length of the substring to be returned, while substring takes end index (excluding) for a substring.

1
2
3
-- substring
SELECT SUBSTR('this string', 6); -- second argument is the starting position, return 'string'
SELECT SUBSTR('this string', 6, 3); -- third argument is the length of return, return 'str'

截取string可以用于改名.

注意:
在SQLite中没有substring只有substr.
SQLite中也没有right:从右边cut string.
如果要从右边cut string,可以用substr(‘THE STRING’, -6).



接select的string comparison:

Like和equal(=)的区别.,一个是match一组pattern,一个是完全等于.
Special character in sql.
Find all special characters in a column in SQL Server 2008.

^ (as the first character inside []) says “anything not in this range. The [] block can just contain individual characters - you just put each possible character in turn. The above is equivalent to [^aAbBcCdD….yYzZ0123456789’. If you said [^a-Z,0-9] then you’re just allowing , as another character that will not be highlighted.

关于这个capital(^)的字符,在sql里是not的意思,但在bash的笔记中是beginning的意思.
包括bash中并没有百分号(%)的使用.这个两个不同啊.

Other wildcards symbols.
MySQL和sqlite和其他也有区别,要注意(悲).

Example:
Query the list of CITY names starting with vowels (i.e., a, e, i, o, or u) from STATION. Your result cannot contain duplicates.

1
2
3
4
5
-- mysql的答案
SELECT DISTINCT city FROM station WHERE city REGEXP "^[aeiou].*";

-- not start with vowels
SELECT DISTINCT city FROM STATION WHERE city REGEXP '^[^aeiou]';

这题也可以笨一点的做,用cut string的方式来compare第一个字母.

Ending with vowels (i.e., a, e, i, o, or u):

1
2
3
4
SELECT DISTINCT CITY FROM STATION WHERE RIGHT(CITY,1) IN ('a','i','e','o','u');

-- not
select distinct city from station where right(city,1) not in ('a', 'e', 'i', 'o', 'u');
1
2
3
-- % : a%b -> ab, aab, axbdb
-- _ : a_b -> aab, abb, acb,adb
SELECT * FROM customer WHERE credit_rating like 'A%';

注意,这个加下划线和百分号的使用是跟like相关的.
以及并不能一起用like和in.



INSTR(Oracle)

INSTR returns the numeric position of a substring.

INSTR(column, ‘string’ , beginning position , occurrence)

  • 0 is returned if the substring is not found
  • The position where you want the search to start. Position 1 is the default
  • Which occurrence you want searched for. Occurrence 1 is the default
1
2
SELECT street_address, INSTR(street_address, ' ')
FROM locations;

用空格很实用.



INSTR(Oracle)

Use LPAD and RPAD to pad a column on the left or right with specified character(s) up to a specified length
Pad the salary on the left with $ up to a length of 6 characters

1
2
SELECT LPAD(salary, 6, '$')
FROM employees;

一共6个char,左边空出来的加$.



Char data type(Oracle)

关于Orcale char的data type:

Varchar2

  • used for variable-length character columns
  • has a maximum length of 4000 characters
  • most character data is of this type

    last_name VARCHAR2(15)

Char:

  • used for fixed-length character data columns
  • has a maximum length of 2000 characters
  • useful for country codes and other fixed-length character data

    country_id CHAR(2)

My sql也有类似的varchar(没有2),和char,详情.

Oracle:
Use LIKE to perform wildcard searches through character, number, date, and timestamp data
Search conditions will be in quotes, or ticks, and can contain

  • Literal values
  • _ which denotes one character
  • % which denotes 0 or more characters


TRIM

可以做到“remove white spaces from the right side”之类的.

1
2
3
4
SELECT TRIM('   string   '); -- 会remove all spaces
SELECT LTRIM(' string '); -- remove spaces from beginning
SELECT RTRIM(' string '); -- remove spaces from the end
SELECT TRIM('...string...', '.'); -- second statement是要remove的character


TRIM(Oracle)

Oracle中的trim感觉很不一样…

TRIM the leading S from email

1
2
SELECT email, TRIM(LEADING 'S' FROM email) 	FROM employees
WHERE email LIKE 'S%' ;

TRIM the trailing S from email

1
2
SELECT email, TRIM(TRAILING 'S' FROM email) 	FROM employees
WHERE email LIKE '%S' ;

TRIM BOTH the leading and training S from email
Because BOTH is the default, it can be omitted

1
2
3
SELECT email, TRIM(BOTH 'S' FROM email)
FROM employees
WHERE email LIKE 'S%S' ;


REPLACE

Example: Write an SQL query to print FIRST_NAME from Worker table after replacing ‘a’ with ‘A’.

1
Select REPLACE(FIRST_NAME, 'a', 'A') from Worker;

SQLite也能做,好强.



CONCAT

用于合并两个column的value,append string那样的感觉.

Example: Write an SQL query to print the FIRST_NAME and LAST_NAME from Worker table into a single column COMPLETE_NAME. A space char should separate them.

1
Select CONCAT(FIRST_NAME, ' ', LAST_NAME) AS 'COMPLETE_NAME' from Worker;

SQLite并不支持concat.
但是可以使用|| <-真是淳朴的链接方式!

1
Select FIRST_NAME ||  ' ' || LAST_NAME AS 'COMPLETE_NAME' from Worker;

(同时mysql不支持||,悲)



UPPER and LOWER

they convert ASCII characters

1
2
3
4
5
SELECT 'StRiNg' = 'string'; -- this returns false,大小写matters
SELECT LOWER('StRiNg') = LOWER('string'); -- return true
SELECT UPPER('StRiNg') = UPPER('string'); -- return true
SELECT UPPER(Name) FROM City ORDER BY Name;
SELECT LOWER(Name) FROM City ORDER BY Name;


在oracle中还有一种格式function:

Change the output character data to capitalized initials

1
2
SELECT email, INITCAP(email)
FROM employees;


SQL number

again:每个system都不一样
fundamental numeric types: real & integer
integer types:

real number sacrifice accuracy for scale.

Oracle的nuber:

Contain variable-length number data up to 38 digits long
Number data could be

  • Integer data created with a Precision
  • Floating point data created with Precision and Scale.

Precision - the total number of digits:

employee_id NUMBER(6)

Scale specifies how many digits are on the right of the decimal.

gross_pay NUMBER(9,2) – 9 is Precision and 2 is scale



TYPEOF

1
2
3
4
SELECT TYPEOF( 1 + 1 ); -- integer
SELECT TYPEOF( 1 + 1.0 ); -- real
SELECT TYPEOF('panda'); -- text
SELECT TYPEOF('panda' + 'koala'); -- integer -> 注意这里每个system都可能不一样,在这个情况下先string convert to integer再加在一起


CAST

cast用于转换数据类型.cast(express as datatype)
string转int好助手.
//integer的result always be integer

1
2
3
4
5
SELECT 1 / 2; -- result is 0
SELECT 1.0 / 2; -- result 0.5
SELECT CAST(1 AS REAL) / 2; -- result 0.5
SELECT 17 / 5; -- result 3
SELECT 17 / 5, 17 % 5; -- result 3 with remainder of 2

cast可以用来拼接字符串.先把数字转换成char,在用||拼接.可以拼成百分号的数字.

1
Cast(round((high-low)/(open+close) * 2,2) * 100 as varchar) || '%' as vol -- (别名)


ROUND

四舍五入函数.
round(expression, precision精度)
精度可以是负数,保留到百位.

1
2
3
4
SELECT 2.55555;
SELECT ROUND(2.55555); -- result is 3
SELECT ROUND(2.55555, 3); -- 2.556, second is 位数
SELECT ROUND(2.55555, 0); -- default,也就是到整数位

Oracle:
Trunc: Rounds a value down to a specified decimal
Mod: Returns the remainder of division
Ceil: Returns the smallest integer greater than a number
Floor: Returns the largest integer less than a number

1
2
SELECT ROUND(187.8575,2)
FROM DUAL;




Nested and Repeated Data

自带class的感觉.

Nested columns have type STRUCT (or type RECORD). This is reflected in the table schema below.

Recall that we refer to the structure of a table as its schema. If you need to review how to interpret table schema, feel free to check out this lesson from the Intro to SQL micro-course.

1
2
# Print information on all the columns in the table
sample_commits_table.schema

SchemaField(‘author’, ‘RECORD’, ‘NULLABLE’, None, (SchemaField(‘name’, ‘STRING’, ‘NULLABLE’, None, (), None), SchemaField(‘email’, ‘STRING’, ‘NULLABLE’, None, (), None), SchemaField(‘time_sec’, ‘INTEGER’, ‘NULLABLE’, None, (), None), SchemaField(‘tz_offset’, ‘INTEGER’, ‘NULLABLE’, None, (), None), SchemaField(‘date’, ‘TIMESTAMP’, ‘NULLABLE’, None, (), None)), None),

比如说在这个名为committer的column里面,有很多种项目.
如果要access name这一栏,用committer.name.

自带array的感觉.

When querying repeated data, we need to put the name of the column containing the repeated data inside an UNNEST() function.

要选择repeated data的时候:

1
2
FROM `bigquery-public-data.github_repos.languages`,
UNNEST(language) AS l

repeated column会分开来select.





SQL date and time

SQL store date and time in UTC

//for SQLite

1
2
3
4
5
6
7
8
SELECT DATETIME('now'); -- in UTC -> ‘2020-11-24 15:07:13’
SELECT DATE('now'); -- -> ‘2020-11-24'
SELECT TIME('now'); -- ->'15:07:13'
SELECT DATETIME('now', '+1 day');
SELECT DATETIME('now', '+3 days');
SELECT DATETIME('now', '-1 month');
SELECT DATETIME('now', '+1 year');
SELECT DATETIME('now', '+3 hours', '+27 minutes', '-1 day', '+3 years');

这里是后期订正:

在sqlite中遇到了’D/M/YYYY’格式的string,想要转换成timestamp.
用了很笨的方法:

先把最后的year给cut出来,
用负值的substr,substr(‘’D/M/YYYY’’, -4).
把’D/M’给cut出来:
Remove the last character in a string in T-SQL?
再把D和M分开:
SQLite split string.
最后用integer value合并一个timestamp:
select date(my_year ||’-01-01’,’+’||(my_month-1)||’ month’,’+’||(my_day-1)||’ day’);
SQLite: how to create a date from integer year and month components?

想知道更聪明的方法…

关于sql还有一个截取时间的办法.

1
Strftime(format, timestring[,modifiers])
format explanation
%Y year with century(yyyy)
%m month(01-12)
%d day of the month(1-31)
%H hour on 24-hour clock(00-23)
%M minute(00-59)
%S second(00-59)
timestring explanation
now now is a literal used to return the current date
YYYY-MM-DD date value formatted as ‘YYYY-MM-DD’
HH:MM:SS.SSS HH:MM:SS.SSS

对于timestamp来说,可以用 datepart() function来分别时间和日期.

1
select distinct datepart(timestamp) as date format=date --通过format转换成看得懂的格式



SQL date and time(Oracle)

TO_CHAR

TO_CHAR(date, ‘format model’)

Converts dates from the default DD-MON-RR format to a more readable format based on the format model you provide

  • Must be enclosed in single quotes
  • May be caSe sEnsiTive
  • Use fm to remove extra spaces and/or leading zeros
Element Description
YYYY Four digit year
YEAR Year spelled out - case sensitive
MONTH Month spelled out - case sensitive
MON Three letter month - case sensitive
MM Two digital month
DAY Day of the week spelled out - case sensitive
DY Three letter day - case sensitive
DD Two digit day of the month - case sensitive
AM or PM Meridian Indicator
A.M. or P.M. Meridian Indicator with periods
HH or HH12 or HH24 Hour of day 12-11 or 12-11 or 00-23
MI Minutes 0-59
SS Seconds 0-59
SSSSS Seconds since midnight 00000-86399
1
2
3
4
5
SELECT TO_CHAR(start_date, ‘DD Month, yyyy’) AS start_date
FROM job_history;

SELECT TO_CHAR(sysdate, 'DD Month, yyyy HH:MI:SS AM') AS right_now
FROM dual;

dual is the dummy table.

TO_CHAR can also use in number.

Element Description
9 Represents one digit
0 Forces a zero to be displayed
$ Displays a floating dollar sign
. Displays a decimal point
, Displays a comma
L Floating local currency symbol
G Local group separator
D Local decimal symbol
1
2
3
SELECT TO_CHAR(salary, '$999,999.99') AS salary
FROM employees
WHERE department_id = 20;

会得到$13,000.00格式的数字.

类似还有TO_DATE的function.

Converts characters into dates using the same available format elements as TO_CHAR

1
2
3
SELECT last_name, salary, hire_date
FROM employees
WHERE hire_date >= TO_DATE('December 1, 2009','Month dd, yyyy');



Aggregate(聚合函数)

1
2
3
-- here we count all countries

SELECT COUNT(*) FROM Country;
1
2
3
4
5
-- here we count all countries in different region
SELECT Region, COUNT(*) AS Count
FROM Country
GROUP BY Region
ORDER BY Count DESC, Region;

GROUP BY:用于分组.

The GROUP BY clause allows you to apply the summary functions to a group of observations (the same as CLASS statement in PROC MEANS), you only need it when you apply summary functions.

如有以下的数据:

gender height
male 60
female 70

我们通过group by gender来求不同gender的height mean.

If you specify a GROUP BY clause in a query that does not contain a summary
function, your clause is changed to an ORDER BY clause, and a message to
that effect is written to the SAS log.

1
2
select cty, mean(amount) as Avg_fraud_dollar from disk.txn_data_v9 where fraud='Yes'
group by cty having count(*)>=1000;
1
2
3
4
SELECT column1,column2,columnN,
COUNT(*) as rec_per_group --一般会用COUNT(*),因为返回不会为0.
FROM table_name
GROUP BY column1,column2,columnN;

如果用group by搜寻max,每一组只会返回一个数据,在同时出现两个max的情况下,我们可以先找出每组的最高,再用这个数据去搜索拥有最高的人.

select和group by的列得对应.如果run出来有问题…可以试试在group by里增加全select的列.

寻找第二大的数:

1
2
3
4
-- 用order by
select *from employee
group by salary
order by salary desc limit 1,1;

或者

1
2
3
4
SELECT name, MAX(salary) AS salary
FROM employee
WHERE salary <> (SELECT MAX(salary)
FROM employee);

Write an SQL query to determine the nth (say n=5) highest salary from a table.
Ans.

The following MySQL query returns the nth highest salary:

1
SELECT Salary FROM Worker ORDER BY Salary DESC LIMIT n-1,1;

The following SQL Server query returns the nth highest salary:

1
2
3
4
5
6
7
SELECT TOP 1 Salary
FROM (
SELECT DISTINCT TOP n Salary
FROM Worker
ORDER BY Salary DESC
)
ORDER BY Salary ASC;



WITH AS

On its own, AS is a convenient way to clean up the data returned by your query. It’s even more powerful when combined with WITH in what’s called a “common table expression”.

A common table expression (or CTE) is a temporary table that you return within your query. CTEs are helpful for splitting your queries into readable chunks, and you can write queries against them.




Analytic function

This is a good way to do rolling time aggregation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
WITH trips_by_day AS
(
SELECT DATE(start_date) AS trip_date,
COUNT(*) as num_trips
FROM `bigquery-public-data.san_francisco.bikeshare_trips`
WHERE EXTRACT(YEAR FROM start_date) = 2015
GROUP BY trip_date
)
SELECT *,
SUM(num_trips)
OVER (
ORDER BY trip_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_trips
FROM trips_by_day
trip_date num_trips cumulative_trips
0 2015-01-01 181 181
1 2015-01-02 428 609
2 2015-01-03 283 892
3 2015-01-04 206 1098
4 2015-01-05 1186 2284

具体见kaggle教程.

  1. Analytic aggregate functions

    As you might recall, AVG() (from the example above) is an aggregate function. The OVER clause is what ensures that it’s treated as an analytic (aggregate) function. Aggregate functions take all of the values within the window as input and return a single value.

MIN() (or MAX()) - Returns the minimum (or maximum) of input values
AVG() (or SUM()) - Returns the average (or sum) of input values
COUNT() - Returns the number of rows in the input

  1. Analytic navigation functions

    Navigation functions assign a value based on the value in a (usually) different row than the current row.

FIRST_VALUE() (or LAST_VALUE()) - Returns the first (or last) value in the input
LEAD() (and LAG()) - Returns the value on a subsequent (or preceding) row

  1. Analytic numbering functions

    Numbering functions assign integer values to each row based on the ordering.

ROW_NUMBER() - Returns the order in which rows appear in the input (starting with 1)
RANK() - All rows with the same value in the ordering column receive the same rank value, where the next row receives a rank value which increments by the number of rows with the previous rank value.

1
2
3
4
5
6
7
8
9
10
SELECT pickup_community_area,
trip_start_timestamp,
trip_end_timestamp,
RANK()
OVER (
PARTITION BY pickup_community_area
ORDER BY trip_start_timestamp
) AS trip_number
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
WHERE DATE(trip_start_timestamp) = '2017-05-01'


HAVING and WHERE

对分组进行筛选.会需要先group by再用having筛选.

1
2
3
4
5
6
7
8
9
SELECT a.title AS Album, COUNT(t.track_number) as Tracks
FROM track AS t
JOIN album AS a
ON a.id = t.album_id
WHERE a.artist = 'The Beatles'
GROUP BY a.id
HAVING Tracks >= 10 --using having clause similar to where
ORDER BY Tracks DESC, Album
;

where和having的区别.
where用于选行,having用于选组.
where比having先run.
where不可以但having可以加summary function.(where后不可以加单个variable的summary function)



AVG and MIN, MAX and SUM

1
2
3
4
5
SELECT COUNT(Population) FROM Country;
SELECT AVG(Population) FROM Country;
SELECT Region, AVG(Population) FROM Country GROUP BY Region;
SELECT Region, MIN(Population), MAX(Population) FROM Country GROUP BY Region;
SELECT Region, SUM(Population) FROM Country GROUP BY Region;

注,sum一定要写在最外面,sum多项时是sum(A,B,C).

sum(A)会从上到下sum A中的所有值.
sum(A,B)会从左到右从上到下sum A B中所有值.

注,语法顺序:

1
2
3
4
5
6
7
SELECT column1, column2, columnN, sum(columnX)
FROM table_name
WHERE [ condition ]
GROUP BY column1, column2, columnN
HAVING [ condition ]
ORDER BY column1, column2, columnN
;

所以group by后面接having,where不能用在group by后面.
where:在原表中筛选;having是筛完了后再筛.

SELECT 6
FROM 1
WHERE 2
GROUP BY 3
HAVING 4
ORDER BY 5




Printf

在sqlite里面only.printf(“%X.YF”, expression),第一个%没有意义,X为整数位保留的数据,Y为小数位保留的数据,F为format.
没有四舍五入,是硬截.

1
2
3
printf("%.2f%", (high-low)/(open+close)2100) as vol --123.567 = 123.56(整数位不变,小数位取两位,f为float,最后一个%为加上的char)
round(123.456, 2) = 123.46
printf("%.2f", 123.456) = 123.45



Transaction

可以理解为C语言里的try和catch.如果一个命令fail了,保护其他的/原数据库不会造成影响.
For example:

  • A transfer of funds between two accounts should include a debit from one account and a credit to another account in the same amount
  • Both actions should succeed or fail together
  • The commit should occur after both steps are successfully completed
1
2
3
4
5
-- all statement between will be performed by one unit
BEGIN TRANSACTION;
INSERT INTO widgetSales ( inv_id, quan, price ) VALUES ( 1, 5, 500 );
UPDATE widgetInventory SET onhand = ( onhand - 5 ) WHERE id = 1;
END TRANSACTION;
1
2
3
BEGIN TRANSACTION;
INSERT INTO widgetInventory ( description, onhand ) VALUES ( 'toy', 25 );
ROLLBACK; -- if we rollback before end transaction, nothing happened

Aside: transaction makes things faster also.


Oracle中分为:commit and rollback.有点像git保存,在中间做一个savepoint,可以rollback取消更新.
A database transaction begins when the first DML statement is executed

The transaction ends when one of the following events occurs

  • A COMMIT or ROLLBACK statement is issued
  • A DDL statement is executed – implicit COMMIT
  • A DCL statement is executed – implicit COMMIT
  • The user exits SQL Developer
  • The database crashes - implicit ROLLBACK

ALL DML CHANGES MUST BE EITHER COMMITED OR ROLLED BACK




Triggers

Will trigger when certain condition meet. Also highly depends on different system.

1
2
3
4
5
CREATE TRIGGER newWidgetSale AFTER INSERT ON widgetSale
BEGIN
UPDATE widgetCustomer SET last_order_id = NEW.id WHERE widgetCustomer.id = NEW.customer_id;
END
;

we can also use trigger to prevent an update(using rollback).//using before update

1
2
3
4
5
6
7
8
9
10
11

CREATE TRIGGER updateWidgetSale BEFORE UPDATE ON widgetSale
BEGIN
SELECT RAISE(ROLLBACK, 'cannot update table "widgetSale"') FROM widgetSale
WHERE id = NEW.id AND reconciled = 1;
END
;

BEGIN TRANSACTION;
UPDATE widgetSale SET quan = 9 WHERE id = 2;
END TRANSACTION;

and timestamps.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
DROP TABLE IF EXISTS widgetSale;
DROP TABLE IF EXISTS widgetCustomer;

CREATE TABLE widgetCustomer ( id integer primary key, name TEXT, last_order_id INT, stamp TEXT );
CREATE TABLE widgetSale ( id integer primary key, item_id INT, customer_id INTEGER, quan INT, price INT, stamp TEXT );
CREATE TABLE widgetLog ( id integer primary key, stamp TEXT, event TEXT, username TEXT, tablename TEXT, table_id INT);

INSERT INTO widgetCustomer (name) VALUES ('Bob');
INSERT INTO widgetCustomer (name) VALUES ('Sally');
INSERT INTO widgetCustomer (name) VALUES ('Fred');
SELECT * FROM widgetCustomer;

CREATE TRIGGER stampSale AFTER INSERT ON widgetSale //using after insert to create stamps
BEGIN
UPDATE widgetSale SET stamp = DATETIME('now') WHERE id = NEW.id;
UPDATE widgetCustomer SET last_order_id = NEW.id, stamp = DATETIME('now')
WHERE widgetCustomer.id = NEW.customer_id;
INSERT INTO widgetLog (stamp, event, username, tablename, table_id)
VALUES (DATETIME('now'), 'INSERT', 'TRIGGER', 'widgetSale', NEW.id);
END
;

INSERT INTO widgetSale (item_id, customer_id, quan, price) VALUES (1, 3, 5, 1995);
INSERT INTO widgetSale (item_id, customer_id, quan, price) VALUES (2, 2, 3, 1495);
INSERT INTO widgetSale (item_id, customer_id, quan, price) VALUES (3, 1, 1, 2995);

SELECT * FROM widgetSale;
SELECT * FROM widgetCustomer;
SELECT * FROM widgetLog;
-- restore database
DROP TRIGGER IF EXISTS newWidgetSale;
DROP TRIGGER IF EXISTS updateWidgetSale;
DROP TRIGGER IF EXISTS stampSale;

DROP TABLE IF EXISTS widgetCustomer;
DROP TABLE IF EXISTS widgetSale;
DROP TABLE IF EXISTS widgetLog;



Exists(Oracle)

  • Tests for the existence of a value in the result of the subquery
  • If the value is found, the subquery stops and TRUE is returned
  • If the value is not found in the subquery, FALSE is returned
  • The subquery does not return any table data
  • EXISTS was originally introduced to make correlated subqueries more efficient

Who is a manager?

1
2
3
4
5
SELECT employee_id, last_name, job_id
FROM employees e
WHERE EXISTS (SELECT 'x'
FROM employees
WHERE manager_id = e.employee_id);

Find departments with no employees, NOT flips TRUE to FALSE

1
2
3
4
5
SELECT department_id, department_name
FROM departments d
WHERE NOT EXISTS (SELECT 'x'
FROM employees
WHERE department_id = d.department_id);



View

在oracle时的解释:

Table: Basic storage unit composed of rows and columns.
View: Shows subsets of data from one or more underlying tables.

  • Views control which columns and/or rows a user can see
  • Views present a subset of table data to a user
  • A view is a SELECT statement that has a name
  • Views are permanently stored in the Data Dictionary

Creating a view.

1
2
3
4
CREATE OR REPLACE VIEW view_name
AS SELECT col1, col2, col3, etc
FROM tables
WHERE etc;

Describe the structure of the view

1
DESC view_name;

All columns selected in a view must have a name.

1
2
3
CREATE OR REPLACE VIEW yearly_pay_vu
AS SELECT employee_id, last_name, salary * 12 AS yearly_pay -- 用as取名
FROM employees;

Modifying a view.

Modify the emps50_vu by reducing the columns selected and giving each column an alias.

1
2
3
4
CREATE OR REPLACE VIEW emps50_vu	(empid, fname, lname, sal, mgr, deptid)
AS SELECT employee_id, first_name, last_name, salary, manager_id, department_id
FROM employees
WHERE department_id = 50;

Dropping a view.

1
DROP VIEW view_name;



Schemas

A way to organizing tables and views.

Schemas are grouping structures, we create schema by specifying the create schema command.
Visually it allows us to group things like our tables, views, and indexes into a logical unit.
We could create a schema based on a project or kind of analysis that we’re doing.

1
CREATE SCHEMA data_sci



Subselect

把select当作select的source.

1
2
3
4
5
6
7
SELECT co.Name, ss.CCode FROM (
SELECT SUBSTR(a, 1, 2) AS State, SUBSTR(a, 3) AS SCode,
SUBSTR(b, 1, 2) AS Country, SUBSTR(b, 3) AS CCode FROM t
) AS ss
JOIN Country AS co
ON co.Code2 = ss.Country
;

When you SELECT from a subquery, you are selecting from a temporary result set

1
2
3
4
SELECT AVG(dept_tot)
FROM (SELECT manager_id, sum(salary) as dept_tot
FROM employees
GROUP BY manager_id) a;
  • The outer query selects FROM “a”
  • a is a query that is inline, has a name, and represents a result set of data, much like a view
  • a is known as an Inline View
  • An inline view is not permanently stored in the database



Subquery

也就是Nested(嵌套)query.

SQL allows you to embed a query into another query, subquery can be uncorrelated or correlated

Subquery的结构中,整体的叫parent/main query,其中的子query叫nested/inner query.
(当作subset来理解.)
什么时候要使用nested query:无法用and和or做一次查询得出结论,得使用多次查询.(第一次的查询的结果用于第二次查询.)

Uncorrelated subquery:

1
2
3
select distinct card_num from clean_data
where Fraud='Yes' and MCC_desc = 'Transportation' and
card_num in (select card_num from clean_data where Fraud = 'Yes' and MCC_desc = 'Grocery'); -- 如果没有主query副query也可以独立运行

Correlated Subquery:

A sub-query that uses values from the outer query. In this case the inner
query has to be executed for every row of outer query.

1
2
select * from disk.multiple_financial as a where exists
(select * from disk.multiple_rating as b where a.id=b.id and a.year=b.year and b.df=1); -- 不建议用correlated subquery因为run得很慢

Comparison的时候:
When a subquery might return multiple values, you must use one of the conditional operators ANY or ALL to modify a comparison operator in the WHERE or HAVING clause immediately before the subquery.
For example, the following WHERE clause contains the less than (<) comparison operator and the conditional operator ANY:

1
2
where dateofbirth < ANY {subquery...}
where dateofbirth < ALL {subquery...}

other example:这里的使用是两个select套在一起.查询两个表.

1
2
3
4
5
SELECT column1, ... ,columnN
FROM tableA
WHERE column1 IN (SELECT columnN
FROM tableB
WHERE ...)
1
2
3
4
5
6
7
8
9
create table tmp3 as
select *
from col_trans
where customer_id in (
select customer_id from customer
where jurisdiction = 'Canada'
and industry = 'Financial'
)
;

注意这里前一个where和后一个select是对应的,不然会出现return和需求的column个数不对.
不一定需要内容一样,但return的column数得相同.

有时可以用join代替,先join再查询.
但建议使用subquery,运算速度会快(提升性能).
在数量对应不上的时候,我们可以用or来拆开子query.

1
2
3
4
5
6
7
8
9
10
11
create table tmp9 as
select *
from col_trans
where security_id in (
select security_id from sec
where security_type = 'Equity' and price <= 30
) or security_id in (
select security_id_2 from sec
where security_type = 'Equity' and price <= 30
)
;

或者用union合并.

1
2
3
4
5
6
7
8
create table tmp10 as
select *
from col_trans
where security_id in (
select security_id from sec where security_type = 'Equity' and price <=30
union
select security_id_2 from sec where security_type = 'Equity' and price <=30
);



Subquery(Oracle)

Subquery学的很烂所以在oracle中又记一次.

Subqueries in the where clause

  • The inner query executes before the main query.
  • The result of the subquery is used to complete the outer query.
  • The shorthand shows the basic layout of subqueries in the WHERE
1
2
3
4
5
SELECT select_list
FROM table
WHERE column_name operator (SELECT select_list
FROM table
WHERE …);
1
2
3
4
5
SELECT last_name, salary 			 		    
FROM employees
WHERE salary > (SELECT salary
FROM employees
WHERE employee_id = 145);




还可以subquery中套outer query.

1
2
3
4
5
6
SELECT column(s)
FROM outer_table
WHERE column_name operator
(SELECT column
FROM inner_table
WHERE column = outer_table.column);

Example:

1
2
3
4
5
SELECT last_name, salary, job_id
FROM employees e
WHERE salary > (SELECT avg(salary)
FROM employees
WHERE job_id = e.job_id);


Single row subqueries

Use single-value comparison operators in the WHERE clause of the outer query

1
2
3
4
5
6
7
8
SELECT last_name, job_id, salary 			 		    
FROM employees
WHERE job_id = (SELECT job_id
FROM employees
WHERE employee_id = 152)
AND salary > (SELECT salary
FROM employees
WHERE employee_id = 152);

Multi row subqueries

有IN, ANY, ALL三种.

1
2
3
4
5
SELECT last_name, job_id, salary
FROM employees
WHERE salary IN (SELECT MIN(salary)
FROM employees
GROUP BY department_id);
1
2
3
4
5
6
SELECT last_name, job_id, salary
FROM employees
WHERE salary < ANY (SELECT salary
FROM employees
WHERE job_id = ‘IT_PROG’)
AND job_id != ‘IT_PROG’;
1
2
3
4
5
SELECT last_name, job_id, salary
FROM employees
WHERE salary < ALL (SELECT salary
FROM employees
WHERE job_id = ‘IT_PROG’);

Subqueries in the having clause

1
2
3
4
5
SELECT department_id, MIN(salary)
FROM employees
GROUP BY department_id
HAVING MIN(salary) > (SELECT MIN(salary)
FROM employees);

(orcacle笔记结束)




With

1
2
3
4
5
6
WITH cte_name AS (SELECT statement)
[, cte_name2 AS (SELECT statement)]
SELECT ... FROM cte_name [inner join cte_name2]

SELECT ... FROM (SELECT statement) AS
subquery_name

Note: WITH clause is mostly equivalent to a subquery.
WITH RECURSIVE is more useful, provide the ability to do hieracrchical/recursive queries

The WITH clause allows complex queries to be broken into smaller blocks.
It is useful when joins, subqueries, aggregation, or repeated query blocks are being used.
WITH allows you to make inline views which can be repeatedly referenced as needed.
The inline views create output that can be used as input to subsequent queries.

Example:
Suppose you need to find the department names and their total salary for any department whose total salary is greater than the average of all the total salaries by department.

Without the WITH clause.

1
2
3
4
5
6
7
8
9
SELECT department_name, SUM(salary)
FROM employees e JOIN departments d
ON e.department_id = d.department_id
GROUP BY department_name
HAVING SUM(salary) > (SELECT AVG(SUM(salary))
FROM employees
WHERE department_id IS NOT NULL
GROUP BY department_id)
ORDER BY 1;

The WHERE clause removes employee Grant, with a null department_id, from the calculations.

Using the WITH clause.

1
2
3
4
5
6
7
8
9
10
11
12
13
WITH a AS (
SELECT department_name, sum(salary) AS totpay
FROM employees e JOIN departments d
ON e.department_id = d.department_id
GROUP BY department_name),
b AS (
SELECT avg(totpay) AS depts_avg
FROM a)
SELECT department_name, tot_pay
FROM a
WHERE totpay > (SELECT depts_avg
FROM b)
ORDER BY 1;



Autoincrement

用于自动添加从1开始的不重复column内容.可以用为增加新的primary key.

1
2
3
create table 'table_name'{
‘id’ integer primary key autoincrement
};

但注意,在这样生成的表中.如果一开始的id为1-20.有数据删除再添加,新的数据会从21开始.




Writing Efficient Queries

Avoid scanning too much data at once.
To begin,you can estimate the size of any query before running it.




关于bigQuery这个web service.

BigQuery里的数据结构.

在bigQuery里可以create a QueryJobConfig object and set the dry_run parameter to True to see how much data a query will scan.

If you are ever unsure what to put inside the COUNT() function, you can do COUNT(1) to count the rows in each group. Most people find it especially readable, because we know it’s not focusing on other columns. It also scans less data than if supplied column names (making it faster and using less of your data access quota).

There are two ways that dates can be stored in BigQuery: as a DATE or as a DATETIME.

The DATE format has the year first, then the month, and then the day. It looks like this:

YYYY-[M]M-[D]D
YYYY: Four-digit year
[M]M: One or two digit month
[D]D: One or two digit day
So 2019-01-10 is interpreted as January 10, 2019.

The DATETIME format is like the date format … but with time added at the end.

There are two ways that dates can be stored in BigQuery: as a DATE or as a DATETIME.

The DATE format has the year first, then the month, and then the day. It looks like this:

YYYY-[M]M-[D]D
YYYY: Four-digit year
[M]M: One or two digit month
[D]D: One or two digit day
So 2019-01-10 is interpreted as January 10, 2019.

The DATETIME format is like the date format … but with time added at the end.

Often you’ll want to look at part of a date, like the year or the day. You can do this with EXTRACT.




1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
WITH LocationsAndOwners AS
(
SELECT *
FROM CostumeOwners co INNER JOIN CostumeLocations cl
ON co.CostumeID = cl.CostumeID
),
LastSeen AS
(
SELECT CostumeID, MAX(Timestamp)
FROM LocationsAndOwners
GROUP BY CostumeID
)
SELECT lo.CostumeID, Location
FROM LocationsAndOwners lo INNER JOIN LastSeen ls
ON lo.Timestamp = ls.Timestamp AND lo.CostumeID = ls.CostumeID
WHERE OwnerID = MitzieOwnerID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
WITH CurrentOwnersCostumes AS
(
SELECT CostumeID
FROM CostumeOwners
WHERE OwnerID = MitzieOwnerID
),
OwnersCostumesLocations AS
(
SELECT cc.CostumeID, Timestamp, Location
FROM CurrentOwnersCostumes cc INNER JOIN CostumeLocations cl
ON cc.CostumeID = cl.CostumeID
),
LastSeen AS
(
SELECT CostumeID, MAX(Timestamp)
FROM OwnersCostumesLocations
GROUP BY CostumeID
)
SELECT ocl.CostumeID, Location
FROM OwnersCostumesLocations ocl INNER JOIN LastSeen ls
ON ocl.timestamp = ls.timestamp AND ocl.CostumeID = ls.costumeID

Why is this better?

Instead of doing large merges and running calculations (like finding the last timestamp) for every costume, we discard the rows for other owners as the first step. So each subsequent step (like calculating the last timestamp) is working with something like 99.999% fewer rows than what was needed in the original query.

Databases have something called “Query Planners” to optimize details of how a query executes even after you write it. Perhaps some query planner would figure out the ability to do this. But the original query as written would be very inefficient on large datasets.