SQL and Databases
Sql is a standard for relational database management. It is used in different areas – including data analysis because it is specific for aggregating and manipulating data.
I used different implementations: MySql, Sql Lite, Oracle. I created/managed databases and tables and I run queries of varying complexity to retrieve data.
I used SQL with:
Assignments
I had assignments for a SQL for Dat Science course. I append an exercise below as an example. I eas given a database and I had to answer some questions, after retrieving the data and results I needed with SQL code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36 SELECT SUM(prod_xy)*ABS(SUM(prod_xy))/(SUM(sq_diff_y)*SUM(sq_diff_x)) rho
FROM
(
SELECT c1_t.diff_y, c1_t.diff_x, c1_t.diff_y*c1_t.diff_y sq_diff_y, c1_t.diff_x*c1_t.diff_x sq_diff_x, c1_t.diff_y*c1_t.diff_x prod_xy
FROM
(
SELECT xy.x, av.mu_x, (xy.x - av.mu_x) diff_x, xy.y, av.mu_y, (xy.y - av.mu_y) diff_y
FROM
(SELECT fans x, useful y
FROM user
) xy,
(SELECT AVG(fans) mu_x, AVG(useful) mu_y
FROM user
) av
) c1_t
) c2_t
SELECT SUM(prod_xy)*ABS(SUM(prod_xy))/(SUM(sq_diff_y)*SUM(sq_diff_x)) rho
FROM
(
SELECT c1_t.diff_y, c1_t.diff_x, c1_t.diff_y*c1_t.diff_y sq_diff_y, c1_t.diff_x*c1_t.diff_x sq_diff_x, c1_t.diff_y*c1_t.diff_x prod_xy
FROM
(
SELECT xy.x, av.mu_x, (xy.x - av.mu_x) diff_x, xy.y, av.mu_y, (xy.y - av.mu_y) diff_y
FROM
(SELECT fans x, funny y
FROM user
) xy,
(SELECT AVG(fans) mu_x, AVG(funny) mu_y
FROM user
) av
) c1_t
) c2_t
SELECT COUNT (funny)
FROM user
WHERE funny=0 and review_count != 0