SQL

SQL and Databases

 

Sql is a standard for relational database management. It is used in different areas – including data analysis because it is specific for aggregating and manipulating data.

I used different implementations: MySql, Sql Lite, Oracle. I created/managed databases and tables and I run queries of varying complexity to retrieve data.

I used SQL with:

Assignments

I had assignments for a SQL for Dat Science course. I append an exercise below as an example. I eas given a database and I had to answer some questions, after retrieving the data and results I needed with SQL code


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
SELECT  SUM(prod_xy)*ABS(SUM(prod_xy))/(SUM(sq_diff_y)*SUM(sq_diff_x)) rho
FROM
(
  SELECT c1_t.diff_y, c1_t.diff_x, c1_t.diff_y*c1_t.diff_y sq_diff_y, c1_t.diff_x*c1_t.diff_x sq_diff_x, c1_t.diff_y*c1_t.diff_x prod_xy
  FROM
  (
    SELECT xy.x, av.mu_x, (xy.x - av.mu_x) diff_x, xy.y, av.mu_y, (xy.y - av.mu_y) diff_y
    FROM
      (SELECT fans x, useful y
      FROM user
      ) xy,
      (SELECT AVG(fans) mu_x, AVG(useful) mu_y
      FROM user
      ) av
  ) c1_t
) c2_t
SELECT  SUM(prod_xy)*ABS(SUM(prod_xy))/(SUM(sq_diff_y)*SUM(sq_diff_x)) rho
FROM
(
  SELECT c1_t.diff_y, c1_t.diff_x, c1_t.diff_y*c1_t.diff_y sq_diff_y, c1_t.diff_x*c1_t.diff_x sq_diff_x, c1_t.diff_y*c1_t.diff_x prod_xy
  FROM
  (
    SELECT xy.x, av.mu_x, (xy.x - av.mu_x) diff_x, xy.y, av.mu_y, (xy.y - av.mu_y) diff_y
    FROM
      (SELECT fans x, funny y
      FROM user
      ) xy,
      (SELECT AVG(fans) mu_x, AVG(funny) mu_y
      FROM user
      ) av
  ) c1_t
) c2_t

SELECT COUNT (funny)
FROM user
WHERE funny=0 and review_count != 0