Open all questions about Spring
Open all questions about Android
In this article:
➤ What is the query syntax in SQL
➤ What is VIEW in SQL
➤ What is INDEX (indexes) in SQL
➤ How to Avoid Contention When Accessing a Record in SQL
➤ What types of relations exist in SQL
➤ How to Make Many-to-Many Relationship in SQL
➤ How to combine a query to multiple tables in SQL
➤ How to Use Query Plan in SQL
➤ How to Use Indexes in SQL
➤ What is the query syntax in SQL
There are many keywords in SQL that are used to create, modify, manage, and retrieve data from databases. Here are the main categories and keywords used in SQL:
DML (Data Manipulation Language) – Data Manipulation Language:
Keywords for working with data, such as inserting, updating, and deleting data.
SELECT:
selects data from a table.
SELECT * FROM users WHERE age > 30; SELECT COUNT(*) FROM users WHERE age > 30;
INSERT:
adds new records to the table.
INSERT INTO users (name, age) VALUES ('John', 25);
UPDATE:
updates existing records in a table.
UPDATE users SET age = 26 WHERE name = 'John';
DELETE:
deletes records from the table.
DELETE FROM users WHERE age < 18;
MERGE:
Combines data from two tables based on conditions.
MERGE INTO employees e USING employees_updates eu ON (e.employee_id = eu.employee_id) WHEN MATCHED THEN UPDATE SET e.salary = eu.salary WHEN NOT MATCHED THEN INSERT (employee_id, salary) VALUES (eu.employee_id, eu.salary);
CALL:
calls a stored procedure.
CALL calculate_salary(12345);
DDL (Data Definition Language) – Data Definition Language:
Keywords for creating and modifying database structure (tables, indexes and other objects).
CREATE:
creates new database objects (tables, indexes, etc.).
CREATE TABLE products ( id INT PRIMARY KEY, name VARCHAR(100), price DECIMAL(10, 2) );
ALTER:
changes the structure of existing database objects.
ALTER TABLE products ADD COLUMN description TEXT;
DROP:
deletes database objects (tables, indexes, etc.).
DROP TABLE products;
TRUNCATE:
deletes all rows from a table, but does not delete the table itself.
TRUNCATE TABLE users;
COMMENT:
Adds comments to database objects.
COMMENT ON TABLE products IS 'Table storing product details';
RENAME:
renames a database object.
RENAME TABLE old_products TO new_products;
DCL (Data Control Language) – Access Control Language:
Keywords for managing data access rights.
GRANT:
Grants users or roles rights to database objects.
GRANT SELECT, INSERT ON orders TO user_admin;
REVOKE:
Revokes permissions on database objects from users or roles.
REVOKE INSERT ON orders FROM user_admin;
TCL (Transaction Control Language) — Transaction Control Language:
Keywords for managing database transactions.
BEGIN TRANSACTION:
starts a transaction.
BEGIN TRANSACTION; UPDATE accounts SET balance = balance - 100 WHERE account_id = 1; UPDATE accounts SET balance = balance + 100 WHERE account_id = 2; COMMIT;
COMMIT:
saves changes made within the current transaction.
BEGIN TRANSACTION; INSERT INTO orders (order_id, customer_id, order_total) VALUES (101, 5, 250.00); INSERT INTO order_items (order_id, product_id, quantity) VALUES (101, 20, 2); COMMIT;
ROLLBACK:
Undoes changes made within the current transaction.
BEGIN TRANSACTION; DELETE FROM employees WHERE employee_id = 45; UPDATE departments SET manager_id = NULL WHERE manager_id = 45; ROLLBACK;
SAVEPOINT:
creates a restore point in a transaction.
BEGIN TRANSACTION; UPDATE products SET price = price * 0.9 WHERE category = 'Electronics'; SAVEPOINT discount_applied; DELETE FROM products WHERE discontinued = true; ROLLBACK TO discount_applied; COMMIT;
SET TRANSACTION:
sets the parameters for the current transaction.
SET TRANSACTION ISOLATION LEVEL READ COMMITTED; BEGIN TRANSACTION; SELECT * FROM sales WHERE sale_date BETWEEN '2024-01-01' AND '2024-01-31'; COMMIT;
Data Management Keywords (Constraints):
Keywords used to specify constraints on data in tables.
PRIMARY KEY:
defines a unique identifier for each row of the table.
CREATE TABLE customers ( customer_id INT PRIMARY KEY, first_name VARCHAR(50), last_name VARCHAR(50) );
FOREIGN KEY:
establishes a relationship between two tables.
CREATE TABLE orders ( order_id INT PRIMARY KEY, customer_id INT, order_date DATE, FOREIGN KEY (customer_id) REFERENCES customers(customer_id) );
UNIQUE:
requires that all values in a column or group of columns be unique.
CREATE TABLE users ( user_id INT PRIMARY KEY, username VARCHAR(50) UNIQUE, email VARCHAR(100) UNIQUE );
NOT NULL:
Disallows null values for this column.
CREATE TABLE products ( product_id INT PRIMARY KEY, product_name VARCHAR(100) NOT NULL, price DECIMAL(10, 2) NOT NULL );
CHECK:
defines a condition that the data must fulfill.
CREATE TABLE employees ( employee_id INT PRIMARY KEY, salary DECIMAL(10, 2) CHECK (salary > 0), age INT CHECK (age >= 18) );
DEFAULT:
sets the default value for the column.
CREATE TABLE orders ( order_id INT PRIMARY KEY, order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP, status VARCHAR(20) DEFAULT 'PENDING' );
Keywords for working with conditions:
Keywords that are used to filter, sort, and aggregate data.
WHERE:
filters rows based on a condition.
SELECT * FROM employees WHERE department = 'Sales' AND salary > 50000;
GROUP BY:
Groups rows by the values of one or more columns.
SELECT department, COUNT(*) AS total_employees FROM employees GROUP BY department;
HAVING:
filters rows after grouping.
SELECT department, AVG(salary) AS average_salary FROM employees GROUP BY department HAVING AVG(salary) > 60000;
ORDER BY:
Sorts the result by one or more columns.
SELECT first_name, last_name, hire_date FROM employees ORDER BY hire_date DESC;
DISTINCT:
returns unique values.
SELECT DISTINCT country FROM customers;
LIMIT:
limits the number of rows returned (in some DBMS, such as MySQL).
SELECT * FROM products ORDER BY price DESC LIMIT 10;
OFFSET:
determines which line to start outputting the result from.
SELECT * FROM products ORDER BY price DESC LIMIT 10 OFFSET 10;
Keywords for working with table joins:
Keywords for combining data from multiple tables.
JOIN:
Combines rows from two or more tables based on the relationship between them.
SELECT o.order_id, c.first_name, c.last_name FROM orders o JOIN customers c ON o.customer_id = c.customer_id;
INNER JOIN:
returns rows with matching values in both tables.
SELECT e.first_name, e.last_name, d.department_name FROM employees e INNER JOIN departments d ON e.department_id = d.department_id;
LEFT JOIN (or LEFT OUTER JOIN):
returns all rows from the left table, even if there are no matches in the right table.
SELECT c.first_name, o.order_id FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id;
RIGHT JOIN (or RIGHT OUTER JOIN):
returns all rows from the right table, even if there are no matches in the left table.
SELECT o.order_id, c.first_name FROM orders o RIGHT JOIN customers c ON o.customer_id = c.customer_id;
FULL JOIN (or FULL OUTER JOIN):
returns all rows from both tables, padding with NULL values if there are no matches.
SELECT a.account_id, t.transaction_id FROM accounts a FULL JOIN transactions t ON a.account_id = t.account_id;
CROSS JOIN:
returns the Cartesian product of two tables (each row from the first table is joined with each row from the second table).
SELECT p.product_name, s.store_name FROM products p CROSS JOIN stores s;
SELF JOIN:
joins a table to itself.
SELECT e1.first_name AS employee, e2.first_name AS manager FROM employees e1 JOIN employees e2 ON e1.manager_id = e2.employee_id;
Keywords for managing sequences and indices:
Keywords for creating and managing sequences and indexes.
CREATE INDEX:
Creates an index on table columns to improve query performance.
CREATE INDEX idx_last_name ON employees(last_name);
CREATE SEQUENCE:
creates a sequence to generate unique numbers.
CREATE SEQUENCE order_seq START WITH 1000 INCREMENT BY 1;
DROP INDEX:
deletes the index.
DROP INDEX idx_last_name; // for some DBMS, such as PostgreSQL DROP INDEX IF EXISTS idx_last_name; // in MySQL, the index is deleted differently ALTER TABLE employees DROP INDEX idx_last_name;
DROP SEQUENCE:
deletes the sequence.
DROP SEQUENCE order_seq;
Keywords for working with views:
Keywords for creating and managing views.
CREATE VIEW:
creates a performance.
CREATE VIEW active_customers AS SELECT customer_id, first_name, last_name, email FROM customers WHERE status = 'ACTIVE';
DROP VIEW:
deletes the view.
DROP VIEW active_customers;
ALTER VIEW:
modifies an existing view.
ALTER VIEW active_customers AS SELECT customer_id, first_name, last_name, email, phone FROM customers WHERE status = 'ACTIVE';
Keywords for working with stored procedures and triggers:
Keywords for creating and managing stored procedures and triggers.
CREATE PROCEDURE:
creates a stored procedure.
CREATE PROCEDURE increase_salary(IN emp_id INT, IN percent DECIMAL) BEGIN UPDATE employees SET salary = salary + (salary * percent / 100) WHERE employee_id = emp_id; END;
CREATE TRIGGER:
creates a trigger that is automatically executed when a specific event occurs (e.g. INSERT, UPDATE, DELETE).
CREATE TRIGGER set_created_at BEFORE INSERT ON users FOR EACH ROW SET NEW.created_at = CURRENT_TIMESTAMP;
EXECUTE:
executes a stored procedure.
CALL increase_salary(101, 10);
DROP PROCEDURE:
drops a stored procedure.
DROP PROCEDURE increase_salary;
DROP TRIGGER:
removes the trigger.
DROP TRIGGER set_created_at;
Aggregation and analytics functions:
Keywords for data aggregation and analysis.
COUNT():
returns the number of rows.
SELECT COUNT(*) AS total_employees FROM employees;
SUM():
returns the sum of the values.
SELECT SUM(order_total) AS total_revenue FROM orders WHERE order_date >= '2024-01-01';
AVG():
returns the average value.
SELECT AVG(salary) AS average_salary FROM employees WHERE department = 'IT';
MIN():
returns the minimum value.
SELECT MIN(price) AS cheapest_product FROM products;
MAX():
returns the maximum value.
SELECT MAX(price) AS most_expensive_product FROM products;
Other keywords and operators:
EXISTS:
checks if there are rows that match the condition.
SELECT first_name, last_name FROM customers WHERE EXISTS ( SELECT 1 FROM orders WHERE orders.customer_id = customers.customer_id );
IN:
checks if a value is contained in a list of values.
SELECT * FROM products WHERE category_id IN (1, 3, 5);
BETWEEN:
checks if a value is within a certain range.
SELECT * FROM orders WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31';
LIKE:
searches for string data based on a pattern.
SELECT * FROM users WHERE email LIKE '%@example.com';
IS NULL:
checks if the value contains NULL.
SELECT * FROM employees WHERE manager_id IS NULL;
CAST():
converts one value to another data type.
SELECT CAST(order_total AS INT) AS total_as_integer FROM orders;
COALESCE():
returns the first non-null value in the list.
SELECT COALESCE(phone, 'No phone') AS contact_number FROM customers;
CASE:
returns a value based on conditions (similar to the if-else statement).
SELECT first_name, last_name, CASE WHEN salary >= 100000 THEN 'High' WHEN salary >= 50000 THEN 'Medium' ELSE 'Low' END AS salary_level FROM employees;
WITH:
used to create temporary results in a query (CTE – Common Table Expression).
WITH high_salary_employees AS ( SELECT * FROM employees WHERE salary > 80000 ) SELECT department, COUNT(*) FROM high_salary_employees GROUP BY department;
UNION:
is used to combine the results of two or more queries. It allows you to combine rows from multiple SELECT queries into a single result set. However, rows from different queries should not be duplicated unless the ALL operator is explicitly used.
SELECT first_name, email FROM customers UNION SELECT name, email FROM newsletter_subscribers;
These are the most common keywords in SQL. They are used to manipulate data and manage the structure of the database. Each DBMS may have its own unique keywords, but the ones listed above are standard SQL commands.
➤ What is VIEW in SQL
VIEW (view) in SQL is a virtual table that is formed based on the query result. Views do not store data physically, they only store the SQL query that is executed each time the view is accessed. This is a convenient tool for simplifying complex queries, ensuring security and improving the convenience of working with data.
Views can be used to display data from one or more tables, and to join, filter, and aggregate data.
Example of creating a VIEW:
CREATE VIEW active_users AS SELECT id, name, email FROM users WHERE status = 'active';
Here the active_users view contains only those users from the users table who have the status "active". Now you can use this view as a regular table:
SELECT * FROM active_users;
This will query the data from the view and actually execute the saved query.
Benefits of using views:
Simplifying complex queries:
Views allow you to hide complex queries behind a simple name, making them easier to work with. Instead of writing long SQL queries every time, you can simply access the view.
Increased security:
Views can be used to restrict access to data. For example, you can grant users access to only certain columns or rows through a view, rather than the original table.
Data abstraction:
Views can help abstract away complex database structures. For example, you can create a view that combines data from multiple tables, and users may not know how the tables are related.
Ensuring Compatibility:
If the table structure changes (for example, new columns are added), the view can serve as a compatibility layer that continues to provide the old interface to existing applications.
Updated views:
Views can support updating, inserting, and deleting data (if this is possible from the SQL server and query logic point of view). However, not all views are updatable. For example, if a view uses aggregate functions (SUM, COUNT, AVG, etc.) or a join (JOIN), then such views usually cannot be directly updated.
Example of an updatable view:
CREATE VIEW basic_users AS SELECT id, name FROM users;
You can update data in this view:
UPDATE basic_users SET name = 'New Name' WHERE id = 1;
Limitations of views:
Do not save data:
Views do not store data, which means that every time you query a view, the database server re-executes the query that created the view.
Not all views can be updated:
Some views cannot be used to modify data, especially if they include complex operations such as aggregations or subqueries.
Performance:
Depending on the complexity of the query on which the view is based, using views can slow down query execution, especially if the query accesses large tables or performs complex operations.
Deleting a view:
If a view is no longer needed, it can be dropped using the DROP VIEW command:
DROP VIEW active_users;
VIEW is a powerful tool for simplifying data management, increasing security, and hiding the complexity of database structure. It allows you to create virtual tables that can combine data, filter it, and offer users more convenient ways to interact with the database.
➤ What is INDEX (indexes) in SQL
Indexes in SQL are special data structures that are used to speed up the search for rows in tables. Indexes are created on one or more columns of a table, and they act as pointers that allow the database to find data faster than if it were to perform a full table scan (i.e., look at all the rows).
How indexes work:
Index structure:
Indexes are often organized as trees (e.g., B-tree or B+ tree), which allows for efficient data retrieval. These structures allow for searches, insertions, and deletions in logarithmic time, which is significantly faster than a full table scan.
Indexes contain sorted data for the columns on which they are built, allowing the database to quickly find the rows it needs.
Main idea:
Instead of scanning the entire table when executing a query (especially if the table contains many rows), the database accesses an index that contains sorted values. Using the index, the database finds the rows it needs with minimal time.
Example of index operation:
Let's say we have a table called employees, which contains information about employees. The table has a column called name, which is often used for search queries.
Without index:
A query like SELECT * FROM employees WHERE name = 'Alice' will cause the database to scan every row in the table (a full table scan) to find rows named "Alice". This is especially expensive for large tables.
With index:
If we create an index on the name column using the command:
CREATE INDEX idx_employee_name ON employees(name);
Then the query SELECT * FROM employees WHERE name = 'Alice' will use this index. The database will already know where the records named "Alice" are, thanks to the index, which will significantly speed up the query.
Advantages of indices:
Speeding up data retrieval queries:
Indexes significantly speed up the execution of SELECT queries, especially those that include WHERE, JOIN, ORDER BY, or GROUP BY conditions.
Improving JOIN performance:
If indexes are used on the columns by which tables are joined, this also speeds up the execution of queries with JOINs.
Data ordering:
Indexes can help organize data in queries with ORDER BY because they already contain sorted column values.
Disadvantages of indices:
Slowing down insert, update and delete operations:
Every time a row is inserted, updated, or deleted in a table, the database must update the index, which requires additional resources. The more indexes a table has, the more overhead these operations incur.
Memory usage:
Indexes take up additional disk and memory space. If a table contains multiple indexes, each index must store a sorted copy of the column data.
Types of indexes in SQL:
Non-Unique Index:
This is an index that can contain non-unique values. Suitable for columns that are frequently queried but where values may be repeated.
CREATE INDEX idx_name ON employees(name);
Unique Index:
This type of index requires that all values in the indexed column be unique. It is often used to ensure uniqueness of data, such as for columns that must contain unique identifiers.
CREATE UNIQUE INDEX idx_unique_email ON employees(email);
Primary Key:
A primary key automatically creates a unique index on the corresponding column or group of columns. There can only be one primary key per table.
CREATE TABLE employees ( id INT PRIMARY KEY, name VARCHAR(100) );
Composite Index:
An index that is created on multiple columns of a table. This index is useful when queries frequently filter data by multiple columns at once.
CREATE INDEX idx_name_department ON employees(name, department);
Full-text Index:
A special type of index for quickly searching text information in large columns such as text fields.
CREATE FULLTEXT INDEX idx_fulltext_description ON articles(description);
Unique Composite Index:
A unique index on multiple columns that ensures that the combination of values in those columns is unique.
CREATE UNIQUE INDEX idx_unique_name_dept ON employees(name, department);
When indexes are not used:
Indexes are not always used, even if they exist. There are several situations where indexes can be ignored:
Small table size:
If the table is very small, the database may decide that scanning the entire table is faster than using an index.
Using functions on indexed columns:
If the query uses a function (such as LOWER(name)), the index may not be used because the data transformation makes it unsuitable for indexing.
Incorrect requests:
If queries are written in a way that does not allow the use of the index (for example, using negative NOT, <>, != conditions), this may reduce the effectiveness of the index.
Frequently updated or deleted tables:
For tables with frequent insert, delete, or update operations, using indexes can reduce performance because they need to be constantly updated.
➤ How to Avoid Contention When Accessing a Record in SQL
Let's say 2 users opened the same record in the database, how can we avoid the situation when the first user updated the data before the second, and the second works with outdated data?
This situation poses a classic problem of contention when working on the same database record, which can lead to the second user overwriting the changes made by the first. To avoid such conflicts, there are several strategies for managing concurrent access to data. The main approaches are:
Database-level locking (Pessimistic Locking)
Pessimistic locking means that once one user starts working on a record, it is locked so that other users cannot modify it until the current transaction is complete. This avoids concurrency issues, but can reduce performance and lead to deadlocks under high load.
Example of use:
One user executes a SELECT … FOR UPDATE, which locks the record for modification by other users.
Until the first user's transaction is completed (via COMMIT or ROLLBACK), other users cannot update it.
Example in SQL:
BEGIN TRANSACTION; SELECT * FROM records WHERE id = 1 FOR UPDATE; -- Changing data... UPDATE records SET field = 'new_value' WHERE id = 1; COMMIT;
When using locks, the database will not allow the second user to access and modify a record until the first user has finished.
Optimistic Locking
Optimistic locking assumes that conflicts when accessing a record are rare, so changes are allowed, but when updating, it checks that the record has not been modified since it was read. If the record has been modified, the second user must retry the operation.
This is achieved by storing a special field, such as the record version or the last modification timestamp. When updating a record, the application checks that the record version matches the one that was read when it started working with it. If the version has changed, the operation is canceled or the data is updated.
Steps of work:
The user reads the record along with its current version.
When updating, the data is sent to the server along with the version.
The server checks whether the version in the database matches the version transmitted.
If the versions match, the data is updated and the version is incremented.
If the versions do not match, the server rejects the request and a conflict message is displayed to the user.
Example of a table with a version field:
CREATE TABLE records ( id INT PRIMARY KEY, name VARCHAR(100), version INT );
Example in SQL:
UPDATE records SET name = 'New Name', version = version + 1 WHERE id = 1 AND version = 3; -- If this update affected 0 lines, then the version has changed.
Using Timestamps (Timestamp-Based Concurrency Control)
Similar to optimistic locking, but instead of storing a version of a record, a timestamp is used (such as an updated_at or last_modified field). When a user updates data, the database checks to see if the timestamp of the record has changed since it was last read.
Example of a table with a timestamp:
CREATE TABLE records ( id INT PRIMARY KEY, name VARCHAR(100), updated_at TIMESTAMP );
Update with timestamp check:
UPDATE records SET name = 'New Name', updated_at = CURRENT_TIMESTAMP WHERE id = 1 AND updated_at = '2024-09-03 12:00:00'; -- If the update did not occur (0 rows changed), then the record was updated by another user.
Using application-level data versions:
At the application level, you can implement logic that monitors the state of the data. For example, when editing data, the user can be shown the version of the record they are editing. If another user has made changes, the application can either warn the user that the data is out of date or offer to automatically merge the changes.
Combining approaches:
Often the best option is a combination of methods. For example:
When working with critical data, you can use pessimistic locking.
For less critical operations, use optimistic locking or timestamp checking.
➤ What types of relations exist in SQL
In relational databases, relationships between tables describe how data in one table is related to data in another. The main types of relationships in SQL databases are:
One-to-Many
This is the most common type of relationship in databases. It refers to a situation where one record in a table can be related to multiple records in another table, but each of those records is related to only one record in the first table.
Example:
The authors table and the books table. One author can write several books, but each book has only one author.
Implementation:
A foreign key is added to the books table that references the primary key of the authors table.
CREATE TABLE authors ( id INT PRIMARY KEY, name VARCHAR(100) ); CREATE TABLE books ( id INT PRIMARY KEY, title VARCHAR(100), author_id INT, FOREIGN KEY (author_id) REFERENCES authors(id) );
Many-to-Many
This type of relationship means that one record in the first table can be related to multiple records in the second table, and vice versa. However, such relationships cannot be expressed directly in SQL. To implement such a relationship, an intermediate table (junction table) is used, which breaks the many-to-many relationship into two one-to-many relationships.
Example:
Students and courses tables. One student can be enrolled in several courses, and one course can be enrolled in several students.
Implementation:
The intermediate table students_courses, which links the students and courses tables.
CREATE TABLE students ( id INT PRIMARY KEY, name VARCHAR(100) ); CREATE TABLE courses ( id INT PRIMARY KEY, title VARCHAR(100) ); CREATE TABLE students_courses ( student_id INT, course_id INT, PRIMARY KEY (student_id, course_id), FOREIGN KEY (student_id) REFERENCES students(id), FOREIGN KEY (course_id) REFERENCES courses(id) );
One to one (One-to-One)
This type of relationship means that one record in one table is related to only one record in the other table, and vice versa. Such relationships are used when data can be divided into two tables for better structure or to store additional information that is used less frequently.
Example:
The users and user_profiles tables. Each user has only one profile, and each profile is associated with only one user.
Implementation:
In the user_profiles table, a foreign key, which is also unique (or primary), references users.
CREATE TABLE users ( id INT PRIMARY KEY, name VARCHAR(100) ); CREATE TABLE user_profiles ( user_id INT PRIMARY KEY, address VARCHAR(255), phone VARCHAR(50), FOREIGN KEY (user_id) REFERENCES users(id) );
Self-Referencing Relationship
This relationship is used when a record in a table refers to another record in the same table. This relationship can describe hierarchical data structures.
Example:
An employees table where one employee can be the boss of another employee.
Implementation:
A foreign key is added to the employees table that references the primary key of the same table.
CREATE TABLE employees ( id INT PRIMARY KEY, name VARCHAR(100), manager_id INT, FOREIGN KEY (manager_id) REFERENCES employees(id) );
Examples of using relations:
One to many:
This is the most common type of relationship. It is used when there is a main entity (e.g. customer) and related entities (e.g. orders).
Example: Clients and their orders (one client – many orders).
Many to many:
Used when objects in both tables can be related to multiple objects in the other table. For example, students and courses (one student can enroll in multiple courses, and one course can be attended by many students).
One to one:
Used to separate entities into separate tables to improve the database structure. For example, a user and their profile.
Self-referential relationships:
Very useful for modeling hierarchies (e.g. employees and their managers, product categories and subcategories).
➤ How to Make Many-to-Many Relationship in SQL
To create a Many-to-Many relationship in relational databases, an intermediate (junction) table is used that links the other two objects (tables). This is done because relational databases do not directly support a many-to-many relationship between two tables.
Example:
Let's assume we have two tables
Students
Courses (courses)
Each student can enroll in several courses, and each course can be enrolled by many students. This is a typical many-to-many relationship.
Create a table students:
CREATE TABLE students ( id INT PRIMARY KEY, name VARCHAR(100) NOT NULL );
Create a courses table:
CREATE TABLE courses ( id INT PRIMARY KEY, title VARCHAR(100) NOT NULL );
Create an intermediate table students_courses:
This table links students and courses via their identifiers (student_id and course_id). It will contain one record for each student's participation in a particular course.
CREATE TABLE students_courses ( student_id INT, course_id INT, PRIMARY KEY (student_id, course_id), FOREIGN KEY (student_id) REFERENCES students(id), FOREIGN KEY (course_id) REFERENCES courses(id) );
What happens in the intermediate table:
The students_courses table contains two columns: student_id and course_id. Together, they form a composite primary key, which prevents duplicate entries (that is, a single student cannot enroll in the same course twice).
Foreign keys (FOREIGN KEY) are used to ensure data integrity by specifying that the values in student_id and course_id must match existing records in the students and courses tables, respectively.
Example of data insertion:
Adding students:
INSERT INTO students (id, name) VALUES (1, 'Alice'), (2, 'Bob');
Adding courses:
INSERT INTO courses (id, title) VALUES (101, 'Mathematics'), (102, 'Biology');
Connecting students with courses:
INSERT INTO students_courses (student_id, course_id) VALUES (1, 101), (1, 102), (2, 101);
Example of a request to get data:
To find out which courses a student is enrolled in, you can use a JOIN query:
SELECT students.name, courses.title FROM students JOIN students_courses ON students.id = students_courses.student_id JOIN courses ON students_courses.course_id = courses.id;
This query will display the names of students and the names of the courses they are enrolled in.
➤ How to combine a query to multiple tables in SQL
In SQL, there are several approaches for combining data from multiple related tables into a single result set.
Let's use the example tables:
CREATE TABLE customers ( customer_id INTEGER PRIMARY KEY, name VARCHAR(100), city VARCHAR(100) ); CREATE TABLE orders ( order_id INTEGER PRIMARY KEY, customer_id INTEGER, order_date DATE, total_amount DECIMAL(10,2), FOREIGN KEY (customer_id) REFERENCES customers(customer_id) ); CREATE TABLE archived_orders ( order_id INTEGER PRIMARY KEY, customer_id INTEGER, order_date DATE, total_amount DECIMAL(10,2) -- FOREIGN KEY );
Combining with JOIN:
JOIN – the most common and understandable way to “join” tables by a common field. Different types of joins are used, depending on which rows you want to include in the result:
INNER JOIN:
Returns only those rows where the values in the connector fields match in both tables.
Performance: With indexes on the joining columns, INNER JOIN is often very fast because the DBMS uses algorithms such as Nested Loops, Merge Join, or Hash Join.
LEFT (or RIGHT) JOIN:
Returns all rows from the left (or right) table and matching rows from the other table. If there is no match, the columns in the right (or left) table will be NULL.
Performance: These joins typically perform similarly to INNER JOINs, but may be less efficient when there are no indexes or when handling large numbers of NULL values.
FULL OUTER JOIN:
Combines rows from both tables, including those rows where there are no matches (producing NULL in missing fields).
Performance: This type of JOIN can be less efficient because it requires additional operations to combine disjoint rows. Some DBMSs even emulate a FULL JOIN by combining LEFT and RIGHT JOINs.
CROSS JOIN:
Performs the Cartesian product of two tables (each row from the first table is combined with each row from the second).
Performance: Typically used less frequently, since with large table sizes the result can be extremely voluminous and, accordingly, time-consuming.
General points about JOIN performance:
Having indexes on join fields significantly improves performance.
The choice of a specific join algorithm (nested loop, hash join, merge join) depends on the DBMS, the volume of data and the availability of suitable indexes.
The query optimizer may react differently to complex join chains; sometimes it is worth rewriting queries or applying optimization hints.
Example: INNER JOIN
We will receive a list of clients with information on their orders
SELECT c.customer_id, c.name, c.city, o.order_id, o.order_date, o.total_amount FROM customers c INNER JOIN orders o ON c.customer_id = o.customer_id;
Performance comment:
If there are indexes on the customer_id field in both tables, the DBMS will choose the optimal plan (for example, using a hash join or merge join). If the data is distributed normally, such a query works efficiently.
Example: LEFT JOIN
If you want to get all customers even if they have no orders
SELECT c.customer_id, c.name, c.city, o.order_id, o.order_date, o.total_amount FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id;
Performance comment:
In case of large data volumes, missing orders may result in more rows being returned with NULL values, but with proper indexing, performance remains decent.
Combining results with UNION / UNION ALL:
UNION And UNION ALL allow you to combine the results of several SELECT queries into a single result set, with the rows of one SELECT placed below the rows of another.
UNION:
With this option, the DBMS automatically eliminates duplicate rows in the resulting set.
Performance:
The additional step of eliminating duplicates can be resource intensive, especially for large numbers of rows.
UNION ALL:
Combines results without removing duplicates.
Performance:
Typically faster, since there is no sorting or duplicate checking step. Good for cases where row uniqueness is guaranteed at the data level or is not critical.
Applicability:
This method is applicable if the queries being combined have the same number of columns and compatible data types. It is used when the data is scattered across different tables, but represents logically homogeneous sets.
Let's imagine that orders are stored in two tables: active orders and archived orders. To get a complete list of orders, you can combine the results of the two queries.
Example: UNION
SELECT order_id, customer_id, order_date, total_amount FROM orders UNION SELECT order_id, customer_id, order_date, total_amount FROM archived_orders;
Performance comment:
UNION performs an additional operation to remove duplicates, which can slow down execution when there are a large number of records.
Example: UNION ALL
SELECT order_id, customer_id, order_date, total_amount FROM orders UNION ALL SELECT order_id, customer_id, order_date, total_amount FROM archived_orders;
Performance comment:
UNION ALL simply combines the results without checking for duplicates, so it is faster.
SELECT product_id, quantity, total_price FROM sales_2022 UNION ALL SELECT product_id, quantity, total_price FROM sales_2023;
Using subqueries and derived tables:
Instead of directly joining tables, you can use subqueries that return data from other tables. A subquery can be in the SELECT, FROM, or WHERE clause:
Subqueries in the FROM clause (derived tables):
Define temporary tables whose results are then joined with the main query data.
Performance:
Non-correlated subqueries:
Typically optimized separately and can be relatively efficient.
Correlated subqueries:
They are calculated for each row of the main query, which can lead to significant slowdowns, especially with large amounts of data.
It is important to note that modern optimizers can rewrite queries by combining them with the main query, but this does not always happen automatically.
Subqueries in the SELECT clause:
Allows you to retrieve individual values or aggregates from another table for each row of the main query.
Example: Subquery in SELECT clause
We will get a list of clients with the number of orders for each:
SELECT c.customer_id, c.name, c.city, (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.customer_id) AS orders_count FROM customers c;
Performance comment:
- If the subquery is correlated (executed for each row), then performance degradation may be observed with a large number of clients.
- Some DBMSs can optimize correlated subqueries, but in critical cases it makes sense to consider aggregation in a separate query.
Example: Subquery in FROM clause (derived table)
Aggregate order data and connect it to the customer table
SELECT c.customer_id, c.name, c.city, o.total_orders FROM customers c JOIN ( SELECT customer_id, COUNT(*) AS total_orders FROM orders GROUP BY customer_id ) o ON c.customer_id = o.customer_id;
Performance comment:
A derived table aggregates data once, which is often more efficient than running a correlated subquery for each row.
Common Table Expressions (CTE) with WITH clause:
CTEs allow you to create temporary, named results (like "virtual tables") that can be used in the main query. This improves query readability and makes it easier to structure complex join logic.
Performance:
- CTEs are essentially aliases for subqueries. Their performance is often similar to subqueries in the FROM clause, but some DBMSs may under-optimize if the CTE is not inlined.
- When using recursive CTEs, be aware that they may be less efficient if the recursion depth is high.
Example: CTE to calculate the number of orders
First, let's create a CTE that aggregates orders, and then perform a LEFT JOIN with customers:
WITH OrderCounts AS ( SELECT customer_id, COUNT(*) AS total_orders FROM orders GROUP BY customer_id ) SELECT c.customer_id, c.name, c.city, COALESCE(oc.total_orders, 0) AS total_orders FROM customers c LEFT JOIN OrderCounts oc ON c.customer_id = oc.customer_id;
Performance comment:
- If CTEs are inlined by the optimizer, they work as efficiently as subqueries in the FROM clause.
- In some DBMSs, CTEs are not optimized ("materialized"), which can impact performance on large data sets.
Additional points and recommendations for performance:
Query optimization:
Regardless of the method chosen, having suitable indexes is a key factor in performance. It is also worth examining the query execution plan (EXPLAIN PLAN or similar) to identify bottlenecks.
Table sizes and data distribution:
When working with large tables, all methods require careful testing. For example, if one table contains millions of records and another table contains thousands, the optimizer may choose a different join order to improve efficiency.
Features of the DBMS:
Different systems (PostgreSQL, MySQL, Oracle, SQL Server) may handle JOIN, UNION or CTE differently. Therefore, it is worth testing queries in a specific environment and, if necessary, applying specialized settings or hints for optimization.
Readability vs. Performance:
Sometimes a more readable query design (e.g. using CTE) may perform worse than a rewritten query with JOINs. Therefore, it is recommended to balance between maintainability and execution speed by benchmarking in a real production environment.
Conclusion:
JOIN (INNER, LEFT/RIGHT, FULL, CROSS):
The most common method with good optimization in the presence of indexes.
Good for directly joining related data, especially when indexed properly.
UNION / UNION ALL:
Great for combining multiple SELECT queries; UNION ALL is preferred for performance if duplicate elimination is not required.
Useful for combining results from tables with the same structure (e.g. active and archived orders).
Subqueries and derived tables:
General purpose tool, but correlated subqueries should be avoided in large data volume cases.
Can be useful for aggregating or filtering data, but care should be taken with correlated subqueries.
CTE(WITH):
They improve the readability of complex queries, but do not always guarantee improved performance.
Improves the readability of complex queries and allows reuse of intermediate results, although they can sometimes affect performance if not optimized inline.
➤ How to Use Query Plan in SQL
An execution plan is a detailed description of how the DBMS is going to execute a SQL query. It allows you to see what operations are performed (table scans, indexes, joins, sorts, aggregations, etc.), in what order they occur, and how the costs are distributed across time and resources. It is an important tool for query optimization and diagnosing performance issues.
PostgreSQL:
EXPLAIN:
EXPLAIN SELECT * FROM employees WHERE department_id = 10;
Conclusion:
QUERY PLAN --------------------------------------------------------------- Seq Scan on employees (cost=0.00..35.50 rows=5 width=50) Filter: (department_id = 10) (2 rows)
Explanation:
Seq Scan on employees: sequential table scan employees
.
cost=0.00..35.50: estimating costs from start to finish of an operation.
rows=5: the estimated number of rows that satisfy the condition.
width=50: average string size in bytes.
Filter: (department_id = 10): filtering condition applied during scanning.
EXPLAIN ANALYZE:
EXPLAIN ANALYZE SELECT * FROM employees WHERE department_id = 10;
Conclusion:
QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Seq Scan on employees (cost=0.00..35.50 rows=5 width=50) (actual time=0.021..0.040 rows=2 loops=1) Filter: (department_id = 10) Rows Removed by Filter: 3 Planning Time: 0.123 ms Execution Time: 0.060 ms (5 rows)
Explanation:
actual time=0.021..0.040: actual start and end time of the scan.
rows=2: The actual number of rows that passed the filter.
Rows Removed by Filter: 3: the number of rows filtered out by the filter.
Planning Time And Execution Time: give an idea of the query planning time and its execution time respectively.
MySQL:
EXPLAIN:
EXPLAIN SELECT * FROM employees WHERE department_id = 10;
Conclusion (in table form):
+----+-------------+-----------+------+---------------+-----------+---------+-------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+------+---------------+-----------+---------+-------+------+--------------------------+ | 1 | SIMPLE | employees | ref | dep_index | dep_index | 4 | const | 5 | Using where; Using index | +----+-------------+-----------+------+---------------+-----------+---------+-------+------+--------------------------+
Explanation:
id: query identifier (for more complex queries with subqueries or joins).
select_type: query type (SIMPLE means no subqueries).
table: table name.
type: access type (eg. ref
(means using an index for searching).
possible_keys: indexes that could be used.
key: the index actually used.
rows: The estimated number of lines to be read.
Extra: additional information, such as Using where
indicates the use of a filter.
Oracle:
For Oracle, it is often a two-step process: first, create a plan, then review it.
Step 1:
Formulation of a plan
EXPLAIN PLAN FOR SELECT * FROM employees WHERE department_id = 10;
Step 2:
View plan
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
Conclusion (in table form):
-------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 3 | |* 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 5 | 120 | 3 | |* 2 | INDEX RANGE SCAN | DEP_INDEX | 5 | | 2 | -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------------- 1 - filter("DEPARTMENT_ID"=10) 2 - access("DEPARTMENT_ID"=10)
Explanation:
TABLE ACCESS BY INDEX ROWID: access rows via index.
INDEX RANGE SCAN: using range scan on index DEP_INDEX
.
Cost: assessment of the costs of performing each operation.
Predicate Information: specifies which condition applies to each operation.
Microsoft SQL Server:
In SQL Server, it is possible to obtain an execution plan both in the SSMS graphical interface and via T-SQL.
Example:
using SHOWPLAN
SET SHOWPLAN_ALL ON; GO SELECT * FROM employees WHERE department_id = 10; GO SET SHOWPLAN_ALL OFF;
Conclusion (in table form):
------------------------------------------------------------------------------------------------------------------ | stmtid | nodeid | parent | physical_op | logical_op | argument | ------------------------------------------------------------------------------------------------------------------ | 1 | 0 | NULL | Clustered Index Scan | Clustered Index Scan | Table: employees, Index: PK_employees | | 1 | 1 | 0 | Filter | Filter | Predicate: department_id = 10 | ------------------------------------------------------------------------------------------------------------------
Explanation:
Clustered Index Scan: scan using clustered index.
Filter: applying filtering condition.
stmtid, nodeid, parent: identifiers that define the hierarchy of operations in the execution plan.
➤ How to Use Indexes in SQL
Indexes significantly improve query performance, but excessive or inappropriate use of indexes can slow down insert, update, and delete operations due to additional index maintenance. The choice of index type is determined by specific tasks:
B-Tree index:
standard for most cases.
Unique index:
to ensure uniqueness.
Composite Index:
for queries with filters on multiple columns.
Full text index:
for working with text data.
Spatial index:
for geographic and geometric data.
Functional index:
to optimize queries using functions.
Regular (B-Tree) index:
A standard index type that is used for most search operations. It is built on the principle of a binary tree, which allows you to quickly find rows by the value of a column.
CREATE INDEX idx_employee_department ON employees(department_id);
How to use:
When executing a query with a condition on a column department_id
(For example, WHERE department_id = 10
) the query optimizer chooses an index instead of a full table scan.
Unique index:
Ensures uniqueness of values in a column or set of columns. If you try to insert a duplicate value, the DBMS returns an error.
CREATE UNIQUE INDEX idx_unique_email ON employees(email);
How to use:
If the employees table cannot have two employees with the same email, a unique index ensures that this constraint is met and speeds up searches on the email column.
Composite (multi-column) index:
Created over two or more columns. It is useful when queries use multiple columns in filtering or joining conditions. It is important to remember that the order of the columns in the index matters.
CREATE INDEX idx_employee_dept_lastname ON employees(department_id, last_name);
How to use:
When querying, the index will allow you to quickly select rows by both columns.
SELECT * FROM employees WHERE department_id = 10 AND last_name = 'Ivanov';
Full text index:
Used for fast searching of text data (for example, in description fields or article content). It is built taking into account morphology and stop words, which allows searching by words and phrases.
MySQL:
Used for TEXT or VARCHAR type columns.
CREATE FULLTEXT INDEX idx_fulltext_description ON products(description);
Once the index is created, you can run queries with the MATCH … AGAINST operator:
SELECT * FROM products WHERE MATCH(description) AGAINST('electronics');
PostgreSQL:
uses a GIN index with the to_tsvector function for full-text searching.
CREATE INDEX idx_fts_content ON articles USING gin(to_tsvector('russian', content));
The search is performed as follows:
SELECT * FROM articles WHERE to_tsvector('russian', content) @@ to_tsquery('electronics');
SQL Server:
has a built-in full-text search mechanism. First, you need to create a full-text catalog, then a full-text index.
-- Creating a full-text search catalog CREATE FULLTEXT CATALOG ftCatalog AS DEFAULT; -- Create a full-text index on the products table CREATE FULLTEXT INDEX ON products(description) KEY INDEX PK_products ON ftCatalog;
Spatial index:
Spatial indexes are used to speed up queries on geometric data (points, lines, polygons). They are optimized for searching on two-dimensional (or multidimensional) spatial coordinates.
MySQL:
When using data types GEOMETRY, POINT, etc. Here geom is a column containing geometric data.
CREATE SPATIAL INDEX idx_spatial_location ON locations(geom);
PostgreSQL with PostGIS extension:
For spatial data, the GIST index is used. This index speeds up spatial queries, such as searching for objects within a certain radius.
CREATE INDEX idx_spatial_location ON locations USING gist(geom);
Oracle:
Supports spatial indexes via Oracle Spatial. Example of creation:
CREATE INDEX idx_spatial_location ON locations(geom) INDEXTYPE IS MDSYS.SPATIAL_INDEX;
Functional (computable) index:
Functional indexes are created based on expressions or functions applied to columns. This is useful if your queries frequently use functions, such as lowercase, string trimming, etc.
Oracle and PostgreSQL:
Let's say you want an index for fast, case-insensitive email searching:
CREATE INDEX idx_lower_email ON employees(LOWER(email));
The query will use the index if the condition is written like this:
SELECT * FROM employees WHERE LOWER(email) = '[email protected]';
SQL Server:
In SQL Server, you can create an index on a computed column. First, add the computed column, then create an index on it:
ALTER TABLE employees ADD lower_email AS LOWER(email); CREATE INDEX idx_lower_email ON employees(lower_email);