The REGEXP
operator in SQL is a powerful tool for matching complex string patterns. It is used for extracting information based on specific patterns rather than simple character matches.
Example
SELECT *
FROM Orders
WHERE item REGEXP 'Mouse|Keyboard';
Here, the SQL command selects orders where the item is either Mouse
or Keyboard
.
SQL REGEXP With WHERE Clause
The REGEXP
operator can be effectively combined with a WHERE
clause for precise pattern matching. For example,
SELECT *
FROM Customers
WHERE first_name REGEXP '^J';
Here, the SQL command returns those customers whose first names start with J.
Note: Our online compiler is based on SQLite, which doesn't support the REGEXP
operator by default.
REGEXP in UPDATE Statements
The REGEXP
operator can also be used in UPDATE statements to modify data based on complex patterns. For example,
UPDATE Shippings
SET status = 'Processing'
WHERE status REGEXP 'Pending|Delivered';
SELECT * FROM Shippings;
This SQL command updates the status column in the Shippings table to Processing for shipments with Pending or Delivered status.
Chaining REGEXP Operators
Multiple REGEXP
operators can be chained for complex pattern matching. For example,
SELECT *
FROM Customers
WHERE country
REGEXP 'USA|UK' AND age REGEXP '2[0-9]';
This query finds customers from either the USA or UK who are in their twenties.
Validate Email Addresses With SQL REGEXP
Using SQL's REGEXP
operator, we can validate email addresses based on standard email formatting rules.
Let's look at an example.
SELECT * FROM Customers
WHERE email
REGEXP '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$';
The REGEXP
pattern here breaks down as follows:
^[A-Za-z0-9._%+-]+
- the email must start with one or more alphanumeric characters, dots, underscores, percent signs, plus signs, or hyphens.@
- this is the mandatory at symbol in an email address.[A-Za-z0-9.-]+
- after the at symbol, the email must have one or more alphanumeric characters, dots, or hyphens.\.[A-Za-z]{2,4}$
- the email must end with a period followed by 2 to 4 alphabetic characters, which represent the domain part.