Learn how to extract substrings in PHP using substr()
, mb_substr()
, and alternative methods. Handle UTF-8 characters correctly and manipulate strings efficiently.
Introduction
Extracting substrings is a fundamental operation in text processing. Whether you need to trim part of a string, extract user input, or manipulate text dynamically, PHP provides multiple ways to handle substrings.
The two primary methods for extracting substrings in PHP are:
substr()
– A simple function for extracting a portion of a string.mb_substr()
– A multibyte-safe alternative that correctly handles UTF-8 and non-ASCII text.
This guide covers:
- How
substr()
andmb_substr()
work - Differences between byte-based and character-based extraction
- Extracting substrings with regex and alternative functions
- Best practices for efficient and safe string manipulation
1. Extracting Substrings Using substr()
The substr()
function extracts a portion of a string based on start and length parameters.
Syntax
substr(string $string, int $start, ?int $length = null): string
$string
– The input string.$start
– The starting position (0-based index).$length
(optional) – The number of characters to extract.
Example: Basic Substring Extraction
$text = "Hello, World!";
$sub = substr($text, 7, 5);
echo $sub; // Output: World
- The function starts at index
7
and extracts 5 characters.
Using Negative Indexes
Negative values can be used to extract from the end of the string.
$text = "Hello, World!";
$sub = substr($text, -6, 5);
echo $sub; // Output: World
-6
starts 6 characters from the end.- Extracts 5 characters forward.
Extracting Everything from a Position
If $length
is omitted, substr()
extracts everything from $start
to the end.
$text = "Programming in PHP";
$sub = substr($text, 13);
echo $sub; // Output: PHP
2. Handling Multibyte Strings with mb_substr()
substr()
may not work correctly for UTF-8 text because it counts bytes, not characters.
Example of substr() Failing with UTF-8
$text = "こんにちは世界"; // "Hello World" in Japanese
$sub = substr($text, 0, 3);
echo $sub; // Output: Incorrect characters
- Japanese characters use multiple bytes per character, causing broken output.
Using mb_substr() for UTF-8 Safety
$text = "こんにちは世界";
$sub = mb_substr($text, 0, 3, "UTF-8");
echo $sub; // Output: こん
Why Use mb_substr()?
✅ Correctly handles non-ASCII characters
✅ Avoids character corruption
✅ Supports multiple encodings
3. Finding and Extracting Text Dynamically
Extracting from a Specific Word with strpos()
To extract text starting from a keyword:
$text = "Order ID: 12345";
$start = strpos($text, "ID:") + 4;
$sub = substr($text, $start);
echo $sub; // Output: 12345
strpos()
finds"ID:"
.substr()
extracts everything after it.
4. Extracting Text Using Regular Expressions
preg_match()
can extract substrings using pattern-based rules.
Example: Extracting a Number from a String
$text = "Price: $49.99";
preg_match("/\d+\.\d+/", $text, $matches);
echo $matches[0]; // Output: 49.99
- The regex pattern
\d+\.\d+
matches decimal numbers.
Extracting Email Addresses
$text = "Contact us at support@example.com";
preg_match("/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/", $text, $matches);
echo $matches[0]; // Output: support@example.com
5. Extracting First and Last Words
Extracting the First Word
$text = "Hello World from PHP";
$firstWord = strtok($text, " ");
echo $firstWord; // Output: Hello
strtok()
efficiently extracts the first word.
Extracting the Last Word
$text = "PHP is a great language";
$words = explode(" ", $text);
$lastWord = end($words);
echo $lastWord; // Output: language
6. Best Practices for Substring Extraction in PHP
✅ Use substr()
for basic substring extraction when working with ASCII text.
✅ Use mb_substr()
for UTF-8 and multibyte characters.
✅ Use strpos()
with substr()
to extract text dynamically.
✅ Use preg_match()
for pattern-based substring extraction.
✅ Use explode()
to split and extract words efficiently.
Conclusion
Extracting substrings is a crucial operation in PHP, and selecting the right method ensures accuracy, performance, and compatibility with different character encodings.
This guide covered:
- Basic substring extraction using
substr()
- Handling multibyte characters with
mb_substr()
- Extracting text dynamically using
strpos()
- Using
preg_match()
for pattern-based extraction
By following best practices, you can safely manipulate strings in PHP while ensuring correct character handling for internationalization.