Extracting Substrings in PHP

Extracting Substrings in PHP

Learn how to extract substrings in PHP using substr(), mb_substr(), and alternative methods. Handle UTF-8 characters correctly and manipulate strings efficiently.

Introduction

Extracting substrings is a fundamental operation in text processing. Whether you need to trim part of a string, extract user input, or manipulate text dynamically, PHP provides multiple ways to handle substrings.

The two primary methods for extracting substrings in PHP are:

  • substr() – A simple function for extracting a portion of a string.
  • mb_substr() – A multibyte-safe alternative that correctly handles UTF-8 and non-ASCII text.

This guide covers:

  • How substr() and mb_substr() work
  • Differences between byte-based and character-based extraction
  • Extracting substrings with regex and alternative functions
  • Best practices for efficient and safe string manipulation

1. Extracting Substrings Using substr()

The substr() function extracts a portion of a string based on start and length parameters.

Syntax

substr(string $string, int $start, ?int $length = null): string
  • $string – The input string.
  • $start – The starting position (0-based index).
  • $length (optional) – The number of characters to extract.

Example: Basic Substring Extraction

$text = "Hello, World!";
$sub = substr($text, 7, 5); 

echo $sub; // Output: World
  • The function starts at index 7 and extracts 5 characters.

Using Negative Indexes

Negative values can be used to extract from the end of the string.

$text = "Hello, World!";
$sub = substr($text, -6, 5);

echo $sub; // Output: World
  • -6 starts 6 characters from the end.
  • Extracts 5 characters forward.

Extracting Everything from a Position

If $length is omitted, substr() extracts everything from $start to the end.

$text = "Programming in PHP";
$sub = substr($text, 13);

echo $sub; // Output: PHP

2. Handling Multibyte Strings with mb_substr()

substr() may not work correctly for UTF-8 text because it counts bytes, not characters.

Example of substr() Failing with UTF-8

$text = "こんにちは世界"; // "Hello World" in Japanese
$sub = substr($text, 0, 3);

echo $sub; // Output: Incorrect characters
  • Japanese characters use multiple bytes per character, causing broken output.

Using mb_substr() for UTF-8 Safety

$text = "こんにちは世界";
$sub = mb_substr($text, 0, 3, "UTF-8");

echo $sub; // Output: こん

Why Use mb_substr()?

✅ Correctly handles non-ASCII characters
✅ Avoids character corruption
✅ Supports multiple encodings

3. Finding and Extracting Text Dynamically

Extracting from a Specific Word with strpos()

To extract text starting from a keyword:

$text = "Order ID: 12345";
$start = strpos($text, "ID:") + 4;

$sub = substr($text, $start);

echo $sub; // Output: 12345
  • strpos() finds "ID:".
  • substr() extracts everything after it.

4. Extracting Text Using Regular Expressions

preg_match() can extract substrings using pattern-based rules.

Example: Extracting a Number from a String

$text = "Price: $49.99";
preg_match("/\d+\.\d+/", $text, $matches);

echo $matches[0]; // Output: 49.99
  • The regex pattern \d+\.\d+ matches decimal numbers.

Extracting Email Addresses

$text = "Contact us at support@example.com";
preg_match("/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/", $text, $matches);

echo $matches[0]; // Output: support@example.com

5. Extracting First and Last Words

Extracting the First Word

$text = "Hello World from PHP";
$firstWord = strtok($text, " ");

echo $firstWord; // Output: Hello
  • strtok() efficiently extracts the first word.

Extracting the Last Word

$text = "PHP is a great language";
$words = explode(" ", $text);

$lastWord = end($words);
echo $lastWord; // Output: language

6. Best Practices for Substring Extraction in PHP

✅ Use substr() for basic substring extraction when working with ASCII text.
✅ Use mb_substr() for UTF-8 and multibyte characters.
✅ Use strpos() with substr() to extract text dynamically.
✅ Use preg_match() for pattern-based substring extraction.
✅ Use explode() to split and extract words efficiently.

Conclusion

Extracting substrings is a crucial operation in PHP, and selecting the right method ensures accuracy, performance, and compatibility with different character encodings.

This guide covered:

  • Basic substring extraction using substr()
  • Handling multibyte characters with mb_substr()
  • Extracting text dynamically using strpos()
  • Using preg_match() for pattern-based extraction

By following best practices, you can safely manipulate strings in PHP while ensuring correct character handling for internationalization.

Leave a Reply