If this doesn’t make much sense, you lack some context as I did before encountering this error.
The short version is you have some mangled emoji in your json.

Valid Unicode characters beyond U+FFFF are split into two “surrogates”.
See https://stackoverflow.com/questions/66605467/how-does-utf-16-encoding-use-surrogate-code-points
Example in JSON:
{"emoji": "\uD83D\uDE02"} // High surrogate \uD83D followed by low surrogate \uDE02
{"broken": "\uD83D"} // High surrogate \uD83D is unpaired
or in my case,
{"also broken": "\uD83DDE02"} //encoding error.
You can detect those unpaired or misencoded surrogate characters with regex and preg_match_all
<?php
$json = file_get_contents('https://example.com/bad.json');
$pattern = '/\\\uD[89AB][0-9A-F]{2}(?!\\\uD[CDEF][0-9A-F]{2})/i';
if (preg_match_all($pattern, $json, $matches, PREG_OFFSET_CAPTURE)) {
foreach ($matches[0] as [$match, $offset]) {
echo "Invalid Unicode sequence: $match at position $offset\n";
}
} else {
echo "No invalid surrogate pairs found.\n";
}
?>
This should give you enough information to find the error in the json.
References:
https://learn.microsoft.com/en-us/dotnet/standard/base-types/character-encoding-introduction
Leave a Reply