If this doesn’t make much sense, you lack some context as I did before encountering this error.
The short version is you have some mangled emoji in your json.

Valid Unicode characters beyond U+FFFF are split into two “surrogates”.
See https://stackoverflow.com/questions/66605467/how-does-utf-16-encoding-use-surrogate-code-points

Example in JSON:

{"emoji": "\uD83D\uDE02"}  // High surrogate \uD83D followed by low surrogate \uDE02

{"broken": "\uD83D"}  // High surrogate \uD83D is unpaired

or in my case,

{"also broken": "\uD83DDE02"} //encoding error.

You can detect those unpaired or misencoded surrogate characters with regex and preg_match_all

<?php
$json = file_get_contents('https://example.com/bad.json');

$pattern = '/\\\uD[89AB][0-9A-F]{2}(?!\\\uD[CDEF][0-9A-F]{2})/i';

if (preg_match_all($pattern, $json, $matches, PREG_OFFSET_CAPTURE)) {
    foreach ($matches[0] as [$match, $offset]) {
        echo "Invalid Unicode sequence: $match at position $offset\n";
    }
} else {
    echo "No invalid surrogate pairs found.\n";
}
?>

This should give you enough information to find the error in the json.

References:

https://stackoverflow.com/questions/47856350/json-decode-produces-error-single-unpaired-utf-16-surrogate-in-unicode-escape

https://learn.microsoft.com/en-us/dotnet/standard/base-types/character-encoding-introduction

Tags

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Verified by MonsterInsights