To explain this a bit more to those who have asked:
The documentation for CAST & CONVERT says:
Starting with SQL Server 2012 (11.x), when using supplementary character (SC) collations, a CAST operation from nchar or nvarchar to an nchar or nvarchar type of smaller length won't truncate inside a surrogate pair. Instead, the operation truncates before the supplementary character.
Under an SC collation, each character has a variable byte length. For example, N'X' takes 2 bytes, while N'😎' uses 4 bytes.
The string N'X😎' (both characters together) requires 2 4 = 6 bytes of storage, corresponding to nvarchar(3). The number in brackets is the number of byte-pairs, not the number of characters. Three byte-pairs = 2 * 3 = 6 bytes. Yes, it's confusing but that's how it is.
-- Returns 6 (bytes) and 2 (characters)
DECLARE
@y nvarchar(3) = N'X😎';
SELECT DATALENGTH(
@y), LEN(
@y);
If we convert the string to a smaller type, truncation occurs:
-- Returns "X", 2 (bytes), and 1 (character)
DECLARE
@y nvarchar(3) = N'X😎';
SELECT
CONVERT(nvarchar(2),
@y),
DATALENGTH(CONVERT(nvarchar(2),
@y)),
LEN(CONVERT(nvarchar(2),
@y));
This is the correct behaviour. Truncation stores as many whole characters from the string as will fit in the destination type. nvarchar(2) offers 2 * 2 = 4 bytes, which is only enough to store the first character, N'X'. There is not enough space left to store the 4 bytes of N'😎'.
The bug occurs when the source is a LOB type, the destination is nvarchar (or nchar), truncation is needed, and the truncation point occurs within a (variable length) character.
Using the same example with a LOB source:
-- Returns "X�", 4 (bytes), and 2 (characters)
DECLARE
@y nvarchar(max) = N'X😎';
SELECT
CONVERT(nvarchar(2),
@y),
DATALENGTH(CONVERT(nvarchar(2),
@y)),
LEN(CONVERT(nvarchar(2),
@y));
The funny-looking second character is the first 2 bytes of N'😎' (which requires 4 bytes). Conversion has split this character in half! This is the bug.