The Length Limitations of LEN()?

I was using LEN() to troubleshoot an issue I was having with a dynamically constructed string truncating while inserting into an NVARCHAR(MAX) column.  Since I know that NVARCHAR(MAX) has a 2 GB limit (goodness only knows how many characters that is!),  I couldn’t explain the truncation.  A colleague suggested doing a test with another dynamically constructed string.  Maybe then, I could find where the cutoff was occurring.

Great idea!

So, I came up with a plan.  I created a large string using the REPLICATE function to save myself the trouble of testing strings of different lengths.  I first checked the length of the output of the function, then inserted the output of that into a temp table, and from there, into a table.  Why, you ask?  Because it closely mirrors what I needed to troubleshoot when I was needing to dynamically construct a string, hold it in memory and then cursor through it and insert into a table.  (Before you ask why in the world I was doing that – yes, it’s an icky process.  Trust me; I wouldn’t do it if I didn’t have a good reason.)

But I digress.

After the inserts are done, I checked the length of the string passed to the temp table and compare it to the insert into the table to see if I could find where the truncation might occur.  Here is the code:

USE AdventureWorks2012;

CREATE TABLE dbo.LenTest (FinishedScript NVARCHAR(MAX));
GO

TRUNCATE TABLE dbo.LenTest;
DECLARE @FinScriptSQL NVARCHAR(MAX);
DECLARE @StringName NVARCHAR(MAX) =
        (
            SELECT 'W' + REPLICATE('h', 7996) + 'en!'
        );
DECLARE @FinScriptName NVARCHAR(MAX) = @StringName;

SELECT LEN(@FinScriptName) AS InitialLengthFromFunction;

DROP TABLE IF EXISTS #t;

CREATE TABLE #t
(
    FinishedScript NVARCHAR(MAX)
);

INSERT INTO #t
(
    FinishedScript
)

SELECT @FinScriptName AS BeginningScriptLength;

SET @FinScriptSQL
    = N'INSERT INTO [dbo].[LenTest]
                                        SELECT FinishedScript
										FROM #t;';
PRINT @FinScriptSQL;
EXECUTE sp_executesql @FinScriptSQL;

SELECT LEN(FinishedScript) AS TempTableScriptLength
FROM #t;

SELECT *
FROM [dbo].[LenTest];

SELECT LEN(FinishedScript) AS FinScriptLengthInTable
FROM [dbo].[LenTest];

Then I took the FinishedScript from the table and copied and pasted it to another window to ensure that it had inserted in its entirety.  I commented out the CREATE TABLE after the first run, of course!

And it was great.  I ran it over and over, increasing the number passed to the REPLICATE() function by 1000 each time.  When I got to 8000, data truncation started, so I backtracked to find where the cutoff was.  I found it when I passed 7996 to the REPLICATE function.  That ran okay. Notice the character length:

Just to prove that it printed in its entirety!

And now, we come to the weird part.  Any number over 7996 passed to the REPLICATE() function still returns 8000 characters when measured by the LEN() function!

Printing the output when I pass 7997 as the number to REPLICATE(). Notice the exclamation point is now gone.

Even stranger – the data truncates at different points, though it shows the same number of characters!

Output when I pass 9000 to the REPLICATE() function – and a different cutoff point. No “en!” at the end now.

You might ask my version and maximum characters retrieved settings. I ran this on SQL Server 2016, and the max characters retrieved was wide open: 65535 for non XML data, and unlimited for XML data.

This is my message in a bottle.  Has anyone else had experience with this? 

Sounds like a job for SQLKiwi to me….

7 comments

  1. So, my very brilliant colleague just solved it. “Note: IF string_expression is of type varchar(max) or nvarchar(max), REPLICATE truncates the return value at 8,000 bytes. To return values greater than 8,000 bytes, string_expression must be explicitly cast to the appropriate large-value data type.”

    #TodayILearned

    I will have to come up with a different test to troubleshoot my original issue, which does not use REPLICATE. Because that fix would have been way too easy.

    Like

    • Your test aslo demonstrates one additional fact : not only that the REPLICATE function limits its own output, it also limits the length of the whole string resulting from this operation, including the characters added at the beginning and end of string (SELECT ‘W’ + REPLICATE(‘h’, 7997) + ‘en!’) to 8000.

      DECLARE @StringName NVARCHAR(MAX) =
      (
      SELECT ‘W’ + REPLICATE(‘h’, 7997) + ‘en!’
      );
      DECLARE @mystring NVARCHAR(max)
      SET @mystring=@StringName+’asdfghj’

      SELECT LEN(@StringName)
      SELECT LEN(@mystring)
      SELECT RIGHT(@StringName,20)
      SELECT RIGHT(@mystring,20)

      If you change the first statement to
      DECLARE @StringName NVARCHAR(MAX) =
      (
      SELECT ‘W’ + REPLICATE(‘h’, 8997) + CAST(‘en!’ as VARCHAR(MAX))
      );
      limitation will apply only to the result of REPLICATE, not to the whole result string.

      Maybe your original problem is caused by something similar – some operation during dynamic construction, that causes the length to be limited, although it doesn’t seem at first sight that it should happen… some implicit conversion during string construction.

      Like

  2. I believe it has something to do with using nvarchar(8000) then if it exceeds the 8K, converting the string to a varbinary outside of the records like TEXT data. So, the LEN() function is reading it as text until……

    Like

    • It seems to have more to do with the limitations of the REPLICATE() function, which is capped at 8000. I didn’t realize that when I was setting up the test. A coworker found out about it and let me know. And thus we learn!

      Like

  3. Hello Pamela. This issue has nothing to do with `REPLICATE` in particular. The issue is the default datatype for string literals. When a query is parsed, any literal values — string, numeric, date, etc — need to be interpreted as being what you are (hopefully) intending them to be. For example, a number like 1234 is interpreted as being an INT. Similarly with strings, a literal without the “N” prefix, and between 0 and 8000 characters (well, technically bytes, but that is a technical detail that would only distract at this point) is interpreted as being VARCHAR(length of literal). A string literal _with_ the “N” prefix, and between 0 and 4000 characters (again, technically bytes, but for all practical purposes it’s ok to think in terms of characters) is interpreted as being NVARCHAR(length of literal). Along these lines, string literals that go over those maximums (i.e. the 8000 and 4000) will be interpreted as being a MAX type.

    With that in mind, operations generally return the datatype passed in. In both cases you are seeing — REPLICATE and string concatenation — you are (or were) only working with non-MAX types since the literals were under those limits. It doesn’t matter if you are trying to store the result of those operations in a MAX type as the destination only matters if it can’t contain it, can’t implicitly convert to it, or collation doesn’t match and can’t be coerced. The fix for REPLICATE is easier to see/understand as it’s a single value: just convert to a MAX type:

    SELECT REPLICATE(CONVERT(NVARCHAR(MAX), N’a’), 5000);

    When concatenating, you just need to make one of the input pieces a MAX type. Datatype precedence will then take over and convert the rest of the non-MAX literals into the appropriate MAX type (or even datatype if you are mixing NVARCHAR and VARCHAR literals: the result will be NVARCHAR). For example:

    “`sql
    SELECT REPLICATE(N’a’, 4000) + CONVERT(NVARCHAR(MAX), N’B’), — result ends with “B”
    REPLICATE(N’c’, 4000) + N’D’; — result ends with “c”
    “`

    The following illustrates how various literals are interpreted when the query is parsed (no example for MAX types as they are not supported by SQL_VARIANT — passing in N’chars’ with 4001 characters will error telling you that NVARCHAR(MAX) is not supported):

    “`sql
    SELECT SQL_VARIANT_PROPERTY(123, ‘BaseType’) AS [NumberLiteral 1], — int
    SQL_VARIANT_PROPERTY(123.4, ‘BaseType’) AS [NumberLiteral 2], — numeric
    SQL_VARIANT_PROPERTY(‘Hello’, ‘BaseType’) AS [StringLiteral 1a], — varchar
    SQL_VARIANT_PROPERTY(‘Hello’, ‘maxlength’) AS [StringLiteral 1b], — 5
    SQL_VARIANT_PROPERTY(N’Hello’, ‘BaseType’) AS [StringLiteral 2a], — nvarchar
    SQL_VARIANT_PROPERTY(N’Hello’, ‘maxlength’) AS [StringLiteral 2b]; — 10 (due to NVARCHAR = UTF-16 = 2-byte code units)
    “`

    The following post of mine might help answer questions related to character data in SQL Server:

    https://sqlquantumleap.com/2019/11/22/how-many-bytes-per-character-in-sql-server-a-completely-complete-guide/

    Take care,
    Solomon…

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s