Skip to content

Read only sequence#3112

Merged
mgravell merged 22 commits into
StackExchange:mainfrom
pairbit:ReadOnlySequence
Jun 25, 2026
Merged

Read only sequence#3112
mgravell merged 22 commits into
StackExchange:mainfrom
pairbit:ReadOnlySequence

Conversation

@pairbit

@pairbit pairbit commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

No description provided.

@mgravell

mgravell commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

OK, I see what you're doing here; it is actually quite nice, I have to admit. There is a small trade-off in needing to recompute the combined sequence (walking the segments), but IMO that is very acceptable to avoid growing the struct, and if you have a long chain, you have a long chain - this won't be the only thing impacted.

So in principal: I'm OK with this; there's a lot of checking needed to make sure we've covered every method that might access the data. The other thought is: right now Sequence actually means ByteSequence - I think that's the only case we realistically need to cover, but technically ROS<char> would be valid. Only mentioning in case we ever need to disambiguate. I think we should not attempt to support both for now.

Thanks for remembering to check for single-segment chains, and the checked. Technically I think we could support uint, but whether that is adviseable is questionably - probably int is fine; however, IMO this OverflowException error will be confusing to the caller - IMO we should do an explicit length check and on failure: call a throw-helper that explains the problem.

But overall: yeah, this looks interesting.

(it is nice but surprisingly rare to see someone who understand ROS chains!)

Comment thread src/StackExchange.Redis/ReadOnlySequenceExtensions.cs Outdated
Comment thread src/StackExchange.Redis/ValueCondition.cs Outdated
@pairbit

pairbit commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

So in principal: I'm OK with this; there's a lot of checking needed to make sure we've covered every method that might access the data. The other thought is: right now Sequence actually means ByteSequence - I think that's the only case we realistically need to cover, but technically ROS<char> would be valid. Only mentioning in case we ever need to disambiguate. I think we should not attempt to support both for now.

I only use byte for buffers. So I'll never need ROS<char>.

Thanks for remembering to check for single-segment chains, and the checked. Technically I think we could support uint, but whether that is adviseable is questionably - probably int is fine; however, IMO this OverflowException error will be confusing to the caller - IMO we should do an explicit length check and on failure: call a throw-helper that explains the problem.

I agree that overflow checking is missing.
As far as I know, the maximum length of a value in redis is 512 MB in bytes.
The int type for length is sufficient.

(it is nice but surprisingly rare to see someone who understand ROS chains!)

Thank you for appreciating my idea. In my work, I specialize in memory optimization.

@mgravell

mgravell commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

As far as I know, the maximum length of a value in redis is 512 MB in bytes.

yep, verified (citation)

@mgravell

Copy link
Copy Markdown
Collaborator

I specialize in memory optimization.

oh, you should be happy when I'm done with the 3.x bits; in the PoC, the memory-graph is a horizontal line while running at full throughput

@pairbit

pairbit commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

@mgravell Why is the GetHashCode method not implemented for the ByteArray and MemoryManager types? Creating a string is not efficient.

private static int GetHashCode(RedisValue x)
{
    x = x.Simplify();
    return x.Type switch
    {
        StorageType.Null => -1,
        StorageType.Double => x.OverlappedValueDouble.GetHashCode(),
        StorageType.Int64 or StorageType.UInt64 => x._valueInt64.GetHashCode(),
        StorageType.String => x.RawString().GetHashCode(),
        _ => ((string)x!).GetHashCode(), // to match equality
    };
}

@pairbit pairbit marked this pull request as ready for review June 24, 2026 21:04
@mgravell

Copy link
Copy Markdown
Collaborator

GetHashCode(): fair, probably oversight. Slightly complicated by the tricky equality rules; one for the Todo pile!

@pairbit

pairbit commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

I have fully implemented everything I planned to support ROS<byte>. Except for the SequenceCompareTo method, which I have commented out (I don't have a verified implementation of this method. I'll add it later). I have also implemented GetHashCode and added other optimizations. Everything should work as expected. Please check it thoroughly.

@pairbit

pairbit commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

I've come to the conclusion that we don't need ROS for internal use. We can use an ROSI, which will iterate through all the segments once. What do you think about this?

internal ref struct ReadOnlySequenceIterator<T>(ReadOnlySequenceSegment<T> segment, int startIndex, int length)

@mgravell

mgravell commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

I've come to the conclusion that we don't need ROS for internal use.

Agreed; most consumption scenarios will just want to walk the segments, which we can do without having to build a ROS - we just need to be careful to track the length; I wonder whether the actual thing to do here is to write a custom iterator so we can do foreach(var chunk in segment.Iterate(offset, length)); something like:

internal static class SequenceSegmentIterator
{
    public static ChunkEnumerator Iterate(
        this ReadOnlySequenceSegment<byte> segment, int offset, int length)
        => new(segment, offset, length);

    internal struct ChunkEnumerator
    {
        private ReadOnlySequenceSegment<byte>? _segment; // current node, null when exhausted
        private int _offset;                             // start offset within _segment
        private int _remaining;                          // total bytes still to yield
        private ReadOnlyMemory<byte> _current;

        public ChunkEnumerator(ReadOnlySequenceSegment<byte> segment, int offset, int length)
        {
            _segment = segment;
            _offset = offset;
            _remaining = length;
            _current = default;
        }

        // 'this' is returned by value, so each foreach gets a fresh, independent cursor.
        public readonly ChunkEnumerator GetEnumerator() => this;

        public readonly ReadOnlyMemory<byte> Current => _current;

        public bool MoveNext()
        {
            while (_segment is not null && _remaining > 0)
            {
                ReadOnlyMemory<byte> mem = _segment.Memory;

                if (_offset < mem.Length) // skips to the first offset and skips empty segments
                {
                    int take = Math.Min(mem.Length - _offset, _remaining);
                    _current = mem.Slice(_offset, take);
                    _remaining -= take;
                    _offset = 0;                  // only the first chunk is offset
                    _segment = _segment.Next;     // advance for the next call
                    return true;
                }

                // offset spills past this (or an empty) segment — skip it
                _offset -= mem.Length;
                _segment = _segment.Next;
            }

            _current = default;
            return false;
        }
    }
}

thoughts?

@pairbit

pairbit commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

My option is without Current Memory:

internal struct ReadOnlySequenceIterator<T>(ReadOnlySequenceSegment<T> segment, int startIndex, int length)
{
    private int _index = startIndex;
    private int _length = length;
    private ReadOnlySequenceSegment<T> _segment = segment;

    public readonly int Length => _length;

    public bool TryNext(out ReadOnlyMemory<T> memory)
    {
        if (_length > 0)
        {
            memory = _segment.Memory;

            // first
            if (_index > 0)
            {
                memory = memory.Slice(_index);
                _index = 0;
            }

            // end
            if (_length <= memory.Length)
            {
                memory = memory.Slice(0, _length);
                _length = 0;
                _segment = null!;
            }
            else
            {
                _length -= memory.Length;
                _segment = _segment.Next ?? throw new InvalidOperationException("EndSegment is null");
            }
            return true;
        }
        memory = default;
        return false;
    }
}

@mgravell

Copy link
Copy Markdown
Collaborator

ah, fine; same concept, different implementation - yours is fine; we might want to ensure we skip empty chunks (rare but technically legal), but otherwise: fine

the hash approach is broken, however; we need to ensure that 10, "10", "10"u8 and (now) a ROS of "1"u8, "0"u8 are all "equal"

I'll see if I can fix both of these and push to the branch

@mgravell

Copy link
Copy Markdown
Collaborator

also: TryParse missing logic; GetCharCount doesn't correctly handle code-points that straddle chunks

@pairbit

pairbit commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

TryParse missing logic;

is it necessary to support numbers in Sequence?

@mgravell

Copy link
Copy Markdown
Collaborator

yes, but I'm on it; the fundamental principle here is that RedisValue should behave the same regardless of how the equivalent value is expressed; the "simplify" logic waves away a lot of this, but...

Re EncodingExtensions: I'm going to lean on dotnet/runtime's logic for both GetChars and GetCharCount: https://github.com/dotnet/dotnet/blob/b0f34d51fccc69fd334253924abd8d6853fad7aa/src/runtime/src/libraries/System.Memory/src/System/Text/EncodingExtensions.cs#L293C13-L321C59

- GetChars/GetCharCount
- Simplify
- normalize .First.Span
@mgravell

mgravell commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator
  • GetHashCode()
  • GetChars/GetCharCount
  • Simplify
  • normalize FirstSpan vs First.Span usage via extension property
    • fixed a side bug found in the cycle-buffer logic!
  • TryParse
  • CompareTo
  • dead WriteUnifiedSequence
  • integration test

@mgravell

Copy link
Copy Markdown
Collaborator

OK, I think that's everything except for a meaningful integration test, i.e. can you do the thing you actually want to do! adding that now

@mgravell

Copy link
Copy Markdown
Collaborator

OK, I think that should do it; any thoughts or other input? I hope I haven't stomped your PR too much!

@mgravell

Copy link
Copy Markdown
Collaborator

I think the remaining gap is GetHashCode - I backed out the changes, as they were fundamentally broken; I need to think what to do there. I agree the string approach is inefficient, but: people usually aren't using BLOB-based RedisValue as keys. If we want to implement it properly, the important thing is that it is consistent between the string and BLOB variants. Let me see whether we can hijack the runtime's GetHashCode() logic against a span, which might light up some options.

@mgravell

mgravell commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

OK, I've improved GetHashCode to not allocate a string in the BLOB cases on modern .NET; we still decode as though it were a string, but: into a lease. Not perfect, but less sucky.

@pairbit

pairbit commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

OK, I think that should do it; any thoughts or other input?

I think that's enough. In the future, we may replace RawSequence() with RawSequenceIterator() in other places, but this is not a critical issue at the moment.

I hope I haven't stomped your PR too much!

I am very pleased that you have approved Sequence support, and I am glad that you have found a critical bug in CycleBuffer.

OK, I've improved GetHashCode to not allocate a string in the BLOB cases; we still decode as though it were a string, but: into a lease. Not perfect, but less sucky.

In what cases does the GetHashCode method of RedisValue get called?

@pairbit

pairbit commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Why not read the string as bytes and hash the bytes?

@mgravell

mgravell commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

In what cases does the GetHashCode method of RedisValue get called?

Scenarios like if a caller creates a Dictionary<RedisValue, whatever> and uses it as the key. If you're not doing that: it doesn't matter. The key thing we need to guard against here is, to illustrate:

RedisValue orig = "some random string here";
dict.Add(orig, someState); // a RedisValue on top of a string
db.SetAsync(key, orig);
...
var fromDb = await db.GetAsync(key); // probably BLOB based
var found = dict[fromDb]; // THIS MUST MATCH, assuming the value was the BLOB of orig

@pairbit

pairbit commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

In what cases does the GetHashCode method of RedisValue get called?

Scenarios like if a caller creates a Dictionary<RedisValue, whatever> and uses it as the key. If you're not doing that: it doesn't matter. The key thing we need to guard against here is, to illustrate:

RedisValue orig = "some random string here";
dict.Add(orig, someState); // a RedisValue on top of a string
db.SetAsync(key, orig);
...
var fromDb = await db.GetAsync(key); // probably BLOB based
var found = dict[fromDb]; // THIS MUST MATCH, assuming the value was the BLOB of orig

Thanks for the clarifications. I thought that the SE.Redis library uses GetHashCode.

@mgravell

Copy link
Copy Markdown
Collaborator

OK, we're good everywhere except Windows; I'm going to go ahead and merge this in - I have some follow-up ideas that I'd like to advance.

Great suggestion, PR, and collaboration.

@mgravell mgravell merged commit 16bde35 into StackExchange:main Jun 25, 2026
5 of 6 checks passed
@pairbit pairbit deleted the ReadOnlySequence branch June 26, 2026 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants