CUTRACE

case04_bmm_sparse

kernel
searchEndMatrixIndicesCudaKernel
thread
block=(0,0,0)
thread=(0,0,0)
  • The left pane keeps the same source window across all slides.
  • The right pane mixes replay-watch results with recovered source-level values.
  • Use the chain to explain how the search moves from midIdx = 0 to indices1D[-1].
line 44

The Search Starts by Moving Left

trimRight = (midVal > targetMatNum)
targetMatNum
0
midIdx
0
midVal
1
trimRight
true

midVal = 1 is recovered from the input indices logical=[1] plus the pre-loop midpoint midIdx = 0. The replay-watch log directly confirms only the comparison result at line 44.

line 48

The Right Bound Drops Below Zero

endIdx = trimRight ? midIdxMinus1 : endIdx;
trimRight
true
midIdxMinus1
-1
old endIdx
0
new endIdx
-1

Both midIdxMinus1 = -1 at line 45 and endIdx = -1 at line 48 come directly from replay-watch. The invalid state is already formed here.

line 50

The Midpoint Recomputes to -1

midIdx = (startIdx + endIdx) >> 1;
startIdx
0
endIdx
-1
startIdx + endIdx
-1
midIdx
-1

The replay-watch log directly prints startIdx + endIdx = -1 at line 50. From that expression, the next midpoint is recovered as midIdx = -1.

line 51

The Next Load Becomes indices1D[-1]

midVal = indices1D[midIdx * indices1DStride];
midIdx
-1
indices1DStride
1

The next read becomes indices1D[-1]. Compute Sanitizer reports the same failure as an Invalid __global__ read that lands 8 bytes before the nearest allocation.