Skip to content

HBASE-30038: RefCnt Leak error when caching#7995

Open
dParikesit wants to merge 5 commits intoapache:masterfrom
dParikesit:5-RefCnt-release
Open

HBASE-30038: RefCnt Leak error when caching#7995
dParikesit wants to merge 5 commits intoapache:masterfrom
dParikesit:5-RefCnt-release

Conversation

@dParikesit
Copy link
Copy Markdown
Contributor

JIRA: HBASE-30038

This bug is similar to HBASE-28890

HFileBlock.Writer.getBlockForCaching() creates a ref-counted HFileBlock, and the caller must release its own reference after the cache has taken the ownership it needs.

In HFileWriterImpl.java, the leak happened when shouldCacheBlock() returned before cacheFormatBlock.release() ran.

In HFileBlockIndex.java, the intermediate-index path cached blockForCaching but never released the reference at all.

Because these blocks can be backed by pooled off-heap ByteBuffs, repeated HFile writes could steadily drain allocator buffers and effectively leak memory, even though the blocks were only meant to live long enough to be considered for cache-on-write.

@wchevreuil
Copy link
Copy Markdown
Contributor

Please consider @vaijosh suggestions and also fix spotless issues.

@dParikesit
Copy link
Copy Markdown
Contributor Author

Thanks for the suggestions. I'll get back to you after I fix it.

@wchevreuil
Copy link
Copy Markdown
Contributor

Any news on this, @dParikesit ? Please get the reviews addressed so that we can move forward here, as this is an important fix you have found.

@dParikesit
Copy link
Copy Markdown
Contributor Author

@wchevreuil thanks for the reminder. I have modified the tests to use retry-loop instead of sleep (including the test added in HBASE-28890 ). I've tested it on my machine. Can you help trigger the CI pipeline?

for (int i = 0; i < 15 && counter.get() == 0; i++) {
System.gc();
alloc.allocate(128 * 1024).release();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dParikesit
You have not added sleep inside the loop so it will finish quicky and deafeat the purpose.

Please keep it something like below

for (int i = 0; i < 30 && counter.get() == 0; i++) {
System.gc();
alloc.allocate(128 * 1024).release();
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
break;
}
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I believe the sleep should be done after we run allocate.release() right? I have updated the commit to change it.

for (int i = 0; i < 15 && counter.get() == 0; i++) {
System.gc();
allocator.allocate(128 * 1024).release();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dParikesit
Please add Put slip inside the loop and make 30 retries instead of 15

Copy link
Copy Markdown
Contributor

@vaijosh vaijosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dParikesit.
Changes LTGM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants