Skip to content

Fix sf_error(NULL) race condition under concurrent open failures#480

Open
AndreasKaratzas wants to merge 1 commit intobastibe:masterfrom
AndreasKaratzas:akaratza_fix_race_cond
Open

Fix sf_error(NULL) race condition under concurrent open failures#480
AndreasKaratzas wants to merge 1 commit intobastibe:masterfrom
AndreasKaratzas:akaratza_fix_race_cond

Conversation

@AndreasKaratzas
Copy link
Copy Markdown

Serialise each sf_open*() + sf_error(NULL) pair under a class-level lock so the global error state cannot be clobbered between the two calls when multiple threads fail concurrently.

Problem

sf_error(NULL) reads a global (non-thread-safe) error variable in libsndfile. When multiple threads concurrently open unsupported formats (e.g. MP4 bytes via sf_open_virtual), one thread can clear the global error before another reads it. This produces LibsndfileError with code=0 and the message "(Garbled error message from libsndfile)" instead of the correct error code.

See #479 for a full reproduction script.

Verification

Before fix:
400 concurrent attempts on MP4 bytes:
All codes seen: {0, 1}
Code 0 (race condition) count: 23

After fix:
same test:
All codes seen: {1}
Code 0 (race condition) count: 0

Normal read/write and concurrent reads of valid audio files are unaffected.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
@AndreasKaratzas
Copy link
Copy Markdown
Author

cc @bastibe Lmk if this looks good :)

@AndreasKaratzas
Copy link
Copy Markdown
Author

cc @bastibe

Does this PR look good to you? Would really appreciate your review :)

Copy link
Copy Markdown
Owner

@bastibe bastibe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this bugfix. Did you observe this bug in the wild, or is this theoretical?

I don't like the way the lock and the error handling is repeated three times. How about we wrap the opening function in an openfunction, much like the isinstance(file, bytes) case (currying with additional virtio-wrapper and args as needed), and call them in a common place in the end?

@AndreasKaratzas
Copy link
Copy Markdown
Author

@bastibe Yep, we observed it in vLLM. The related PR on vLLM: vllm-project/vllm#37616

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants