Rendered at 14:06:03 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
kjellsbells 2 hours ago [-]
The database wars of the late 1990s were full of this kind of stuff. Oracle, Sybase, IBM etc invested heavily in tuning specifically for benchmarks like TPC-C just so they could post ads in the Wall St Journal saying theirs was faster.
I do sympathize with OP, though, their objection to measuring cold-start queries is incomplete without also describing how often cold start needs to happen. If you restart once every five years then it doesnt matter as much if it takes 20 minutes to be warm. Every hour, that would be a real problem.
ozgrakkurt 21 minutes ago [-]
The dataset they use is <14GB of parquet [1] so the "cold start" seems to be intended to also measure having a dataset that doesn't fit in memory in a way.
I don't think this is an oversight but it is just what they found to be feasible. This is explicitly written in [1]. Also the guy who setup this benchmark is very serious about benchmarking under difficult conditions [2]
My personal opinion is that you need a massive amount of data and massive number of different variables to test for separately. For example you might want to monitor how many cache misses/hits there were, p99 latency etc. And you want to do it under full load, expected load etc. And you want to compare the different versions of the same database because comparing different databases makes things combinatorially more difficult, unless you have a real production use case that you are optimizing for ofc.
The swisstable talk on cppcon is a good example of a useful benchmark and optimization that shows how difficult it is to really asses performance effects of even "small" changes. [3]
Anyone here using QuestDB in production? What is your use case? What is your experience?
We want to migrate away from InfluxDB eventually (because of their 180 on OSS, and their tendency to reinvent the product every major release), and QuestDB seems like an interesting option.
bitlad 4 hours ago [-]
Reminds me of the recent Terminal Bench controversy [1][2][3]
If theres a benchmark, people will cheat, lie and optimize for that benchmark. Honest depends on the compliance enforced on teams. But if, compliance itself is weak, it is going to be taken advantage of. Like growing up india, you would optimize for the exam and not what you learn from it.
Exactly! The task gets even trickier when you're benchmarking lots of systems of different kinds: cloud databases, self-hosted ones, embedded engines, CLI tools.
N_Lens 3 hours ago [-]
Same with LLM benchmarks these days.
Metaluim 2 hours ago [-]
Well, the pelican benchmark is easily verifiable.
echoangle 30 minutes ago [-]
Kind of hard to judge though, it’s not really objective how good a pelican looks.
dkdcdev 1 hours ago [-]
see also “ Fair Benchmarking Considered Difficult:
Common Pitfalls In Database Performance Testing” by the DuckDB folks with a classic Figure 1
puzpuzpuz-hn 34 minutes ago [-]
Thanks for the reference. Will check!
ozgrakkurt 2 hours ago [-]
Really respectable writing and perspective. Questdb blog posts that get posted here never disappoint
puzpuzpuz-hn 33 minutes ago [-]
Thanks! We do our best to be as transparent as possible when it comes to benchmarking.
I do sympathize with OP, though, their objection to measuring cold-start queries is incomplete without also describing how often cold start needs to happen. If you restart once every five years then it doesnt matter as much if it takes 20 minutes to be warm. Every hour, that would be a real problem.
I don't think this is an oversight but it is just what they found to be feasible. This is explicitly written in [1]. Also the guy who setup this benchmark is very serious about benchmarking under difficult conditions [2]
My personal opinion is that you need a massive amount of data and massive number of different variables to test for separately. For example you might want to monitor how many cache misses/hits there were, p99 latency etc. And you want to do it under full load, expected load etc. And you want to compare the different versions of the same database because comparing different databases makes things combinatorially more difficult, unless you have a real production use case that you are optimizing for ofc.
The swisstable talk on cppcon is a good example of a useful benchmark and optimization that shows how difficult it is to really asses performance effects of even "small" changes. [3]
[1] https://github.com/ClickHouse/ClickBench#data-loading
[2] https://www.youtube.com/watch?v=CAS2otEoerM
[3] https://www.youtube.com/watch?v=ncHmEUmJZf4
We want to migrate away from InfluxDB eventually (because of their 180 on OSS, and their tendency to reinvent the product every major release), and QuestDB seems like an interesting option.
If theres a benchmark, people will cheat, lie and optimize for that benchmark. Honest depends on the compliance enforced on teams. But if, compliance itself is weak, it is going to be taken advantage of. Like growing up india, you would optimize for the exam and not what you learn from it.
[1] https://news.ycombinator.com/item?id=47920787
[2] https://www.tbench.ai/news/leaderboard-integrity-update
[3] https://debugml.github.io/cheating-agents/