As I understand it, the charts (especially the second) are essentially showing how efficient each project's application is relative to those of other projects. It takes credits, which can equated with FLOPS, and compares that with seconds of CPU time.
So I don't think the Project Credit Comparison charts are lying at all; their purpose is to effectively quantify the difference between how optimized and robust different projects' applications are.
I can agree to a point.....why I think it is misleading is it may say x project gives the best credit.....but people may use this to decide which project to crunch when in fact project y may give them better credit performance based on their architecture ,OS,and whether they can get an optimized app to work (win may be easy but other OS can stump many).
To give another example: Windoze may kick butt on 1 project but Linux on another....how is this ever accurately displayed?
Yet another example-You are comparing a project using fixed credit vs a project using benchmarks....how is that factor ever compensated for? Through total output? You may be doing more flops on 1 project that actually grants less credit than another.
IMHO the 'best' way is to run project x vs y on the same host,all variables considered, and make your own judgement over a period of time and only use credit charts as a rough guide

More refinement is just a waste of energy unless the 'true' breakdowns can be done.(Actually doing that would be a great Boinc project!)