Guess what, scalar-valued functions are used a lot in where clauses. That means they have to be evaluated for every candidate row. No, I don't mean just each row in the result set, I mean every row that SQL Server thinks might end up in the result set. In order to figure out if a row belongs in the result set, the engine must execute the function with that row's values. That's usually a lot of executions because, in principle, the select can return the whole table. Some rows might be discarded early due to other where conditions but you get the idea.
To end up with bad performance, all you need is a table with a few million rows. Now imagine running your function for a significant portion of those rows. How many executions is that? If the function performs a simple calculation on the parameter values that's not too bad. After all, computers can perform billions (or is it trillions?) of operations per second these days, can't they? The real problem occurs when the function has to run another select. If this second select involves a large number of rows by itself, now the main select has a few orders of magnitude more work to do. But even if the second select only has to look at a single row, you're still left with a significant number of random I/O reads that can bring your main select to a crawl.
So what's the answer? Since a scalar-valued function can contain an endless combination of T-SQL statements, SQL Server has a limited ability to adjust the execution plan of the function's select to optimize the overall performance of the main select. If you can join the function's select into the main query, the engine stands a much better chance of finding an efficient plan. But what about the encapsulation and re-usability of the select in the function? Do you have to throw that away in the name of performance? Not really, there are other ways you can achieve the same goals. You might want to try a view or a stored procedure instead. Even better, have you heard of table-valued functions?
If you use one of them, the overall structure of your query will look very different and that's not a bad thing. The idea of using a non-trivial scalar-valued function in a select sounds very imperative anyway - the kind of thing that appeals to an applications programming mindset. In a relational database, you're usually better off using a set based approach.
To end up with bad performance, all you need is a table with a few million rows. Now imagine running your function for a significant portion of those rows. How many executions is that? If the function performs a simple calculation on the parameter values that's not too bad. After all, computers can perform billions (or is it trillions?) of operations per second these days, can't they? The real problem occurs when the function has to run another select. If this second select involves a large number of rows by itself, now the main select has a few orders of magnitude more work to do. But even if the second select only has to look at a single row, you're still left with a significant number of random I/O reads that can bring your main select to a crawl.
So what's the answer? Since a scalar-valued function can contain an endless combination of T-SQL statements, SQL Server has a limited ability to adjust the execution plan of the function's select to optimize the overall performance of the main select. If you can join the function's select into the main query, the engine stands a much better chance of finding an efficient plan. But what about the encapsulation and re-usability of the select in the function? Do you have to throw that away in the name of performance? Not really, there are other ways you can achieve the same goals. You might want to try a view or a stored procedure instead. Even better, have you heard of table-valued functions?
If you use one of them, the overall structure of your query will look very different and that's not a bad thing. The idea of using a non-trivial scalar-valued function in a select sounds very imperative anyway - the kind of thing that appeals to an applications programming mindset. In a relational database, you're usually better off using a set based approach.
No comments:
Post a Comment