Avoiding Row-by-Row Processing 避免逐行处理
A set-based program and row-by-row processing are not mutually exclusive: some rules do call for rowby-row processing, but these rules are the exceptions. You can have a row-by-row component within a mostly set-based program.
基于集合的程序和逐行处理并不相互排斥:有些规则确实要求逐行处理,但这些规则是例外。您可以在主要基于集合的程序中使用逐行组件。
For example, suppose your program contains five rules that you will run against your data. Four of those rules lend themselves well to a set-based approach, while the fifth requires a row-by-row process. In this situation, run the four set-based steps or rules first, and then run the row-by-row step last to resolve the exceptions. Although not pure set-based processing, you will obtain better performance than if the entire program used a row-by-row approach.
例如,假设您的程序包含五个规则,您将对数据运行这些规则。这些规则中有四条适合于基于集合的方法,而第五条则需要逐行处理。在这种情况下,首先运行四个基于集合的步骤或规则,然后最后运行逐行的步骤来解决异常。虽然不是纯粹的基于集合的处理,但您将获得比整个程序使用逐行方法更好的性能。
When performing a row-by-row update, reduce the number of rows and the number of columns that you select to an absolute minimum to decrease the data transfer time.
执行逐行更新时,请将所选的行数和列数减少到绝对最小值,以减少数据传输时间。
For logic that cannot be coded entirely in set, try to process most of the transactions in set and process only the exceptions in a row-by-row loop. A good example of an exception is the sequence numbering of detail lines within a transaction when most transactions have only a single detail line. You can set the sequence number on all the detail lines to 1 by default in an initial set-based operation, and then carry out a Select statement to retrieve only the exceptions (duplicates) and update their sequence numbers to 2, 3, and so on.
对于不能完全在set中编码的逻辑,尝试在set中处理大多数事务,而在逐行循环中只处理例外。一个很好的例外示例是,当大多数事务只有一个明细行时,事务中明细行的序列编号。在初始的基于集合的操作中,可以将所有细节行上的序列号默认设置为1,然后执行Select语句仅检索异常(重复项)并将它们的序列号更新为2、3和就这样。
Avoid the tendency to expand row-by-row processing for more than is necessary. For example, if you are touching all of the rows in a table in a specific row-based process, you do not necessarily gain efficiency by running the rest of your logic on that table in a row-based manner.
避免将逐行处理扩展到超出所需范围的情况。例如,如果在一个特定的基于行的进程中涉及表中的所有行,那么以基于行的方式在该表上运行逻辑的其余部分并不一定会提高效率。
When updating a table, you can add another column to be set in the Update statement. However, do not add another SQL statement to your loop simply because your program is looping. If you can apply that SQL in a set-based manner, then in most cases you achieve better performance with a set-based SQL statement outside the loop.
更新表时,可以添加要在Update语句中设置的另一列。但是,不要仅仅因为程序正在循环而向循环中添加另一条SQL语句。如果可以以基于集合的方式应用SQL,那么在大多数情况下,在循环外使用基于集合的SQL语句可以获得更好的性能。
The rest of this section describes techniques for avoiding row-by-row processing and enhancing performance.
本节的其余部分将介绍避免逐行处理和增强性能的技术。
Filtering 过滤
Using SQL, filter the set to contain only those rows that are affected or meet the criteria and then run the rule on them. Use a Where clause to minimize the number of rows to reflect only the set of affected rows.
使用SQL对集合进行筛选,使其仅包含受影响或满足条件的行,然后对这些行运行规则。使用Where子句最小化行数以仅反映受影响的行集。
Two-Pass Approach 双通道法
Use a two-pass approach, wherein the first pass runs a rule on all of the rows and the second pass resolves any rows that are exceptions to the rule. For instance, bypass exceptions to the rule during the first pass, and then address the exceptions individually in a row-by-row manner.
使用两次传递方法,其中第一次传递在所有行上运行一个规则,第二次传递解析该规则的异常的任何行。例如,在第一遍中绕过规则的异常,然后以逐行的方式单独处理这些异常。
Parallel Processes 并行处理
Divide sets into distinct groups and then run the appropriate rules or logic against each set in parallel processes. For example, you could split an employee data population into distinct sets of hourly and salary employees, and then you could run the appropriate logic for each set in parallel.
将集合划分为不同的组,然后在并行进程中对每个集合运行适当的规则或逻辑。例如,您可以将雇员数据群体划分为小时工和薪水工的不同集合,然后可以为每个集合并行运行适当的逻辑。
Flat Temporary Tables 平面临时表
Flatten your temporary tables. The best temporary tables are denormalized and follow a flat file model for improved transaction processing.
把你们的临时桌子放平。最好的临时表是非规范化的,并遵循平面文件模式以改进事务处理。
For example, payroll control data might be keyed by setID and effective dates rather than by business unit and accounting date. Use the temporary table to denormalize the data and switch the keys to business unit and accounting date. Afterwards, you can construct a straight join to the Time Clock table and key it by business unit and date.
例如,工资控制数据可能按setID和有效日期而不是按业务单位和会计日期键入。使用临时表对数据进行非规范化,并将键切换到业务单位和会计日期。然后,您可以构造到Time Clock表的直接联接,并按业务单元和日期对其进行键。
Techniques to Avoid 避免的技巧
Note that:
要注意的是:
- If you have a series of identical temporary tables, examine your refinement process.
- 如果您有一系列相同的临时表,请检查您的细化过程。
- You should not attempt to accomplish a task that your database platform does not support, as in complex mathematics, non-standard SQL, and complex analytical modeling.
- 你不应该试图完成你的数据库平台不支持的任务,如复杂的数学,非标准SQL和复杂的分析建模。
Use standard SQL for set processing.
使用标准SQL进行设置处理。
- Although subqueries are a useful tool for refining your set, make sure that you are not using the same one multiple times.
- 虽然子查询是一个有用的工具,为完善您的集,请确保您没有使用同一个多次。
If you are using the same subquery in more than one statement, you should probably have denormalized the query results into a temporary table. Identify the subqueries that appear frequently and, if possible, denormalize the queried data into a temporary table.
如果您在多个语句中使用相同的子查询,您可能已经将查询结果反规范化到一个临时表中。识别经常出现的子查询,如果可能的话,将查询的数据反规范化到一个临时表中。