pg 上拉子链接源码分析
此分析基于 version 13.8, 在pg 中把子查询分为两类,一类是from 中的子查询; 一类是where 中的子查询,叫做子链接(sublink)。
下面用sublink(子链接)指代where中的子查询, subquery(子查询) 指代from 中的子查询。
1. pull_up_sublinks
void pull_up_sublinks(PlannerInfo *root)
- 被 subquery_planner 调用;也可能被pull_up_simple_subquery 调用,用于在 pull_up_subqueries 时处理子查询中的sublink。
- 上拉子链接的入口
- 调用 pull_up_sublinks_jointree_recurse 来递归的上拉子链接, 传入parse->jointree,返回处理后的jointree
- 设置 root->parse->jointree 为处理后的 jointree (不为 FromExpr节点类型 则构建为FromExpr节点)
1.1 pull_up_sublinks_jointree_recurse
/** Recurse through jointree nodes for pull_up_sublinks()** In addition to returning the possibly-modified jointree node, we return* a relids set of the contained rels into *relids.*/
static Node *
pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode,Relids *relids)
{if (jtnode == NULL){*relids = NULL;}else if (IsA(jtnode, RangeTblRef)){int varno = ((RangeTblRef *) jtnode)->rtindex;*relids = bms_make_singleton(varno);/* jtnode is returned unmodified */}else if (IsA(jtnode, FromExpr)){FromExpr *f = (FromExpr *) jtnode;List *newfromlist = NIL;Relids frelids = NULL;FromExpr *newf;Node *jtlink;ListCell *l;/* First, recurse to process children and collect their relids */foreach(l, f->fromlist){Node *newchild;Relids childrelids;newchild = pull_up_sublinks_jointree_recurse(root,lfirst(l),&childrelids);newfromlist = lappend(newfromlist, newchild);frelids = bms_join(frelids, childrelids);}/* Build the replacement FromExpr; no quals yet */newf = makeFromExpr(newfromlist, NULL);/* Set up a link representing the rebuilt jointree */jtlink = (Node *) newf;/* Now process qual --- all children are available for use */newf->quals = pull_up_sublinks_qual_recurse(root, f->quals,&jtlink, frelids,NULL, NULL);/** Note that the result will be either newf, or a stack of JoinExprs* with newf at the base. We rely on subsequent optimization steps to* flatten this and rearrange the joins as needed.** Although we could include the pulled-up subqueries in the returned* relids, there's no need since upper quals couldn't refer to their* outputs anyway.*/*relids = frelids;jtnode = jtlink;}else if (IsA(jtnode, JoinExpr)){JoinExpr *j;Relids leftrelids;Relids rightrelids;Node *jtlink;/** Make a modifiable copy of join node, but don't bother copying its* subnodes (yet).*/j = (JoinExpr *) palloc(sizeof(JoinExpr));memcpy(j, jtnode, sizeof(JoinExpr));jtlink = (Node *) j;/* Recurse to process children and collect their relids */j->larg = pull_up_sublinks_jointree_recurse(root, j->larg,&leftrelids);j->rarg = pull_up_sublinks_jointree_recurse(root, j->rarg,&rightrelids);/** Now process qual, showing appropriate child relids as available,* and attach any pulled-up jointree items at the right place. In the* inner-join case we put new JoinExprs above the existing one (much* as for a FromExpr-style join). In outer-join cases the new* JoinExprs must go into the nullable side of the outer join. The* point of the available_rels machinations is to ensure that we only* pull up quals for which that's okay.** We don't expect to see any pre-existing JOIN_SEMI or JOIN_ANTI* nodes here.*/switch (j->jointype){case JOIN_INNER:j->quals = pull_up_sublinks_qual_recurse(root, j->quals,&jtlink,bms_union(leftrelids,rightrelids),NULL, NULL);break;case JOIN_LEFT:j->quals = pull_up_sublinks_qual_recurse(root, j->quals,&j->rarg,rightrelids,NULL, NULL);break;case JOIN_FULL:/* can't do anything with full-join quals */break;case JOIN_RIGHT:j->quals = pull_up_sublinks_qual_recurse(root, j->quals,&j->larg,leftrelids,NULL, NULL);break;default:elog(ERROR, "unrecognized join type: %d",(int) j->jointype);break;}/** Although we could include the pulled-up subqueries in the returned* relids, there's no need since upper quals couldn't refer to their* outputs anyway. But we *do* need to include the join's own rtindex* because we haven't yet collapsed join alias variables, so upper* levels would mistakenly think they couldn't use references to this* join.*/*relids = bms_join(leftrelids, rightrelids);if (j->rtindex)*relids = bms_add_member(*relids, j->rtindex);jtnode = jtlink;}elseelog(ERROR, "unrecognized node type: %d",(int) nodeTag(jtnode));return jtnode;
}
- 根据节点类型进行处理,返回修改后的新 jtnode 及 以此jtnode作为root涉及的 relids。
- 第一次调用,节点类型必为 FromExpr。
- 节点类型为 RangeTblRef – 叶子节点
- 获取relids(bitmap), 返回
- 可能为subquery 表, 这里不处理子查询中的sublink。 在调用pull_up_subqueries (subquery_planner)时处理
- 节点类型为 FromExpr f=(FromExpr *) jtnode
- 遍历 f->fromlist, 对于fromlist 中的每个成员递归调用 pull_up_sublinks_jointree_recurse处理(一直处理到叶子节点 RangeTblRef 才返回), 返回处理后的成员。根据处理后的结构构建 newfromlist 和 frelids (涉及的关系集合)
- 根据 newfromlist 创建新的 FromExpr(newf)。
- 调用 pull_up_sublinks_qual_recurse处理约束 f->quals(包含子链接,即实际在pull_up_sublinks_qual_recurse 中处理子链接) , 返回新的quals,设置到 newf->quals。
- *relids = frelids; 返回涉及的relids
- jtnode = jtlink; 返回新的jtnode
- 节点类型为 JoinExpr j=jtnode actually copy
- 对于join的左右子树(j->larg,j->rarg)递归调用 pull_up_sublinks_jointree_recurse处理, 并赋为新值。
- 根据 j->jointype 处理 join 的关联条件(约束),都调用 pull_up_sublinks_qual_recurse, 传参不同
- JOIN_INNER 内连接:传入 jtlink, bms_union(leftrelids,rightrelids)
- JOIN_LEFT 左连接:传入 j->rarg,rightrelids, 只处理右表(nullable)相关子链接,并上拉子链接到rarg
- JOIN_FULL 全连接, 不处理
- JOIN_RIGHT 右连接:传入 j->larg,leftrelids, 只处理左表(nullable)相关子链接, 并上拉子链接到larg
- 构造 relids(leftrelids, rightrelids,j->rtindex)返回
- 返回新jtnode
1.2 pull_up_sublinks_qual_recurse
/** Recurse through top-level qual nodes for pull_up_sublinks()** jtlink1 points to the link in the jointree where any new JoinExprs should* be inserted if they reference available_rels1 (i.e., available_rels1* denotes the relations present underneath jtlink1). Optionally, jtlink2 can* point to a second link where new JoinExprs should be inserted if they* reference available_rels2 (pass NULL for both those arguments if not used).* Note that SubLinks referencing both sets of variables cannot be optimized.* If we find multiple pull-up-able SubLinks, they'll get stacked onto jtlink1* and/or jtlink2 in the order we encounter them. We rely on subsequent* optimization to rearrange the stack if appropriate.** Returns the replacement qual node, or NULL if the qual should be removed.*/
static Node *
pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node,Node **jtlink1, Relids available_rels1,Node **jtlink2, Relids available_rels2)
{if (node == NULL)return NULL;if (IsA(node, SubLink)){SubLink *sublink = (SubLink *) node;JoinExpr *j;Relids child_rels;/* Is it a convertible ANY or EXISTS clause? */if (sublink->subLinkType == ANY_SUBLINK){if ((j = convert_ANY_sublink_to_join(root, sublink,available_rels1)) != NULL){/* Yes; insert the new join node into the join tree */j->larg = *jtlink1;*jtlink1 = (Node *) j;/* Recursively process pulled-up jointree nodes */j->rarg = pull_up_sublinks_jointree_recurse(root,j->rarg,&child_rels);/** Now recursively process the pulled-up quals. Any inserted* joins can get stacked onto either j->larg or j->rarg,* depending on which rels they reference.*/j->quals = pull_up_sublinks_qual_recurse(root,j->quals,&j->larg,available_rels1,&j->rarg,child_rels);/* Return NULL representing constant TRUE */return NULL;}if (available_rels2 != NULL &&(j = convert_ANY_sublink_to_join(root, sublink,available_rels2)) != NULL){/* Yes; insert the new join node into the join tree */j->larg = *jtlink2;*jtlink2 = (Node *) j;/* Recursively process pulled-up jointree nodes */j->rarg = pull_up_sublinks_jointree_recurse(root,j->rarg,&child_rels);/** Now recursively process the pulled-up quals. Any inserted* joins can get stacked onto either j->larg or j->rarg,* depending on which rels they reference.*/j->quals = pull_up_sublinks_qual_recurse(root,j->quals,&j->larg,available_rels2,&j->rarg,child_rels);/* Return NULL representing constant TRUE */return NULL;}}else if (sublink->subLinkType == EXISTS_SUBLINK){if ((j = convert_EXISTS_sublink_to_join(root, sublink, false,available_rels1)) != NULL){/* Yes; insert the new join node into the join tree */j->larg = *jtlink1;*jtlink1 = (Node *) j;/* Recursively process pulled-up jointree nodes */j->rarg = pull_up_sublinks_jointree_recurse(root,j->rarg,&child_rels);/** Now recursively process the pulled-up quals. Any inserted* joins can get stacked onto either j->larg or j->rarg,* depending on which rels they reference.*/j->quals = pull_up_sublinks_qual_recurse(root,j->quals,&j->larg,available_rels1,&j->rarg,child_rels);/* Return NULL representing constant TRUE */return NULL;}if (available_rels2 != NULL &&(j = convert_EXISTS_sublink_to_join(root, sublink, false,available_rels2)) != NULL){/* Yes; insert the new join node into the join tree */j->larg = *jtlink2;*jtlink2 = (Node *) j;/* Recursively process pulled-up jointree nodes */j->rarg = pull_up_sublinks_jointree_recurse(root,j->rarg,&child_rels);/** Now recursively process the pulled-up quals. Any inserted* joins can get stacked onto either j->larg or j->rarg,* depending on which rels they reference.*/j->quals = pull_up_sublinks_qual_recurse(root,j->quals,&j->larg,available_rels2,&j->rarg,child_rels);/* Return NULL representing constant TRUE */return NULL;}}/* Else return it unmodified */return node;}if (is_notclause(node)){/* If the immediate argument of NOT is EXISTS, try to convert */SubLink *sublink = (SubLink *) get_notclausearg((Expr *) node);JoinExpr *j;Relids child_rels;if (sublink && IsA(sublink, SubLink)){if (sublink->subLinkType == EXISTS_SUBLINK){if ((j = convert_EXISTS_sublink_to_join(root, sublink, true,available_rels1)) != NULL){/* Yes; insert the new join node into the join tree */j->larg = *jtlink1;*jtlink1 = (Node *) j;/* Recursively process pulled-up jointree nodes */j->rarg = pull_up_sublinks_jointree_recurse(root,j->rarg,&child_rels);/** Now recursively process the pulled-up quals. Because* we are underneath a NOT, we can't pull up sublinks that* reference the left-hand stuff, but it's still okay to* pull up sublinks referencing j->rarg.*/j->quals = pull_up_sublinks_qual_recurse(root,j->quals,&j->rarg,child_rels,NULL, NULL);/* Return NULL representing constant TRUE */return NULL;}if (available_rels2 != NULL &&(j = convert_EXISTS_sublink_to_join(root, sublink, true,available_rels2)) != NULL){/* Yes; insert the new join node into the join tree */j->larg = *jtlink2;*jtlink2 = (Node *) j;/* Recursively process pulled-up jointree nodes */j->rarg = pull_up_sublinks_jointree_recurse(root,j->rarg,&child_rels);/** Now recursively process the pulled-up quals. Because* we are underneath a NOT, we can't pull up sublinks that* reference the left-hand stuff, but it's still okay to* pull up sublinks referencing j->rarg.*/j->quals = pull_up_sublinks_qual_recurse(root,j->quals,&j->rarg,child_rels,NULL, NULL);/* Return NULL representing constant TRUE */return NULL;}}}/* Else return it unmodified */return node;}if (is_andclause(node)){/* Recurse into AND clause */List *newclauses = NIL;ListCell *l;foreach(l, ((BoolExpr *) node)->args){Node *oldclause = (Node *) lfirst(l);Node *newclause;newclause = pull_up_sublinks_qual_recurse(root,oldclause,jtlink1,available_rels1,jtlink2,available_rels2);if (newclause)newclauses = lappend(newclauses, newclause);}/* We might have got back fewer clauses than we started with */if (newclauses == NIL)return NULL;else if (list_length(newclauses) == 1)return (Node *) linitial(newclauses);elsereturn (Node *) make_andclause(newclauses);}/* Stop if not an AND */return node;
}
-
分为三块来处理:1. sublink 2. not clause ; 3. and clause
-
节点类型为 SubLink pull up any/some/in/exists
-
只对 ANY_SUBLINK(包括any, some, in) 和 EXISTS_SUBLINK 类型进行处理
-
sublink->subLinkType 类型为 ANY_SUBLINK,则依次对available_rels1 和 available_rels2 调用 convert_ANY_sublink_to_join,若成功则构造 JoinExpr并返回。此时把 sublink 作为一个subquery给 pull up 为 join 的右表, 左表为入参的 jtlink1(fromexpr or JoinExpr)
-
新JoinExpr的构造(以available_rels1为例):
- convert_ANY_sublink_to_join 中会创建新的 JoinExpr, 并设置rarg 为sublink转化的subquery, quals设置为对于的约束。
- 把 jtlink1 作为 larg , 并复制给jtlink1(替换jtlink1)。然后继续对上拉的j->rarg 调用pull_up_sublinks_jointree_recurse,尝试上拉rarg中的子链接。最后对上拉的quals,尝试继续上拉其中的表,与larg 或rarg 关联,构成新的join关系。
-
若 sublink->subLinkType 为 EXISTS_SUBLINK, 则是调用 convert_EXISTS_sublink_to_join,直接上拉sublink 中的jointree( from clause), 其他逻辑同ANY_SUBLINK
-
-
节点为 BoolExpr && NOT_EXPR
is_notclause
pull up not exists- 若为 not sublink 且type 为 EXISTS_SUBLINK,则进行处理。即只支持对not exists 进行pullup
- 处理逻辑类似上面,只是调用convert_EXISTS_sublink_to_join时设置under_not 参数为true(表明使用antijoin),对上拉的quals调用pull_up_sublinks_qual_recurse 尝试上拉时,只支持上拉后与 rargs join。
-
节点为 BoolExpr && AND_EXPR
is_andclause
- 遍历((BoolExpr *) node)->args, 对每个条件调用pull_up_sublinks_qual_recurse 处理。
1.2.1 convert_ANY_sublink_to_join
/** convert_ANY_sublink_to_join: try to convert an ANY SubLink to a join** The caller has found an ANY SubLink at the top level of one of the query's* qual clauses, but has not checked the properties of the SubLink further.* Decide whether it is appropriate to process this SubLink in join style.* If so, form a JoinExpr and return it. Return NULL if the SubLink cannot* be converted to a join.** The only non-obvious input parameter is available_rels: this is the set* of query rels that can safely be referenced in the sublink expression.* (We must restrict this to avoid changing the semantics when a sublink* is present in an outer join's ON qual.) The conversion must fail if* the converted qual would reference any but these parent-query relids.** On success, the returned JoinExpr has larg = NULL and rarg = the jointree* item representing the pulled-up subquery. The caller must set larg to* represent the relation(s) on the lefthand side of the new join, and insert* the JoinExpr into the upper query's jointree at an appropriate place* (typically, where the lefthand relation(s) had been). Note that the* passed-in SubLink must also be removed from its original position in the* query quals, since the quals of the returned JoinExpr replace it.* (Notionally, we replace the SubLink with a constant TRUE, then elide the* redundant constant from the qual.)** On success, the caller is also responsible for recursively applying* pull_up_sublinks processing to the rarg and quals of the returned JoinExpr.* (On failure, there is no need to do anything, since pull_up_sublinks will* be applied when we recursively plan the sub-select.)** Side effects of a successful conversion include adding the SubLink's* subselect to the query's rangetable, so that it can be referenced in* the JoinExpr's rarg.*/
JoinExpr *
convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,Relids available_rels)
{JoinExpr *result;Query *parse = root->parse;Query *subselect = (Query *) sublink->subselect;Relids upper_varnos;int rtindex;ParseNamespaceItem *nsitem;RangeTblEntry *rte;RangeTblRef *rtr;List *subquery_vars;Node *quals;ParseState *pstate;Assert(sublink->subLinkType == ANY_SUBLINK);/** The sub-select must not refer to any Vars of the parent query. (Vars of* higher levels should be okay, though.)*/if (contain_vars_of_level((Node *) subselect, 1))return NULL;/** The test expression must contain some Vars of the parent query, else* it's not gonna be a join. (Note that it won't have Vars referring to* the subquery, rather Params.)*/upper_varnos = pull_varnos(root, sublink->testexpr);if (bms_is_empty(upper_varnos))return NULL;/** However, it can't refer to anything outside available_rels.*/if (!bms_is_subset(upper_varnos, available_rels))return NULL;/** The combining operators and left-hand expressions mustn't be volatile.*/if (contain_volatile_functions(sublink->testexpr))return NULL;/* Create a dummy ParseState for addRangeTableEntryForSubquery */pstate = make_parsestate(NULL);/** Okay, pull up the sub-select into upper range table.** We rely here on the assumption that the outer query has no references* to the inner (necessarily true, other than the Vars that we build* below). Therefore this is a lot easier than what pull_up_subqueries has* to go through.*/nsitem = addRangeTableEntryForSubquery(pstate,subselect,makeAlias("ANY_subquery", NIL),false,false);rte = nsitem->p_rte;parse->rtable = lappend(parse->rtable, rte);rtindex = list_length(parse->rtable);/** Form a RangeTblRef for the pulled-up sub-select.*/rtr = makeNode(RangeTblRef);rtr->rtindex = rtindex;/** Build a list of Vars representing the subselect outputs.*/subquery_vars = generate_subquery_vars(root,subselect->targetList,rtindex);/** Build the new join's qual expression, replacing Params with these Vars.*/quals = convert_testexpr(root, sublink->testexpr, subquery_vars);/** And finally, build the JoinExpr node.*/result = makeNode(JoinExpr);result->jointype = JOIN_SEMI;result->isNatural = false;result->larg = NULL; /* caller must fill this in */result->rarg = (Node *) rtr;result->usingClause = NIL;result->quals = quals;result->alias = NULL;result->rtindex = 0; /* we don't need an RTE for it */return result;
}
- 转为 JOIN_SEMI
- 把整个 subquery 作为rargs 提升,而不是提升subquery 中的表,提升后为表和subquery 进行semijoin。subquery 可以通过pull_up_subquery 继续提升。
- 通过调用
addRangeTableEntryForSubquery
生成RangeTblEntry
- 通过调用
- ANY 类型子链接如果是相关子链接,即子查询中引用 了父查询的列属性,则不能提升。
- ANY 类型子链接 sublink->testexpr(Opexpr arg1(left) arg2)没有引用上一层的列,不能提升,这种情况下上层的表和子链接中的表不能构成连接关系。
if (bms_is_empty(upper_varnos))
- 只能与指定的表 构成连接关系 , 即 sublink->testexpr 只能包含 available_rels 中的关系
if (!bms_is_subset(upper_varnos, available_rels))
- sublink->testexpr 没有列属性也不能提升
- sublink->testexpr 中不能包含volatile functions
- 返回设置了rarg(新创建的RangeTblRef,指向新创建的rte(subquery)),quals(sublink->testexpr转换),jointype的 JoinExpr 对象
1.2.2 convert_EXISTS_sublink_to_join
/** convert_EXISTS_sublink_to_join: try to convert an EXISTS SubLink to a join** The API of this function is identical to convert_ANY_sublink_to_join's,* except that we also support the case where the caller has found NOT EXISTS,* so we need an additional input parameter "under_not".*/
JoinExpr *
convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,bool under_not, Relids available_rels)
{JoinExpr *result;Query *parse = root->parse;Query *subselect = (Query *) sublink->subselect;Node *whereClause;int rtoffset;int varno;Relids clause_varnos;Relids upper_varnos;Assert(sublink->subLinkType == EXISTS_SUBLINK);/** Can't flatten if it contains WITH. (We could arrange to pull up the* WITH into the parent query's cteList, but that risks changing the* semantics, since a WITH ought to be executed once per associated query* call.) Note that convert_ANY_sublink_to_join doesn't have to reject* this case, since it just produces a subquery RTE that doesn't have to* get flattened into the parent query.*/if (subselect->cteList)return NULL;/** Copy the subquery so we can modify it safely (see comments in* make_subplan).*/subselect = copyObject(subselect);/** See if the subquery can be simplified based on the knowledge that it's* being used in EXISTS(). If we aren't able to get rid of its* targetlist, we have to fail, because the pullup operation leaves us* with noplace to evaluate the targetlist.*/if (!simplify_EXISTS_query(root, subselect))return NULL;/** Separate out the WHERE clause. (We could theoretically also remove* top-level plain JOIN/ON clauses, but it's probably not worth the* trouble.)*/whereClause = subselect->jointree->quals;subselect->jointree->quals = NULL;/** The rest of the sub-select must not refer to any Vars of the parent* query. (Vars of higher levels should be okay, though.)*/if (contain_vars_of_level((Node *) subselect, 1))return NULL;/** On the other hand, the WHERE clause must contain some Vars of the* parent query, else it's not gonna be a join.*/if (!contain_vars_of_level(whereClause, 1))return NULL;/** We don't risk optimizing if the WHERE clause is volatile, either.*/if (contain_volatile_functions(whereClause))return NULL;/** The subquery must have a nonempty jointree, but we can make it so.*/replace_empty_jointree(subselect);/** Prepare to pull up the sub-select into top range table.** We rely here on the assumption that the outer query has no references* to the inner (necessarily true). Therefore this is a lot easier than* what pull_up_subqueries has to go through.** In fact, it's even easier than what convert_ANY_sublink_to_join has to* do. The machinations of simplify_EXISTS_query ensured that there is* nothing interesting in the subquery except an rtable and jointree, and* even the jointree FromExpr no longer has quals. So we can just append* the rtable to our own and use the FromExpr in our jointree. But first,* adjust all level-zero varnos in the subquery to account for the rtable* merger.*/rtoffset = list_length(parse->rtable);OffsetVarNodes((Node *) subselect, rtoffset, 0);OffsetVarNodes(whereClause, rtoffset, 0);/** Upper-level vars in subquery will now be one level closer to their* parent than before; in particular, anything that had been level 1* becomes level zero.*/IncrementVarSublevelsUp((Node *) subselect, -1, 1);IncrementVarSublevelsUp(whereClause, -1, 1);/** Now that the WHERE clause is adjusted to match the parent query* environment, we can easily identify all the level-zero rels it uses.* The ones <= rtoffset belong to the upper query; the ones > rtoffset do* not.*/clause_varnos = pull_varnos(root, whereClause);upper_varnos = NULL;while ((varno = bms_first_member(clause_varnos)) >= 0){if (varno <= rtoffset)upper_varnos = bms_add_member(upper_varnos, varno);}bms_free(clause_varnos);Assert(!bms_is_empty(upper_varnos));/** Now that we've got the set of upper-level varnos, we can make the last* check: only available_rels can be referenced.*/if (!bms_is_subset(upper_varnos, available_rels))return NULL;/* Now we can attach the modified subquery rtable to the parent */parse->rtable = list_concat(parse->rtable, subselect->rtable);/** And finally, build the JoinExpr node.*/result = makeNode(JoinExpr);result->jointype = under_not ? JOIN_ANTI : JOIN_SEMI;result->isNatural = false;result->larg = NULL; /* caller must fill this in *//* flatten out the FromExpr node if it's useless */if (list_length(subselect->jointree->fromlist) == 1)result->rarg = (Node *) linitial(subselect->jointree->fromlist);elseresult->rarg = (Node *) subselect->jointree;result->usingClause = NIL;result->quals = whereClause;result->alias = NULL;result->rtindex = 0; /* we don't need an RTE for it *//*LightDB add, 2022/2/9, for S202201136600, semijoin hints*/result->qb_name = ((Query *) sublink->subselect)->qb_name;return result;
}
-
JOIN_ANTI(not exists) 或JOIN_SEMI
-
直接提升subquery 中的表, 提升后是表之间join
if (list_length(subselect->jointree->fromlist) == 1)result->rarg = (Node *) linitial(subselect->jointree->fromlist);elseresult->rarg = (Node *) subselect->jointree;
-
不能包含CTE
-
simplify_EXISTS_query(其他地方也会使用, modify cte 仍需判断)
-
不能含有set operation(UNION/INTERSECT/EXCEPT), agg, group sets(
group by grouping sets((brand,segment),(brand),(segment),())
), 窗口函数, set-returning function(返回多行), modify ctes(有insert/update/delete), having 条件, offset (limit), rowmark(for update/share) -
对于limit 不为常量不支持, 为常量但不大于0也不支持pull up, 其他情况会把limit 优化掉(直接去掉,不影响语义),然后就可以pull up了.
-
其他优化:
query->targetList = NIL;query->groupClause = NIL;query->windowClause = NIL;query->distinctClause = NIL;query->sortClause = NIL;query->hasDistinctOn = false;
-
-
去除了whereclause(subselect->jointree->quals)的子链接中不能引用外层对象,The rest of the sub-select must not refer to any Vars of the parent query. (Vars of higher levels should be okay, though.)
-
whereclause(subselect->jointree->quals) 需要引用外层对象,即需要为相关子链接。非相关子链接不能pull up
-
whereclause(subselect->jointree->quals)中不能包含volatile 函数
-
检测引用的上层的对象都在 available_rels 中, 不在则不支持提升。Now that we’ve got the set of upper-level varnos, we can make the last check: only available_rels can be referenced.
-
返回设置了rarg(subselect->jointree->fromlist| subselect->jointree),quals(subselect->jointree->quals),jointype的 JoinExpr 对象
2. 总结
pg 支持对any 类型(包括in, any, some)的子链接提升,把子链接中的查询提升到from中作为一个子查询与表构成semijoin。此子查询可以被 pull up subquery 流程继续提升。
提升条件:
-
子链接的查询中不能引用父查询中表的列属性 any (select * from t2 where )
chuhx@postgres=# explain select * from t1 a where key1=any (select key1 from t2 where a.key1=t2.key1);QUERY PLAN -------------------------------------------------------------------------------Seq Scan on t1 a (cost=0.00..2671.15 rows=1130 width=8)Filter: (SubPlan 1)SubPlan 1-> Index Only Scan using t2_pkey on t2 (cost=0.15..2.17 rows=1 width=4)Index Cond: (key1 = a.key1) (5 rows)chuhx@postgres=#
-
子链接的条件需要引用父查询中的表的列属性 t1.key1 = any
-
子链接的条件引用的表需要符合要求available_rels,比如左连接,子链接的条件只能使用右表的字段才能提升
chuhx@postgres=# explain select * from t1 a left join t3 c on a.key1=c.key1 and a.key2=any (select key1 from t2);QUERY PLAN -----------------------------------------------------------------------------Merge Left Join (cost=196.91..694.38 rows=12769 width=16)Merge Cond: (a.key1 = c.key1)Join Filter: (hashed SubPlan 1)-> Index Scan using key1_i on t1 a (cost=0.15..45.06 rows=2260 width=8)-> Sort (cost=158.51..164.16 rows=2260 width=8)Sort Key: c.key1-> Seq Scan on t3 c (cost=0.00..32.60 rows=2260 width=8)SubPlan 1-> Seq Scan on t2 (cost=0.00..32.60 rows=2260 width=4) (9 rows)chuhx@postgres=# explain select * from t1 a left join t3 c on a.key1=c.key1 and c.key2=any (select key1 from t2);QUERY PLAN ------------------------------------------------------------------------------Merge Left Join (cost=225.45..659.07 rows=25538 width=16)Merge Cond: (a.key1 = c.key1)-> Index Scan using key1_i on t1 a (cost=0.15..45.06 rows=2260 width=8)-> Sort (cost=225.30..230.95 rows=2260 width=8)Sort Key: c.key1-> Hash Join (cost=60.85..99.39 rows=2260 width=8)Hash Cond: (c.key2 = t2.key1)-> Seq Scan on t3 c (cost=0.00..32.60 rows=2260 width=8)-> Hash (cost=32.60..32.60 rows=2260 width=4)-> Seq Scan on t2 (cost=0.00..32.60 rows=2260 width=4) (10 rows)
-
子链接的条件中不能使用volatile 函数, 如下使用了random
chuhx@postgres=# explain select * from t1 a where key1+random()=any (select key1 from t2);QUERY PLAN --------------------------------------------------------------Seq Scan on t1 a (cost=38.25..99.10 rows=1130 width=8)Filter: (hashed SubPlan 1)SubPlan 1-> Seq Scan on t2 (cost=0.00..32.60 rows=2260 width=4) (4 rows)chuhx@postgres=# explain select * from t1 a where key1=any (select key1 from t2);QUERY PLAN ------------------------------------------------------------------Hash Join (cost=60.85..99.39 rows=2260 width=8)Hash Cond: (a.key1 = t2.key1)-> Seq Scan on t1 a (cost=0.00..32.60 rows=2260 width=8)-> Hash (cost=32.60..32.60 rows=2260 width=4)-> Seq Scan on t2 (cost=0.00..32.60 rows=2260 width=4) (5 rows)chuhx@postgres=#
pg 支持对exists/not exists 类型的子链接提升,把子链接中的表提升到from中与表构成semijoin/antijoin。
-
不能包含CTE
-
不能含有set operation(UNION/INTERSECT/EXCEPT), agg, group sets(
group by grouping sets((brand,segment),(brand),(segment),())
), 窗口函数, set-returning function(返回多行), modify ctes(有insert/update/delete), having 条件, offset (limit), rowmark(for update/share) -
对于limit 不为常量不支持, 为常量但不大于0也不支持pull up, 其他情况会把limit 优化掉(直接去掉,不影响语义)
chuhx@postgres=# explain select * from t1 a where exists (select * from t2 where a.key1=t2.key1 limit 1);QUERY PLAN ------------------------------------------------------------------Hash Join (cost=60.85..99.39 rows=2260 width=8)Hash Cond: (a.key1 = t2.key1)-> Seq Scan on t1 a (cost=0.00..32.60 rows=2260 width=8)-> Hash (cost=32.60..32.60 rows=2260 width=4)-> Seq Scan on t2 (cost=0.00..32.60 rows=2260 width=4) (5 rows)chuhx@postgres=# explain select * from t1 a where exists (select * from t2 where a.key1=t2.key1 limit -1);QUERY PLAN --------------------------------------------------------------------------------Seq Scan on t1 a (cost=0.00..4942.45 rows=1130 width=8)Filter: (SubPlan 1)SubPlan 1-> Limit (cost=0.15..2.17 rows=1 width=8)-> Index Scan using t2_pkey on t2 (cost=0.15..2.17 rows=1 width=8)Index Cond: (key1 = a.key1) (6 rows)
-
去除了whereclause(subselect->jointree->quals)的子链接中不能引用外层对象
chuhx@postgres=# explain select * from t1 a where exists (select * from (select * from t2 where t2.key2=a.key1) x where a.key1=x.key1);QUERY PLAN --------------------------------------------------------------------------Seq Scan on t1 a (cost=0.00..4948.10 rows=1130 width=8)Filter: (SubPlan 1)SubPlan 1-> Index Scan using t2_pkey on t2 (cost=0.15..2.18 rows=1 width=0)Index Cond: (key1 = a.key1)Filter: (key2 = a.key1) (6 rows)
-
whereclause(subselect->jointree->quals) 需要引用外层对象,即需要为相关子链接
chuhx@postgres=# explain select * from t1 a where exists (select * from t2 where key1>10);QUERY PLAN --------------------------------------------------------------Result (cost=0.05..32.65 rows=2260 width=8)One-Time Filter: $0InitPlan 1 (returns $0)-> Seq Scan on t2 (cost=0.00..38.25 rows=753 width=0)Filter: (key1 > 10)-> Seq Scan on t1 a (cost=0.05..32.65 rows=2260 width=8) (6 rows)
-
whereclause(subselect->jointree->quals)中不能包含volatile 函数
-
检测引用的上层的对象都在 available_rels 中, 不在则不支持提升