 |
 |
Archives of the TeradataForum
Message Posted: Mon, 17 Feb 2003 @ 19:43:05 GMT
| Subj: | | INSERT/SELECT out of one table into two |
| |
| From: | | Frank C. Martinez IV |
Ok everyone, I've got another dumb question, When you do a INS/SEL operation from two+ tables with the same structure, same PI as
the resulting table, with the resulting table starting out empty, you get a very fast load. This works if you do a multistatement
request (or if you use a UNION, I've heard. Ok, so I'm old-fashioned!). Now, why doesn't that happen when you extract from one
table into two+ tables, same structure, same PI, or at least why doesn't the explain show the extraction happenning at the same time,
so that synch scanning can be accomplished. For example:
create table stg_y_1 as stg_yopenorder_hist with no data;
create table stg_y_2 as stg_yopenorder_hist with no data;
EXPLAIN
INSERT stg_y_1
SELECT *
FROM stg_yopenorder_hist
WHERE CAST(SUBSTR(EXTRACT_TS, 1, 10) AS DATE) < '2003-01-10'
;INSERT stg_y_2
SELECT *
FROM stg_yopenorder_hist
WHERE CAST(SUBSTR(EXTRACT_TS, 1, 10) AS DATE) >= '2003-01-10';
with the explain:
Explanation -------------------------------------------------- | | 1) First, we lock a distinct GDYR_HIST."pseudo table" for write on a RowHash to prevent global deadlock for GDYR_HIST.stg_y_2.
2) Next, we lock a distinct GDYR_HIST."pseudo table" for write on a RowHash to prevent global deadlock for GDYR_HIST.stg_y_1.
3) We lock a distinct GDYR_HIST."pseudo table" for read on a RowHash to prevent global deadlock for
GDYR_HIST.stg_yopenorder_hist.
4) We lock GDYR_HIST.stg_y_2 for write, we lock GDYR_HIST.stg_y_1 for write, and we lock GDYR_HIST.stg_yopenorder_hist for read.
5) We do an all-AMPs RETRIEVE step from GDYR_HIST.stg_yopenorder_hist by way of an all-rows scan with a condition of
("(SUBSTR(GDYR_HIST.stg_yopenorder_hist.EXTRACT_TS ,1 ,10 )(DATE, FORMAT 'yyyy-mm-dd')LATIN)< DATE '2003-01-10'") into Spool 1, which
is built locally on the AMPs. Then we do a SORT to order Spool 1 by row hash. The input table will not be cached in memory, but it
is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 1 is estimated with no
confidence to be 9,031,152 rows. The estimated time for this step is 2 hours and 29 minutes.
6) We execute the following steps in parallel.
- 1) We do a MERGE into GDYR_HIST.stg_y_1 from Spool 1 (Last Use).
- 2) We do an all-AMPs RETRIEVE step from GDYR_HIST.stg_yopenorder_hist by way of an all-rows scan with a condition of
("(SUBSTR(GDYR_HIST.stg_yopenorder_hist.EXTRACT_TS ,1 ,10)(DATE, FORMAT 'yyyy-mm-dd')LATIN)>= DATE '2003-01-10'") into Spool 2, which
is built locally on the AMPs. Then we do a SORT to order Spool 2 by row hash. The input table will not be cached in memory, but it
is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 2 is estimated with no
confidence to be 9,031,152 rows. The estimated time for this step is 2 hours and 29 minutes.
7) We do a MERGE into GDYR_HIST.stg_y_2 from Spool 2 (Last Use).
8) We spoil the parser's dictionary cache for the table.
9) We spoil the parser's dictionary cache for the table.
10) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
| | -> | No rows are returned to the user as the result of statement 1. No rows are returned to the user as the result of statement
2.
| |
Steps #5 and #6.2 are doing an all-amps retrieve, building the spool files locally. Both steps suggest that synch scan is
an option, but the explain puts the retrieve at different times. Is the only way to achieve a synch scan to start them up as
different processes at the same time? Ok, somebody klonk me on the head and straighten me out on this one.
iv
| |