Monday, March 12, 2012

midian calculation

Hi guys,
I have a dilemma, I used the following query to find the Q-by-q median. It
uses a temporary table. I think it's because it has no index and foreign
key, it takes over 5 minutes to run. When we look at subquestions, there are
over 10,000 records and cross join takes forever. I don't know if I can
create indexes on temp table and wonder if the process will take long
anyway.
--Put the course marks in temp table
--DROP TABLE #QBYQ
CREATE TABLE #QBYQ (QUESTION_ID INT, QUESTION_POINT_MARK DECIMAL)
INSERT INTO #QBYQ
SELECT QM.QUESTION_ID, QM.QUESTION_POINT_MARK
from QUESTION_MARK QM, Question q, Enrollment e
where
e.ENROLLMENT_ID = qm.ENROLLMENT_ID
and q.QUESTION_ID = qm.QUESTION_ID
and e.COURSE_YEAR = 2005
and e.COURSE_SESSION = 1
and e.COURSE_CODE = 'FA4'
--and q.PARENT_QUESTION_ID is null
AND QM.QUESTION_POINT_MARK IS NOT NULL
and q.QUESTION_TYPE_CODE = 'EN'
ORDER BY QM.QUESTION_ID, QM.QUESTION_POINT_MARK
-- It is currently doing statistical median, the smaller of the two if even
rows
-- Get this working first before calculating the financial median (avg of
middle 2 rows)
SELECT Q.LABEL AS QUESTION,
QM1.QUESTION_POINT_MARK AS MEDIAN
from QUESTION Q,
#QBYQ QM1 CROSS JOIN #QBYQ QM2
WHERE QM1.QUESTION_ID = QM2.QUESTION_ID
AND Q.QUESTION_ID = QM1.QUESTION_ID
GROUP BY Q.LABEL, QM1.QUESTION_POINT_MARK
HAVING sum(CASE WHEN QM2.QUESTION_POINT_MARK <= QM1.QUESTION_POINT_MARK
THEN 1 ELSE NULL END)>=(COUNT(*)+1)/2
AND sum(CASE WHEN QM2.QUESTION_POINT_MARK >= QM1.QUESTION_POINT_MARK
THEN 1 ELSE NULL END)>=(COUNT(*)+1)/2
ORDER BY Q.LABEL
I then try to put the select in-line but didn't get the correct results. Can
you tell what's wrong? Looks like HAVING is not doing what it is supposed to
do.
SELECT Q.LABEL AS QUESTION,
QM1.QUESTION_POINT_MARK AS MEDIAN
from QUESTION_MARK QM1, QUESTION_MARK QM2, ENROLLMENT E, QUESTION Q
WHERE
QM1.QUESTION_ID = QM2.QUESTION_ID
AND QM1.QUESTION_ID = Q.QUESTION_ID
AND QM1.ENROLLMENT_ID = E.ENROLLMENT_ID
AND QM2.QUESTION_ID = Q.QUESTION_ID
AND QM2.ENROLLMENT_ID = E.ENROLLMENT_ID
and e.COURSE_YEAR = 2005
and e.COURSE_SESSION = 1
and e.COURSE_CODE = 'FA4'
and q.PARENT_QUESTION_ID is null
and q.QUESTION_TYPE_CODE = 'EN'
GROUP BY
Q.LABEL,
QM1.QUESTION_POINT_MARK
HAVING sum(CASE WHEN QM2.QUESTION_POINT_MARK <= QM1.QUESTION_POINT_MARK
THEN 1 ELSE NULL END)>=(COUNT(*)+1)/2
AND sum(CASE WHEN QM2.QUESTION_POINT_MARK >= QM1.QUESTION_POINT_MARK
THEN 1 ELSE NULL END)>=(COUNT(*)+1)/2
ORDER BY Q.LABEL, QM1.QUESTION_POINT_MARK
ThanksHi Ray5531,
Medians have been discussed many times in this newsgroup. You may want
to google past threads.
Maybe the following two threads help you out:
http://tinyurl.com/7ba4w
http://tinyurl.com/7fsjr
Hope this helps,
Gert-Jan
Ray5531 wrote:
> Hi guys,
> I have a dilemma, I used the following query to find the Q-by-q median. It
> uses a temporary table. I think it's because it has no index and foreign
> key, it takes over 5 minutes to run. When we look at subquestions, there a
re
> over 10,000 records and cross join takes forever. I don't know if I can
> create indexes on temp table and wonder if the process will take long
> anyway.
> --Put the course marks in temp table
> --DROP TABLE #QBYQ
> CREATE TABLE #QBYQ (QUESTION_ID INT, QUESTION_POINT_MARK DECIMAL)
> INSERT INTO #QBYQ
> SELECT QM.QUESTION_ID, QM.QUESTION_POINT_MARK
> from QUESTION_MARK QM, Question q, Enrollment e
> where
> e.ENROLLMENT_ID = qm.ENROLLMENT_ID
> and q.QUESTION_ID = qm.QUESTION_ID
> and e.COURSE_YEAR = 2005
> and e.COURSE_SESSION = 1
> and e.COURSE_CODE = 'FA4'
> --and q.PARENT_QUESTION_ID is null
> AND QM.QUESTION_POINT_MARK IS NOT NULL
> and q.QUESTION_TYPE_CODE = 'EN'
> ORDER BY QM.QUESTION_ID, QM.QUESTION_POINT_MARK
> -- It is currently doing statistical median, the smaller of the two if eve
n
> rows
> -- Get this working first before calculating the financial median (avg of
> middle 2 rows)
> SELECT Q.LABEL AS QUESTION,
> QM1.QUESTION_POINT_MARK AS MEDIAN
> from QUESTION Q,
> #QBYQ QM1 CROSS JOIN #QBYQ QM2
> WHERE QM1.QUESTION_ID = QM2.QUESTION_ID
> AND Q.QUESTION_ID = QM1.QUESTION_ID
> GROUP BY Q.LABEL, QM1.QUESTION_POINT_MARK
> HAVING sum(CASE WHEN QM2.QUESTION_POINT_MARK <= QM1.QUESTION_POINT_MARK
> THEN 1 ELSE NULL END)>=(COUNT(*)+1)/2
> AND sum(CASE WHEN QM2.QUESTION_POINT_MARK >= QM1.QUESTION_POINT_MARK
> THEN 1 ELSE NULL END)>=(COUNT(*)+1)/2
> ORDER BY Q.LABEL
> I then try to put the select in-line but didn't get the correct results. C
an
> you tell what's wrong? Looks like HAVING is not doing what it is supposed
to
> do.
> SELECT Q.LABEL AS QUESTION,
> QM1.QUESTION_POINT_MARK AS MEDIAN
> from QUESTION_MARK QM1, QUESTION_MARK QM2, ENROLLMENT E, QUESTION Q
> WHERE
> QM1.QUESTION_ID = QM2.QUESTION_ID
> AND QM1.QUESTION_ID = Q.QUESTION_ID
> AND QM1.ENROLLMENT_ID = E.ENROLLMENT_ID
> AND QM2.QUESTION_ID = Q.QUESTION_ID
> AND QM2.ENROLLMENT_ID = E.ENROLLMENT_ID
> and e.COURSE_YEAR = 2005
> and e.COURSE_SESSION = 1
> and e.COURSE_CODE = 'FA4'
> and q.PARENT_QUESTION_ID is null
> and q.QUESTION_TYPE_CODE = 'EN'
> GROUP BY
> Q.LABEL,
> QM1.QUESTION_POINT_MARK
> HAVING sum(CASE WHEN QM2.QUESTION_POINT_MARK <= QM1.QUESTION_POINT_MARK
> THEN 1 ELSE NULL END)>=(COUNT(*)+1)/2
> AND sum(CASE WHEN QM2.QUESTION_POINT_MARK >= QM1.QUESTION_POINT_MARK
> THEN 1 ELSE NULL END)>=(COUNT(*)+1)/2
> ORDER BY Q.LABEL, QM1.QUESTION_POINT_MARK
> Thanks|||On Mon, 25 Apr 2005 10:26:30 -0700, Ray5531 wrote:
(snip)
>I don't know if I can
>create indexes on temp table
Hi Ray,
You can. The best way is to use CREATE TABLE to create the table (with
PRIMARY KEY and UNIQUE constraints; accompanying indexes will be generated
automatically), then use CREATE INDEX to add extra (non-unique) indexes,
then use INSERT INTO ... SELECT to load the temp table with data. (This
order of creation minimizes the number of recompiles). OTOH, there might
be cases where definin g an index after loading the table with data might
be quicker - test all variation to find the quickest way in your scenario.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)

No comments:

Post a Comment