<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.2.1" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments for The Rhodium Toad</title>
	<link>http://blog.rhodiumtoad.org.uk</link>
	<description>an experimental blog</description>
	<pubDate>Sat, 31 Jul 2010 04:34:01 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.1</generator>

	<item>
		<title>Comment on The Rule Challenge by Andrew</title>
		<link>http://blog.rhodiumtoad.org.uk/2010/06/21/the-rule-challenge/#comment-8750</link>
		<author>Andrew</author>
		<pubDate>Tue, 22 Jun 2010 17:36:48 +0000</pubDate>
		<guid>http://blog.rhodiumtoad.org.uk/2010/06/21/the-rule-challenge/#comment-8750</guid>
		<description>We've had more than the usual number of rules questions on IRC recently, at least that's my impression; that would probably explain the multiple recent blog posts. (And I only wrote this one because someone on IRC bugged me about it.)</description>
		<content:encoded><![CDATA[<p>We&#8217;ve had more than the usual number of rules questions on IRC recently, at least that&#8217;s my impression; that would probably explain the multiple recent blog posts. (And I only wrote this one because someone on IRC bugged me about it.)</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on The Rule Challenge by Robert Young</title>
		<link>http://blog.rhodiumtoad.org.uk/2010/06/21/the-rule-challenge/#comment-8735</link>
		<author>Robert Young</author>
		<pubDate>Tue, 22 Jun 2010 12:30:56 +0000</pubDate>
		<guid>http://blog.rhodiumtoad.org.uk/2010/06/21/the-rule-challenge/#comment-8735</guid>
		<description>I'm not principally a PostgreSQL folk, DB2 is my main squeeze, but I do follow it a bit.  The whole rules thing is something only PG did, and that was Stonebraker's doing.  One would have to read up his rationale way back when.  From what I can find, View support is the justification for using rules, since that is the only way to get it.  There have been a few links from the PostgreSQL site to rule posts such as this recently.  I wonder why rules have become a topic du jour?

The challenge as written isn't provable; the counterexample is exhaustive enumeration of all non-trivial rules.

As to better, one of those linked posts asserts that triggers are better.  I would agree with that.</description>
		<content:encoded><![CDATA[<p>I&#8217;m not principally a PostgreSQL folk, DB2 is my main squeeze, but I do follow it a bit.  The whole rules thing is something only PG did, and that was Stonebraker&#8217;s doing.  One would have to read up his rationale way back when.  From what I can find, View support is the justification for using rules, since that is the only way to get it.  There have been a few links from the PostgreSQL site to rule posts such as this recently.  I wonder why rules have become a topic du jour?</p>
<p>The challenge as written isn&#8217;t provable; the counterexample is exhaustive enumeration of all non-trivial rules.</p>
<p>As to better, one of those linked posts asserts that triggers are better.  I would agree with that.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on The Rule Challenge by Andrew</title>
		<link>http://blog.rhodiumtoad.org.uk/2010/06/21/the-rule-challenge/#comment-8710</link>
		<author>Andrew</author>
		<pubDate>Tue, 22 Jun 2010 04:41:57 +0000</pubDate>
		<guid>http://blog.rhodiumtoad.org.uk/2010/06/21/the-rule-challenge/#comment-8710</guid>
		<description>That is indeed the real question, but it's one for another blog post rather than this one :-)</description>
		<content:encoded><![CDATA[<p>That is indeed the real question, but it&#8217;s one for another blog post rather than this one :-)</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on The Rule Challenge by Tom Lane</title>
		<link>http://blog.rhodiumtoad.org.uk/2010/06/21/the-rule-challenge/#comment-8706</link>
		<author>Tom Lane</author>
		<pubDate>Tue, 22 Jun 2010 03:43:46 +0000</pubDate>
		<guid>http://blog.rhodiumtoad.org.uk/2010/06/21/the-rule-challenge/#comment-8706</guid>
		<description>Rules work all right for views (ie, ON SELECT DO INSTEAD SELECT cases).  There's no question that every other case sucks.  The *real* question is how to do better?</description>
		<content:encoded><![CDATA[<p>Rules work all right for views (ie, ON SELECT DO INSTEAD SELECT cases).  There&#8217;s no question that every other case sucks.  The *real* question is how to do better?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Selecting random rows from a table by Andrew</title>
		<link>http://blog.rhodiumtoad.org.uk/2009/03/08/selecting-random-rows-from-a-table/#comment-4221</link>
		<author>Andrew</author>
		<pubDate>Sat, 29 Aug 2009 20:13:08 +0000</pubDate>
		<guid>http://blog.rhodiumtoad.org.uk/2009/03/08/selecting-random-rows-from-a-table/#comment-4221</guid>
		<description>That's not unbiased; some rows are more likely than others to be returned.</description>
		<content:encoded><![CDATA[<p>That&#8217;s not unbiased; some rows are more likely than others to be returned.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Selecting random rows from a table by Andrew</title>
		<link>http://blog.rhodiumtoad.org.uk/2009/03/08/selecting-random-rows-from-a-table/#comment-4172</link>
		<author>Andrew</author>
		<pubDate>Thu, 27 Aug 2009 16:51:31 +0000</pubDate>
		<guid>http://blog.rhodiumtoad.org.uk/2009/03/08/selecting-random-rows-from-a-table/#comment-4172</guid>
		<description>If you don't really care about insert performance, and you only need one (or a few) random rows at once you can do this:

ALTER TABLE x ADD COLUMN r DOUBLE PRECISION;
ALTER TABLE x ALTER COLUMN r SET DEFAULT random();
UPDATE x SET r = random() WHERE r IS NULL; -- this will be slow
ALTER TABLE x ALTER COLUMN r SET NOT NULL;
CREATE INDEX i ON x(r); -- also slow
ANALYZE x(r);

Then take a sample row quickly by running this:
SELECT * FROM x WHERE r &#62;= (SELECT random()) ORDER BY r LIMIT 1;

I'm not sure if asking for more than one row in the LIMIT clause would be statistically sound or not.  The "random" order is fixed, so whenever you land in an overlapping spot the sequence will be the same.

If you just need a few rows, you can UNION a few of those together, and that should be as random as you could care for.</description>
		<content:encoded><![CDATA[<p>If you don&#8217;t really care about insert performance, and you only need one (or a few) random rows at once you can do this:</p>
<p>ALTER TABLE x ADD COLUMN r DOUBLE PRECISION;<br />
ALTER TABLE x ALTER COLUMN r SET DEFAULT random();<br />
UPDATE x SET r = random() WHERE r IS NULL; &#8212; this will be slow<br />
ALTER TABLE x ALTER COLUMN r SET NOT NULL;<br />
CREATE INDEX i ON x(r); &#8212; also slow<br />
ANALYZE x(r);</p>
<p>Then take a sample row quickly by running this:<br />
SELECT * FROM x WHERE r &gt;= (SELECT random()) ORDER BY r LIMIT 1;</p>
<p>I&#8217;m not sure if asking for more than one row in the LIMIT clause would be statistically sound or not.  The &#8220;random&#8221; order is fixed, so whenever you land in an overlapping spot the sequence will be the same.</p>
<p>If you just need a few rows, you can UNION a few of those together, and that should be as random as you could care for.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Selecting random rows from a table by Joanmi</title>
		<link>http://blog.rhodiumtoad.org.uk/2009/03/08/selecting-random-rows-from-a-table/#comment-1079</link>
		<author>Joanmi</author>
		<pubDate>Fri, 17 Apr 2009 23:18:11 +0000</pubDate>
		<guid>http://blog.rhodiumtoad.org.uk/2009/03/08/selecting-random-rows-from-a-table/#comment-1079</guid>
		<description>Tell it to my boss.

Sorry for the noise.

Regards.</description>
		<content:encoded><![CDATA[<p>Tell it to my boss.</p>
<p>Sorry for the noise.</p>
<p>Regards.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Selecting random rows from a table by Andrew</title>
		<link>http://blog.rhodiumtoad.org.uk/2009/03/08/selecting-random-rows-from-a-table/#comment-1076</link>
		<author>Andrew</author>
		<pubDate>Fri, 17 Apr 2009 15:28:50 +0000</pubDate>
		<guid>http://blog.rhodiumtoad.org.uk/2009/03/08/selecting-random-rows-from-a-table/#comment-1076</guid>
		<description>I don't have much sympathy for people still using 7.4 (which is approaching EOL).

The optimization for min() and max() to use indexes was added in 8.2.</description>
		<content:encoded><![CDATA[<p>I don&#8217;t have much sympathy for people still using 7.4 (which is approaching EOL).</p>
<p>The optimization for min() and max() to use indexes was added in 8.2.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Selecting random rows from a table by Joanmi</title>
		<link>http://blog.rhodiumtoad.org.uk/2009/03/08/selecting-random-rows-from-a-table/#comment-1074</link>
		<author>Joanmi</author>
		<pubDate>Fri, 17 Apr 2009 09:52:19 +0000</pubDate>
		<guid>http://blog.rhodiumtoad.org.uk/2009/03/08/selecting-random-rows-from-a-table/#comment-1074</guid>
		<description>Dear Andrew,

We are just working on migration to Postgresql 8.3 but, I can say that, at least in Postgres 7.4, max() and min() functions causes a sequential scan which in tables with a few milions of rows, can be quite expensive.

I know that using the sequence last_value is not a good idea, but is the (tested) best way that I found.

Also, in Postgres 7.4, does'nt exist generate_series() function (which I thought were user-defined function which author didn't reproduced but now, I found it in Postgres 8.3 documentation and, yes, is really better way to get some rows (but with Postgres 7 we have'nt this option).

You are right in that we need the outer level order by random() to guarantee the random order. I apologize for that.


So I think in Postgres 7, the best solution will be someting like this:

select * from (
  select * from item
  where item_id in (
    select floor(random() * (
      select last_value
      from item_item_id_seq
   ))::bigint
   from item
   limit 100
   ) limit 10
) as foo
order by random();

Off course, if we have Postgres 8, we can use generate_series() to improve it.

For limits, I suggest to test max() and min() functions in Postgres 8 (I will do it as short as I can). I also think that an index must speed up searching maximum and minimum values but, at least when we try to examine big subsets of wide tables, postgres planner uses sequencial scan because, in this cases, is more efficient than index scan.

I repeat: I think that searching maximum or minimum must can be more efficient using index but, at least Postgres 7, does not it this way.

For this reason, I suggest to try "explain select max (item_id)" first. So I understand your arguments, but many times I discovered that things are not as they seem to be.


PD: Just now I remembered a way to implement best 'max()' and 'min()' which really takes advantadge of index which I implemented in past and also remembered the reason (I think) for which max() doesnt take advantadge of the index (we could need to select max() from a subset of table or join result, but, off course, is possible --and desirable-- that Postgres 8 could implement a way to always take advantadge of the indexs).

Look at this:

calm=# explain select max(lid) from location;
                                QUERY PLAN
---------------------------------------------------------------------------
 Aggregate  (cost=214987.09..214987.09 rows=1 width=8)
   -&#62;  Seq Scan on "location"  (cost=0.00..206792.67 rows=3277767 width=8)
(2 filas)

calm=# explain select lid from location order by lid desc limit 1;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..3.79 rows=1 width=8)
   -&#62;  Index Scan Backward using location_pkey on "location"  (cost=0.00..12438139.10 rows=3277767 width=8)
(2 filas)


I really don't need to obtain random rows of a table. I simply found this post yesterday searching another thing and I think it interesting.

But, if you need, you can use this trick to erradicate the use of the sequence max_value in my query (PG7) or the max() and min() in the post author's in Postgres8 if max() and min() values continues not taking advantadge of indexes.

Regards.</description>
		<content:encoded><![CDATA[<p>Dear Andrew,</p>
<p>We are just working on migration to Postgresql 8.3 but, I can say that, at least in Postgres 7.4, max() and min() functions causes a sequential scan which in tables with a few milions of rows, can be quite expensive.</p>
<p>I know that using the sequence last_value is not a good idea, but is the (tested) best way that I found.</p>
<p>Also, in Postgres 7.4, does&#8217;nt exist generate_series() function (which I thought were user-defined function which author didn&#8217;t reproduced but now, I found it in Postgres 8.3 documentation and, yes, is really better way to get some rows (but with Postgres 7 we have&#8217;nt this option).</p>
<p>You are right in that we need the outer level order by random() to guarantee the random order. I apologize for that.</p>
<p>So I think in Postgres 7, the best solution will be someting like this:</p>
<p>select * from (<br />
  select * from item<br />
  where item_id in (<br />
    select floor(random() * (<br />
      select last_value<br />
      from item_item_id_seq<br />
   ))::bigint<br />
   from item<br />
   limit 100<br />
   ) limit 10<br />
) as foo<br />
order by random();</p>
<p>Off course, if we have Postgres 8, we can use generate_series() to improve it.</p>
<p>For limits, I suggest to test max() and min() functions in Postgres 8 (I will do it as short as I can). I also think that an index must speed up searching maximum and minimum values but, at least when we try to examine big subsets of wide tables, postgres planner uses sequencial scan because, in this cases, is more efficient than index scan.</p>
<p>I repeat: I think that searching maximum or minimum must can be more efficient using index but, at least Postgres 7, does not it this way.</p>
<p>For this reason, I suggest to try &#8220;explain select max (item_id)&#8221; first. So I understand your arguments, but many times I discovered that things are not as they seem to be.</p>
<p>PD: Just now I remembered a way to implement best &#8216;max()&#8217; and &#8216;min()&#8217; which really takes advantadge of index which I implemented in past and also remembered the reason (I think) for which max() doesnt take advantadge of the index (we could need to select max() from a subset of table or join result, but, off course, is possible &#8211;and desirable&#8211; that Postgres 8 could implement a way to always take advantadge of the indexs).</p>
<p>Look at this:</p>
<p>calm=# explain select max(lid) from location;<br />
                                QUERY PLAN<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
 Aggregate  (cost=214987.09..214987.09 rows=1 width=8)<br />
   -&gt;  Seq Scan on &#8220;location&#8221;  (cost=0.00..206792.67 rows=3277767 width=8)<br />
(2 filas)</p>
<p>calm=# explain select lid from location order by lid desc limit 1;<br />
                                                 QUERY PLAN<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
 Limit  (cost=0.00..3.79 rows=1 width=8)<br />
   -&gt;  Index Scan Backward using location_pkey on &#8220;location&#8221;  (cost=0.00..12438139.10 rows=3277767 width=8)<br />
(2 filas)</p>
<p>I really don&#8217;t need to obtain random rows of a table. I simply found this post yesterday searching another thing and I think it interesting.</p>
<p>But, if you need, you can use this trick to erradicate the use of the sequence max_value in my query (PG7) or the max() and min() in the post author&#8217;s in Postgres8 if max() and min() values continues not taking advantadge of indexes.</p>
<p>Regards.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Selecting random rows from a table by Andrew</title>
		<link>http://blog.rhodiumtoad.org.uk/2009/03/08/selecting-random-rows-from-a-table/#comment-1069</link>
		<author>Andrew</author>
		<pubDate>Fri, 17 Apr 2009 01:52:37 +0000</pubDate>
		<guid>http://blog.rhodiumtoad.org.uk/2009/03/08/selecting-random-rows-from-a-table/#comment-1069</guid>
		<description>Iterating a single-row fetch in the client is potentially quite a lot slower than the approach given in the original post.

Using the sequence last_value is essentially always a bad idea; better to use min() and max() on the actual id column. Due to the non-transactional nature of sequences, it's possible (for example in the case where a large bulk insert is running in another session) for the sequence last_value to be a long way ahead of the visible maximum ID.

Using a real table rather than a generate_series() call to produce multiple rows in the IN subquery is just going to slow things down.

And finally, omitting the outer level ORDER BY random() clause means that the results will not be in a random order (even though it may look random).</description>
		<content:encoded><![CDATA[<p>Iterating a single-row fetch in the client is potentially quite a lot slower than the approach given in the original post.</p>
<p>Using the sequence last_value is essentially always a bad idea; better to use min() and max() on the actual id column. Due to the non-transactional nature of sequences, it&#8217;s possible (for example in the case where a large bulk insert is running in another session) for the sequence last_value to be a long way ahead of the visible maximum ID.</p>
<p>Using a real table rather than a generate_series() call to produce multiple rows in the IN subquery is just going to slow things down.</p>
<p>And finally, omitting the outer level ORDER BY random() clause means that the results will not be in a random order (even though it may look random).</p>
]]></content:encoded>
	</item>
</channel>
</rss>
