Bigquery: Select a column with any value not in the group by clause

advertisements

It is a classical question and I know there are many work around like here: Select a Column in SQL not in Group By but they do not work for my issue on Bigquery.

I have a table with tweets from Twitter and I want a ranking for the urls including any tweet text.

ID         tweet               url
1          my github tweet     http://www.github.com/xyz
2          RT github tweet     http://www.github.com/xyz
3          another tweet       http://www.twitter.com
4          more tweeting       http://www.github.com/abc

I tried the following query, but then id 1 and 2 are counted separately.

SELECT tweet, count(url) as popularity, url FROM table group by tweet, url order by popularity desc

How can I count/rank the urls correctly and still preserve any associated tweet text in the result? I do not care if it is from ID 1 or 2.


Here is one approach:

SELECT url, COUNT(*) AS popularity, GROUP_CONCAT(tweet)
FROM Table GROUP BY url ORDER BY popularity

GROUP_CONCAT aggregation function will concatenate all the tweets associated with same URL using comma as separator (you can pick another separator as second parameter to GROUP_CONCAT).