Basic Queries
Basic Queries⚓︎
search by time⚓︎
The underlying data has a 15 minute granularity. Aggregating the data into higher time intervals is achieved using Time Functions. Here we are using toYear(datetime) and the count() function to count the number of events by year.
This returns about 344 million events with 61 million in 2018 and about 51 million in 2019
| Result of query | |
|---|---|
We could have partitioned by toMonth(), toWeek(), toHour(), toQuarter() and hence derived all the extra unnecessary time fields in the GDELT data schema.
This query used absolute times, but relative time and date math is also valid. We could as easily query the last 10 years(subtractYears(date, num)) from now()
Time Functions can be used in SQL SELECT, WHERE and GROUP clauses.
Count distinct values⚓︎
Figure out how many distinct primary actors there are.
During this time, there were 16,955 unique identities in actor1_name.
Find the most frequently occuring values⚓︎
Find the 10 most occuring values of actor1_name
This query returns this list:
['UNITED STATES','\0','POLICE','UNITED KINGDOM','PRESIDENT','GOVERNMENT','CHINA','SCHOOL','RUSSIA','CANADA']
One of the most occurring values is '\0' where no primary actor exists.
Find the last article classified SCHOOL & avg_tone > 1⚓︎
We can easily find the last value in a filtered data set. argMax(field, datetime) is much simpler than grouping and ordering by time desc. It also works with field tuples.
The latest article classified as SCHOOL and with a positive avg_tone was an Edison State Community College conference
| Result of query | |
|---|---|
Find the average value⚓︎
Calculate the average tone of events for a particular primary actor by month.
avg_tone has values ranging from -100 (very negative) to +100 (very positive). Even though average sentiment has gone up in events where SCHOOL is the primary actor, these articles very slightly negative in tone. The value is greater than -1, but less than zero.
Count event types⚓︎
Protests (event_root_code='14') are an interesting event type to monitor over time.
Take for example the 2016 US election time frame. There is a marked jump in the number of protest around election time.
| Result of query | |
|---|---|
Find the top source_url domains by year⚓︎
Each event has a source_url. We can use url functions to easily extract root domains domain(url).
We have extracted the root domain from the full url at query time, allowing us to trend news outlets over time.