Writing Your Own Queries
In addition to using the dashboards supplied by Cascade, you can write your own queries for deeper knowledge about the performance of your CDN.
Keep in mind that querying Cascade data demands extra care due to the very large amounts of data involved. Writing queries for multi-billion row data sets requires a change of mindset.
At this scale, it’s easy to get into a situation where queries are slow and time out, possibly affecting other users in your organization. Here are some tips to help produce the best results when you’re writing your own queries.
General Advice
- Always limit a query’s output by using time constraints. Queries unbounded by time will surprise you with the amount of data returned and processing required.
- Keep your search narrow by limiting the number of columns in your query to just what you need. Don’t use SELECT *. Hydrolix is a column-oriented database: it’s different from a row-based database. Data in adjacent columns is not stored in an adjacent manner, possibly requiring an undue amount of work to retrieve data you don’t need.
- Consider creating a Summary Table. At this scale, even a 30% reduction in compute, memory, and storage can be significant. Also, we see many summaries that save 95% while still presenting the same data.
Specific Tips
- Be careful with LIMIT. LIMIT is a valuable tool, especially when you’re debugging a new query for syntax and other problems. However, don’t rely on it to protect you from mistakes on a large dataset, even though the Hydrolix database has some safeguards in place for this. Don’t scan 100 billion rows to return a LIMIT of 50 rows.
- Limit your results before using ORDER and GROUP. Ordering 100 billion rows is going to take a long time and use lots of memory. In the same vein, GROUPing by billions of high-cardinality fields is an invitation to failure.
- Use predicates. Predicates are your greatest path to simple, fast queries, so use them whenever you can. Hydrolix will push predicate knowledge to the lowest levels of the system, saving you time and money.
- Limit JOINs, which are expensive at this scale. If you must use JOIN, make sure you first omit all the columns you don’t need. There are often better ways to accomplish what you need without using a complicated JOIN statement.
- Watch your CTEs. Common Table Expressions are still queries under the hood, so be aware of them and make sure they obey these guidelines, too.
Grafana Dashboards
- Don’t pipe data from SELECT * into a Grafana widget. Select just the columns you need.
- Keep the count of your result sets as low as possible while still showing what you need to see. Grafana will only show a limited number of data points per widget – in the low thousands.
- Make sure your dashboard filters are efficiently written. Dashboard filters run every time the page is loaded, so they can slow down the entire dashboard.
- Be aware of Grafana’s lazy loading and use it to your advantage. If there’s a widget that’s expensive to generate and is rarely used, put it below the fold so those queries are only executed when necessary.
- Use Summary Tables to provide a separate dashboard for management and stakeholders. Save the unsummarized, low-latency dashboard for you and your NOC. This helps prevent query slowness based on sudden sharing of dashboards among many people.
Updated 11 days ago