Data Prep tools — finishing touches before visualization
The Data Prep dropdown sits in the Composer toolbar — Top N, Simple Pivot, Field Builder, Turn On Select Distinct. Three of those are the topic of this lesson; Field Builder is the entry to drill-downs and gets its own lesson next. Think of these as the polish you apply before a report goes to a chart or dashboard.
By the end of this lesson
- A dataset that reads cleanly when you take it into a chart or dashboard
- A working sense of which Data Prep tool fits which scenario (Top N for limits, Pivot for transposition, Select Distinct for deduplication)
- Awareness of the silent traps — Top N without a sort returning database-order rows, Select Distinct masking an App Builder join issue
You'll need
- A working report — easiest with the one from Lesson 4 that already has an aggregate on it
- A sense of what you'll do with the prepped report next: a data grid you'll save, a chart you'll build, a dashboard widget you'll embed
Background
Data Prep is the last stage of report building. If your end product is a data grid you'll save and share, Data Prep is the cleanup pass before you save it. If you're heading into Module 8 to build a chart, Data Prep is what makes the chart readable instead of a wall of 200 unsorted rows.
The dropdown lives in the Composer toolbar and contains four entries:
None of these tools change the underlying data in your database. They reshape what this particular report displays. Different Composers can build different reports off the same dataset, each with their own Data Prep choices.
Not sure which Data Prep tool fits your scenario? Email [email protected] with a short description of what your report currently shows and what you want it to show. Same business day reply.
Do it
-
Top N — limit to the top rows (the most-used Data Prep tool)
The most common Data Prep operation by a wide margin. Click Data Prep → Top N. The dialog gives you a list of preset values (1, 5, 10, 50) plus a custom field for anything else.
The defaults cover most cases. Top 1 for KPI-style reports where you want to display a single hero metric (the customer with the most tickets, the agent with the fastest response time, etc.). Top 5 / Top 10 for executive summaries. Top 25 is a custom value worth remembering: it's the practical cap for chart readability — more than 25 categories on a column chart turns into a wall of bars no one can read.
Top N depends on three things being set up correctly before it does what you expect:
- An aggregate. Top N on a report of raw rows just returns the first N rows in database order, which is rarely useful. Top N on a report with
COUNT(?)orSUM(?)returns the top N groups by that aggregate. - A sort order. "Top 5" only means something if there's a sort that defines what "top" is. Sort descending on the aggregate column for "top 5 by count"; sort ascending for "bottom 5." Without an explicit sort, the database picks rows in storage order — not what you want.
- A grouping that matches your aggregate. If you've added other display columns that fragment the grouping, "top 5 customers" becomes "top 5 customer-channel-status combinations" which is rarely the intent. Use the non-visible-column pattern for fields you need in the data flow but not in the displayed output.
Once Top N is on, the dropdown menu shows the active value so you can see at a glance it's applied. Click Save and Apply. Set it back to All Rows when you want to turn it off.
- An aggregate. Top N on a report of raw rows just returns the first N rows in database order, which is rarely useful. Top N on a report with
-
Simple Pivot — restructure rows into columns
Pivot (sometimes called transpose) is the move you make when your dataset has the wrong shape for what you're trying to display. Example from the video: a count-tickets-by-customer-and-status report comes out with multiple rows per customer (one row for "Closed," another for "Open," another for any other status). That's hard to read in a grid and impossible to chart cleanly because most charts want one row per x-axis value.
Pivot fixes this by promoting one of the dimensions into column headers. Click Data Prep → Pivot. The dialog asks you to map three things:
- Column — the field whose distinct values will become column headers. Best practice: pick the field with the fewest distinct values. If you have 5 statuses and 1,200 customers, status goes in the Column slot, not customer.
- Row — the field that will produce one row per distinct value. The "other" category. Following the example, customer goes here.
- Data field — the numeric measure that fills the cells. The aggregate column from your report. Usually a
COUNT(?)orSUM(?).
The pivot dialog has additional options for sorting and customization that the video skips — they're documented in the help center if you need them.
Click Save and Apply. The grid transposes: one row per customer, columns for Closed / Open / etc., counts in each cell. To clear the pivot and return to the original row-per-combination shape, open Data Prep → Pivot again and click Clear.
Don't confuse Simple Pivot with the Pivot Grid visualization. The Pivot Grid is a Module 8 visualization with its own configuration. Simple Pivot here is data-shaping that runs before any visualization layer. Both transpose data; one is a data prep step (this lesson), the other is a chart type (Module 8). When in doubt: if you want the result to be the report data itself, use Simple Pivot. If you want a rendered grid widget with subtotals, totals, and drill-down behavior, use the Pivot Grid visualization.
-
Select Distinct — deduplicate (with a caveat)
Click Data Prep → Turn On Select Distinct. The report collapses any rows that are exact duplicates across all selected fields.
Useful when:
- You need a unique list of values from a column that naturally contains duplicates ("which distinct statuses exist in our system?", "which channels are in use?").
- Your dataset legitimately has multiple events per entity and you want a uniques-only summary view.
The caveat: if your report has duplicate rows that you didn't expect, Select Distinct will quietly hide them — and you may have a join problem in the App Builder that the duplicates were telling you about. A many-to-many join, a missing filter on a one-to-many, an overlooked Cartesian product. Before reaching for Select Distinct to "fix" duplicate rows, ask whether the duplicates are real. If they're a sign of a bad join, the right fix is in App Builder, not in Composer. Talk to your App Builder before papering over it.
Toggle Select Distinct off in the same menu when you want the duplicates back.
You have a working connection. Below is what to tighten before real users see it. Skip if you're just testing.
Make it real
Three places these tools earn their keep — concrete scenarios pulled from the patterns most teams hit, each pointing at where Module 8 picks up.
Top 25 customers for a chart that's actually readable
The most common Top N use case. You've built a report counting tickets per customer with a descending sort on the count. There are 200+ customers in your data. Add Top N = 25 via Data Prep. Save the report as a data grid for browsing, or take it to a column chart — 25 bars is the practical readability ceiling. More than that, you're back to the wall of bars no one parses. Module 8 — Charts picks up here.
Pivot before charting a two-dimensional comparison
You want a column chart showing Open vs Closed ticket counts side-by-side for each customer. Without pivoting, your report has two rows per customer (one for each status) — the chart can't render that cleanly because each customer would appear twice on the x-axis. With Simple Pivot (status as column, customer as row, count as data field), each customer is a single row with separate Open / Closed columns, which is what a multi-series column chart expects.
This pivot-then-chart pattern is so common that some teams treat it as default: any chart with two categorical dimensions starts with a pivot in Data Prep. The Module 8 — Charts lesson assumes the report shape is already chart-ready.
Top 1 for a KPI tile on a dashboard
Module 11 (dashboards) uses KPI tiles that display a single hero number. The standard recipe: build a small report that returns exactly one row with the metric you want, then drop it on the dashboard as a KPI widget. Top N = 1 (paired with the right sort) gets you to that single row reliably. Examples: "highest-volume customer this quarter," "ticket with the longest open time," "channel with the most-recent activity." Each is a one-row Top 1 report on a different sort key.
When NOT to use Data Prep
These tools are report-shape decisions. They don't fix database-side problems. Three patterns to watch for:
- "My counts are off." Don't reach for Select Distinct. Check the non-visible-column pattern first — a sort on a hidden column silently changes your GROUP BY. Data Prep won't fix that.
- "My report is too slow." Top N limits the displayed rows but the underlying query still computes everything. For real performance work, criteria (Lesson 3) limit the rows the query touches; that's where slow reports get fixed.
- "I have duplicate rows." Read the Select Distinct caveat above. Duplicates often mean a join issue, not a deduplication problem.
If you're stuck
Five Data Prep traps, in roughly the order Composers stumble into them.
Top N is on but I'm getting weird random-looking rows
No sort order, or the sort is on the wrong field. Top N returns "the top N rows according to the report's current sort" — and if there's no sort, that's whatever rows the database happens to return first (storage order, not by your aggregate). Open the action menu on the column you actually want ranked, set Sort to Descending, and Top N will now mean what you expect. This is the most common Data Prep mistake because the report runs fine and the output looks plausible; it's just not the top N by anything meaningful.
Select Distinct made my counts go down by a lot — should I be worried?
Yes, possibly. Duplicate rows can come from a few places, but the most consequential cause is a join problem in the App Builder: a many-to-many relationship without the right filter, a one-to-many that was joined as if it were one-to-one, a Cartesian product from a missing ON clause. If Select Distinct removes a substantial fraction of your rows, ask your App Builder to look at the joins behind this report tree before relying on the deduplicated view. The duplicates may be hiding inflated metrics elsewhere (sums and counts that double-count) that Select Distinct doesn't fix because those columns aren't part of the distinct comparison.
Top N gives a smaller-than-expected number of rows
Two causes. One: your report has fewer total groups than your Top N value. If you asked for Top 5 statuses and there are only 4 distinct statuses, you get 4 rows — Top N can't invent rows. Two: a non-visible column with a sort is fragmenting the grouping, so what looks like 50 customer groups is actually 50 customer-channel-status combinations of which only the top N appear. The non-visible-column pattern matters here too; check that any non-visible fields don't have sorts that pull them into the GROUP BY.
Pivot output is one giant column-heavy grid I can't read
You put the high-cardinality field in the Column slot. The transcript demonstrates this directly: putting customer (hundreds of distinct values) in the Column slot creates a grid with hundreds of columns and a small number of rows — unreadable. The best-practice rule from step 2: put the field with the fewest distinct values in the Column slot. Open Data Prep → Pivot → Clear, then re-apply with the dimensions reversed. The grid should now have many rows (one per customer) and a small number of status columns.
I'm trying to pivot but I see "Pivot Grid" as a chart type — different thing?
Yes — different feature. Simple Pivot (Data Prep) is data-shaping that changes the report's underlying grid. Pivot Grid (Module 8 visualization) is a rendered widget with its own configuration, subtotals, drill-down behavior, and styling. Both transpose data; one is a data prep step, the other is a visualization. For chart preparation, use Simple Pivot here. For a polished pivot-style display widget, use the Pivot Grid in Module 8. You can't use both on the same report at the same time — pick one path.
None of these match my situation
Email [email protected] with the report you're trying to prep, which Data Prep tool you tried, and what's not lining up. A screenshot of the grid before and after Data Prep helps. Same business day reply.
