Product Updates

File and Error: Analyzing Sensitive Content in Files with Harmonic

April 21, 2025

Michael Marriott

In this article

Example H2

Example H3

One question we get a lot at Harmonic is, “Can you detect sensitive content in files?” The short answer is yes.

In this blog, I want to walk through the prevalence of file uploads in GenAI usage and how Harmonic helps security teams keep control without blocking innovation.

‍

Trends in GenAI Use Across the Enterprise

As enterprise use of GenAI matures, a number of pretty powerful use cases are becoming standard. This has progressed significantly within the last 12 months, and we’re now seeing a host of creative tools and use cases that are propelling businesses forward.

The most mature organizations have identified hundreds of AI use cases in their business. Here’s some of the most popular:

Analyzing memos: Private equity analysts are feeding investment memos into GenAI tools to help assess potential deals.
Translating documents: Teams can now translate reports, presentations, and communications with just a few clicks.
Processing insurance claims: AI helps claims analysts quickly parse complex forms and supporting documentation.
Reviewing legal documents: Legal teams are using AI to reduce the time spent reviewing contracts and other dense documentation.
Creating marketing assets: Content teams are generating compelling creative assets for campaigns and social media posts.
Extracting insights from large datasets: One fast-growing use case is analyzing large, data-heavy files (think reports, logs, CSVs) to surface insights or trends that would take a human hours or days to uncover.

All of these use cases have one thing in common: they involve uploading a file. And while that makes AI incredibly useful, it also opens the door to a real challenge: how do we enable all of this without exposing our most sensitive data?

‍

Files Uploaded into ChatGPT: A Deep Dive

To understand the scale of this challenge, we took a look at a quarter’s worth of real-world file upload data across GenAI use. Our analysis spanned 8,226 file uploads directed into various ChatGPT plans over the course of that quarter.

The findings were surprising.

First, image files were the dominant file type uploaded, accounting for 68.3% of all uploads. While some of this likely includes product screenshots or diagrams, we can probably assume a fair share is likely employees indulging in AI-generated Ghibli-style portraits of coworkers or creating packaged toy images.

Beyond images, more work-oriented file types made up the majority of the remaining uploads:

PDFs: 13.4%
Word documents (.docx): 5.46%
Excel files (.xlsx): 4.90%
CSV files: 3.17%
PowerPoint files (.pptx): 1.45%

Multimedia files like .mp4 videos were far less common, making up only 0.18% of uploads. QuickTime files were virtually nonexistent at 0.01%.

Employees are clearly using GenAI to analyze and interact with business documents. This makes sense. These tools are great at summarizing reports, reviewing documentation, generating slide decks, and even turning spreadsheets into visualizations.

But it also means sensitive data is being uploaded in large volumes, most of which security teams have limited control over.

‍

“Aren’t My Files Just Labeled Anyway?”

This is a great question, and it comes up often when teams already use tools like Microsoft Purview.

Purview and similar solutions are good at identifying well-defined, structured data types. If you're worried about social security numbers, credit card data, or API keys, these tools usually catch them. They also offer automated classification capabilities, which is a solid start.

But they’re not built for what’s actually being uploaded.

Think about an investment memo, an insurance claim, a customer success QBR deck, or a legal argument. These are complex, unstructured documents. They don’t follow a set format, and the context that makes them sensitive isn’t easy to define in a few keywords or regex rules.

This is where traditional classification falls short. Security teams end up with blind spots because the tooling wasn’t designed for the shape of today’s GenAI usage.

‍

Harmonic Provides In-Line Identification of Sensitive Files

Harmonic steps in to close that gap.

When an employee uploads a file containing sensitive content, Harmonic detects that in real time. Even if it’s highly unstructured. Whether it’s a legal document, a strategy deck, a sales pipeline report, or a private equity memo, we flag it before it ever leaves your organization’s control.

Depending on how you’ve configured your environment, Harmonic can take several actions:

Alert the security team about the sensitive upload.
Block the upload if it violates policy (such as uploading customer data into a free-tier GenAI tool).
Redirect the user to a safer, sanctioned workflow or internal tool.

You can also apply granular policies. For example, you can allow uploads to enterprise-approved GenAI tools but block sensitive uploads to ChatGPT’s free tier. You could restrict the use of personal accounts, certain use cases, or certain high-risk applications.

Most importantly, Harmonic does this with speed and accuracy, minimizing disruption for employees. Your users don’t want to wait five minutes for a policy check to clear. You also don’t want a backlog of alerts to wade through.

Harmonic flags issues in milliseconds, letting you maintain security without killing productivity.

‍

Adding Controls for Users.

As more employees interact with GenAI tools, file uploads will continue to grow. From research memos to marketing slides, AI tools are already reshaping how work gets done. But if those files contain sensitive, proprietary or regulated data, the risk multiplies.

With Harmonic, you get deep visibility into file uploads, real-time detection of sensitive content, and the controls you need to keep data safe.

All without standing in the way of innovation.

If you're interested in seeing how Harmonic handles file-based GenAI usage in your organization, reach out for a demo.

‍