Skip to content
HotelSEO Lab
← The Lab
Advanced Measurement & Experimentation

Building a Tiny Data Warehouse to Blend Your Hotel's Booking and Marketing Data

A no-engineer guide to piping your PMS, booking engine, GA4, and ad data into BigQuery so you can finally answer revenue questions instead of squinting at five dashboards.

HotelSEO LabSeptember 4, 2025 10 min read

I want to talk about the least glamorous thing I do for hotels, which is also one of the most quietly profitable: building a tiny data warehouse so a property can stop guessing about its own numbers.

Here is the situation I walk into constantly. The GM has a PMS dashboard. The revenue manager lives in the booking engine reporting. Marketing has GA4 open in one tab and the Google Ads dashboard in another. Someone exports the OTA extranet stats into a spreadsheet once a month. Five sources of truth, five different definitions of a “booking,” and nobody can answer a simple question like “what did it actually cost us to acquire a direct booking last month versus a booking through an OTA?”

That question matters because OTA commissions run roughly 15 to 25 percent of the booking value. When you cannot see acquisition cost cleanly across channels, you cannot make smart bets about where to push. So you keep feeding the OTAs because they are easy, and you never quite claw back the margin you could. I wrote a whole separate piece on the book-direct math behind OTA commission if you want the napkin numbers, but this post is about the plumbing that lets you actually run those numbers without a part-time job in spreadsheets.

Why five dashboards is worse than it sounds

Each of your tools is great at one thing and blind to everything else.

Your PMS knows the stay happened and what the guest paid, but it does not know that the guest first found you in an AI overview, clicked a Google Ad three weeks earlier, then came back direct. GA4 knows about that click journey but has no idea the booking was later cancelled and re-booked at a different rate. The OTA extranet knows its own commission but pretends your direct channel does not exist.

The result is that every “report” is really one department’s narrow slice, and when two slices disagree, the meeting becomes an argument about whose number is right instead of a decision about what to do next.

A warehouse fixes this by being the boring shared place where all of those slices land, get lined up by the same dates and the same booking IDs, and become one table you can ask questions of.

You do not need more dashboards. You need one place where your existing dashboards stop lying to each other.

What I actually mean by “tiny data warehouse”

When I say warehouse, please do not picture a data engineering team and a six-figure platform. For a single independent or boutique property, I mean:

That is it. No streaming, no real-time, no Kubernetes. A boutique hotel generates maybe a few hundred to a few thousand bookings a month. That is a rounding error to BigQuery.

A 40-room independent hotel running near full occupancy produces well under 20,000 bookings a year. That entire dataset is smaller than a single mid-size marketing email list. The problem was never data volume. It was that nobody put the pieces in the same room.

The four feeds, and how I get each one in

Let me go feed by feed, because the honest truth is that the difficulty is wildly uneven across sources.

1. GA4 to BigQuery (the easy one)

GA4 has a native, free BigQuery export built in. In the GA4 admin, under Product Links, there is a BigQuery Links option. You point it at your BigQuery project, choose daily export, and from then on Google drops a raw events table into a dataset every day. No connector to buy, no code.

This is the single highest-leverage thirty minutes in the whole project, so do it first even if you do nothing else for a month. Raw GA4 export is far richer than the GA4 interface. You get every event with its parameters, which means you can later reconstruct the real path a guest took from first AI mention or organic click through to the booking confirmation page. If you care about whether ChatGPT and AI search are sending you anyone, this is where the evidence eventually lives, which ties straight into the work on AI visibility and AEO/GEO.

2. Google / Meta ad cost (managed connectors)

For ad spend, do not try to build an API integration yourself. Use a managed connector. There are several in the cheap tier that pull Google Ads and Meta cost data into BigQuery on a schedule for a small monthly fee, and the free tiers on some of them cover a single small advertiser.

What you want landing in your warehouse is daily cost, impressions, clicks, and campaign names, keyed by date. You are not trying to replicate the ad platform’s reporting. You just need spend in the same database as revenue so you can divide one by the other.

3. Booking engine transactions (varies a lot)

Your booking engine is where direct revenue is born, so this feed is worth fighting for. The better engines have an API or a scheduled export. Some integrate with GA4 ecommerce already, which means part of this data may arrive through your GA4 export for free. If your engine supports a nightly transaction export to cloud storage, take it. If it only offers a manual CSV download, start with a weekly manual drop and automate later. Do not let perfect block you here.

4. The PMS (the messy one)

I will be straight with you: the PMS is almost always the hardest feed, and it varies enormously by vendor. Some modern cloud PMS platforms have a clean API. Plenty of older ones give you a nightly emailed report or a scheduled export to an FTP folder and nothing else.

My rule: whatever the PMS can give you on a schedule without a human clicking, take that, even if it is ugly. A nightly CSV of arrivals, departures, rate, channel, and a booking ID dropped into Google Cloud Storage is enough to load into BigQuery automatically. If the only option is a weekly manual export, fine, do that for now. One imperfect feed inside the warehouse beats a perfect report trapped in a tool nobody else can see.

The part everyone skips: agreeing on definitions

Here is where most DIY attempts quietly fail, and it has nothing to do with technology.

Before you join anything, you have to decide what your words mean. What counts as a “booking”? Is it the reservation when it is made, or the stay when it happens? Do you report on booking date or stay date? Cancellations: do they reverse the original record, or sit as a separate negative line? When an OTA reservation gets modified, is that one booking or two?

If you do not nail this down, your blended table will produce numbers that look authoritative and are subtly wrong, which is worse than five honest dashboards that disagree.

My default for a hotel is to build the core table on stay date, because that is the night revenue is actually earned and the night you had operational cost. I keep booking date as a separate column so I can still analyze lead time and pace, but the spine of the warehouse is the stay.

I also insist on one shared channel taxonomy. Pick the exact channel labels once and force every feed to map into them:

Channel groupWhat goes in it
Direct - BrandGuest searched your name, came direct, booked on your engine
Direct - Non-brandFound you via non-brand organic, local, or AI, booked direct
Paid SearchBooked after a Google or Bing ad click
OTA - Booking.comReservations through that OTA
OTA - Expedia groupReservations through that OTA family
MetasearchCame via a metasearch comparison click
OtherPhone, walk-in, group, corporate

That table looks trivial. It is the whole game. Once every booking is tagged with one of those labels and sits next to its true cost, the revenue questions answer themselves. The “Direct - Brand” versus OTA split, in particular, is where you see how much margin you are leaving on the table, which is exactly the fight I describe in why your hotel ranks below the OTAs for your own name.

What you can finally answer

Once the feeds land and the definitions hold, here are the questions that go from “nobody knows” to “thirty-second query.”

True cost of acquisition by channel. Add OTA commission plus any ad spend attributable to a channel, divide by the bookings it produced. Now “direct is cheaper” stops being a slogan and becomes a number you can defend in a budget meeting.

Real direct-booking ROI. Your direct channel is not free. It costs you ad spend, content, agency fees, and your booking engine’s cut. But once you can put all of that against direct revenue in one table, you can see the actual return, and it is almost always dramatically better than the OTA path even after costs. That is the number that justifies investing in book-direct conversion work and in SEO for the hotel itself.

Channel mix trend over time. Is your OTA share creeping up or down month over month? You want a healthier mix, more direct and less commission leakage, and you can only manage that if you can see it moving.

Pace and lead time by channel. Direct guests and OTA guests often book on very different timelines. Knowing that changes when and where you spend.

Illustrative, not a promise: imagine an OTA booking at a 100 dollar nightly rate carrying a 20 percent commission. That is 20 dollars of margin gone before the guest checks in. If a quarter of your room nights flow through OTAs at that rate, the warehouse makes the annual size of that leak impossible to ignore. The point is not the exact figure, it is that you finally see it in one place.

A realistic build order and timeline

I am not going to pretend this is a weekend if you have never touched cloud tooling. But it is genuinely weeks, not months, for one property.

  1. Week one: Create the BigQuery project, turn on the GA4 export, set a budget alert. Watch raw event data start flowing.
  2. Week two: Add the ad cost connector. Now spend and behavior live together.
  3. Week three: Get whatever the booking engine and PMS can give you on a schedule into cloud storage and loaded, even if one of them is a weekly manual drop to start.
  4. Week four: Agree the channel taxonomy and stay-date definition, then build the one blended table on top. This is the deliverable. Everything before it was prep.

After that, the maintenance is light. You are mostly babysitting the messy PMS feed and occasionally fixing a renamed campaign that broke a channel mapping.

A quick honesty note, because it is in my bones from a billing scare on another project: when you stand up any cloud account, set a budget or usage alert immediately, before you load a single row. For a hotel-sized warehouse the bill should be tiny, but the discipline of capping it on day one is what keeps “tiny” from ever becoming “surprise.” It is a two-minute setting and the cheapest insurance you will buy all year.

Is this worth it for your property?

If you are a 12-room inn doing everything by hand and OTAs are barely on your radar, honestly, skip the warehouse for now and go fix your Google Business Profile first. The plumbing pays off when you have enough volume and enough channels that the dashboards genuinely contradict each other and real money rides on the answer.

But if you are a 30-to-100-room independent or boutique property, spending on ads, leaking double-digit commission to OTAs, and making channel decisions on vibes, then a tiny warehouse is one of the highest-ROI things you can build. It does not magically win you bookings. What it does is end the guessing, so every other thing you invest in, content and reputation, CRO, AI visibility, gets measured honestly and the winners get more budget.

I cannot promise the warehouse moves a single ranking by itself, because it does not. It is the instrument panel, not the engine. But I have never seen a property build one and then keep over-feeding the OTAs out of habit, because once you can see the margin walking out the door, you stop holding the door open.

If you want help scoping this without turning it into a six-month data project, I do free intro calls. Come tell me how many of those five dashboards you are currently squinting at, and book a call here: /book. Or, if you would rather start with the marketing side that the warehouse measures, take a look at how OTAs quietly steal your search traffic first.

FAQ

Quick answers

Do I need to hire a data engineer to build a hotel data warehouse?

No. For a single independent property you can stand up a BigQuery warehouse with managed connectors and a couple of scheduled exports. The hard part is naming things consistently and agreeing on definitions, not writing code.

How much does a small BigQuery setup cost per month?

For one boutique hotel the data volumes are tiny, so most months land inside or just above the free tier. Plan for a low monthly bill rather than a big platform cost, and set a budget alert so nothing surprises you.

What is the single most useful thing a blended warehouse lets me answer?

True cost of acquisition by channel. Once OTA commission, ad spend, and direct revenue all sit in one table keyed by stay date, you can see which channels actually grow margin versus which just shuffle bookings around.

Can I do this if my PMS has no real export?

Usually yes, but it is the messiest part. Many PMS platforms have a nightly email report or a basic API. Worst case you schedule a weekly CSV drop into cloud storage and load that. Imperfect data in one place beats perfect data in five places.

Keep reading

More from the Lab

Free intro call

Let's go find out why the OTAs are outranking you for your own name.

20 free minutes. We'll look at your hotel live, show you where you're invisible — on Google and in the AI answers — and tell you straight whether we can help.

No lock-in · No 12-month handcuffs · You talk to the strategist