Government salary transparency for Texas — how we built it

HomeMethodology

Methodology

How we built this dataset

A plain-English walkthrough of where the OpenPayrolls Texas data comes from, how it's cleaned, what we publish, and what we deliberately leave out.

1. The source of record

Every salary record on OpenPayrolls Texas is derived from a publicly available payroll snapshot of the State of Texas workforce. The specific file we mirror is the non-duplicated employees CSV released by the Texas Tribune as part of its Texas Tribune Government Salaries Explorer. The Tribune in turn obtains the data from the Texas Comptroller of Public Accounts under the Texas Public Information Act. The release for the period reflected in this site is dated 2026; the file URL is preserved in our seed script for reproducibility.

We chose this source because it is already de-duplicated at the row level (people who hold multiple positions appear once with their primary role) and because it carries the same provenance the rest of the Texas reporting community relies on. If you reproduce our numbers, this is the file to start with.

2. What we filter out

We exclude two kinds of records before the data is published on this site:

3. How we choose which employees to publish

The full underlying release contains roughly 155,000 records, which is more than necessary for a browsable site and would slow page loads to a crawl on cheap hosting. We publish a curated working set of approximately 6,000 employees that combines two slices of the source data:

This is not the same as publishing the full dataset, and we say so plainly. If you need every record for an investigation, the original Tribune CSV is one click away from our seed script. We make no claim to be a comprehensive raw archive; we are a browsable lens on top of one.

4. What "annual pay" means

The annual-pay number on every employee page is the annualized base salary at the time of the snapshot. For salaried employees that is the monthly base rate × 12. For hourly employees it is the hourly rate × scheduled hours × 52. It does not include:

For the great majority of Texas state employees, base pay is the dominant component of total compensation. For senior university administrators, head coaches, and certain regulated medical positions, base pay can be a small fraction of total earnings — and you should treat the number on this site as a floor, not a ceiling, for those roles. When in doubt, consult the agency's annual financial report.

5. Aggregations

Every agency page and job-title page shows three derived statistics: average pay, highest pay, and lowest pay. These are computed only across the records actually published on this site (the curated working set described above), not across the full Tribune file. The numbers are stable per release: they will not drift between page loads.

Sector pages additionally show a sector-wide average across every agency in that sector. Sectors are assigned heuristically from the agency name (universities and A&M campuses to "Higher Education", anything containing "Police" or "Public Safety" or "Criminal Justice" to "Public Safety", and so on). The sector grouping is editorial; if a particular agency feels misclassified to you, please flag it via our contact page.

6. URL design

Slugs are derived from the published agency or title text by lowercasing, replacing non-alphanumeric runs with hyphens, and trimming. Where the source produces collisions (two distinct employees with identical names), we append a numeric suffix in seeding order to keep URLs unique. Slugs are stable per release; they may drift between releases if the State of Texas renames an agency.

7. Refresh cadence

The State of Texas releases a new payroll snapshot roughly twice a year. We refresh OpenPayrolls within two weeks of each new release. Dates are visible in the dataset metadata in the footer of every page.

8. Reproducibility

The full code that builds this site is open. The PHP seed.php script in our repository reads the public Tribune CSV, applies the filters and aggregations described above, and writes a small set of JSON files to disk that the page templates render from. Anyone with a copy of the seed script and the source CSV can regenerate every page on this site byte-for-byte (modulo the random sample in step 3, which is seeded for reproducibility in our build).

9. Corrections

Because we mirror a publisher of record, corrections must originate upstream. If a name, title, or pay figure is wrong on this site, it is almost certainly wrong in the underlying state release as well. The right path is to contact the agency's HR office and the Comptroller's open-data team, who will republish a corrected snapshot in the next cycle. We will pick up the correction automatically. If you believe a record should be suppressed entirely for a safety reason, please contact us and we will work with the upstream publisher.


Methodology last updated: May 2026.