Data Analysis Methodology

For some quick background, you should know that in my private industry career I have often been in charge of modifying and implementing changes in compensation plans, in some cases for national organizations.

In all cases, that process starts with establishing baselines – so you have an accurate picture of where your employee group currently IS in their compensation, so your potential “what if” scenarios based on potential changes will accurately reflect the impact those changes are likely to have on both groups and individuals.

I’ve done this for national groups that have comp plans far more complicated than OUSD’s plans (often involving commission rates that change on a sliding structure, bonus plans based on hitting various targets, performance incentives that depend on ancillary reporting, requirements that vary state-by-state for work hours, overtime calculation, minimum wage rates, etc, etc…)

OUSD’s plans can be complex, but they’re no more so that some private industry plans I’ve been in charge of administering (or changing…) and in some respects simpler.

In examining the data, here’s an outline of the methodology I’ve used.

State Bureau of Education Average Pay (J90) data discussion

Everyone tends to use from the State BOE’s “J90” teacher pay data to make their arguments, but that data is nearly useless for that purpose.

Why?  It’s an average, and as such is misleading and statistically useless.


Imagine we had 100 employees all making $100,000/year in 2012.  The “average pay” of this group would be $100,000/year.

Now imagine that 50 of those retire between 2012 and 2016 and are replaced by 50 new teachers making substantially less.  Let’s assume that by 2016 the average wage of THAT group has risen from the starting rate of $43,912 to $50,000/year (just a random number I’m picking to make the math easy to see…)

That means in 2016 we now have a group of teachers that are composed of 50% making $100K and 50% making $50K.  The “average” of that group would therefore be $75K/year.

If you simply look at that average, you would say “Oceanside has CUT what it pays teachers!!!” and be up in arms.

However, the actual data would show the opposite – NO cuts had happened, and as a matter of fact, 50 of the teachers in OUSD had received raises during that time.

Since the J90 data does not contain any information on the mix of new vs. senior teachers, simply looking at the “average” in this data tells you almost nothing.

This data is even useless when comparing the average in one school district to another.

What if both OUSD and Vista Unified start out in 2012 with 100 teachers making $100K, OUSD has the turnover described above happen (dropping it’s average to $75K) and Vista does not – all it’s teachers stay.

Using simply the “average” would say that OUSD is now paying substantially less than VUSD, when in fact it is paying “the same teachers with the same seniority levels the same pay” as it was before.

Averages are useless for this purpose, sorry.

Transparent California

For a discussion of the accuracy of the Transparent CA data (including recognition of known flaws), click here.

Meanwhile, this data is “all we’ve got” that allows in-depth analysis of pay and benefits in OUSD by employee group, so I’m using it because, as outlined in the discussion of data accuracy, I feel it’s “close enough” to give overall numbers that are within a small percentage of accuracy…

For all Transparent California data, my objective was to see what the real increase in pay and benefits were for employees who have been with the district for enough time to have a significant history.

Examination Period

Since Transparent CA data begins with the 2011-12 school year, I chose that as my baseline measurement point.

This is also a good point to start with since 2012 is the year we voted – in Proposition 30 – to raise taxes to increase school funding, which began the flow of additional monies (called “Educational Protection Account” or EPA funds) into the school districts.

By looking at 2012 as a starting point for both Transparent California and the actual budget documents we can see what has happened with both since those funds began coming in to the district.

2016 was chosen as the “end date” because it is the last period for which Transparent CA data is available, and is also the last year we have completed, audited budget documentation available.

I have indicated in some of these notes when 2017 data is also included, but that is mostly just for reference, to indicate “where we’re going now” with spending.

Employee Selection

Unfortunately the Transparent CA data has none of the data identifiers I would normally use to separate out employee groups, determine longevity, identify specific pay plans or levels within a plan, or any other “standard identifiers.”   Normally working with a data set like this is done with data that gives, say, a department code, pay plan code, employee start dates, raise dates and amounts, etc, etc.   None of that is here.

With that, there are some filters one can apply to get a data set that is usable, and here’s how I did that.

I wanted to track only a cohort of employees who existed in the data in 2012 and also in the data in 2016.  To do this I had to use the reported name.  This was originally a bit problematical because the data appears to change in mid-stream from reporting names using middle initials (and sometimes full middle names) to reporting purely based on first and last names.

To accommodate this, I simply ran a bit of small code on the Excel data that “normalized” all the data sets to simply use first and last name and then match “idential names” in 2012 to the same name in 2016.

Recognized potential problems:

1.  This approach cannot distinguish between two employees with the same name.  If there is more than one “John Smith” in the data that would be a problem. None of these were found in examining the data, however I can’t say for sure they don’t exist.

2.  This approach cannot tell me if someone was in the data in 2012, left the district, and returned later in 2016.

“Full-Time” identification Selection

The district has a lot of employees that do not appear to be full time.  Part time classified workers, substitute teachers, etc.

There are undoubtedly also people who joined or left the district before a full year was completed in either 2012 or 2016.

If I left in people who were not working full time (or did not work full years in both periods) that would skew the data.  For example, if someone were only part time (or only worked a partial year) in 2012 but was full time in 2016, if I left them in it would appear that their pay totals have risen dramatically higher than they really did.

Ditto the other way, if someone switched to part time in 2016 – or left without completing a full year – their pay would show an artificial drop.

To eliminate this, I first removed all records from both 2012 and 2016 that did not meet minimum wage criteria for each year ($16,640 in 2012, $21,840 in 2016.)  I make an assumption that OUSD did not violate minimum wage requirements, therefore anyone making less than those numbers did not work full time.

Next up, for teachers they have a minimum pay level in their contract. I do not know what the minimum was in 2012, however in 2016 I know it’s at least $38,319 (and $43,912 for some at least.)

To avoid “over-stating” the increases teachers have gotten I used $38,319 as the cutoff in 2016, since I don’t know when the $43,912 rate was implemented.

To avoid including part-time teachers in the 2012 data, I picked an arbitrary cutoff of $35,000.  If anyone knows the real starting wage for teachers that year I’d be happy to revise my threshold, but as I understand it the 2015-16 contract represented at least a 3% raise from the former contract (which I do not have) so it’s likely the starting wage was something higher than $35K.

Employee Group Selection

Although my real point to make involves the overall growth in pay and benefits for the district as a whole, we have a situation now where different employee groups are pointing to the other groups as part of the cases they’re making for increases.

Accordingly, I had to separate the data by groups to see how each was impacted.

Unfortunately, again the data does not contain the kinds of data labels one would ideally have – “department code” or perhaps “employee type”, for example.

I had to separate them  out based on a few key indicators in the data.

For teachers, this was somewhat easy – most of the teachers in the data have “teacher” at the beginning of their job title.  There were also job titles that did not start with “Teacher”, used abbreviations, or indicated positions that are part of the OTA contract but do not have “teacher” in them.  Those were also included in the “Teachers” group.

Such titles I included were:

Counselor – High
Counselor – RWP
Counselor/Coordinator – AARC
Intervention Teacher
Library Media Technician
Preschool Teacher
Reg Sub Tchr Prep Pay
Resource Specialist
Resource Specialist – AARC
Resource Specialist – Special
Resource Specialist Spec Ed
Resource Teacher
Rop Teacher
Rotc Instructor
Rsp Teacher
School Base Resource Teacher
School Based Resource Teacher
School Psychologist
Sped Sub Teacher Prep Pay
Speech And Language Specialist
Speech And Language Specialist
Speech Therapist
Sub Teacher
Summer School Teacher
Supervising Teacher – Aarc
Supervising Teacher – AARC
Supervising Teacher-Aarc

As a check, this gave me a list of 846 “teachers” in 2016.  OUSD reports 865 teacher positions in 2016, so I’m obviously missing 19 names, but this seems close enough to be statistically accurate overall.

For Classified and Administrative, the separation was harder because there are literally hundreds of job titles.

After examining the names, and knowing what I know from personal knowledge of administrative and classified staff members, I determined that if I used a cutoff of a base pay of $80,000/year it pretty effectively separated people I know to be administrative from people I know to be classified.

I have NO way to know if this is really accurate, however this resulted in a list of 78 names meeting this criteria where the district reports 69 positions (again, close…)

Classified was much more vague.  Using the cutoff of $80K (and above minimum wage) gives me 631 names, but the district reports 941 classified positions.  I’m guessing the difference might be because often classified workers are likely to be the ones working less than full time, but I have no way to verify.


Using this methodology resulted in cohorts of employees in each group that existed in 2012 data and 2016 data that went:

Administrative – 42

Certificated – 355

Classified – 161

So… again, I have no idea how accurate this is from an absolute standpoint.  The Admin and Certificated numbers appear “reasonable”, the Classified number is hard to evaluate and determine, so I have relatively high confidence that the results for Admin and Certificated are fairly good, not so much for Classified.