Currently just notes, under construction.


I must be unbiased in this study

Material of interest


Create a grounded, source-agnostic incident index

Rather than starting with lawsuits or admin comms, I’ll build my incident list from the most neutral, time-ordered source available:

Best options:

Build a master list of incidents first, based only on what happened and when, without regard to group or admin reaction. Only after that do you code for Target_Group, Response, etc.

Define an incident inclusion rule (must stick to it)

Any campus-affiliated event between Sept 2023 and June 2024 where:

⚠️ Considerations:

Rule applies constantly. Some will involve Jewish students, some Arab students — the rule determines inclusion, not identity or severity.

Tag, no filters by source

I might still find incidents via lawsuits, surveys, etc., but I won’t use those to determine which ones get included.

Instead:

This avoids confirmation bias.

Supplement for representation, don’t balance artificially

I don’t have to have an equal number of Jewish and Palestinian incidents — real-world bias might mean there are more of one kind of incident and fewer of another. What’s important is:

“Do incidents involving Palestinian students receive less administrative response than those involving Jewish students even when the severity and visibility are the same?”

“Among incidents with equal media coverage and equal severity, does the administration still respond differently depending on which group is affected?”

If I find a consistent disparity after controlling for those other factors → that’s evidence of bias.

“Account for” = don’t ignore the fact that Jewish incidents might look different on paper — control for that so your findings reflect bias, not circumstance.

But,“If both the media and the administration are biased — how can I isolate administrative bias without just proving they respond to media pressure?”

If admin responses correlate with media coverage, it could mean:

You’re testing whether the pattern of behavior is systematically unequal even after accounting for neutral factors like visibility or severity.

🧱 Use Media Coverage as a Covariate, Not a Shield

You’re not saying media coverage justifies admin action. You’re testing whether media coverage explains the action — and whether it explains all of it.

🔐 Lean Into Transparency

You’ll strengthen your work by saying:

“Media coverage does influence administrative behavior — but we tested whether that explains all the disparity, and it doesn’t.”


Methodology

What is this research

In research terms, you’re working in the realm of:

Given your project is:

Using categories like Yes/Strong for bias detection patterns is a well-established approach in policy analysis, institutional review, and legal-impact studies.

As long as:

Structured Comparison, Not Inference

Your work involves:

You’re not trying to say:

“This proves, with p < 0.05, that UCLA is biased.”

You’re saying:

“Across all comparable incidents documented via a neutral source, there is a consistent pattern of unequal treatment that is not explained by severity, visibility, or legality.”

That’s qualitative comparative research, not statistical inference — and that’s totally valid in policy, legal, and bias studies.

So I can’t extrapolate, I can’t say the university is biased. I can say the administration is biased — within the scope of the dataset and under clearly defined parameters.

✅ Here’s How You Make That Claim Rigorously

“Based on a comprehensive review of all incidents reported by The Daily Bruin between [years], and applying a consistent inclusion rule and coding scheme, UCLA’s administration demonstrated a pattern of differential response depending on the identity group affected — even when controlling for severity, visibility, and policy violations.”

This is a strong claim:

📐 What You Can Say

✅ “There is clear evidence of disparate treatment within documented cases.”

✅ “Patterns of administrative response show group-based asymmetry.”

✅ “Even among incidents of similar severity and visibility, the university responded differently depending on the group affected.”

❌ What You Shouldn’t Say (without statistical inference)

❌ “UCLA is universally biased across all incidents involving these communities.”

❌ “This proves the administration is institutionally antisemitic/Islamophobic.”

❌ “X% of the time, they behave in a biased way.”

🧠 So Yes — You’re Claiming Conditional Bias

UCLA exhibited bias under the following conditions:

And that’s how most serious bias studies work


Defend the methodology: DB is my sole incident source

“The Daily Bruin is the most comprehensive and continuously maintained public record of UCLA campus life from the student perspective.”

✅ 1. It’s UCLA’s Student Paper of Record

“Articles were used to identify and time-stamp relevant incidents. No editorial interpretation from the Bruin was included in bias analysis.”

✅ 2. You’re Not Using It for Interpretation — Just Event Discovery

✅ 3. You Apply a Consistent Keyword Search and Inclusion Rule

This makes your methodology replicable and objective:

✅ 4. You Cross-Check Admin Response Using Admin’s Own Words

🎯 Why I’m Right to Stick with the Daily Bruin

✅ 1. Social media is not objective or complete

✅ 2. DB gives you structure and timestamped reporting

✅ 3. Methodology matters more than exhaustiveness

“This isn’t about cutting corners — it’s about minimizing noise so I can precisely measure the administration’s behavior against an externally grounded record of student life.”

Trying to include every mention from social media would make your study:

By contrast, sticking to DB ensures:


Defend the methodology: My choices for the dependent variables sources

✅ 1. Define a fixed set of DV source types up front

For structure and transparency

Examples

✅ Stick to these consistently — no ad hoc additions later unless logged as a scope expansion.

✅ 2. Apply all relevant DV source types to every incident

Don’t pick and choose based on which sources are available or interesting.

Instead:

If yes → log and code it

If no → mark as “none observed” or “no public response”

This way you’re not selecting responses — you’re checking whether they exist, from a consistent list.

✅ 3. Make your coding definitions as replicable as your inclusion rule

Field: Admin_Response

Definition:

🎯 Dependent Variables (DVs) Examples

These measure administrative behavior — the outcomes you’re testing for bias.

Examples (Structured):


Defend the methodology: My choices for 🧩 Independent / Control Variable Sources

✅ 1. Define a fixed set of orgs up front

But don’t aim for perfect symmetry — aim for methodological neutrality.

Choose orgs based on:

🧠 So yes — your approach might look like:

Included student orgs (based on incident involvement and visibility):

Other relevant docs

You can say: “These orgs were selected based on their repeated appearance in Daily Bruin coverage of relevant incidents between [dates].”

✅ This makes your selection criteria visibility-based, not identity-based.

✅ 2. What if the visibility is lopsided?

That’s okay — and in fact, it’s data. If certain orgs are more active, more covered, or more responded to, that’s part of the story.

The key is:

🧱 Final structure:

Define a fixed list of org accounts you’ll monitor

That’s how you avoid both cherry-picking and artificial balancing — you’re just tracking who actually showed up.

So you’re not treating student org posts as DVs — you’re using them to code:

These are for capturing nuance. You want student_tone to be separate from incident_severity and media_coverage_level because it doesn’t necessarily correlate with those things.

They become independent or control variables to isolate bias in admin behavior.

For the Kaplan example:

You’re not “including Hillel” as a party to the incident unless they were directly involved.

You’re just logging that they responded — and that may factor into things like:

Media_Coverage_Level:

🧩 Independent / Control Variables Examples

These help you explain or isolate what might influence the DVs.

They can be either:

Examples (Structured):

Examples (Qualitative):

🧠 Why this matters:

You’re capturing the ecosystem around the incident — who was involved, who amplified, and who shaped admin perception. But your unit of analysis stays the same: the incident, not the org.


List organization

What is a Master Incident List?

Your main dataset — one row per incident.

Includes:

📋 Sample Master Incident List

Incident_ID Date Target_Group Severity_Score Admin_Response Tone_of_Response Media_Coverage_Level Source_IDs
INC-001 2024-04-30 Palestinian High Yes Punitive High DB-045, ADM-014, HIL-003
INC-002 2024-05-03 Jewish Moderate Yes Conciliatory Moderate DB-047, ADM-017
INC-003 2024-05-05 Palestinian Low No Low DB-048
INC-004 2024-05-10 Jewish High Yes Neutral High DB-050, ADM-020, HIL-004

What is the Source Appendix?

It’s a master list of all individual sources, regardless of incident. Each row = one source, with a unique Source_ID.

So yes — a source (e.g., ADM-014) can be related to multiple incidents if relevant to each.

📎 Sample Source Appendix Structure

Source_ID Type Title Date Use (Incidents)
DB-045 DB Article “Police Remove Protesters…” 2024-05-01 INC-004, INC-006
ADM-014 Admin Email Chancellor’s Campus Update 2024-05-02 INC-004
HIL-003 Hillel IG “We are alarmed by recent…” 2024-05-02 INC-004

📚 What is a Source Library / Repository?

This is your folder of saved source materials.

It’s not a table — it’s where you store the actual documents or links (PDFs, screenshots, archived web pages).

Each file or link should be named by its Source_ID, so DB-045.pdf or ADM-014.txt matches the entries in your Source Appendix.


Possible outcome of study

That possibility is exactly what makes your research credible.

If you go through this process honestly and rigorously, and come out with:

…then you can confidently say:

“This study found no evidence of systematic administrative bias in UCLA’s public response to documented incidents, based on a consistent, neutral inclusion rule.”

That is still a valid and valuable outcome.

🧭 But Here’s the Reality:

Given what you’ve already seen — lawsuits, student testimony, visibility patterns, unequal framing — it’s unlikely you’ll come out with nothing.

You might find:

Even if the pattern isn’t across the board, you’ll likely find:

And that’s enough to make a powerful and specific claim.

🎓 Research Isn’t About Proving a Point — It’s About Testing One

You’re not “trying to show bias” — you’re trying to find out whether bias exists under defined, observable conditions.

If you do that transparently, then whether your conclusion is yes or no, your work is:

And that makes it powerful.

Example

Apply all code definitions in good faith, and the conclusions emerge as a pattern in the analysis phase, not as a coded variable.

Your coding job is to:

Then, in analysis:

You can write:

“Across 12 incidents affecting Palestinian students, administrative responses consistently used language coded as conciliatory, but avoided naming harm or offering specific recourse — suggesting a pattern of rhetorical response that deflects institutional responsibility, aligning more with reputational safeguarding than material recourse.””


🧩 Table 1: Incident Evaluation Pipeline

Inclusion Rule
Defines what counts as an incident
→ Neutral, identity-agnostic

Keyword Search
Retrieves a superset of candidate articles
→ Designed to surface events likely to match rule

Screening by Rule
Filters keyword results using defined criteria in the inclusion rule
→ Apply consistently — group/outcome blind

Logging
Track both included and excluded articles with reasons for transparency
→ Maintain transparency and repeatability

Structured Coding
Assign rule-based fields (e.g., group, severity, policy, response)
→ Enables categorical comparison across incidents

Qualitative Coding
Apply interpretive rubrics to capture tone, framing, or narrative position
→ Adds context and nuance beyond numeric fields

Consistency Checks
Test and refine coding for replicability across all incidents
→ Apply to both structured and qualitative variables

Controlled Comparison
Analyze disparities while holding severity, visibility, and legality constant
→ Reveals potential group-based bias in admin behavior

🧬 Table 2: Variable Types

Type Description Examples
Structured Attributes Rule-based, consistent, and quantifiable. These variables are coded using explicit criteria, allowing categorical comparison. Target_Group, Severity_Score, Policy_Broken, Media_Coverage_Level, Admin_Response
Qualitative Variables Interpretive but systematic. These capture nuance (e.g., tone or framing) using defined rubrics with consistent categories. Tone_of_Response, Framing_Language, Narrative_Positioning, Latency_Tone, Follow_Up_Action

⏱️ Table 3: Temporal Analysis Integration

Tool Use
Date_of_Incident, Date_of_Response Calculate latency, map timelines
Academic_Term, Policy_Epoch Compare behavior pre-/post-major events (design choice, maybe)
Time-windowed analysis Detect episodic or event-specific bias
Visual tools Reveal clusters, escalation patterns, or administrative silences

🗂️ Table 4: Data Organization Structure

✅ Master Incident List
One row per incident
→ Includes date, location, structured and qualitative fields
→ References source(s) used via Source_IDs
→ Chronologically ordered and searchable

✅ Source Appendix
One row per unique source
→ Includes Source_ID, title, date, type (e.g., Daily Bruin, admin email)
→ Describes how the source was used
→ Linked to incidents via shared IDs

Why This Works:

Getting started

✅ Core Fields to Code From the Start

These are foundational — you need them early to build structured comparisons:

Format tables that separate DVs from other variables, and group structured and qualitative

🆔 Incident ID (Incident_ID) format

📎 Source ID (Source_ID) format

✅ Add-On Fields You Can Layer In Later

You don’t need to lock these in up front — just keep them in mind:


🧱 Staying Grounded: Avoiding Outside Inference and Project Creep

✅ What “no outside inference required” means:

⚠️ Project Creep Risks:

🛡️ Guardrails to Prevent Drift

🧾 Source vs. Source Type

🧾 Source = A specific document or artifact
An individual item you cite or use to code an incident.
Examples:

🗂️ Source Type = A category of source
A class of materials that you allow into your dataset.
Examples:

✅ Why This Matters for Your Methodology


🧭 Choosing Your Next Step

The best next step depends on your immediate goal: refining your process vs. scaling your dataset.

✅ Option 1: Start with a Few Candidate Incidents

Best if your goal is to test and refine your pipeline

Walk 2–3 candidate incidents through the entire evaluation pipeline:

🔁 This approach reduces rework later and ensures your system holds up under real examples.

✅ Option 2: Design Your Inclusion Rule and Scrape Keywords

Best if your goal is to begin scaling up the dataset

🧱 This sets the foundation for consistent data gathering and prevents selection bias.

🧠 Suggested Hybrid Approach

Do one full test incident first, end-to-end (cherry-picked is fine):

Example—Field: Severity_Score

Then, run your inclusion rule and draft keyword search on a small batch (5–10 real Daily Bruin articles):

✅ When to Consider Your System Finalized

You can consider your:

Finalized when they all hold steady across the batch — meaning:

At that point, your system is stable, and you’re ready to scale up full incident discovery and coding with confidence.


A full breakdown of field types you’ll use in structured research, with examples and how they relate to each other:

Some of the notes above are hazy on field types and need to be corrected. For clarity for now:

🟢 1. Binary / Boolean

🔵 2. Nominal Categorical

🟡 3. Ordinal Categorical

🔴 4. Quantitative (Discrete or Continuous)

🟠 5. Structured Qualitative

⚫️ 6. Unstructured Qualitative

Field Type Reference

Type Ordered Numeric Needs Coding Rules? Structured Examples
Binary / Boolean No No No ✅ Yes Admin_Response, Follow_Up_Action
Nominal Categorical No No ✅ Yes (defined set) ✅ Yes Target_Group, Media_Coverage_Level
Ordinal Categorical ✅ Yes No ✅ Yes (codebook) ✅ Yes Severity_Score, Tone_of_Response
Quantitative ✅ Yes ✅ Yes No ✅ Yes Latency (days), Injury_Count
Structured Qualitative Maybe No ✅ Yes (codebook) ✅ Yes Narrative_Framing, Student_Tone
Unstructured Qualitative No No ❌ No Admin_Statement_Text, Raw Chants

Ordinal Categorical v. Structured Qualitative

✅ Both Need a Codebook or Rubric

Yes, both require:

But the type of structure differs:

🟡 Ordinal Categorical

Example rubric for Severity_Score:

Value, Definition

Low, No injuries, no arrests, no building closures

Moderate, 1–2 arrests OR building disruptions

High, Injuries OR multiple arrests OR widespread closures

🟠 Structured Qualitative

Values may be non-ordered

Example rubric for Narrative_Framing:

Category, Description, Example Phrase

Civil Rights, Emphasizes student rights, equality “Free expression is essential”

Security Threat, Emphasizes danger, policing, disruption “We must restore order”

Procedural, Uses neutral, bureaucratic language “We are reviewing the matter”


👩‍🏫 Coder Training = Ensuring consistency in how fields are applied

It’s about making sure that:

🔁 Why It Matters:

🛠️ Coder Training Often Includes:

✅ Coder Training Checklist

  1. Codebook Prep
  1. Training Set
  1. Practice Coding
  1. Compare Results
  1. Refine Rules
  1. Retest if Needed

📊 Measuring Inter-Coder Reliability

🧮 Cohen’s Kappa (for 2 coders)

Use tools like Excel, Python (sklearn.metrics.cohen_kappa_score), or R to calculate.

🧮 Krippendorff’s Alpha (if >2 coders or mixed data types)

👤👤 Yes — you can absolutely be both coders.

Solo Coding for Consistency:

When you’re the only coder:

This still lets you:

🛠️ Tip:

Use a spreadsheet with hidden columns or versioned YAML files to store your first pass, then code again and compare.


For situations where impactful factors only apply to a few cases

You can say something like:

In several high-profile cases, such as [INC-012], internal records released by the university revealed a significant volume of public feedback (e.g., hundreds of emails and community letters), suggesting a level of visibility and pressure not captured by external media metrics alone. While this internal engagement was not available across all incidents, its presence in select cases may have amplified administrative responsiveness or public framing.

This maintains scientific transparency and causal discipline without overcoding or biasing your results. You’re flagging a potential unmeasured covariate — totally legit (check this across coders, i.e. Claude, etc.)

And in the case where administrative posts go viral:

Suggested narrative framing (formal tone): In a subset of cases, administrative posts themselves became focal points of public engagement. For example, in [INC-014], a university statement posted to social media received over 25,000 views and was widely circulated by students and external media. While such amplification is not present across all incidents, its presence in these select cases likely heightened public awareness and influenced both perception and institutional pressure.

This lets you:

You’re treating it as a qualitative explanatory factor, not a coded metric — which is the right move if it’s only in a few incidents.


Source Categories

🔹 Dependent Variable (DV) Sources

Prefix Source Type Function in Dataset
ADM- Administrative communications Used to code administrative response, tone, latency, and stated recourse.

🔸 Incident-Triggering Source (Primary IVs)

Prefix Source Type Function in Dataset
DB- Daily Bruin articles Defines incident inclusion. Used for timing, location, participants, and anchoring events.

🟨 Explanatory / Control Variable Sources

Prefix Source Type Function in Dataset
MED- Third-party news media Visibility, amplification, and narrative framing (e.g., LA Times, Jewish Journal).
SOC- Social media posts Public-facing visibility and grassroots traction (e.g., X, Instagram).
ORG- Student org materials Protest tone, framing, and actor intent.
LEG- Legal documents Legal escalation, OCR complaints, lawsuits, external review.
RPT- Reports, investigations, or audits Institutional context, third-party evaluations, policy framing.

Reddit limitations (source biases)