About — Not Like Us

Overview

This dataset tracks how members of the United States Congress have talked about immigrants and immigration across 152 years of legislative debate. The dataset covers every congressional session from the 43rd (1873–75) through the 119th (2025–27), drawing on the full text of the Congressional Record. Each paragraph mentioning immigration or immigrant groups was extracted, classified for sentiment, rhetorical framing, and target group, and aggregated into the visualizations you see here.

Data source

The majority of speeches come from the Stanford Congressional Record Database, which digitized and structured the Congressional Record from 1873 through 2017 (114th Congress). The raw corpus contains millions of paragraphs spanning floor speeches, debate excerpts, and committee proceedings from both the House and Senate. The remaining floor speeches for the 115-119th Congress were downloaded from the GovInfo Congressional Record.

Classification methodology

Each paragraph was passed through Gemini 2.0 Flash (via Google Vertex AI) and classified along four dimensions:

Relevance — Is this paragraph actually about immigration? Classified as direct, indirect, or false positive (e.g., the word "alien" used in a physics context). False positives are removed from subsequent visualizations.
Sentiment — The emotional valence of the paragraph toward the immigrant group it mentions: positive, negative, neutral, or mixed.
Target group — Which immigrant group is being discussed? Categories include Immigrants General, Undocumented Immigrants, Refugees, Chinese, Irish, Italian, German, Mexican, Latin American, Jewish, and others.
Framing — The rhetorical frame applied to the group: Economic contributor, Economic threat, Cultural enrichment, Cultural threat, Security threat, Humanitarian concern, Legal/procedural, Family values, Criminal, or Victim.

The classifier also flags paragraphs that contain stereotyping — broad, group-level generalizations about behavior, character, or capabilities.

What the visualizations show

Sentiment over time traces how the emotional tone of immigration discourse shifts across sessions, broken out by party. Across all 152 years, a majority of immigration mentions carry negative sentiment. The fraction has risen in recent decades, with Republican negative rates exceeding 70% in the 118th Congress (2023–25).

Stereotyping rate shows the fraction of paragraphs flagged as containing stereotyping language. Stereotyping among Republican speakers is on a steady increase since the late 1980's.

Target groups over time shows which immigrant communities attracted congressional attention in each era. Chinese immigrants in the 19th-century; refugees in the 20th-century; Jewish/Soviet immigrants in the 1980s; and undocumented immigrants since the 1980s.

Framing categories capture the rhetorical strategies used: economic threat framings dominate in periods of nativist backlash; humanitarian and legal-procedural frames rise during refugee crises and reform debates.

Partisan sentiment gap charts the difference between Republican and Democratic negative-sentiment rates for each target group across time. Positive values mean Republicans were more negative; negative values mean Democrats were. The gap has widened sharply since the mid-1990s for almost every group.

Geographic distribution maps the state origin of each speech.

Narratives visualizes selected quotes for individual immigrant groups, showing a common pattern where new immigrant groups are initially viewed with suspicion, then recognized for economic/cultural benefit, and finally used as a model to denigrate more recent immigrant waves by comparison.

Limitations and caveats

LLM classification is imperfect. Gemini 2.0 Flash performs well on clear cases but can mis-classify ironic, hypothetical, or procedurally complex language. The LLM often struggles with assigning sentiment to the speaker; for example, a speaker criticizing a negative stereotype will be labeled as negatively stereotyping. Cases should be verified against the original text, which is linked from every speech page.
Party attribution is incomplete. Congressional Record entries do not always include reliable party metadata, particularly for 19th-century sessions. Analyses stratified by party should be treated with caution for sessions before the 80th Congress (1947).
Framing categories are not mutually exclusive. A single paragraph can carry multiple framings; the charts show the share of paragraphs with each framing, which can sum to more than 100%.
"Immigrants General" is the most common target group category and acts as a catch-all for speeches that discuss immigration broadly without naming a specific group. It is excluded from the partisan sentiment gap chart to focus on more specific comparisons.

Citations

Please cite the Stanford Congressional Record Database, which powered most of this analysis. For peer-reviewed analyses of immigration sentiment in this data (through 2020) see Card et al. (2022) PNAS -- Computational analysis of 140 years of US political speeches reveals more positive but increasingly polarized framing of immigration.