A network of 1,637 economists built from Wikidata and Wikipedia, enriched with AI-generated summaries, ranked by PageRank, and organized around the intellectual debates that have defined the discipline.
"Economists, like any other discipline, do not produce ideas in a vacuum. Their contributions build upon and influence one another, forming a complex and evolving web of intellectual history."
The goal of Econograph is to uncover and visualize those interconnections: academic lineage, intellectual influence, and institutional affiliations, using structured data scraped from Wikipedia. Inspired by similar projects in the history of philosophy, this effort brings the tools of data scraping, network analysis, and machine learning to the history of economic thought. While some comparable projects exist, they tend toward either manual curation or simple lists. Here, the aim is to automate data collection at scale and create a reusable foundation for historical, institutional, or theoretical investigations into the economics profession.
One of the most meaningful courses in my training as an economist was the history of economic thought. It is the kind of course that forces you to step back and ask a harder set of questions: how did we come to think the way we think? What problems were economists actually trying to solve when they built the models we now take for granted? What did they assume away, and why? The history of the discipline is, in this sense, a history of the questions economists decided were worth asking. Econograph is an attempt to make that history navigable.
The graph you see is the result of a five-stage automated pipeline. None of the school assignments, summaries, or network scores were entered by hand. The source code lives at github.com/mmarteaga/econograph.
Six stages turn raw Wikidata and Wikipedia records into the interactive graph.
The dataset starts with Wikidata, Wikipedia's structured knowledge base. Every economist with a Wikipedia article has a Wikidata entry recording structured facts: birth and death dates, a photo, doctoral advisor relationships, doctoral student relationships, and intellectual influences.
The scraper queries Wikidata's SPARQL endpoint and the MediaWiki API to collect this data. The result is 1,637 economists and 1,800 direct connections.
Why Wikidata instead of scraping Wikipedia infoboxes? Wikidata represents structured facts with explicit property types. P184 is doctoral advisor, P185 is doctoral student, P737 is influenced by. This distinction matters: knowing that Paul Samuelson was Robert Solow's doctoral advisor is richer information than knowing they are "connected" in some undifferentiated sense.
For each economist with a Wikipedia page, the introductory section (the paragraphs before any section headers) is fetched using the Wikipedia Action API. This gives a plain-text summary of who each economist is and what they worked on.
This text serves two purposes. First, it is the raw material for LLM school classification in Stage 3. Second, it is embedded directly in the graph file so the Research Assistant can perform full-text search without additional API calls at browse time.
A note on batching: The Wikipedia Action API silently caps prop=extracts responses at 20 articles per request, regardless of how many titles are submitted. Batches are set to 20 accordingly. This cap is not prominently documented; it was discovered empirically during development.
Assigning an economist to a school of thought is genuinely hard. Two problems make traditional keyword matching unreliable.
The first is polysemy. The word "development" means something entirely different in "development economics" (growth in low-income countries) versus "financial development" (depth of capital markets). A keyword approach cannot resolve this.
The second is network contamination. Community detection (Louvain algorithm) groups economists by who they cite, but intellectual network proximity does not equal school membership. Raj Chetty co-publishes with behavioral economists but is primarily a labor economist. Naive community detection dragged the entire Harvard labor economics cluster (Autor, Katz, Diamond) into the wrong school because of their network adjacency to behavioral researchers.
The solution is to read what each economist's Wikipedia page actually says and classify from that text directly. Each Wikipedia intro is sent to Claude Haiku with a structured prompt listing all 20 valid schools and including explicit disambiguation rules.
156 economists are classified by hand as authoritative seeds and are never sent to the LLM. They serve as anchors: Keynes is Keynesian, Hayek is Austrian School, Samuelson is Classical/Neoclassical. The LLM classifies everyone else.
The system prompt gives the LLM explicit guidance on edge cases. Some examples:
Coverage: 1,388 of 1,637 economists had Wikipedia intro text available. All 1,388 were classified. 863 received a different school assignment than the prior keyword-based approach. The remaining 249 economists either had no Wikipedia URL or insufficient text; they retain their seed assignment or prior classification.
The LLM approach substantially outperforms keyword matching on edge cases. Tobias Adrian (an expert on financial stability and capital market risk, wrongly labeled "Development" by keyword matching) is correctly classified as "Finance." The Harvard labor economics cluster is no longer contaminated by its Louvain community membership.
The 1,637 economists and 1,800 connections form an undirected graph. NetworkX computes PageRank for each node, the same iterative algorithm Google introduced to rank web pages.
PageRank works recursively: a node receives a higher score when it is connected to many others, and when those others are themselves highly connected. The formal update equation is:
PR(u) = PageRank score of node u (an economist)
d = damping factor, set to 0.85 (probability of following a link rather than jumping randomly)
N = total number of nodes in the graph (1,637)
B(u) = set of nodes with an edge pointing to u (i.e., economists who reference u)
L(v) = number of outbound edges from node v
Scores converge through repeated iteration until the change between rounds falls below a tolerance threshold (10-6). The result is a continuous score for each economist reflecting both breadth of influence and the prestige of their peers.
PageRank scores are used throughout the interface. They rank search results, determine which connections appear first in the detail panel, size nodes in the mini network diagram, and weight the "Surprise me" selection toward historically significant figures.
Beyond the raw graph, Wikidata records the type of each relationship. These are resolved into three labeled categories in the detail panel: doctoral advisors, doctoral students, and intellectual influences. A fourth category ("Also Connected") captures all remaining edges that carry no specific relationship type in the data, typically colleagues or frequent co-authors.
For each economist with a Wikipedia article, Claude Haiku generates two things. The first is a one-paragraph contribution summary (three to five sentences) describing their most important ideas, theorems, and intellectual legacy in specific terms. The second is a set of eight identifying keywords drawn from the theories they created, the institutions they shaped, or the results they are best known for.
The summaries and keywords are generated offline as part of the build pipeline and stored directly in the graph data file. No API calls are made at browse time. This makes the site fully static and compatible with GitHub Pages hosting.
Wikipedia text (up to 3,000 characters of the article's introductory section) is fetched via the Wikipedia Action API in batches of 20. Each batch is then processed concurrently by Claude Haiku with up to 8 simultaneous API calls. A checkpoint file saves progress after each batch, making the process crash-safe and resumable.
The model is instructed to respond with valid JSON only: a "summary" field containing the paragraph and a "keywords" array of exactly eight strings. Responses are parsed strictly; any output that does not parse as JSON is retried up to three times before being skipped.
1,636 of 1,637 economists received summaries and keywords. The single economist without coverage has no Wikipedia URL in the Wikidata record and could not be fetched. A URL-encoding issue discovered during the run (Wikipedia URLs containing percent-encoded characters such as G%C3%A9rard_Debreu were not being decoded before lookup) caused 97 economists to be missed on the first pass; this was corrected with urllib.parse.unquote() and a resume run covered the remaining cases.
Why not use Wikipedia bios for the same purpose? Wikipedia intros vary enormously in length, quality, and focus. The AI summaries are consistently structured and use active voice ("Arrow proved...," "Friedman argued..."), making them more useful for quick orientation. The Wikipedia bio text is preserved separately in the data for full-text search.
The Research Assistant uses MiniSearch, a lightweight BM25 full-text search library that runs entirely in the browser. On page load, it builds an in-memory index over every economist's name, school, Wikipedia bio, AI-generated summary, and keywords. No server is required.
The BM25 relevance score for a query q against a document d is:
IDF(t) = inverse document frequency of term t (rare terms score higher)
f(t, d) = frequency of term t in document d
|d| = length of document d in tokens
avgdl = average document length across the corpus
k1 = term saturation parameter (1.2 by default; controls how much repeated mentions add)
b = length normalization parameter (0.75 by default; penalizes very long documents)
Field weights are applied on top of BM25: name matches receive a boost of 4, keywords 3, summary 2, school 2, and bio text 1. This ensures that an economist whose name matches the query outranks one who merely mentions the query term in passing.
The final score for each result is a weighted combination of MiniSearch relevance and the economist's PageRank score, so prominent economists surface above obscure figures with the same keyword match.
For broader conceptual queries, Econograph fetches the Wikipedia article for the search query itself, extracts the first 600 characters of its intro, and runs a second MiniSearch pass using those expanded terms. A query for "history of corporate governance" pulls Wikipedia's intro for that topic, which mentions "agency theory," "hostile takeover," and "institutional investors." Those terms appear in relevant economists' bios and substantially improve recall. When this enrichment fires, a small badge appears on the results.
The debates view surfaces six recurring intellectual tensions that have driven the development of economics as a discipline. Rather than treating the history of economic thought as a sequence of settled questions, this view frames it as a set of ongoing arguments, each of which has shifted in character but never been fully resolved.
The selection criteria were that a debate had to meet three conditions. It had to be genuinely unresolved, meaning that thoughtful economists today still disagree about it rather than having reached a stable consensus. It had to be traceable through at least 150 years of the discipline's history, with identifiable figures on recognizable sides. It also had to connect to something visible in contemporary policy or research, so that a reader could see why it still matters.
Applying those criteria produces six debates:
The assignment process works in two layers. The first layer is a hand-curated set of named economists for each side of each debate. These are the figures most closely associated with each position in the historical literature. The second layer applies school-of-thought matching to extend coverage: economists in the Keynesian or Post-Keynesian schools are assigned to the "Discretion" side of the rules debate, for example, while Chicago School economists are assigned to the "Rules" side. Force-listed names always take precedence over school-based assignment. Economists whose work spans multiple debates appear in each one independently.
A note on simplification: Reducing an intellectual tradition to "two sides" necessarily loses nuance. Many economists occupy complex positions that shift across different policy questions. The debates view is designed to orient readers to the major fault lines, not to serve as a complete intellectual taxonomy.
English Wikipedia bias. The dataset is sourced entirely from English Wikipedia and Wikidata. Economists whose work is primarily documented in other languages are underrepresented or absent, particularly thinkers from Latin America, East Asia, and continental Europe outside major research universities.
The full pipeline is open source at github.com/mmarteaga/econograph.