This module introduces large language models as practical research tools.
You’ll learn:
What LLMs really are and why they don’t understand language the way we do.
How to set up and use Ollama for local model inference with Python.
Practical workflows for summarization, extraction, and hypothesis generation in research.
The failure modes and boundaries of LLMs, and when to trust (or verify) their outputs.
Do LLMs Understand Language?
Let’s talk about the most fundamental question: Can LLMs understand the world and reason about it?
One might argue that fluency demonstrates understanding. This is the intuition behind Turing’s 1950 test: if you can’t tell it’s a machine, treat it as intelligent. But let’s examine counter-arguments starting with ELIZA, developed by Joseph Weizenbaum in the mid-1960s as one of the first chatbots. It simulated a Rogerian psychotherapist using simple pattern matching and keyword substitution. Despite its lack of true understanding, ELIZA famously convinced many users that they were conversing with an intelligent entity, highlighting the human tendency to anthropomorphize technology and the limitations of the Turing Test.
Another argument against fluency is the Chinese Room argument, proposed by philosopher John Searle. Imagine a person in a room who receives Chinese characters and, using an English rulebook, manipulates these symbols to produce new Chinese characters. To an outside observer, it appears the room understands Chinese, yet the person inside merely follows instructions to manipulate symbols without understanding their meaning. Searle argues that this is analogous to how computers, including LLMs, operate: they process symbols based on rules without genuine comprehension.
So do LLMs understand the world? Probably not in the same way we do. LLMs are lossy compression algorithms, compressing data into their parameters to generate fluent outputs. To predict “The capital of France is ___,” the model must compress not just the fact (Paris) but the statistical regularities governing how facts appear in text: that capitals follow “The capital of,” that France is a country, that countries have capitals. The model stores P(\text{word}_{n+1} \mid \text{word}_1, \ldots, \text{word}_n), which words tend to follow which other words in which contexts, just as a lottery memorizer stores patterns of number sequences.
Training feeds the model billions of sentences. For each sentence, the model predicts the next word, compares its prediction to the actual next word, and adjusts its parameters to increase the probability of the correct word. Repeat trillions of times. The result: a compressed representation of how language behaves statistically. The model doesn’t learn “Paris is the capital of France” as a fact but rather that in contexts matching the pattern [The capital of France is], the token “Paris” appears with high probability.
The lottery memorizer doesn’t understand what draws mean but just knows what patterns appear most often. This is why LLMs create hallucinations: fluent but false outputs. Truth and fluency correlate in the training data, so the model is mostly truthful, but in the tails (obscure topics, recent events, precise recall), fluency diverges from truth, and the model follows fluency.
Keep this limitation in mind and use LLMs as a tool to scale pattern recognition, not judgment. Let’s learn how to utilize them.
Setting Up Ollama
For this course, we use Ollama, a tool for running LLMs locally, with Gemma 3N, a 4-billion parameter open-source model. It’s free, private, and capable enough for research tasks.
Visit ollama.ai, download the installer, and verify installation.
ollama--versionollama pull gemma3n:latestollama run gemma3n:latest "What is a complex system?"
If you receive a coherent response, install the Python client and send your first prompt.
pip install ollama
import ollamaparams_llm = {"model": "gemma3n:latest", "options": {"temperature": 0.3}}response = ollama.generate( prompt="Explain emergence in two sentences.",**params_llm)print(response.response)
Emergence is when complex patterns and behaviors arise from simple interactions between individual components in a system. These emergent properties are not predictable from the properties of the individual parts alone, representing a novel level of organization.
Run this code twice and you’ll get different outputs. Why? Because LLMs sample from probability distributions. The temperature parameter controls this randomness: lower values (0.1) make outputs more deterministic, higher values (1.0) increase diversity. You’re controlling how far into the tail of the probability distribution the model samples. Low temperature means the model picks the most likely next word, while high temperature ventures into less probable territory, sometimes producing creativity, sometimes nonsense.
Research Applications
The strategy is simple: use LLMs for tasks where speed trumps precision, then verify the outputs that matter. Three workflows demonstrate this pattern.
Abstract Summarization
You collected 50 papers on network science. Which deserve detailed reading?
You don’t have time to read all 50 abstracts carefully. An LLM scans them in seconds.
abstract ="""Community detection in networks is a fundamental problem in complex systems.While many algorithms exist, most assume static networks. We propose a dynamiccommunity detection algorithm that tracks evolving communities over time usinga temporal smoothness constraint. We evaluate our method on synthetic and realtemporal networks, showing it outperforms static methods applied to temporalsnapshots. Our approach reveals how communities merge, split, and persist insocial networks, biological systems, and transportation networks."""prompt =f"Summarize this abstract in one sentence:\n\n{abstract}"response = ollama.generate(prompt=prompt, **params_llm)print(response.response)
This paper introduces a novel dynamic community detection algorithm that effectively tracks evolving communities in networks over time, outperforming static methods and revealing community dynamics in various real-world systems.
The model captures the pattern: propose method, evaluate, outperform baselines. It doesn’t understand the paper but has seen enough academic abstracts to recognize the structure. For multiple abstracts, loop through them.
for i, abstract inenumerate(["Abstract 1...", "Abstract 2..."], 1): response = ollama.generate(prompt=f"Summarize:\n\n{abstract}", **params_llm)print(f"{i}. {response.response}")
1. Please provide me with "Abstract 1"! I need the text of the abstract to be able to summarize it for you.
Just paste the abstract here, and I'll give you a concise summary. 😊
I'm ready when you are!
2. Please provide me with the content of "Abstract 2"! I need the text of the abstract to be able to summarize it for you.
Just paste the abstract here, and I'll give you a concise summary. 😊
Local models are slow (2–5 seconds per abstract). For thousands of papers, switch to cloud APIs. But the workflow scales: delegate skimming to the model, retain judgment for yourself.
I ran this on 200 abstracts about power-law distributions. Gemma flagged the 15 that used preferential attachment models. Saved me 4 hours. I still read all 15 myself.
Structured Extraction
Turn unstructured text into structured data automatically.
abstract ="""We analyze scientific collaboration networks using 5 million papers from2000-2020. Using graph neural networks and community detection, we identifydisciplinary boundaries and interdisciplinary bridges. Interdisciplinarityincreased 25%, with physics and CS showing strongest cross-connections."""prompt =f"""Extract: Domain, Methods, Key Finding\n\n{abstract}\n\nFormat:\nDomain:...\nMethods:...\nKey Finding:..."""response = ollama.generate(prompt=prompt, **params_llm)print(response.response)
Here's the extraction in the requested format:
Domain: Scientific Collaboration Networks
Methods: Graph Neural Networks, Community Detection, Analysis of 5 million papers (2000-2020)
Key Finding: Interdisciplinarity increased by 25% between 2000-2020, with the strongest cross-connections observed between Physics and Computer Science.
Scale this to hundreds of papers for meta-analysis, but always verify. LLMs misinterpret obscure terminology and fabricate plausible-sounding technical details when uncertain, pattern-matching against academic writing they’ve seen rather than reasoning about your domain.
Hypothesis Generation
LLMs pattern-match against research questions they’ve encountered in training data.
context ="""I study concept spread in citation networks. Highly cited paperscombine existing concepts novelty. What should I study next?"""prompt =f"""Suggest three follow-up research questions:\n\n{context}"""response = ollama.generate(prompt=prompt, **params_llm)print(response.response)
Okay, here are three follow-up research questions, building on your work on concept spread in citation networks, focusing on highly cited papers and the interplay of existing concepts and novelty. I've tried to offer a mix of methodological and theoretical directions:
**1. How does the *type* of novelty (e.g., incremental, radical, convergent) in highly cited papers influence the rate and direction of concept spread?**
* **Rationale:** You've identified that highly cited papers combine existing concepts with novelty. However, the *nature* of that novelty likely matters. Is it a small tweak to an existing idea (incremental), a completely new paradigm (radical), or a synthesis of multiple existing ideas (convergent)? Different types of novelty might spread differently through the citation network.
* **Methodology:** This could involve:
* **Concept Extraction & Categorization:** Develop a method (potentially using NLP techniques like topic modeling or knowledge graph extraction) to identify and categorize the types of novelty present in highly cited papers.
* **Network Analysis:** Analyze the citation network to see if papers with different types of novelty have different citation patterns (e.g., different citation paths, different communities of citing papers).
* **Temporal Analysis:** Track the spread of concepts over time, looking for differences in the spread dynamics based on the type of novelty.
* **Potential Insights:** This could reveal whether incremental novelty spreads quickly within a well-established field, while radical novelty requires more time and a different set of initial citations to gain traction.
**2. To what extent does the *citation context* (i.e., how a highly cited paper is cited) mediate the relationship between novelty and concept spread?**
* **Rationale:** It's not just *that* a paper is highly cited, but *how* it's cited that matters. Is it cited as a foundational work, a contrasting viewpoint, a building block for further research, or something else? The citation context could significantly influence how the novelty is perceived and incorporated by subsequent researchers.
* **Methodology:**
* **Citation Context Analysis:** Develop a method to classify the citation context of highly cited papers (e.g., using NLP to analyze the surrounding text in citations).
* **Network Analysis:** Analyze the citation network to see if the citation context of a paper is correlated with the subsequent spread of concepts.
* **Sentiment Analysis:** Use sentiment analysis on the citation text to gauge the attitude towards the novelty being presented.
* **Potential Insights:** This could reveal whether a paper's novelty is more likely to spread if it's cited as a key foundational work, or if it's more likely to be incorporated if it's cited as a contrasting viewpoint that sparks debate.
**3. Can we identify "concept amplifiers" – papers that, due to their specific combination of existing concepts and novelty, act as particularly effective catalysts for concept spread?**
* **Rationale:** Not all highly cited papers are created equal in terms of their ability to spread concepts. Some papers might be inherently more influential due to their specific combination of existing knowledge and new ideas.
* **Methodology:**
* **Feature Engineering:** Develop a set of features that capture the combination of existing concepts and novelty in a paper (e.g., the number of distinct concepts introduced, the degree of overlap with existing concepts, the "surprise" or unexpectedness of the novelty).
* **Machine Learning:** Use machine learning techniques (e.g., regression, classification) to identify papers that are strong predictors of concept spread, based on their feature values.
* **Network Analysis:** Analyze the citation network to see if papers identified as "concept amplifiers" have distinct network properties (e.g., high betweenness centrality, strong connections to diverse communities).
* **Potential Insights:** This could lead to a better understanding of the factors that contribute to the influence of scientific papers and potentially inform strategies for promoting impactful research.
These questions are designed to be relatively focused and address different aspects of your initial research. They also offer opportunities to combine quantitative network analysis with qualitative analysis of the content and context of citations. I hope this helps! Let me know if you'd like me to elaborate on any of these or suggest alternative directions.
Treat the model as a thought partner, not an oracle. It helps structure thinking but doesn’t possess domain expertise, reflecting patterns in how research questions are framed rather than deep knowledge of your field.
Failure Modes and Boundaries
The failure modes follow directly from the mechanism. LLMs fabricate plausibly because they optimize for fluency, not truth. Ask about a non-existent “Smith et al. quantum paper” and receive fluent academic prose describing results that never happened. Always verify citations: the model has seen thousands of papers cited in the format “Smith et al. (2023) demonstrated that…” and generates outputs matching that pattern even when the citation is fictional.
Context limits are architectural. Models see only 2,000–8,000 tokens at once. Paste 100 abstracts and early ones are mathematically evicted from working memory, gone. Knowledge cutoffs are temporal: Gemma 3N’s training ended early 2024, so asking about recent events yields outdated information or plausible fabrications constructed from pre-cutoff patterns.
Reasoning is absent. LLMs pattern-match, they don’t reason. Ask “How many r’s in ‘Strawberry’?” and the model might answer correctly via pattern matching against similar questions in training data, not by counting letters. Sometimes right, often wrong, with no internal representation of what counting means. These aren’t bugs to be fixed but intrinsic to the architecture.
Use LLMs to accelerate work, not replace judgment. They excel at summarizing text, extracting structure, reformulating concepts, brainstorming, generating synthetic examples, and translation. They fail at literature reviews without verification, factual claims without sources, statistical analysis, and ethical decisions.
Harvest the center of the distribution where fluency and truth correlate. Defend against the tails where they diverge.
What Comes Next
You’ve seen LLMs in practice: setup, summarization, extraction, limitations. But how do they actually work?
What happens inside when you send a prompt? The rest of this module unboxes the technology: prompt engineering (communicating with LLMs), embeddings (representing meaning as numbers), transformers (the architecture enabling modern NLP), fundamentals (from word counts to neural representations).