Center for Techno-Anthropology (CETAN)

Research

Hyphe Coding Retreat 2019

The Techno-Anthropology Lab invited designer-developers from the Sciences Po médialab (Paris) and the Digital Methods (Amsterdam) to a one-week coding retreat (21.11.-25.11.2019) on the web crawler Hyphe.

Last modified: 08.11.2019

Hyphe is a free, libre, open source software developed mainly at the Sciences Po médialab. It offers an online interface to scholars who want to engage with the study of the web. It allows harvesting web pages and curating so-called "web entities", groups of pages that can be websites but also many other things. The typical outcome is a corpus represented as a network of web entities linked by hyperlinks. Hyphe is a scientific instrument designed for social science, humanities, and teaching.
We drew inspiration from the Agile approach to software development to organize our collaboration. We had written user stories to have an idea of which features were the most desired by each of us, techno-anthropologists and engineers alike. A user story formalizes a feature from the point of view of the user, favoring a context and sense making over a precise specification. For instance:


***As a STS researcher or as an actor of a domain, I can map the web landscape of a domain/topic (e.g. smart cities), in order to get a sense of how certain topics are talked about by different groups of people.***
***I would be happy if:***
* ***I can see the multiple dividing lines that separate groups of people (in the sense that two groups can be aligned on a sub-topic and opposed on another one).***
* ***As an actor of the domain, I can see where I stand in the landscape in order to have a ground for finding a strategic standpoint (who to align with, who to differentiate from...)***


We used Trello to organize our user stories and tasks during the coding retreat. Trello is the online equivalent of a "Kanban board", organized into columns where you can add and move post-it-like user stories.
A strong focal point quickly emerged: exploiting the text of web entities inside Hyphe, during the process of crawling and curating a corpus. Is it even possible? Which would be the technical constraints? What could it look like? Different subproject branched out of these questions.
* We compared two different approaches to delineating a topic inside a web entity: (1) by looking at the URL structure, or (2) using a full-text search query. Although there is no such feature in Hyphe currently, we were able to repurpose Google and DMI's Lippmannian Device to simulate it. We realized that the results were quite different.
* We tested the feasibility of (1) embedding an Elastic Search engine in Hyphe once the corpus is finalized and (2) having it work live, i.e. during the crawl and before it is finalized. Because of a heavy technical constraint, the second feature is much more difficult than the first one. We were able to code a proof of concept for solution (2), showing that text-mining features in Hyphe are, in fact, in our grasp.
* We wireframed views using text-mining features. Wireframing is a design step during which we think of how different features are distributed in different screens, and how. It is about the occupation of space and which elements must be displayed together and how, but it is not yet about graphic design or implementation. We proposed an early design for the most important screens and prioritized them.
* Finally, we made progress towards the possibility to repurpose Hyphe's backend for DMI's Issue Crawler. The backend of the Issue Crawler is now so old that its maintenance is a major issue, and benefiting from Hyphe's maintenance efforts would allow it to extend its life.
We dedicated three mornings to collective discussions about Hyphe, the constraints of a web crawler, and what it could become in the future. Debates about how different existing or possible features impact the methodology showed how distributed are these decisions. Of course the designer-developers of Hyphe are also users with their own research agenda, and have a methodological perspective allowing them to take meaningful design decisions for a scientific instrument. In that sense, Hyphe carries a methodological influence. However users also repurpose tools in unexpected ways. For instance, during the coding retreat, our micro-experiment on how to delineate a topic inside a website sparked a previously unthought use scenario, pushing Hyphe's limits in unexpected ways (crawls with hundreds of starting pages). As co-designers of Hyphe we were all aware of these possibilities; we were willing to let users explore the tool in their own ways, and we paid attention to be open enough to other influences. In that sense, Hyphe is also a toolbox with multiple applications, and not the implementation of a precise methodology. Inside this design negotiation between technical constraints, research interests, misleading or dangerous features we refrain from implementing, and ideas that could open the user's methodological horizon, it became clear that "blackboxing" a scientific instrument could mean different things to different publics.​​​


Hyphe website: https://hyphe.medialab.sciences-po.fr
Hyphe demo: https://hyphe.medialab.sciences-po.fr/demo

Names of participants:
Anders Koed Madsen
Anders Munk
Andreas Birkbak
Arnaud Pichon
Benjamin Ooghe Tabanou
Guillaume Plique
Martin Delabre
Mathieu Jacomy
Oscar Coromina
Paul Girard
Sofie Torsen
Stijn Peeters
Torben Elgaard Jensen​​