Background

In Oceanus, commercial fishing is a major part of the local economy, and the activities of local fishing companies are often discussed in local news reports, especially when there is a major disruption or controversial issue. Lately, the news is filled with stories of companies who are willing to break the law and fish illegally.

FishEye International is an organization whose mission is to discover and stop illegal, unreported, and unregulated fishing. FishEye is an independent non-profit, but they share their findings with law enforcement when warranted. FishEye analysts collect open-source data, including news articles and public reports, and have recently started extracting embedded knowledge from these free text sources using several advanced language models. Knowledge from multiple sources is combined to form CatchNet: the Oceanus Knowledge Graph, which is used to search for evidence of possible illegal fishing activity. Analysts know that data may be biased so they review and edit extracted information before it is added to CatchNet. Data for this challenge includes both the extracted and analyst modified knowledge graph and the original source articles.

Recently, the commercial fishing community in Oceanus was rocked with scandal after SouthSeafood Express Corp was caught fishing illegally. The company’s leaders claim an innocent mistake, while others in the industry have suggested getting caught was overdue and is likely part of a larger pattern of suspect behavior. FishEye analysts are aware that a polarizing event like this is likely to attract biased perspectives. They are looking for your help to identify sources of bias in their data. They are having particular trouble determining whether bias comes from the source article, the algorithms used to extract the knowledge graph, or possibly somewhere else…

Tasks and Questions:

Your task is to develop visual analytics approaches that FishEye analysts can use to verify the facts included in their knowledge graph are representative of facts stated in the source text. Analysts should be able to compare consistency of the extracted knowledge with the source and identify and trace sources of bias in the data. Novel use of large language models (LLMs) as part of a visual analytics process is encouraged.

  1. Use novel visualizations and visual analytic workflows to examine the bias in each news source. Create visualizations to help FishEye analysts understand how bias in the original sources changes over time. You may use the knowledge graph extracts and may use a large language model to supplement your understanding.

  2. FishEye uses two LLM extraction algorithms: ShadGPT and BassLine. Develop visualizations to compare the bias of each algorithm. Though not required, you may develop your own LLM-based extraction and include it in the comparison.

  3. FishEye is also interested in understanding the reliability of their human analysts. Use visual analytics to examine potential analyst bias. Provide visual examples of the types of bias present.

  4. Identify unreliable actors: news sources, algorithms, or analysts. Use visualizations to provide evidence for your conclusions. Can you use the data provided and a visual analytics workflow to determine who else may be involved?

Note: the VAST challenge is focused on visual analytics and graphical figures should be included with your response to each question. Please include a reasonable number of figures for each question (no more than about 6) and keep written responses as brief as possible (around 250 words per question). Participants are encouraged to new visual representations rather than relying on traditional or existing approaches.

Reflection Questions

Download the Submission Form and the Data

VAST 2024 Submission Instructions

All VAST data is fully synthetic. Any resemblance to real people, places, or events is purely coincidental. Some elements of the 2024 VAST Challenge resemble data released for the 2023 VAST Challenge. Participants should not assume that there is any continuity and should not use any earlier or any external data for their submission. Mini-Challenge participants should only use data supplied for this mini-challenge in their submission. Data from other challenges should not be combined, except in the Grand Challenge.