Center for Global Cyber Strategy (CGCS) researchers have used the data donated by the white hat groups to create anonymized profiles of the groups.One such profile has been identified by CGCS sociopsychologists as most likely to resemble the structure of the group who accidentally caused this internet outage. You have been asked to examine CGCS records and identify those groups who most closely resemble the identified profile.


Data for this mini-challenge consists of the following. This data is described in more detail in the download file.

Note: In the sample, data channels were each in their own file. In the full release these have been merged into a single file.

Note: Having difficulty with scale when working with the very large graph? Feel free to ask questions about approaching this challenge. Answers to the questions we receive will be posted on this page for all contestants to see.

Tasks and Questions

  1. Using visual analytics, compare the template subgraph with the potential match provided. Show where the two graphs agree and disagree. Use your tool to answer the following questions:
  1. Compare the five candidate subgraphs to the provided template. Show where the two graphs agree and disagree. Which subgraph matches the template the best? Please limit your answer to seven images and 500 words.
  2. Which key parts of the best match help discriminate it from the other potential matches? Please limit your answer to five images and 300 words.
  1. CGCS has a set of “seed” IDs that may be members of other potential networks that could have been involved. Take a look at the very large graph. Can you determine if those IDs lead to other networks that matches the template? Describe your process and findings in no more than ten images and 500 words.
  2. Optional: Take a look at the very large graph. Can you find other subgraphs that match the template provided? Describe your process and your findings in no more than ten images and 500 words.
  3. Based on your answers to the question above, identify the group of people that you think is responsible for the outage. What is your rationale? Please limit your response to 5 images and 300 words.
  4. What was the greatest challenge you had when working with the large graph data? How did you overcome that difficulty? What could make it easier to work with this kind of data?

Clarification Requests

1. Problems downloading data


I’m having trouble downloading the large files when I click on the link in the form. How do I download the files?


Links in the form must be opened in a new browser tab. Clicking on the links within the form may not work.

2. Shared IDs


Are the IDs shared between the candidate graphs?


Yes, ID’s are always shared between the candidate graphs. A given node ID in any graph provided with this mini-challenge always refers to the same person or thing.

A person who appears in two candidate subgraphs appears to have a different set of edges in each graph. What causes this?

Candidate subgraphs do not contain the full set of edges connected to each node. The full set of edges for each node can only be found in the large graph.

3. Strange data values


There are travel records with negative weights. What does this signify?


492850 6 625756 1641600 -1 5 3 22 156 -25 -111


There is no special significance to these values. Datasets can be messy and contain errors and unknown values.

4. Column confusion


There is a mismatch in the descriptions of eType 0 and eType 1 in the PDF documentation for mini-challenge 1 (CGCS-GraphData-Readme.pdf). Which is correct?


The table on the first page of the PDF was incorrect. Emails are designated as eType 0 and calls are designated as eType 1.

Download the Submission Form and the Data

Note: Links in the form must be opened in a new tab.

VAST 2020 Submission Instructions

Please provide a valid address in order to get access to the data. Your email address may be used if we need to contact you with important information about the Challenge.

Sample Data

Sample data for this mini-challenge is available in order to give teams a small set to prepare their tools and work through ingesting the data.

Note: Data from multiple channels is provided in separate files in this sample.