How do I avoid duplicating data?

If you're importing spreadsheet data into Kumu, you'll most likely want to add new data, edit existing data, or maybe just fix a typo at some point. When that happens, you will want to make sure that Kumu isn't duplicating data when you re-import your up-to-date dataset.

To avoid duplicating data, you first need to understand how Kumu decides whether to create new elements & connections or update existing elements & connections.

When does Kumu create, and when does it update?

The rules are a bit different for elements and connections, so we'll tackle them one at a time below.

Elements

For the simplest possible element spreadsheet, which only contains a Label column, Kumu will create a new element for each unique label. Kumu would create three elements for this example (a list of famous surfing waves in Hawaii):

Label

Waimea Bay

Pipeline

Sunset Beach

For this next example, Kumu will also create only three elements, since one of the Labels is repeated in the data:

Label

Waimea Bay

Pipeline

Pipeline

Sunset Beach

You can use this behavior to your advantage when you want to update an existing element's data. Let's say you have already imported the first dataset, which created three elements in Kumu, and now you want to add a new field to those existing elements, using this data:

LabelLive Surf Report

Waimea Bay

http://www.surfline.com/surf-report/waimea-bay-oahu_4755/

Pipeline

http://www.surfline.com/surf-report/pipeline-oahu_4750/

Sunset Beach

http://www.surfline.com/surf-report/sunset-beach-oahu_4746/

Pua'ena Point

http://www.surfline.com/surf-report/puaena-point-oahu_49940/

When you import this spreadsheet, Kumu will search through all existing elements to find any Labels that match the Labels in your import file. Then, Kumu will update the matched elements with your new field.If you had any existing data in an element's field, it would be replaced by the data in the new import.

If Kumu can't find a match, it will add a new element to the map. In the example above, Kumu would update the existing Waimea Bay, Pipeline, and Sunset Beach elements, and it would create a new element for Pua'ena Point, which wasn't included in the original dataset.

Now, if you also include a Type column on your spreadsheet, Kumu will create a new element for each unique Label-Type combination. For example, this dataset will create five elements, even though one Label is repeated:

LabelType

Waimea Bay

Double overhead

Pipeline

Head high

Pipeline

Double overhead

Sunset Beach

Head high

Pua'ena Point

Head high

The same rules apply here when updating data on existing elements: if Kumu can find a Label-Type match for the elements you uploaded, it will update that element's data, otherwise, it will create a new element.

Use IDs to separate identical labels

Finally, you can use an ID column to tell Kumu to ignore both Label and Type, and only match existing elements based on their unique ID. This can be really useful when you want to change the Label and/or Type of elements without creating duplicates:

IDLabelType

Wave-1

Waimea Bay

Double overhead

Wave-2

Pipeline

Head high

Wave-2

Pipeline

Double overhead

Wave-3

Sunset Beach

Head high

Wave-4

Pua'ena Point

Head high

In the earlier example, before we added ID, Kumu was creating two elements with the label Pipeline. One of those elements had the type Head high, and the other had the type Double overhead. But now that we're using ID, Kumu understands that those are the same element with the ID Wave-2.

With the help of this new ID column, Kumu will only create one Wave-2 element, and it will import the data from the Wave-2 that is furthest down on the list (i.e. the type will be Double overhead, not Head high).

If you're using IDs on your elements sheet, you need to use those IDs in the From and To columns on your connections spreadsheet, instead of using Labels.

Connections

If you want to create multiple connections between the same elements, or want to update existing connections with future imports, make sure to add IDs for each connection:

IDFromTo

Connection-1

Oahu

Waimea Bay

Connection-2

Oahu

Waimea Bay

In a future import for this example, we could use the ID Connection-2 to update the existing connection with any new data, including a new From or To value:

IDFromToType

Connection-2

Oahu

Sunset Bay

Location

This import would update the map without merging your connections, or creating a new, duplicate connection.

If you're using an ID column on your connections sheet, you don't necessarily have to add IDs to your elements sheet. However, if you are using IDs on your elements sheet, you need to use those IDs in the From and To columns on your connections spreadsheet, instead of using Labels.

Good to know:

  • You can't update map data when clustering is turned on. Make sure to turn off all clustering options before importing any data into Kumu.

  • All of the same import rules apply when you are importing a JSON blueprint

  • If you use underscores _ or periods . in your IDs, you won't be able to select elements from the search results in your map

  • To send somebody a link directly to an element, connection, or loop, you can follow this pattern: https://kumu.io/YourUsername/ProjectName#MapName/ViewName/ID

  • To remove duplicates, check out these steps

Alternatively...delete all data to avoid duplicates

We get it—sometimes, you just don't want to think through the complexity of Kumu's import rules; all you want is a nice, clean map with no duplicates. In that case, your best option might be to delete all the data from your project and start over with a fresh import.

Happy importing!

Last updated