most efficient way in GDscript to check an array for duplicates?

:information_source: Attention Topic was automatically imported from the old Question2Answer platform.
:bust_in_silhouette: Asked By independentCog
:warning: Old Version Published before Godot 3 was released.

Are there any built in functions for doing this? If not what is the best method in this language? I know python has a number of one liners that don’t even require for loops, just curious if this is possible in GDScript as well.

Do you want to know if there are duplicates or do you want all indexes that are duplicates?

Zylann | 2016-06-02 18:11

just to see if the array that had values appended into it beforehand contains duplicates and than if it does it will change a bool that will decide whether a later if statement does something or not. In a nutshell what I’m trying to do.

independentCog | 2016-06-02 18:30

:bust_in_silhouette: Reply From: Zylann

This function will return which indexes of the array are containing duplicates.

func get_duplicates(a):
	if a.size() < 2:
		return []
	
	var seen = {}
	seen[a[0]] = true
	var duplicate_indexes = []
	
	for i in range(1, a.size()):
		var v = a[i]
		if seen.has(v):
			# Duplicate!
			duplicate_indexes.append(i)
		else:
			seen[v] = true
	
	return duplicate_indexes

There are faster ways without hashing on small arrays, but I don’t think they make a difference in GDScript.

The array itself is keyed into a dictionary, wasn’t sure how using .has on that would affect it. I’ll have to play with this method some to see what else I can do with it. Thanks =D

independentCog | 2016-06-02 18:50

func check_duplicates(a):
	var seen = {}
	seen[a[0]] = true
	
	for i in range(1, a.size()):
		print(a[i])
		var v = a[i]
		if seen.has(v):
			bool_trigger = false
		else:
			seen[v] = true

I just modified it like this to do what I needed. Works like a charm. Your awesome!

independentCog | 2016-06-02 19:13

:bust_in_silhouette: Reply From: Warlaan

I’d say it depends on the size of the array. For small arrays (less than ~10) it’s probably fastest to just loop as many times as there are entries as the overhead of creating a dictionary will be bigger than the benefit.
For large collections on the other hand I would do it the way it’s usually done in lua: just use a dictionary with empty values. In other words do what Zylann recommended but don’t rebuild the “seen”-dictionary every time.

The array I’m accumulating only has 5 elements before being cleared at the end of state. Would it be possible for you too share a working example of what I would use for that? I’ve been looking through the reference in the code editor and it seems like nocasecmp_to() or casecmp_to() were made for this situation, is that what you would use?
This is something I’ve been playing with this morning, but haven’t quite gotten to work. Admittedly its most likely worse than Zylann’s examples but I’m trying to learn so thats expected.

The idea I had is to compare an array of strings against itself if the index’s match skip it so that it doesn’t count it as a duplicate otherwise keep going, if duplicate found than change a bool to signify that the array has one so that a further condition can react accordingly.

I tried using find() to reference the index except that say in test_arr = [“test0”,“test1”,“test2”,“test2”] it will only return index 2 and ignore index 3 completely because they share the same value. You would think that it would return index 2 and 3 because they both have the value set in its parameter. Regardless that thought was a bust.

independentCog | 2016-06-03 11:47

Any help with this would be appreciated =)

independentCog | 2016-06-03 15:39

Are you checking for duplicates only when you add an element to your array? If yes, you can just do one loop before inserting it. Can you explain what you want to achieve in your game?

Zylann | 2016-06-04 10:51

Sorry, I didn’t have time to answer sooner. I was aiming at code like the one Zylann suggested in his second answer. The reason why this should be faster than using a dictionary is that a small array will definitely fit into the CPU’s cache, so accessing the elements of the array will be very fast. If checking for equality is cheap then up to 10 checks with cheap memory accesses are most likely cheaper than reserving memory for a dictionary, hashing the values upon inserting and again upon lookup, finding the value in the hash table etc.

This does however depend on the two key elements mentioned above: the array needs to fit into the CPU’s cache and checks for equality need to be cheap. Those two conditions should be fulfilled if you use an IntArray or a RealArray. If you use a StringArray or just a plain Array on the other hand I wouldn’t expect it to work as well, because both Strings and Variants usually don’t store the data they contain directly but instead store a pointer to the actual memory. That’s necessary because a string object cannot know how long the string will be that you intend to store in it, so it wouldn’t be wise to allocate some made up amount of memory which may be a waste or may still be insufficient.

Long story short: try to avoid using strings if this is performance critical. You should be able to use integer IDs instead which would probably make this a lot faster.

Warlaan | 2016-06-04 18:40

Is CPU cache optimization really relevant with a GDScript implementation?

Zylann | 2016-06-04 18:47

I haven’t read the underlying code yet, so I can’t tell for sure, but my impression is that gdscript is a rather light wrapper around C++ calls, so unlike in script languages like Lua or C# I wouldn’t expect it to have it’s own memory stack / virtual machine. If that is correct then cache considerations should apply to gdscript just like they do to C++.

But that’s why I added “if this is performance critical”. You are probably right and the code we are talking about is run maybe once every few frames, so it doesn’t make any difference how well it performs.

Warlaan | 2016-06-04 18:52

The reason I’m trying to have this duplicate check as optimal as possible is that it is validation against inputs for a combo system where no two same inputs should be entered while that part of the statemachine is running. That loop will be called pretty often and so I didn’t want to risk incurring input lag. Probably a bit pretentious because I’ve yet to see it function with the full game environment but I’d rather have a better safe than sorry approach than have to optimize it anyway. I originally was experimenting with find() when checking the array to do something like:

if arr.find(arr[i]) == arr.find(arr[j]):
    continue

to skip matching indexes until I realized that its essentially the same as arr.index from python that has a first come first serve policy with duplicates so that gave some strange results =P. The code from the answer below doesn’t pass syntax (gave a symbol + in wrong place error essentially), I looked into the range() api and did this which I believe is comparable for the syntax range() requires to put the initial of j from 0 to 1:

for i in range(0, arr.size()):
	print("1st loop")
	print(arr[i])
	for j in range(1, arr.size()):
		print(arr[j])			
		print("2nd loop")
		if arr[j] == arr[i]:
			print("duplicate")

This didn’t work either though. On a unique array it still outputted duplicate. So I came up with this which does work and minimizes the amount loops to find a dupe.

for i in range(arr.size()):
	print("1st loop")
	print(arr[i])
	if is_dupe == true:
		break
	for j in range(1, arr.size()):
		print(arr[j])			
		print("2nd loop")
		if arr[j] == arr[i] && i != j:
			is_dupe = true
			print("duplicate")
			print("should break here")
            break 

I also took your suggestion Warlaan and switched the array from strings to ints because yeah that just seems more efficient. Come to think of it the only reason I had it as strings was because originally(before making two separate arrays) I had two arrays inside of a dictionary and I did the strings as a type of description. Didn’t make sense though it’s only getting processed for validation and I’m not ever outputting the appended strings before being cleared so data wise they serve no real purpose as strings. Trying to use find instead of just referencing i and j was a classic example of me over-complicating things. I feel like this is optimal for a small array, if statements are about as efficient as it gets process wise I belive, but if something sleeker can be found by all means let me know. I’m always looking to optimize my code in anyway possible. Now onto my other post to hash that out. Thank you both for the help on this.

independentCog | 2016-06-04 22:00

The outer loop should go up to arr.size()-1 and the inner one from i+1 to arr.size(). Also make sure you explicitly use the IntArray class, just switching to ints will probably not do the trick. Last but not least consider following Zylann’s advice and benchmark a C++ implementation as well. I woulg expect it to be only slightly faster, but performance is hard to predict.

Warlaan | 2016-06-05 02:02

Just curious. How reliable are the performance readings in Godot have you had a chance to test them against other analytics?

independentCog | 2016-06-05 02:21

Not sure if I implemented this correctly, I’ve never used integer arrays before so excuse my newbness.

func check_duplicates(a):
	var is_dupe = false
	for i in range(0, IntArray(a).size(), -1):
		if is_dupe == true:
			break
		for j in range(IntArray(a).size()):
			if IntArray(a)[j] == IntArray(a)[i] && i != j:
				is_dupe = true
				spell_cast_success = false
				print("duplicate")
				break

I’m pretty sure I made the changes you recommended but now it doesn’t recognize duplicates. A code example of what you mean would be helpful.

independentCog | 2016-06-05 02:51

So much for guessing performance. :slight_smile:
I did a quick test where I generated 5 numbers between 0 and 7 and added them to a container that checked for duplicates. That test ran in a loop 10.000 times for each of four containers: one that uses a dictionary to check for duplicates, one that uses the GDScript Array class, one that uses the IntArray class and one that uses a raw int-Array in C++ code.

These are the results:

** Debug Process Started **
Native code needed 44.984001ms.
IntArray code needed 262.967987ms.
Array code needed 165.378006ms.
Dictionary code needed 140.057007ms.
** Debug Process Stopped **

I am mostly surprised that the IntArray did not lead to better performance, but now that I checked the documentation it says that it’s mostly meant to prevent fragmentation with large arrays, so at least it’s not meant to be any faster for small arrays. I’ll have to check the source to understand what it really does.

In any way the lesson here is that a native implementation is the fastest solution (not due to GDScript overhead since the GDScript interface of all these classes is the same, so the number of calls from GDScript to C++ should be identical) and in your case it won’t matter which solution you use since if you run the test only once (i.e. add 5 items) the slowest solution will still need no more than 0.05 milliseconds (= 1/400th of a frame at 50fps).

FYI this is the test code:
(NativeSet and PerformanceTimer are C++ classes written by me, so don’t google for them :wink: )

extends Node

var tries = 1
var size = 5
var items = 8

func _ready():
	var timer = PerformanceTimer.new()
	
	testNative()
	print("Native code needed " + str(timer.get_elapsed_ms()) + "ms.")
	
	timer.restart()
	testIntArray()
	print("IntArray code needed " + str(timer.get_elapsed_ms()) + "ms.")
	
	timer.restart()
	testArray()
	print("Array code needed " + str(timer.get_elapsed_ms()) + "ms.")
	
	timer.restart()
	testDictionary()
	print("Dictionary code needed " + str(timer.get_elapsed_ms()) + "ms.")


func testNative():
	var set = NativeSet.new()
	set.setup(size)
	test(set)

func testIntArray():
	var set = IntArraySet.new()
	test(set)

func testArray():
	var set = ArraySet.new()
	test(set)

func testDictionary():
	var set = DictionarySet.new()
	test(set)

func test(var set):
	for i in range(tries):
		for j in range(size):
			var newItem = randi() % items
			set.add(newItem)
		set.clear()

class IntArraySet:
	var array = IntArray([])
	
	func add(value):
		if (contains(value)):
			return false
		array.push_back(value)
		return true
	
	func contains(value):
		for val in array:
			if (val == value):
				return true
		return false
	
	func clear():
		array.resize(0)

class ArraySet:
	var array = []
	
	func add(value):
		if (contains(value)):
			return false
		array.push_back(value)
		return true
	
	func contains(value):
		for val in array:
			if (val == value):
				return true
		return false
	
	func clear():
		array.clear()

class DictionarySet:
	var array = {}
	
	func add(value):
		if (contains(value)):
			return false
		array[value] = 1
		return true
	
	func contains(value):
		return array.has(value)
	
	func clear():
		array.clear()

Warlaan | 2016-06-05 09:05

It also gives a good example I believe of the OOP you were talking about in the other post. It is interesting how proportionally faster the native c++ is compared to gdscripts array class. It makes more sense now why its recommended for more intense things. I also didn’t think to look into the timer class that counts ms. I’ll have to keep that in mind for further tests as well. So many different options for each node in this engine. Overall great info.

independentCog | 2016-06-05 15:17

The timer class? Do you mean PerformanceTimer? That’s a very simple class that I wrote myself. If you like I can give you the code, you’ll just have to place it in the modules folder and compile Godot yourself (which thanks to SCons and very few dependencies is really easy to do).

Warlaan | 2016-06-05 20:14

Timer — Godot Engine (latest) documentation in English
Nope I meant this timer class. =P I haven’t played with it enough yet. I need to.
But yeah I’d definitely be interested in seeing how it works and I’ve been wondering on the process of adding my own classes and nodes into Godot. The fact that I can do that with a game engine is too amazing to pass up.
EDIT: nm lol I just looked at the reference and realized the ms feature is from your class and not a part of godots timers. Just thought it would have something like that for some reason. I must of read the top part of your code here too fast and missed you were refering to performanceTimer with timer Oops.

independentCog | 2016-06-06 00:36

:bust_in_silhouette: Reply From: Zylann

This is another version for smaller arrays:

func find_duplicates_small(a):
	var duplicates = []
	for i in range(0, a.size()):
		for j in range(j+1, a.size()):
			if a[j] == a[i]:
				duplicates.append(j)
	return duplicates

The idea is to loop a second time for each element, but only starting from the next so we can discard b == a because that’s the same as a == b.

As for my first version, it will return the index of duplicates, but not the first occurences of the value. So for example, if you have [“a”, “b”, “c”, “b”], the function will return [3]. If you want both you can adapt the function so it also inserts i the first time a duplicate is found. Or, if order doesn’t matters, you can check the size of duplicates after the inner loop, then if it changed insert i.

Again, you can tweak it the way you like. You know your problem, so you could make this algorithm more efficient by adapting it for you need :slight_smile:

In godot there is a easier 1 line method to check the number of times an element is repeated in an array. The below method can be used to avoid iterating the whole length of the array.
Try something like
var count=array_name.count(element_to_be_found)
if count>0:
print(“Element is present multiple times”)

ShashankLReddy | 2020-01-17 15:17

if ($“/root/Profile”.listeJoueurs).has(leJoueur):
OS.alert(“Le nom du nouveau joueur est déjà utilisé.”)
in godot documentation go to => has(value: Variant)
in my list of users, I do verify duplication to avoid a name repition
Michel

mabelair88 | 2021-01-05 21:17

In this solution, we return the array without for duplicates.

static func delete_duplicates(array: Array):
	var result = []
	for i in range(array.size()):
		var duplicated = false
		for j in range(i+1, array.size()):
			if array[i] == array[j]:
				duplicated = true
				break
		if not duplicated:
			result += [array[i]]
	return result

luislodosm | 2021-05-30 14:53

:bust_in_silhouette: Reply From: kristapsf
var unique_numbers = []
for n in all_numbers:
    if not n in unique_numbers:
        unique_numbers.append(n)