Web Scraping with Godot

:information_source: Attention Topic was automatically imported from the old Question2Answer platform.
:bust_in_silhouette: Asked By ramazan

I want to do Web Scraping with Godot.
As an example I found this

How do I get only the “text” part of the output I want to do. (“User ramazan”)

My code

var data : PoolStringArray
var url := "https://godotengine.org/qa/user/ramazan"

func _ready():
   $HTTPRequest.request(url)
pass
func _on_HTTPRequest_request_completed(_result, _response_code, _headers, body):
  var response = body.get_string_from_utf8()
  if 'godotengine' in url:
	  data = response.split('<div class="page-title">')
	  var price = str(data[1]).split('div')
	  price = str(price[0])
	  print(price)
pass

output =

<h1> - erase
User ramazan  - I only want this
</h1> -  erase
</       - erase
:bust_in_silhouette: Reply From: kelaia

Godot has a XMLParser class that you can use to parse HTML, is valid to remember that HTML and XML has some subtle differences, but for most scenarios you should be able to achieve the result you expect, I really don’t recommend you to manually split/find/regex the text.

func _ready():
    $HTTPRequest.connect("request_completed", self, "_on_request_completed")
    $HTTPRequest.request("https://godotengine.org/qa/user/ramazan")

func _on_request_completed(_result, _response_code, _headers, _body):
    var parser: XMLParser = XMLParser.new()
    parser.open_buffer(_body)

    while parser.read() != ERR_FILE_EOF:
        if parser.get_node_name() == "form" and parser.has_attribute("method") and parser.has_attribute("action"):
            if parser.get_attribute_value(1).find('../user/') == 0:
                print(parser.get_attribute_value(1)) # ../user/ramazan

Although you can get the same text just parsing the url.

thank you very much. I tried to do it with Python, but I learned everything in Godot.

ramazan | 2022-02-15 13:46

1 Like