Leider ist diese Job-Anzeige nicht mehr aktiv

Interactive Labeling for ML-based Structural Formula Extraction in Karlsruhe

If you are interested answer the questions and send your CV, and a current transcript of records. Feel free to reach out beforehand if you have any questions.

4 - 40 Stunden pro Woche

Jobbeschreibung

Subject

Joint Master’s thesis offered by IAR/SZS (research group Prof. Stiefelhagen, CV:HCI) and IISM (research group Prof. Mädche, ISSD) for both computer science and information systems students. Open for applications. WHO CAN APPLY? Only enrolled students from KIT (Karlsruher Institut für Technologie) with course of studies Wirtschaftsinformatik, Wirtschaftsingenieurwesen, Informationswirtschaft, or Technische Volkswirtschaftslehre.

Problem

Scientific publications, lecture slides, and other documents convey their information not only in plain text, but also in figures and images. This makes documents less accessible for humans and machines alike. Automated metadata extraction, full text search, or information aggregation is impacted by this. Less obvious, but potentially even more important, human accessibility is also hindered. Figures are often entirely incomprehensible for visually impaired users, but also people less accustomed with the domain could benefit from support. This fact limits access to e.g. graphical representations of structural formulas for the visually impaired. However, these graphics are often a crucial part of lecure slides or scientific publications on the topic.

Goals

The goal of this Master’s thesis is to design, develop and evaluate an interactive labeling system to support the accessibility of figures. Thereby interactive labeling refers to a human-machine cooperative approach, which combines automatic with manual steps. Structural formulas from the field of chemistry offer themselves as a context of application for this system, as they are frequently used and standards have already been well established. We envision a semi-automated approach, in which user input is supported by the machine. Well structured tasks like these suit themselves well to be supported by machine learning models. As a user is always involved, the model does not need to achieve near-perfect accuracy scores, but rather should support the users with suggestions. Allowing the model to improve with new user input would be a bonus. In a first step we expect the student to identify the state of the art such systems, and identify components that could be re-used or adapted to this context. Afterwards the solution should be developed. A full-fledged evaluation of the system is expected as well. The typical workflow for the system should look like the following:

  • Import a PDF document into the system.
  • The system suggests areas in which figures chemical formulas could be found.
  • Correct the systems suggestions.
  • Crop out all marked areas to obtain indidual figures.
  • For each figure create
    • a chemfig representation of the figure (e.g. “\chemfig{*6(=-=-=-)}”),
    • a non-informative textual description of the figure (e.g. “a hexagon where three edges are double lines”)
    • and an interpretation of the figure (e.g. “Benzene”).
  • The system supports the user in creation of above representations with automatically generated suggestions. Hereby a classifier from automatically generated training data that translates images to chemfig should be trained.
  • Export an accessible EPUB v3 where the original figure is augmented with above data as alternative versions.
  • Export a version of the figure for use on a braille printer (Open Document Graphic format).

Requirements

We expect the student to be familiar with web development. The system should be devloped with a modern web application frontend framework (e.g. Angular, React, or Vue) and a JavaScript or Python backend.

Contact

If you are interested in this topic and want to apply for this thesis, please apply via Campusjäger.

Our job offer Interactive Labeling for ML-based Structural Formula Extraction sounds interesting? Then we are looking forward to receiving your application via Campusjäger by Workwise.

With our partner Campusjäger, you can apply for this job in just a few minutes without a cover letter and track the status of your application live.

Extra Informationen

Status
Inaktiv
Ausbildungsniveau
Hochschule/Universität
Standort
Karlsruhe
Arbeitsstunden pro Woche
4 - 40
Jobart
Praktikum
Tätigkeitsbereich
IT / Software-Entwicklung / Programmierung
Führerschein erforderlich?
Nein
Auto erforderlich?
Nein
Motivationsschreiben erforderlich?
Nein
Sprachkenntnisse
Deutsch

Karlsruhe | IT Stellenangebote | Software-Entwickler Stellenangebote | Praktikum | Hochschule/Universität