Class ScraperIngest

java.lang.Object
org.localify.data.ingest.ScraperIngest

@Service public class ScraperIngest extends Object
Service for ingesting scraped data from various sources. This class handles the processing of artist, event, city, and venue data, and persists it to the database. It also dispatches requests for more data to be scraped.
  • Constructor Details

    • ScraperIngest

      public ScraperIngest(org.springframework.jdbc.core.JdbcTemplate jdbcTemplate, BandsInTownDispatch bandsInTownDispatch, PollstarDispatch pollstarDispatch, ArtistRepository artistRepository, GenreRepository genreRepository, EventRepository eventRepository, VenueRepository venueRepository, CityRepository cityRepository, ArtistEventRepository artistEventRepository, ArtistCityRepository artistCityRepository, ArtistGenreRepository artistGenreRepository, UserFavoriteEventRepository userFavoriteEventRepository, UserRecentEventViewRepository userRecentEventViewRepository, CityService cityService)
      Constructs a new ScraperIngest service.
      Parameters:
      jdbcTemplate - The JDBC template for database operations.
      bandsInTownDispatch - The dispatch service for BandsInTown.
      pollstarDispatch - The dispatch service for Pollstar.
      artistRepository - The repository for artist data.
      genreRepository - The repository for genre data.
      eventRepository - The repository for event data.
      venueRepository - The repository for venue data.
      cityRepository - The repository for city data.
      artistEventRepository - The repository for artist-event relationships.
      artistCityRepository - The repository for artist-city relationships.
      artistGenreRepository - The repository for artist-genre relationships.
      userFavoriteEventRepository - The repository for user favorite events.
      userRecentEventViewRepository - The repository for user recent event views.
      cityService - The service for city-related operations.
  • Method Details

    • processStartTime

      public Instant processStartTime(EventResponse resp)
      Processes the start time of an event from an event response. For Pollstar data, it converts the local time to an Instant using the venue's city's timezone. For other data sources, it parses the start time string directly.
      Parameters:
      resp - The event response containing the start time and other event details.
      Returns:
      The start time as an Instant, or null if it cannot be determined.
    • processArtistResp

      @Transactional(isolation=READ_COMMITTED) public void processArtistResp(ArtistResponse resp)
      Processes an artist response from a scraper. This method updates an artist's information, including their music service IDs, genres, and origin city. It also handles dispatching the artist to other scrapers for more information.
      Parameters:
      resp - The artist response to process.
    • getOrCreateGenreByName

      public Genre getOrCreateGenreByName(String name, DataSource dataSource)
      Retrieves a genre by name, or creates a new one if it doesn't exist.
      Parameters:
      name - The name of the genre.
      dataSource - The data source that provided the genre.
      Returns:
      The existing or newly created Genre object.
    • dispatchBitNotScrapedArtists

      public void dispatchBitNotScrapedArtists(int numDispatch)
      Dispatches artists that have not yet been scraped by BandsInTown.
      Parameters:
      numDispatch - The maximum number of artists to dispatch.
    • dispatchBitArtists

      public void dispatchBitArtists(int numDispatch)
      Dispatches artists to be updated by the BandsInTown scraper. This method selects artists that have not been updated in the last 7 days.
      Parameters:
      numDispatch - The maximum number of artists to dispatch.
    • getArtistsForEvent

      public Set<Artist> getArtistsForEvent(EventResponse resp)
      Retrieves the set of artists associated with an event. The artists are identified using localify IDs, data source specific IDs, and candidate names from the event response.
      Parameters:
      resp - The event response containing artist information.
      Returns:
      A set of Artist objects associated with the event.
    • getOrCreateVenue

      public Venue getOrCreateVenue(VenueResponse venueResp) throws BadScraperDataException
      Retrieves an existing venue from the database, or creates a new one if it doesn't exist. It can also throw an exception if the venue cannot be created.
      Parameters:
      venueResp - The scraper data transfer object for the venue.
      Returns:
      The Venue object.
      Throws:
      BadScraperDataException - if the venue cannot be found or created based on the provided data.
    • createOrUpdateEvent

      public void createOrUpdateEvent(EventResponse resp)
      Creates or updates an event based on the provided event response.
      Parameters:
      resp - The event response containing the event details.
    • searchForEventByArtistsAndVenueAndDate

      public Event searchForEventByArtistsAndVenueAndDate(String eventName, Set<Artist> artists, Venue venue, Instant startTime) throws BadDatabaseStateException
      Searches for an event by artists, venue, and a time window. This method is used to find duplicate events.
      Parameters:
      eventName - The name of the event.
      artists - The set of artists performing at the event.
      venue - The venue of the event.
      startTime - The start time of the event.
      Returns:
      The found Event, or null if no event is found.
      Throws:
      BadDatabaseStateException - if multiple matching events are found.
    • updateEvent

      @Transactional public void updateEvent(EventResponse resp, Event event, Set<Artist> artists)
      Updates an existing event with information from an event response. This includes updating the start time, door time, end time, price, and ticket URL. It also updates the timestamp for the data source that provided the update.
      Parameters:
      resp - The event response containing the new information.
      event - The event to be updated.
      artists - The set of artists performing at the event.
    • updateEventArtists

      @Transactional public void updateEventArtists(Event event, Set<Artist> artists)
      Updates the artists associated with an event. It adds new artists to the event and ensures that existing artist associations are maintained.
      Parameters:
      event - The event to update.
      artists - The set of artists to associate with the event.
    • findOrCreateCity

      public City findOrCreateCity(String locationText) throws BadScraperDataException
      Finds or creates a city based on a location text string. This method uses geocoding to find the city's details and then searches for it in the database. If the city is not found, a new one is created.
      Parameters:
      locationText - The text describing the location of the city.
      Returns:
      The found or created City object.
      Throws:
      BadScraperDataException - if the city cannot be geocoded or found.
    • processEvent

      public void processEvent(EventResponse resp)
      Processes an event response from a scraper. This method handles the creation or update of an event, its venue, and the associated artists. It also checks for duplicate events to avoid creating redundant data.
      Parameters:
      resp - The event response to process.
    • createEvent

      @Transactional public Event createEvent(EventResponse resp, Venue venue, String topArtistName)
      Creates a new event.
      Parameters:
      resp - The event response containing the event details.
      venue - The venue for the event.
      topArtistName - The name of the top artist, used as a fallback for the event name.
      Returns:
      The newly created Event object, or null if the start time is not parseable.
    • dispatchVenueToScrapers

      @Transactional public void dispatchVenueToScrapers(VenueResponse venueResponse, Venue venue) throws BadScraperDataException
      Dispatches a venue to the appropriate scrapers to fetch more data. It only dispatches if the venue has not been updated recently.
      Parameters:
      venueResponse - The venue response from the initial scrape.
      venue - The venue object to be dispatched.
      Throws:
      BadScraperDataException - if there is an issue with the scraper data.
    • checkForDuplicateEvents

      public Event checkForDuplicateEvents(EventResponse resp, Set<Artist> artists, Venue venue, Event event)
      Checks for duplicate events in the database. If a duplicate is found, it merges the events.
      Parameters:
      resp - The event response being processed.
      artists - The set of artists for the event.
      venue - The venue for the event.
      event - The current event object (can be null if it's a new event).
      Returns:
      The original event, a merged event, or null if a duplicate is found and the event should be skipped.
    • mergeEventRelationships

      public Event mergeEventRelationships(Event event1, Event event2)
      Merges the relationships of two events. This is used when a duplicate event is found. The user-related data from the second event is merged into the first, and the second event is deleted.
      Parameters:
      event1 - The event to merge into.
      event2 - The event to merge from and then delete.
      Returns:
      The merged event (event1).
    • mergeUserFavoriteEvents

      @Transactional public void mergeUserFavoriteEvents(Event event1, Event event2)
      Merges the user favorite events from one event to another.
      Parameters:
      event1 - The event to merge favorite associations into.
      event2 - The event to merge favorite associations from.
    • mergeUserRecentEventViews

      @Transactional public void mergeUserRecentEventViews(Event event1, Event event2)
      Merges the user recent event views from one event to another.
      Parameters:
      event1 - The event to merge recent view associations into.
      event2 - The event to merge recent view associations from.
    • deleteEventAndEventArtists

      @Transactional(propagation=REQUIRES_NEW) public void deleteEventAndEventArtists(Event event)
      Deletes an event and all of its associated artist-event relationships. This is used when merging duplicate events.
      Parameters:
      event - The event to delete.
    • processCity

      @Transactional public void processCity(CityResponse resp)
      Processes a city response from a scraper. This method updates the last-scraped timestamp for the city from the given data source.
      Parameters:
      resp - The city response to process.
    • processRymJsonArtist

      @Transactional public void processRymJsonArtist(RymJsonArtist rymArtist)
      Processes an artist's data from a Rate Your Music (RYM) JSON object. This method adds origin cities and genres to the artist based on the RYM data.
      Parameters:
      rymArtist - The RYM JSON artist data.