Package org.localify.data.ingest
Class ScraperIngest
java.lang.Object
org.localify.data.ingest.ScraperIngest
Service for ingesting scraped data from various sources.
This class handles the processing of artist, event, city, and venue data,
and persists it to the database. It also dispatches requests for more data
to be scraped.
-
Constructor Summary
ConstructorsConstructorDescriptionScraperIngest(org.springframework.jdbc.core.JdbcTemplate jdbcTemplate, BandsInTownDispatch bandsInTownDispatch, PollstarDispatch pollstarDispatch, ArtistRepository artistRepository, GenreRepository genreRepository, EventRepository eventRepository, VenueRepository venueRepository, CityRepository cityRepository, ArtistEventRepository artistEventRepository, ArtistCityRepository artistCityRepository, ArtistGenreRepository artistGenreRepository, UserFavoriteEventRepository userFavoriteEventRepository, UserRecentEventViewRepository userRecentEventViewRepository, CityService cityService) Constructs a new ScraperIngest service. -
Method Summary
Modifier and TypeMethodDescriptioncheckForDuplicateEvents(EventResponse resp, Set<Artist> artists, Venue venue, Event event) Checks for duplicate events in the database.createEvent(EventResponse resp, Venue venue, String topArtistName) Creates a new event.voidCreates or updates an event based on the provided event response.voiddeleteEventAndEventArtists(Event event) Deletes an event and all of its associated artist-event relationships.voiddispatchBitArtists(int numDispatch) Dispatches artists to be updated by the BandsInTown scraper.voiddispatchBitNotScrapedArtists(int numDispatch) Dispatches artists that have not yet been scraped by BandsInTown.voiddispatchVenueToScrapers(VenueResponse venueResponse, Venue venue) Dispatches a venue to the appropriate scrapers to fetch more data.findOrCreateCity(String locationText) Finds or creates a city based on a location text string.Retrieves the set of artists associated with an event.getOrCreateGenreByName(String name, DataSource dataSource) Retrieves a genre by name, or creates a new one if it doesn't exist.getOrCreateVenue(VenueResponse venueResp) Retrieves an existing venue from the database, or creates a new one if it doesn't exist.mergeEventRelationships(Event event1, Event event2) Merges the relationships of two events.voidmergeUserFavoriteEvents(Event event1, Event event2) Merges the user favorite events from one event to another.voidmergeUserRecentEventViews(Event event1, Event event2) Merges the user recent event views from one event to another.voidProcesses an artist response from a scraper.voidprocessCity(CityResponse resp) Processes a city response from a scraper.voidprocessEvent(EventResponse resp) Processes an event response from a scraper.voidprocessRymJsonArtist(RymJsonArtist rymArtist) Processes an artist's data from a Rate Your Music (RYM) JSON object.Processes the start time of an event from an event response.searchForEventByArtistsAndVenueAndDate(String eventName, Set<Artist> artists, Venue venue, Instant startTime) Searches for an event by artists, venue, and a time window.voidupdateEvent(EventResponse resp, Event event, Set<Artist> artists) Updates an existing event with information from an event response.voidupdateEventArtists(Event event, Set<Artist> artists) Updates the artists associated with an event.
-
Constructor Details
-
ScraperIngest
public ScraperIngest(org.springframework.jdbc.core.JdbcTemplate jdbcTemplate, BandsInTownDispatch bandsInTownDispatch, PollstarDispatch pollstarDispatch, ArtistRepository artistRepository, GenreRepository genreRepository, EventRepository eventRepository, VenueRepository venueRepository, CityRepository cityRepository, ArtistEventRepository artistEventRepository, ArtistCityRepository artistCityRepository, ArtistGenreRepository artistGenreRepository, UserFavoriteEventRepository userFavoriteEventRepository, UserRecentEventViewRepository userRecentEventViewRepository, CityService cityService) Constructs a new ScraperIngest service.- Parameters:
jdbcTemplate- The JDBC template for database operations.bandsInTownDispatch- The dispatch service for BandsInTown.pollstarDispatch- The dispatch service for Pollstar.artistRepository- The repository for artist data.genreRepository- The repository for genre data.eventRepository- The repository for event data.venueRepository- The repository for venue data.cityRepository- The repository for city data.artistEventRepository- The repository for artist-event relationships.artistCityRepository- The repository for artist-city relationships.artistGenreRepository- The repository for artist-genre relationships.userFavoriteEventRepository- The repository for user favorite events.userRecentEventViewRepository- The repository for user recent event views.cityService- The service for city-related operations.
-
-
Method Details
-
processStartTime
Processes the start time of an event from an event response. For Pollstar data, it converts the local time to an Instant using the venue's city's timezone. For other data sources, it parses the start time string directly.- Parameters:
resp- The event response containing the start time and other event details.- Returns:
- The start time as an Instant, or null if it cannot be determined.
-
processArtistResp
Processes an artist response from a scraper. This method updates an artist's information, including their music service IDs, genres, and origin city. It also handles dispatching the artist to other scrapers for more information.- Parameters:
resp- The artist response to process.
-
getOrCreateGenreByName
Retrieves a genre by name, or creates a new one if it doesn't exist.- Parameters:
name- The name of the genre.dataSource- The data source that provided the genre.- Returns:
- The existing or newly created Genre object.
-
dispatchBitNotScrapedArtists
public void dispatchBitNotScrapedArtists(int numDispatch) Dispatches artists that have not yet been scraped by BandsInTown.- Parameters:
numDispatch- The maximum number of artists to dispatch.
-
dispatchBitArtists
public void dispatchBitArtists(int numDispatch) Dispatches artists to be updated by the BandsInTown scraper. This method selects artists that have not been updated in the last 7 days.- Parameters:
numDispatch- The maximum number of artists to dispatch.
-
getArtistsForEvent
Retrieves the set of artists associated with an event. The artists are identified using localify IDs, data source specific IDs, and candidate names from the event response.- Parameters:
resp- The event response containing artist information.- Returns:
- A set of Artist objects associated with the event.
-
getOrCreateVenue
Retrieves an existing venue from the database, or creates a new one if it doesn't exist. It can also throw an exception if the venue cannot be created.- Parameters:
venueResp- The scraper data transfer object for the venue.- Returns:
- The Venue object.
- Throws:
BadScraperDataException- if the venue cannot be found or created based on the provided data.
-
createOrUpdateEvent
Creates or updates an event based on the provided event response.- Parameters:
resp- The event response containing the event details.
-
searchForEventByArtistsAndVenueAndDate
public Event searchForEventByArtistsAndVenueAndDate(String eventName, Set<Artist> artists, Venue venue, Instant startTime) throws BadDatabaseStateException Searches for an event by artists, venue, and a time window. This method is used to find duplicate events.- Parameters:
eventName- The name of the event.artists- The set of artists performing at the event.venue- The venue of the event.startTime- The start time of the event.- Returns:
- The found Event, or null if no event is found.
- Throws:
BadDatabaseStateException- if multiple matching events are found.
-
updateEvent
Updates an existing event with information from an event response. This includes updating the start time, door time, end time, price, and ticket URL. It also updates the timestamp for the data source that provided the update.- Parameters:
resp- The event response containing the new information.event- The event to be updated.artists- The set of artists performing at the event.
-
updateEventArtists
Updates the artists associated with an event. It adds new artists to the event and ensures that existing artist associations are maintained.- Parameters:
event- The event to update.artists- The set of artists to associate with the event.
-
findOrCreateCity
Finds or creates a city based on a location text string. This method uses geocoding to find the city's details and then searches for it in the database. If the city is not found, a new one is created.- Parameters:
locationText- The text describing the location of the city.- Returns:
- The found or created City object.
- Throws:
BadScraperDataException- if the city cannot be geocoded or found.
-
processEvent
Processes an event response from a scraper. This method handles the creation or update of an event, its venue, and the associated artists. It also checks for duplicate events to avoid creating redundant data.- Parameters:
resp- The event response to process.
-
createEvent
Creates a new event.- Parameters:
resp- The event response containing the event details.venue- The venue for the event.topArtistName- The name of the top artist, used as a fallback for the event name.- Returns:
- The newly created Event object, or null if the start time is not parseable.
-
dispatchVenueToScrapers
@Transactional public void dispatchVenueToScrapers(VenueResponse venueResponse, Venue venue) throws BadScraperDataException Dispatches a venue to the appropriate scrapers to fetch more data. It only dispatches if the venue has not been updated recently.- Parameters:
venueResponse- The venue response from the initial scrape.venue- The venue object to be dispatched.- Throws:
BadScraperDataException- if there is an issue with the scraper data.
-
checkForDuplicateEvents
public Event checkForDuplicateEvents(EventResponse resp, Set<Artist> artists, Venue venue, Event event) Checks for duplicate events in the database. If a duplicate is found, it merges the events.- Parameters:
resp- The event response being processed.artists- The set of artists for the event.venue- The venue for the event.event- The current event object (can be null if it's a new event).- Returns:
- The original event, a merged event, or null if a duplicate is found and the event should be skipped.
-
mergeEventRelationships
Merges the relationships of two events. This is used when a duplicate event is found. The user-related data from the second event is merged into the first, and the second event is deleted.- Parameters:
event1- The event to merge into.event2- The event to merge from and then delete.- Returns:
- The merged event (event1).
-
mergeUserFavoriteEvents
Merges the user favorite events from one event to another.- Parameters:
event1- The event to merge favorite associations into.event2- The event to merge favorite associations from.
-
mergeUserRecentEventViews
Merges the user recent event views from one event to another.- Parameters:
event1- The event to merge recent view associations into.event2- The event to merge recent view associations from.
-
deleteEventAndEventArtists
Deletes an event and all of its associated artist-event relationships. This is used when merging duplicate events.- Parameters:
event- The event to delete.
-
processCity
Processes a city response from a scraper. This method updates the last-scraped timestamp for the city from the given data source.- Parameters:
resp- The city response to process.
-
processRymJsonArtist
Processes an artist's data from a Rate Your Music (RYM) JSON object. This method adds origin cities and genres to the artist based on the RYM data.- Parameters:
rymArtist- The RYM JSON artist data.
-