When I prepare classes, I spend a significant amount of time (maybe 20-30% of the whole process) finding or constructing **perfect** language examples or images that I can use for my Japanese language class. A classic example is images for the unit on counters (e.g., ไธๆใไบๆใไธๅใไบๅ etc.), where an instructor has to find or create images for different numbers of objects and different shapes. It's a lot harder than you think to find an image that fits the exact description (try finding an image of 8 pieces of paper, for example).
So I became very curious what AI-based image generation tools can offer and wanted to see how useful they can be in designing images, text, and audio that can be used for language instruction.
Starting with images, I have tried to create several different types of instructional materials with AI. Below are some example resources generated by AI and the prompts that I used to design these materials.
OpenAI's DALL-E, Stable Diffusion, Stability AI, and Midjourney released text-to-image models in 2023. Each model had its own strengths (and weaknesses), but overall they successfully produced target objects (such as "books" and "apples") as photorealistic images. The generated images were not consistent, especially when I provided detailed prompts -- for example, the image was not accurate when I asked to generate "five apples and two bananas on the table."
The cost was another challenge. The cost of generating images and videos with AI is usually significantly higher than that for text. For example, the latest models in 2025 (Imagen-4, etc.) cost about $0.20โ$0.40 per image. If we need to generate dozens of images regularly, the cost is a prohibitive factor for instructional use.
A huge amount of criticism of text-to-image technology came from creative professionals such as artists, photographers, and videographers. While I personally have no objection to using AI-generated images, I agree with the need for ethical practices in using AI-generated images (or AI in general). One suggested practice is to always disclose that an image is AI-generated when using AI-generated images. Unfortunately, there has been no consensus on the ethical approach to AI-generated images. While some believe disclosure is good enough, others feel any use of AI-generated images is offensive. This is a major hurdle for anyone who intends to use AI for their classroom instruction.
Below are the first set of images that I generated with Stability AI in 2023. The prompt for these models was rather simple since the image went wild when I provided a too long prompt. If I want to have an image of books with these models, the overall quality was better with simple commands such as "books" or "books on the desk" rather than a long detailed description. An example prompt for each image is also presented.
A realistic photo of following object for instructional materials. Simplify the image so that it highlights the object: [A SINGLE WORD TARGET (such as "teacher" and "apple")] [INFO] Generated image ....
Google's Imagen 3 was probably the most advanced AI image generation tool in 2024. Imagen 3 was particularly good at understanding the instructions (prompts) written in natural language. It claims to be able to generate English text (such as a sign on a store), but it often produced errors. It never produced correct Japanese text either. The prompt must be in English -- if I use a Japanese word in the prompt, it often produces erratic images (you can see some of them in the sample images below).
The price for image generation became accessible in the range of $0.1-$0.2 in this model. In late 2024, I generated many images that can be used for our Japaense language courses. The first set of images are for the vocabulary items in the Genki textbooks. Below are the prompts and images that it generated.
Create a photorealistic image without any text that illustrates the following expression for learners of Japanese.: teacher or professor. [INFO] Generated image ....
In late 2024, DeepSeek started to offer an extremely affordable API access to their reasoning model (DeepSeek R1), which turned out to be extremely good at Japanese language resources (see below for more information about this). I tried using a combination of DeepSeek R1 and Imagen3 to make more complex images. Some examples are
The process involved two procedures:
Since Imagen3 does not perform very well with Japaense prompts, I had to create English translations for the Japanese practice sentences. I noticed that Imagen3 did not generate any meaningful Japaense text, so I have added an instruction not to use any text in the image (but the model still generated text sometimes.). People from the Asian background were overrepresented in images, so I also added an instruction to diversity the people's backgrounds.
ใ๏ฝใพใใใ (e.g., ใณใผใใผใ้ฃฒใฟใพใใใใ)ใใไฝฟใฃใไพๆใใๆฅๆฌ่ชใฎๆ็งๆธใใใใใใไฝฟใฃใฆใใฅใผใจใผใฏใฎใฉใฌใผใใฃใขใณใใฅใใใฃใผใซใฌใใธใงๆฅๆฌ่ชใๅญฆ็ฟใใฆใใๅญฆ็ใฎใใใซ50ๅไฝใฃใฆใใ ใใใใใใใใใฎCh.1-Ch.03ใพใงใฎๅ่ชใไฝฟใใใใซใใฆใใใฎ็ฏๅฒๅ ใฎๅ่ชใ ใใฎไพๆใไฝใใใใซใใฆใใ ใใใใใใใฏใฏใใฅใผใจใผใฏใซ้ขใใๅ ดๆใใคใใณใใๆๆใไฝฟใใใใซใใฆใใ ใใใๅบๅใฏใใผใฟใฎใฟใซใใฆใใๆฅๆฌ่ชใฎไพๆ; ่ฑ่ชใฎ่จณใๅฝขๅผใงๅบใใฆใ่งฃ่ชฌใฏใคใใชใใงใใ ใใใ ใณใผใใผใ้ฃฒใฟใพใใใใ; Would you like to drink coffee? ๆ ็ปใ่ฆใพใใใใ; Would you like to watch a movie? ใปใณใใฉใซใใผใฏใงๆฃๆญฉใใพใใใใ; Would you like to take a walk in Central Park? ๅณๆธ้คจใงๅๅผทใใพใใใใ; Would you like to study at the library? ใใฅใผใจใผใฏใฎ็พ่ก้คจใธ่กใใพใใใใ; Would you like to go to a museum in New York? ๆผใ้ฃฏใ้ฃในใพใใใใ; Would you like to eat lunch? ็ด ่ถใ้ฃฒใฟใพใใใใ; Would you like to drink tea? ๅ้ใซไผใใพใใใใ; Would you like to meet a friend? ใฟใคใ ใบในใฏใจใขใ่ฆใพใใใใ; Would you like to see Times Square? ้ณๆฅฝใ่ใใพใใใใ; Would you like to listen to music? ๆฌใ่ชญใฟใพใใใใ; Would you like to read a book? ๅ ฌๅใงใใฏใใใฏใใพใใใใ; Would you like to have a picnic in the park? ใใฅใผใจใผใฏใฎใใฌใผใใ่ฆใพใใใใ; Would you like to watch a parade in New York? ... [snap]
A note about the diverse background was added to the prompt. The skewed representation of people in Imagen3 became a major concern (i.e., minorities were not represented in images). When the target sentence included "Japanese," people from Asian backgrounds were overrepresented. For example, many Japanese people appear for the sentence "I go to the Japanese class," which does not make sense since those students should not look like Japanese. AI-generated nudity became an issue, and Imagen3 later added a very strong filter, which blocked a certain set of words such as "student", "young", and "girls." The prompts, therefore, had to be modified -- for example, I had to use "a classroom packed with people" instead of "a classroom packed with students"
A note about text/characters was also added since Imagen3 was never able to produce correct text in their images. About 50-70% of English text was wrong and almost none of Japanese text was correct.
Make a realistic photo for the following sentence. Make sure to represent a diverse background. Do not include any text or written characters. Would you like to drink coffee? [INFO] Generated image ....
UNDER CONSTRUCTION
A realistic photo of following object for instructional materials. Simplify the image so that it highlights the object: [A SINGLE WORD TARGET (such as "teacher" and "apple")] [INFO] OUTPUT... ....
DeepSeek was introduced in the U.S. in 2025 and offered a highly capable reasoning model for a significantly reduced cost. In my observation, DeepSeek outperforms in topics of Japanese language learning, probably because the model has been trained with a large number of Japanese language learning materials (DeepSeek was developed in China). For example, DeepSeek understands what kind of words and grammatical structures we should use just by saying "Use Japanese appropriate for students who are studying chapter 3 of the Genki textbook," etc.
Its reasoning model (R1) was extremely affordable (less than $0.01 for each query), and it was possible to produce a large number of text-based materials with it. As an initial project, I generated a large number of example sentences that I can use in my class. The generated output was appropriate for the students' level and its topics were diverse. To be honest, it generated much better example sentences than I make, since I tend to use the same topics/sentence patterns in my examples. Errors did happen, but they were rare (about 1 in 100-200).
ใ๏ฝใพใใใ (e.g., ใณใผใใผใ้ฃฒใฟใพใใใใ)ใใไฝฟใฃใไพๆใใๆฅๆฌ่ชใฎๆ็งๆธใใใใใใไฝฟใฃใฆใใฅใผใจใผใฏใฎใฉใฌใผใใฃใขใณใใฅใใใฃใผใซใฌใใธใงๆฅๆฌ่ชใๅญฆ็ฟใใฆใใๅญฆ็ใฎใใใซ50ๅไฝใฃใฆใใ ใใใใใใใใใฎCh.1-Ch.03ใพใงใฎๅ่ชใไฝฟใใใใซใใฆใใใฎ็ฏๅฒๅ ใฎๅ่ชใ ใใฎไพๆใไฝใใใใซใใฆใใ ใใใใใใใฏใฏใใฅใผใจใผใฏใซ้ขใใๅ ดๆใใคใใณใใๆๆใไฝฟใใใใซใใฆใใ ใใใๅบๅใฏใใผใฟใฎใฟใซใใฆใใๆฅๆฌ่ชใฎไพๆ; ่ฑ่ชใฎ่จณใๅฝขๅผใงๅบใใฆใ่งฃ่ชฌใฏใคใใชใใงใใ ใใใ ใณใผใใผใ้ฃฒใฟใพใใใใ; Would you like to drink coffee? ๆ ็ปใ่ฆใพใใใใ; Would you like to watch a movie? ใปใณใใฉใซใใผใฏใงๆฃๆญฉใใพใใใใ; Would you like to take a walk in Central Park? ๅณๆธ้คจใงๅๅผทใใพใใใใ; Would you like to study at the library? ใใฅใผใจใผใฏใฎ็พ่ก้คจใธ่กใใพใใใใ; Would you like to go to a museum in New York? ๆผใ้ฃฏใ้ฃในใพใใใใ; Would you like to eat lunch? ็ด ่ถใ้ฃฒใฟใพใใใใ; Would you like to drink tea? ๅ้ใซไผใใพใใใใ; Would you like to meet a friend? ใฟใคใ ใบในใฏใจใขใ่ฆใพใใใใ; Would you like to see Times Square? ้ณๆฅฝใ่ใใพใใใใ; Would you like to listen to music? ๆฌใ่ชญใฟใพใใใใ; Would you like to read a book? ๅ ฌๅใงใใฏใใใฏใใพใใใใ; Would you like to have a picnic in the park? ใใฅใผใจใผใฏใฎใใฌใผใใ่ฆใพใใใใ; Would you like to watch a parade in New York? ใฑใผใญใ้ฃในใพใใใใ; Would you like to eat cake? ๆฐดใ้ฃฒใฟใพใใใใ; Would you like to drink water? ๆ ็ป้คจใธ่กใใพใใใใ; Would you like to go to the movie theater? ๅญฆๆ กใงๆฅๆฌ่ชใๅๅผทใใพใใใใ; Would you like to study Japanese at school? ่ช็ฑใฎๅฅณ็ฅใ่ฆใพใใใใ; Would you like to see the Statue of Liberty? ใณใผใใผใทใงใใใงไผใฟใพใใใใ; Would you like to rest at a coffee shop? ใใฅใผใจใผใฏใฎใณใณใตใผใใธ่กใใพใใใใ; Would you like to go to a concert in New York? ใใฌใใ่ฆใพใใใใ; Would you like to watch TV? ้ฃไบใใใพใใใใ; Would you like to have a meal? ๆฃๆญฉใใพใใใใ; Would you like to take a walk? ... [snap]
DeepSeek is particularly good at generating text for Japaense language learners (probably because DeepSeek, a Chinese LLM initiative, has access to a large amount of materials for Japaense language learners for training.
Here I tried to generate reading passages for each target grammar as well as for each Genki textbook chapter.
ใ๏ฝใพใใใ (e.g., ใณใผใใผใ้ฃฒใฟใพใใใใ)ใใไฝฟใฃใ50่ช็จๅบฆใฎ่ชญใฟ็ฉใใๆฅๆฌ่ชใฎๆ็งๆธใใใใใใไฝฟใฃใฆๅๅผทใใฆใใๆฅๆฌ่ชใๅญฆ็ฟใใฆใใๅญฆ็ใฎใใใซไฝใฃใฆใใ ใใใใใใใใใฎCh.1-Ch.03ใฎๅ่ชใไฝฟใใใใซใใฆใใใฎ็ฏๅฒๅ ใฎๅ่ชใ ใใๅฉ็จใใฆ่ชญใฟ็ฉใไฝใใใใซใใฆใใ ใใใ่ชญใฟ็ฉใฏใใใฅใผใจใผใฏใฎใใใใฏใ็นใซใใฅใผใจใผใฏใฎใฉใฌใผใใฃใขใณใใฅใใใฃใผใซใฌใใธใงๆฅๆฌ่ชใๅๅผทใใฆใใๅญฆ็ใ่ๅณใใใใใใชใใใใฏ(ใขใใกใใฒใผใ ใๆผซ็ปใชใฉ)ใๅใๆฑใใใใซใใฆใ5-6่ช็จๅบฆใฎใฟใคใใซใงๅงใพใใใใซใใฆใใ ใใใๅบๅใฏใใผใฟใฎใฟใซใใฆใ่งฃ่ชฌใฏใคใใชใใงใใ ใใใ ๅ ฌๅใงๆฌใ่ชญใฟใพใใใ ใใใซใกใฏใ็งใฏ้ดๆจใงใใๅคงๅญฆใฎๅญฆ็ใงใใๅ้ใฎ้ซๆฉใใใใใพใใไปๆฅใๅคฉๆฐใใใใงใใ้ซๆฉใใใใๅ ฌๅใงๆฌใ่ชญใฟใพใใใใใใจ่จใใพใใ็งใฏใใฏใใ่ชญใฟใพใใใใใใจ่จใใพใใๅ ฌๅใธ่กใใพใใๅ ฌๅใฏๅคงใใใงใใๆจใฎไธใงๆฌใ่ชญใฟใพใใๆฌใฏ้ข็ฝใใงใใใใใใใใ่ถใ้ฃฒใฟใพใใใใ้ซๆฉใใใฏใใใใงใใญใใใจ่จใใพใใๆฅฝใใๅๅพใงใใ
These are reading passages for each chapter of the Genki textbook. Beginning-level passages (Ch01-Ch05) tend to use many vocabulary items and structures that the chapters have not covered yet, but they can still be useful with some modifications/corrections. Those after Chapter 6 have a good balance of the target vocabulary items and structures and new items, which are often presented with sufficient context, and proficient readers will be able to make reasonable guesses.
ๆฅๆฌ่ชใฎๆ็งๆธใใใใใใฎCh.3ใฎใใใใฏใๆๆณ้ ็ฎใใใญใฃใใฉใชใผใซๆฒฟใฃใ50่ช็จๅบฆใฎ่ชญใฟ็ฉใใๆฅๆฌ่ชใฎๆ็งๆธใใใใใใไฝฟใฃใฆๅๅผทใใฆใใๆฅๆฌ่ชใๅญฆ็ฟใใฆใใๅญฆ็ใฎใใใซไฝใฃใฆใใ ใใใใใใใใใฎCh.1-Ch.03ใฎๅ่ชใไฝฟใใใใซใใฆใใใฎ็ฏๅฒๅ ใฎๅ่ชใ ใใๅฉ็จใใฆ่ชญใฟ็ฉใไฝใใใใซใใฆใใ ใใใ่ชญใฟ็ฉใฏใใใฅใผใจใผใฏใฎใใใใฏใ็นใซใใฅใผใจใผใฏใฎใฉใฌใผใใฃใขใณใใฅใใใฃใผใซใฌใใธใงๆฅๆฌ่ชใๅๅผทใใฆใใๅญฆ็ใ่ๅณใใใใใใชใใใใฏ(ใขใใกใใฒใผใ ใๆผซ็ปใชใฉ)ใๅใๆฑใใใใซใใฆใ5-6่ช็จๅบฆใฎใฟใคใใซใงๅงใพใใใใซใใฆใใ ใใใๅบๅใฏใใผใฟใฎใฟใซใใฆใ่งฃ่ชฌใฏใคใใชใใงใใ ใใใ ๅญฆๆ กใงใฎไธๆฅใฎๅๅผท ไปๆฅใ็งใฏๆใใๅญฆๆ กใธ่กใใพใใใๆๅฎคใงๅ้ใจไผใใพใใใ็งใใกใฏๆฅๆฌ่ชใฎๅๅผทใใใพใใใๅ ็ใใๆ็งๆธใ่ชญใใงใใ ใใใใจ่จใใพใใใ็งใฏๅฃฐใๅบใใฆ่ชญใฟใพใใใๅ้ใฏใ่ณชๅใใฆใใใใงใใใใจ่ใใพใใใๅ ็ใฏใใฏใใใฉใใใใจ็ญใใพใใใๆผไผใฟใซใ้ฃๅ ใงใ้ฃฏใ้ฃในใพใใใๅๅพใๅณๆธ้คจใธ่กใฃใฆใๆฌใๅใใพใใใใจใฆใๆฅฝใใไธๆฅใงใใใ
Task-based Language Learning (TBLL) is an extremely engaging and effective approach to language teaching. However, the major challenge in implementing TBLL is the amount of time and energy required to build meaningful tasks that are relevant to the learning objective (e.g., target grammar or vocabulary) and to students' lives.
It turned out that DeepSeek can generate reasonably good tasks. We can also make the tasks specific to students' environments by adjusting prompts.
In my experience, it works best to generate a few (probably 4-5) different tasks by adjusting prompts -- at least one of them is good enough for actual classroom instruction.
ใ๏ฝใพใใใ (e.g., ใณใผใใผใ้ฃฒใฟใพใใใใ)ใใ็ทด็ฟใใtask-basedใฎ่ชฒ้กใใๆฅๆฌ่ชใฎๆ็งๆธใใใใใใไฝฟใฃใฆๅๅผทใใฆใใๆฅๆฌ่ชใๅญฆ็ฟใใฆใใๅญฆ็ใฎใใใซไฝใฃใฆใใ ใใใใใฅใผใจใผใฏใฎๅคงๅญฆใ็ญๅคงใงๅๅผทใใฆใใๅญฆ็ใฎใใใซใใใฅใผใจใผใฏใซ้ขไฟใใใใใใฏใ็ฉๆฅต็ใซๅฉ็จใใฆใใ ใใใใใใใใใฎCh.1-Ch.03ใพใงใฎๅ่ชใไฝฟใใใใซใใฆใใใฎ็ฏๅฒๅ ใฎๅ่ชใ ใใๅฉ็จใใใใใซๅฟใใใฆไธใใใใฟในใฏใฏใๅญฆ็ใๅฎ้ใซ็ด้ขใใใใใช็ถๆณใ่จญๅฎใใใใใ่ฉณใใใใใฏใฐใฉใฆใณใใใฟในใฏใง่งฃๆฑบใในใๅ้กใซใคใใฆ่ฑ่ชใง่ฉณใใ่ฟฐในใฆใใใฎๅพใซๅฟ ่ฆใชใใญใฃใใฉใชใผใๆ็คบใใๆๅฐใฎในใใใใ3-5ในใใใใง่กจ็คบใใฆไธใใใๅบๅใฏใใผใฟใฎใฟใซใใฆใๆฅๆฌ่ชใฎๅ่ชใไพๆไปฅๅคใฏ่ฑ่ชใงๆธใใฆใใใใฆใฟในใฏใฎๅๅใๅฟ ่ฆใชๆๆใ็ทด็ฟใใ่จ่ชในใญใซใๅฟ ่ฆใชๆ้ใชใฉใๆๅใซๆธใใฆใใฟในใฏใฎ่ฉณ็ดฐใชในใใใใๆธใใฆใใ ใใใ่งฃ่ชฌใฏใคใใชใใงใใ ใใใ ### Task-Based Activity: "Exploring NYC Together - Museum Invitation Challenge" **Task Name:** Weekend Museum Invitation Plan **Target Language Pattern:** ๏ฝใพใใใ (e.g., ใณใผใใผใ้ฃฒใฟใพใใใใ) **Textbook Source:** Genki I, Chapters 1โ3 **Target Learners:** Beginner Japanese students in New York colleges/community colleges **Materials Needed:** Whiteboard, Genki vocabulary lists (Ch.1โ3), scenario cards, timer **Language Skills Practiced:** Speaking (invitations), listening (responses), vocabulary recall **Time Required:** 30โ40 minutes ### Background Scenario (in English) You are a new student at a New York college and want to make friends while exploring the city. This weekend, you plan to visit the **American Museum of Natural History** (or a similar NYC landmark). Your task is to invite classmates to join you using Japanese. You must negotiate a day/time, suggest meeting points, and politely handle acceptances or refusals. The challenge: use only vocabulary and grammar from Genki Chapters 1โ3. ### Key Vocabulary (from Genki Ch.1โ3) **Nouns:** - ใณใผใใผ (coffee), ใใกใ (tea), ใฐใใใฏใ (dinner) - ใซใกใใใณ (Sunday), ใฉใใใณ (Saturday), ใใ (morning), ใใ (afternoon) - ใใพ (now), ใใใ (tomorrow), ใใใ (today) - ใใใใ (teacher), ใจใใ ใก (friend), ใใใใ (student) - ใใใ (movie), ใใฅใผใธใขใ (museum โ loanword allowed in Ch.3) **Verbs:** - ใใใพใ (go), ใฟใพใ (see/watch), ใในใพใ (eat), ใฎใฟใพใ (drink), ใใพใ (do) - ใญใพใ (sleep), ใใใพใ (wake up), ในใใใใใใพใ (study) **Expressions:** - ใใใงใใญ (Thatโs good!), ใกใใฃใจโฆ (a littleโฆ [for hesitance]) - ใชใใใใณ๏ผ (What day?), ใชใใ๏ผ (What time?) ### Task Steps 1. **Preparation (5 minutes)** - Review the vocabulary list above. Write down 2โ3 invitation sentences using "๏ฝใพใใใ" (e.g., ใฉใใใณใซใใฅใผใธใขใ ใธใใใพใใใใ). - Brainstorm how to suggest meeting at a specific time/place (e.g., ใใใงใใใพใใใใ). 2. **Role-Play Practice (15 minutes)** - Pair up. One student is the "inviter," the other is the "classmate." - Inviter: Propose a museum visit using "๏ฝใพใใใ." Include day, time, and activity (e.g., ใซใกใใใณใฎใใใซใใฅใผใธใขใ ใใฟใพใใใใ). - Classmate: Respond using Ch.1โ3 vocabularyโeither accept (ใใใงใใญ๏ผ) or politely decline (ใกใใฃใจโฆในใใใใใใพใ). - Swap roles and repeat with a different day/activity. 3. **Expansion and Negotiation (10 minutes)** - In pairs, plan details: - Decide where to meet (e.g., ใใฃใใใงใใใผใฏใง). - Suggest an after-museum activity (e.g., ใณใผใใผใใฎใฟใพใใใใ). - Use "ใชใใ๏ผ" or "ใชใใใใณ๏ผ" to ask questions. 4. **Group Sharing (5 minutes)** - Each pair shares their final plan with the class in simple Japanese (e.g., ใฉใใใณใฎใใใซใใฅใผใธใขใ ใธใใใพใใใใใใใใณใผใใผใใฎใฟใพใใ). 5. **Optional Challenge** - If time allows, try inviting a "busy friend" (played by the teacher) who declines twice before acceptingโpractice persistent but polite invitations.
Projects are extremely useful as student-centered homework and authentic assessment. Like tasks, however, it takes a considerable amount of time and energy to design effective projects.
It turned out that DeepSeek can generate reasonably good projects. Like tasks, it works best to generate a few (probably 4-5) different projects by adjusting prompts and at least one of them is good enough for actual classroom instruction.
ใ๏ฝใพใใใ (e.g., ใณใผใใผใ้ฃฒใฟใพใใใใ)ใใ็ทด็ฟใใใใญใธใงใฏใใใๆฅๆฌ่ชใฎๆ็งๆธใใใใใใไฝฟใฃใฆๅๅผทใใฆใใๆฅๆฌ่ชใๅญฆ็ฟใใฆใใๅญฆ็ใฎใใใซไฝใฃใฆใใ ใใใใใใใใใฎCh.1-Ch.03ใพใงใฎๅ่ชใไฝฟใใใใซใใฆใใใฎ็ฏๅฒๅ ใฎๅ่ชใ ใใไฝฟใใใใซใใฆใใ ใใใใใญใธใงใฏใใฏใใ ใใใ1ๆ้็จๅบฆใง็ตใใ็ฏๅฒใงใใใฅใผใจใผใฏใซใใๆฅๆฌใซ้ขใใใๅบใๅฃไฝใๆฅๆฌ่ชใไฝฟใใใฆใใๅ ดๆใชใฉใๅฉ็จใใใใใซใใฆใใ ใใใๅ ดๆใฏๆฅๆฌใฎใฆใงใใตใคใใชใฉใฎใชใณใฉใคใณใงใใใใงใใใใญใธใงใฏใใฏใๆๅบใงใ ใdeliverableใๆ็ขบใซใใใใใซใใฆใใ ใใใๅบๅใฏใใผใฟใฎใฟใซใใฆใๆฅๆฌ่ชใฎๅ่ชใไพๆไปฅๅคใฏ่ฑ่ชใงๆธใใฆใใใใฆใใญใธใงใฏใใฎๅๅใๅฟ ่ฆใชๆๆใ็ทด็ฟใใ่จ่ชในใญใซใๅฟ ่ฆใชๆ้ใชใฉใๆๅใซๆธใใฆใใใญใธใงใฏใใฎ่ฉณ็ดฐใชในใใใใๆธใใฆใใ ใใใ่งฃ่ชฌใฏใคใใชใใงใใ ใใใ **Project Name:** "NYCใงใใใใ๏ผ๏ฝใพใใใ Invitation Project" **Materials Needed:** - "Genki I" Textbook (Ch.1-3 vocabulary reference) - Internet access (for website research) - Worksheet (printed or digital) - Pen/Notebook or digital document **Language Skills Practiced:** - Grammar: ๏ฝใพใใใ (informal invitations) - Vocabulary: Genki Ch.1-3 only - Reading: Simple website navigation - Writing: Forming invitation sentences **Time Required:** Approximately 60 minutes --- **Detailed Project Steps:** 1. **Preparation (5 minutes)** - Review the "๏ฝใพใใใ" grammar point from Genki. - Review vocabulary lists from Genki Chapters 1, 2, and 3. - Open a web browser. 2. **Research Phase (25 minutes)** - Visit the following real websites based in or related to New York: a. **Mitsuwa Marketplace (Edgewater, NJ - near NYC)** website: Look for sections like ใฌในใใฉใณ (restaurant), ใซใใง (cafe), ๆฌ (books), ้ฃใน็ฉ (food). b. **Japan Society (NYC)** website: Look for ใคใใณใ (events), ใใใช (videos), ใฏใฉใน (classes). c. **Kinokuniya Bookstore (NYC)** website: Look for ๆฌ, ้่ช (magazines), ๆๆฟๅ ท (stationery). - Using **only vocabulary from Genki Ch.1-3**, identify 5 items/activities you find. Write them down in Japanese. *Example Finds: ใณใผใใผใใใกใใใใใใใปใใใใผใใใฆใใฟใใตใณใใคใใใใใผใใฃใณใฐ* 3. **Sentence Creation Phase (20 minutes)** - Using your 5 found items/activities, create 5 invitation sentences using the "๏ฝใพใใใ" pattern. - Each sentence must use **only vocabulary and numbers from Genki Ch.1-3**. *Example Sentences:* 1. ใใใใใณใผใใผใใ้ฃฒใฟใพใใใใ 2. ใใใใใปใใใ่ชญใฟใพใใใใ 3. ๆจใใๆฅใซใใใใใใ่ฆใพใใใใ 4. ใใใงใใตใณใใคใใใใ้ฃในใพใใใใ 5. ใซใกใใๆฅใซใใใผใใฃใผใธใ่กใใพใใใใ 4. **Deliverable Creation & Submission (10 minutes)** - Fill out the final worksheet with: A. The 5 items/activities you found (in Japanese). B. Your 5 original invitation sentences. C. One chosen invitation (from your 5 sentences) written neatly or typed for submission. - Submit your final chosen invitation sentence and the name of the website you used most to your instructor via email or the provided Google Form. **Deliverable Format for Submission:** - Student Name: - Website Used Most: - My Invitation: [One complete sentence using ๏ฝใพใใใ]
Although often overlooked, OpenAIโs audio-generation โ or text-to-speech (tts) โ technology is a major asset. Before OpenAI's tts model, initially known as Whisper, numerous commercial and academic projects produced synthetic speech systems, but they all had noticeable accents that could alienate audiences. OpenAI's Whisper project applied large language model technology to generate synthetic speech, which outperformed previous models that attempted to emulate human phonetic and phonological patterns.
OpenAI first announced the tts-1 model in 2023 and later upgraded it to the gpt-4o-mini-tts model in 2024. As of this writing, the latest model remains gpt-4o-mini-tts (gpt-4o-mini-tts-2025-12-15).
OpenAI's gpt-4o-mini-tts offers 13 different speakers (voices) and supports audio instructions, with which you can make minor adjustments to the speech patterns. Below is a prompt example to generate speech samples.
for speaker in "alloy" "ash" "ballad" "cedar" "coral" "echo" "fable" "marin" "nova" "onyx" "sage" "shimmer" "verse"; do generate_speech.py --audioModel gpt-4o-mini-tts-2025-12-15 --audioSpeaker $speaker --audioInstructions "Accent: warm, refined, and gently instructive, reminiscent of a friendly instructor. Tone: Calm, encouraging, and articulate. Pacing: Deliberate, pausing often to allow the listener to follow instructions comfortably. Emotion: Cheerful, supportive, and pleasantly enthusiastic" --text "The quick brown fox jumps over the lazy dog. one, two, three, four, five, six, seven, eight, nine, ten. Sally sells seashells by the seashore. Six sleek swans swam swiftly south."; done [INFO] Audio has been generated... ....
Although these models are not specifically trained for Japaense, they tend to do very well with the Japanese speech. Below is the sample of Japaense text.
generate_speech.py "ๆฌกใฎๆฅๆฌ่ชใฎๆ็ซ ใ้ณๅฃฐใซๅคๆดใใฆใใ ใใ" --audioInstructions "Speak in Japanese. Accent: warm, refined, and gently instructive, reminiscent of a friendly instructor. Tone: Calm, encouraging, and articulate. Pacing: Natural speed. Make sure to pause for one or two seconds at the end of each sentence. Emotion: Cheerful, supportive, and pleasantly enthusiastic" --audioModel gpt-4o-mini-tts-2025-12-15 --audioSpeaker echo "้ณๅฃฐใใงใใฏใงใใใใ ใใพใใใคใฏใฎใในใใ่กใฃใฆใใพใใใใกใใซใใใใใใใใใใใใใชใชใใฏใกใใใ ใใใใ ใใ็้บฆใ็็ฑณใ็ๅตใ้ฃใฎๅฎขใฏใใใๆฟ้ฃใๅฎขใ ใใใ ใใพใ่ช็ถใช้ใใจไธๅฎใฎ้ณ้ใง่ชญใใงใใพใใใใใใใใใใใใใใใใใใใใ" [INFO] Audio has been generated... ....
The cost for OpenAI's tts is extremely affordable (about $0.01 per minute), which allows us to generate a large number of audio files as instructional materials.
generate_speech.py "ๆฌกใฎๆฅๆฌ่ชใฎๆ็ซ ใ้ณๅฃฐใซๅคๆดใใฆใใ ใใ" --audioInstructions "Speak in Japanese. Accent: warm, refined, and gently instructive, reminiscent of a friendly instructor. Tone: Calm, encouraging, and articulate. Pacing: Natural speed. Make sure to pause for one or two seconds at the end of each sentence. Emotion: Cheerful, supportive, and pleasantly enthusiastic" --audioModel gpt-4o-mini-tts-2025-12-15 --audioSpeaker echo "ใณใผใใผใ้ฃฒใฟใพใใใ" "ๆ ็ปใ่ฆใพใใใ" "ใปใณใใฉใซใใผใฏใงๅ็ใๆฎใใพใใใ" "ๅณๆธ้คจใงๅๅผทใใพใใใ" "ใกใใญใใชใฟใณ็พ่ก้คจใธ่กใใพใใใ" ... [snip] [INFO] Audio has been generated... ....
Eleven Labs specializes in AI-based speech synthesis and other audio-based services (such as transcription). In terms of quality, it probably surpasses recorded speech by non-professional humans โ the audio emulates human speech perfectly and does not have any unintended interruptions such as external noise or coughing. You will have absolute control over speech synthesis and can add various extra-speech features by using tags such as [surprised] and [uninterested]. There are over 1,000 speakers to choose from, and you can also use your own voice as a speaker by uploading a short (2-3 min) speech sample.
Eleven Labs is probably the best speech synthesis service (a lot better than ChatGPT's tts), but it costs a lot more than other services. On average, Eleven Labs costs about $0.2-0.3 per minute, while the average cost of ChatGPT tts is about $0.01-0.02 per minute.
Eleven Labs offers a free subscription, which comes with some credits (sufficient for 10-15 min of speech synthesis).
Below are speech samples for Eleven Labs. Eleven Labs offers over 1,000 voices and an option to use your own voice, so these are just a fraction of the speech samples that you can generate with Eleven Labs.
generate_speech.py --model-id "eleven_v3" --voice-id Ellen "้ณๅฃฐใใงใใฏใงใใใใ ใใพใใใคใฏใฎใในใใ่กใฃใฆใใพใใใใกใใซใใใใใใใใใใใใใชใชใใฏใกใใใ ใใใใ ใใ็้บฆใ็็ฑณใ็ๅตใ้ฃใฎๅฎขใฏใใใๆฟ้ฃใๅฎขใ ใใใ ใใพใ่ช็ถใช้ใใจไธๅฎใฎ้ณ้ใง่ชญใใงใใพใใใใใใใใใใใใใใใใใใใใ" [INFO] Audio has been generated... ....
I processed the audio for sample sentences again with ElevenLabs (see OpenAI Whisper above). The audio quality is significantly better than OpenAI Whisper.
generate_speech.py --voice-id "Yui" --model-id "eleven_v3" "ใณใผใใผใ้ฃฒใฟใพใใใ" "ๆ ็ปใ่ฆใพใใใ" "ใปใณใใฉใซใใผใฏใงๅ็ใๆฎใใพใใใ" "ๅณๆธ้คจใงๅๅผทใใพใใใ" "ใกใใญใใชใฟใณ็พ่ก้คจใธ่กใใพใใใ" ... [snip] [INFO] Audio has been generated... ....
Eleven Labs offers a great deal of control over speech synthesis and you can make a dialogue among multiple different people (voices) too. Below is an example of the dialogue speech synthesis.
generate_speech.py --model-id "eleven_v3" --dialogue-voice-map "Male1:Akira, Female1:Shizuka, Male2:Austin, Female2:Ellen" Female1: [warmly] ใใใซใกใฏใใใใใใใใๅ ๆฐใงใใใ Male1: ใฏใใๅ ๆฐใงใใใใใใใใใใกใใฏๅ้ใฎใใคใฏใใใจใตใฉใใใงใใ Female2: [curious] ใฏใใใพใใฆใใตใฉใงใใใขใกใชใซไบบใงใใๅญฆ็ใงใใ Male2: ใฏใใใพใใฆใใใคใฏใงใใใขใกใชใซไบบใงใใใใใใใ้กใใใพใใ Female1: ็งใฏใใใใงใใๆฅๆฌไบบใงใใๅคงๅญฆใงๆฅๆฌ่ชใๅๅผทใใฆใใพใใ Male1: ็งใฏใใใใงใใๆฅๆฌไบบใงใใๅญฆ็ใงใใ Female2: [thoughtful] ใใฎใใๆฅๆฌ่ชใฎๆฌใ่ชญใฟใพใใใใ Male2: [surprised] ๆฌใงใใ๏ผใใใงใใญใ Female1: [warmly] ใใใงใใญใไธ็ทใซๅณๆธ้คจใธ่กใใพใใใใ Male1: ใใใ่กใใพใใใใ Female2: [laugh] ใใใใจใใใใใพใใ Male2: [curious] ๅณๆธ้คจใฏใฉใใงใใใ Female1: ๅคงๅญฆใฎ่ฟใใงใใ Male1: [warmly] ใใใใไธ็ทใซ่กใใพใใใใ Female2: ใฏใใใ้กใใใพใใ [INFO] Audio has been generated... ....
Last update: